Getting started¶
Build your first iscc-usearch index, add vectors, search for nearest neighbors, and save the
index to disk.
Prerequisites¶
- Python 3.10 or later
piporuv
Install¶
Or with uv:
Verify the installation¶
What gets installed¶
iscc-usearch brings in three runtime dependencies:
| Dependency | Purpose |
|---|---|
usearch-iscc |
Patched USearch fork with native NPHD metric, fast view(), and GIL release for parallel shard loading |
fastbloom-rs |
Rust-based bloom filter for O(1) key rejection in sharded indexes |
loguru |
Structured logging |
Create an index¶
NphdIndex stores binary bit-vectors up to a given maximum dimension. Here we create one that
accepts vectors up to 256 bits (32 bytes):
max_dim is the upper bound on vector length in bits. Every vector you add must fit within this
limit.
Constraints on max_dim
max_dim must be a multiple of 8 and at most 256 (the maximum resolution of ISCC content
fingerprints). The constructor raises ValueError if either constraint is violated.
Add vectors¶
Vectors are NumPy uint8 arrays where each byte holds 8 bits of the binary code. Each vector
requires an integer key:
import numpy as np
# Add three 32-bit vectors (4 bytes each)
index.add(1, np.array([255, 128, 64, 32], dtype=np.uint8))
index.add(2, np.array([255, 128, 64, 33], dtype=np.uint8))
index.add(3, np.array([255, 128, 64, 32], dtype=np.uint8))
Batch insertion works too:
keys = [10, 11, 12]
vectors = np.array(
[
[255, 128, 64, 32],
[255, 128, 64, 33],
[0, 0, 0, 0],
],
dtype=np.uint8,
)
index.add(keys, vectors)
Search for nearest neighbors¶
Pass a query vector to find the closest matches:
query = np.array([255, 128, 64, 32], dtype=np.uint8)
matches = index.search(query, count=3)
print(matches.keys) # Array of matching keys, sorted by distance
print(matches.distances) # Corresponding NPHD distances [0.0, 1.0]
Distances range from 0.0 (identical) to 1.0 (every bit differs).
Retrieve vectors by key¶
Fetch a stored vector by its key:
vector = index.get(1)
print(vector) # array([255, 128, 64, 32], dtype=uint8)
# Missing keys return None
missing = index.get(999)
print(missing) # None
Check key existence¶
Check whether a key exists in the index without retrieving its vector:
print(index.contains(1)) # True
print(index.contains(999)) # False
# Python 'in' operator works too
print(1 in index) # True
Save and reload¶
Save the index to a file and load it back later:
# Save
index.save("my_index.usearch")
# Restore (loads into RAM)
restored = NphdIndex.restore("my_index.usearch")
# Verify it works
matches = restored.search(query, count=3)
print(matches.keys)
Tip
For read-only access with lower memory usage, use restore(..., view=True) to memory-map the
file instead of loading it fully into RAM. See the
Persistence how-to for details.
Single-process only
Running multiple processes against the same index files may corrupt data. See Architecture for details.
Next steps¶
- Variable-length vectors -- Mix vectors of different bit-lengths in the same index.
- Scaling up -- Use
ShardedNphdIndexfor datasets that exceed RAM. - Persistence --
save(),load(),view(), andrestore()explained. - API reference -- Full API documentation.