API Reference¶
Auto-generated documentation for all public classes in iscc-usearch.
NphdIndex¶
Single-file index for variable-length binary bit-vectors with NPHD metric.
NphdIndex ¶
Bases: Index
Fast approximate nearest neighbor search for variable-length binary bit-vectors.
Supports Normalized Prefix Hamming Distance (NPHD) metric and packed binary vectors as np.uint8 arrays of variable length. Vector keys must be integers.
CONCURRENCY: Single-process only. The underlying .usearch files have no file locking or multi-process coordination. Running multiple processes against the same index may corrupt data. Use a single process with async/await for concurrent connections.
UPSERT: Batch upsert requires uniform-length vectors. For variable-length batch upsert,
call upsert() individually for each vector: for k, v in zip(keys, vecs): idx.upsert(k, v)
Create a new NPHD index.
add ¶
Add variable-length binary vectors to the index.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
keys
|
Integer key(s) or None for auto-generation |
required | |
vectors
|
Single vector, 2D array of uniform vectors, or list of variable-length vectors |
required | |
kwargs
|
Additional arguments passed to parent Index.add() |
{}
|
Returns:
| Type | Description |
|---|---|
|
Array of keys for added vectors |
get ¶
Retrieve unpadded variable-length vectors by key(s).
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
keys
|
Integer key(s) to lookup |
required | |
dtype
|
Optional data type (defaults to index dtype) |
None
|
Returns:
| Type | Description |
|---|---|
|
Unpadded vector(s) or None for missing keys |
search ¶
Search for nearest neighbors of query vector(s).
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
vectors
|
Single vector or batch of variable-length vectors to query |
required | |
count
|
Maximum number of nearest neighbors to return per query |
10
|
|
kwargs
|
Additional arguments passed to parent Index.search() |
{}
|
Returns:
| Type | Description |
|---|---|
|
Matches for single query or BatchMatches for batch queries |
Raises:
| Type | Description |
|---|---|
ValueError
|
If count < 1 |
load ¶
Load index from file or buffer and restore max_dim from saved ndim.
CRITICAL: After loading, we must restore the custom NPHD metric because usearch's load() overwrites it with the saved metric (standard Hamming).
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
path_or_buffer
|
Path or buffer to load from (defaults to self.path) |
None
|
|
progress
|
Optional progress callback |
None
|
view ¶
Memory-map index from file or buffer and restore max_dim from saved ndim.
CRITICAL: After viewing, we must restore the custom NPHD metric because usearch's view() overwrites it with the saved metric (standard Hamming).
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
path_or_buffer
|
Path or buffer to view from (defaults to self.path) |
None
|
|
progress
|
Optional progress callback |
None
|
copy ¶
Create a copy of this index.
Returns:
| Type | Description |
|---|---|
|
New NphdIndex with same configuration and data |
restore
staticmethod
¶
Restore a NphdIndex from a saved file or buffer.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
path_or_buffer
|
Path or buffer to restore from |
required | |
view
|
If True, memory-map the index instead of loading |
False
|
|
kwargs
|
Additional arguments passed to NphdIndex constructor |
{}
|
Returns:
| Type | Description |
|---|---|
|
Restored NphdIndex or None if file is invalid |
options: show_source: false heading_level: 3 members_order: source
ShardedNphdIndex¶
Multi-shard index combining automatic sharding with NPHD support for variable-length vectors.
ShardedNphdIndex ¶
Bases: ShardedIndex
Sharded index for variable-length binary bit-vectors with NPHD metric.
Combines ShardedIndex's automatic sharding with NphdIndex's support for variable-length vectors and Normalized Prefix Hamming Distance metric.
CONCURRENCY: Single-process only. No file locking. Use async/await within a single process for concurrent access.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
max_dim
|
int | None
|
Maximum bits per vector (auto-detected from existing shards if omitted) |
None
|
path
|
str | PathLike
|
Directory path for shard storage (required) |
required |
shard_size
|
Size limit in bytes before rotating shards (default 1GB) |
required | |
connectivity
|
HNSW connectivity parameter (M) |
required | |
expansion_add
|
Search depth on insertions (efConstruction) |
required | |
expansion_search
|
Search depth on queries (ef) |
required |
Initialize a sharded NPHD index.
vectors
property
¶
Lazy iterator over all unpadded vectors across all shards.
Returns a ShardedNphdIndexedVectors object that supports: - Iteration: for vec in idx.vectors - Length: len(idx.vectors) - Indexing: idx.vectors[0], idx.vectors[-1] - Slicing: idx.vectors[:10] - Numpy conversion: np.asarray(idx.vectors) (requires uniform vector lengths)
Vectors are returned unpadded (variable-length), consistent with the get() API. This is a live view - reflects current state at iteration time.
Returns:
| Type | Description |
|---|---|
ShardedNphdIndexedVectors
|
ShardedNphdIndexedVectors iterator |
add ¶
add(
keys: int | None | Any,
vectors: NDArray[Any],
*,
copy: bool = True,
threads: int = 0,
log: str | bool = False,
progress: Callable[[int, int], bool] | None = None,
) -> int | NDArray[np.uint64]
Add variable-length binary vectors to the index.
Pads vectors before adding to ensure consistent storage across shards.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
keys
|
int | None | Any
|
Integer key(s) or None for auto-generation |
required |
vectors
|
NDArray[Any]
|
Single vector or batch of variable-length vectors to add |
required |
copy
|
bool
|
Whether to copy vectors into index |
True
|
threads
|
int
|
Number of threads (0 = auto) |
0
|
log
|
str | bool
|
Enable progress logging |
False
|
progress
|
Callable[[int, int], bool] | None
|
Progress callback |
None
|
Returns:
| Type | Description |
|---|---|
int | NDArray[uint64]
|
Key(s) for added vectors |
search ¶
search(
vectors: NDArray[Any],
count: int = 10,
*,
radius: float = float("inf"),
threads: int = 0,
exact: bool = False,
log: str | bool = False,
progress: Callable[[int, int], bool] | None = None,
) -> Matches | BatchMatches
Search for nearest neighbors of query vector(s).
Pads query vectors before searching to match stored format.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
vectors
|
NDArray[Any]
|
Query vector or batch of variable-length vectors to query |
required |
count
|
int
|
Maximum number of nearest neighbors to return per query |
10
|
radius
|
float
|
Maximum distance for results |
float('inf')
|
threads
|
int
|
Number of threads (0 = auto) |
0
|
exact
|
bool
|
Perform exact search |
False
|
log
|
str | bool
|
Enable progress logging |
False
|
progress
|
Callable[[int, int], bool] | None
|
Progress callback |
None
|
Returns:
| Type | Description |
|---|---|
Matches | BatchMatches
|
Matches for single query, BatchMatches for batch |
get ¶
Retrieve unpadded variable-length vectors by key(s) from any shard.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
keys
|
int | Any
|
Integer key(s) to lookup |
required |
dtype
|
Any
|
Optional data type for returned vectors |
None
|
Returns:
| Type | Description |
|---|---|
NDArray[Any] | list | None
|
Unpadded vector(s) or None for missing keys |
options: show_source: false heading_level: 3 members_order: source
ShardedIndex¶
Generic sharded index for any metric. Use ShardedNphdIndex for NPHD workloads.
ShardedIndex ¶
ShardedIndex(
*,
ndim: int | None = None,
metric: MetricKind | Any = MetricKind.Cos,
dtype: ScalarKind | str | None = None,
connectivity: int | None = None,
expansion_add: int | None = None,
expansion_search: int | None = None,
multi: bool = False,
path: str | PathLike,
shard_size: int = DEFAULT_SHARD_SIZE,
bloom_filter: bool = True,
)
Sharded vector index for scalable append-only storage.
Wraps usearch Index/Indexes to provide automatic sharding when the active shard exceeds the configured size limit. Finished shards are memory-mapped (view mode) for efficient read-only access, while the active shard is fully loaded (load mode) for read-write operations.
CONCURRENCY: Single-process only. No file locking. Use async/await within a single process for concurrent access.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
ndim
|
int | None
|
Number of vector dimensions (auto-detected from existing shards if omitted) |
None
|
metric
|
MetricKind | Any
|
Distance metric (MetricKind or CompiledMetric) |
Cos
|
dtype
|
ScalarKind | str | None
|
Scalar type for vectors (ScalarKind) |
None
|
connectivity
|
int | None
|
HNSW connectivity parameter (M) |
None
|
expansion_add
|
int | None
|
Search depth on insertions (efConstruction) |
None
|
expansion_search
|
int | None
|
Search depth on queries (ef) |
None
|
multi
|
bool
|
Allow multiple vectors per key |
False
|
path
|
str | PathLike
|
Directory path for shard storage (required) |
required |
shard_size
|
int
|
Size limit in bytes before rotating shards (default 1GB) |
DEFAULT_SHARD_SIZE
|
bloom_filter
|
bool
|
Enable bloom filter for fast non-existent key rejection |
True
|
Initialize a sharded index.
keys
property
¶
Lazy iterator over all keys across all shards.
Returns a ShardedIndexedKeys object that supports: - Iteration: for key in idx.keys - Length: len(idx.keys) - Indexing: idx.keys[0], idx.keys[-1] - Slicing: idx.keys[:10] - Numpy conversion: np.asarray(idx.keys)
This is a live view - reflects current state at iteration time.
Returns:
| Type | Description |
|---|---|
ShardedIndexedKeys
|
ShardedIndexedKeys iterator |
vectors
property
¶
Lazy iterator over all vectors across all shards.
Returns a ShardedIndexedVectors object that supports: - Iteration: for vec in idx.vectors - Length: len(idx.vectors) - Indexing: idx.vectors[0], idx.vectors[-1] - Slicing: idx.vectors[:10] - Numpy conversion: np.asarray(idx.vectors)
This is a live view - reflects current state at iteration time.
Note: Unlike usearch Index.vectors which returns an np.ndarray immediately, this returns a lazy iterator appropriate for larger-than-RAM indexes.
Returns:
| Type | Description |
|---|---|
ShardedIndexedVectors
|
ShardedIndexedVectors iterator |
add ¶
add(
keys: int | None | Any,
vectors: NDArray[Any],
*,
copy: bool = True,
threads: int = 0,
log: str | bool = False,
progress: Callable[[int, int], bool] | None = None,
) -> int | NDArray[np.uint64]
Add vectors to the active shard, rotating if size exceeded.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
keys
|
int | None | Any
|
Integer key(s) or None for auto-generation |
required |
vectors
|
NDArray[Any]
|
Vector or batch of vectors to add |
required |
copy
|
bool
|
Whether to copy vectors into index |
True
|
threads
|
int
|
Number of threads (0 = auto) |
0
|
log
|
str | bool
|
Enable progress logging |
False
|
progress
|
Callable[[int, int], bool] | None
|
Progress callback |
None
|
Returns:
| Type | Description |
|---|---|
int | NDArray[uint64]
|
Key(s) for added vectors |
search ¶
search(
vectors: NDArray[Any],
count: int = 10,
*,
radius: float = float("inf"),
threads: int = 0,
exact: bool = False,
log: str | bool = False,
progress: Callable[[int, int], bool] | None = None,
) -> Matches | BatchMatches
Search across all shards, merging and sorting results.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
vectors
|
NDArray[Any]
|
Query vector or batch of vectors |
required |
count
|
int
|
Maximum number of results per query |
10
|
radius
|
float
|
Maximum distance for results |
float('inf')
|
threads
|
int
|
Number of threads (0 = auto) |
0
|
exact
|
bool
|
Perform exact search |
False
|
log
|
str | bool
|
Enable progress logging |
False
|
progress
|
Callable[[int, int], bool] | None
|
Progress callback |
None
|
Returns:
| Type | Description |
|---|---|
Matches | BatchMatches
|
Matches for single query, BatchMatches for batch |
Raises:
| Type | Description |
|---|---|
ValueError
|
If count < 1 |
get ¶
Retrieve vectors by key from any shard.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
keys
|
int | Any
|
Integer key(s) to lookup |
required |
dtype
|
Any
|
Optional data type for returned vectors |
None
|
Returns:
| Type | Description |
|---|---|
NDArray[Any] | list | None
|
Vector(s) or None for missing keys |
contains ¶
Check if keys exist in any shard.
When bloom_filter=True (default), uses bloom filter to quickly reject non-existent keys.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
keys
|
int | Any
|
Integer key(s) to check |
required |
Returns:
| Type | Description |
|---|---|
bool | NDArray[bool_]
|
Boolean or array of booleans |
count ¶
Count occurrences of keys across all shards (sum aggregation).
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
keys
|
int | Any
|
Integer key(s) to count |
required |
Returns:
| Type | Description |
|---|---|
int | NDArray[uint64]
|
Count or array of counts |
save ¶
save(
path_or_buffer: str | PathLike | None = None,
progress: Callable[[int, int], bool] | None = None,
) -> None
Save active shard and bloom filter to disk.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
path_or_buffer
|
str | PathLike | None
|
Ignored (uses internal path management) |
None
|
progress
|
Callable[[int, int], bool] | None
|
Progress callback |
None
|
rebuild_bloom ¶
Rebuild bloom filter from all existing keys.
Use this to populate the bloom filter for an existing index that was created without bloom filter support, or to repair a corrupted filter.
Processes keys shard-by-shard in batches for efficiency.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
save
|
bool
|
Whether to save the bloom filter to disk after rebuilding |
True
|
log_progress
|
bool
|
Whether to log progress per shard |
True
|
Returns:
| Type | Description |
|---|---|
int
|
Number of keys added to the bloom filter |
metadata
staticmethod
¶
Extract metadata from a sharded index directory.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
path
|
str | PathLike
|
Directory containing shard files |
required |
Returns:
| Type | Description |
|---|---|
dict | None
|
Metadata dict or None if invalid |
pairwise_distance ¶
Not supported for sharded indexes.
options: show_source: false heading_level: 3 members_order: source
ScalableBloomFilter¶
Scalable bloom filter for efficient probabilistic key existence checks.
ScalableBloomFilter ¶
ScalableBloomFilter(
initial_capacity: int = 10000000,
fpr: float = 0.01,
growth_factor: float = 2.0,
)
Scalable bloom filter that grows automatically as elements are added.
Chains multiple fixed-size bloom filters to support unlimited growth while maintaining the target false positive rate. Each new filter has progressively tighter FPR to keep the overall rate bounded.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
initial_capacity
|
int
|
Initial number of elements before first growth |
10000000
|
fpr
|
float
|
Target false positive rate (0.0-1.0) |
0.01
|
growth_factor
|
float
|
Capacity multiplier for each new filter |
2.0
|
Initialize a scalable bloom filter.
add ¶
Add a single key to the bloom filter.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
key
|
int
|
Integer key to add (uint64) |
required |
add_batch ¶
Add multiple keys to the bloom filter efficiently.
Uses native batch operations and handles capacity growth properly.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
keys
|
Sequence[int]
|
Sequence of integer keys to add |
required |
contains ¶
Check if a key might be in the filter.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
key
|
int
|
Integer key to check |
required |
Returns:
| Type | Description |
|---|---|
bool
|
False if definitely not present, True if possibly present |
contains_batch ¶
Check if multiple keys might be in the filter.
Uses native Rust batch operations for throughput. Each filter in the chain is checked via a single batch call, and results are OR-combined.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
keys
|
Sequence[int]
|
Sequence of integer keys to check |
required |
Returns:
| Type | Description |
|---|---|
list[bool]
|
List of booleans (False=definitely not, True=possibly present) |
save ¶
Save bloom filter to disk.
File format: - 4 bytes: magic ("ISBF") - 1 byte: version - 8 bytes: count (uint64) - 4 bytes: initial_capacity (uint32) - 8 bytes: fpr (float64) - 8 bytes: growth_factor (float64) - 4 bytes: num_filters (uint32) - For each filter: - 4 bytes: capacity (uint32) - 4 bytes: hashes (uint32) - 4 bytes: data_len (uint32) - data_len bytes: filter data
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
path
|
str | Path
|
File path to save to |
required |
load
classmethod
¶
Load bloom filter from disk.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
path
|
str | Path
|
File path to load from |
required |
Returns:
| Type | Description |
|---|---|
ScalableBloomFilter
|
Restored ScalableBloomFilter |
Raises:
| Type | Description |
|---|---|
ValueError
|
If file format is invalid |
options: show_source: false heading_level: 3 members_order: source
timer¶
Context manager for timing operations with loguru integration.
timer ¶
Context manager for timing code blocks and logging elapsed duration.
Logs a message with the elapsed time on exit using loguru.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
message
|
str
|
Description of the operation being timed. |
required |
log_start
|
If True, log a "started" message on entry. |
False
|
options: show_source: false heading_level: 3 members_order: source