Skip to content

API Reference

Auto-generated documentation for all public classes in iscc-usearch.

NphdIndex

Single-file index for variable-length binary bit-vectors with NPHD metric.

NphdIndex

NphdIndex(max_dim=256, **kwargs)

Bases: Index

Fast approximate nearest neighbor search for variable-length binary bit-vectors.

Supports Normalized Prefix Hamming Distance (NPHD) metric and packed binary vectors as np.uint8 arrays of variable length. Vector keys must be integers.

CONCURRENCY: Single-process only. The underlying .usearch files have no file locking or multi-process coordination. Running multiple processes against the same index may corrupt data. Use a single process with async/await for concurrent connections.

UPSERT: Batch upsert requires uniform-length vectors. For variable-length batch upsert, call upsert() individually for each vector: for k, v in zip(keys, vecs): idx.upsert(k, v)

Create a new NPHD index.

add

add(keys, vectors, **kwargs)

Add variable-length binary vectors to the index.

Parameters:

Name Type Description Default
keys

Integer key(s) or None for auto-generation

required
vectors

Single vector, 2D array of uniform vectors, or list of variable-length vectors

required
kwargs

Additional arguments passed to parent Index.add()

{}

Returns:

Type Description

Array of keys for added vectors

get

get(keys, dtype=None)

Retrieve unpadded variable-length vectors by key(s).

Parameters:

Name Type Description Default
keys

Integer key(s) to lookup

required
dtype

Optional data type (defaults to index dtype)

None

Returns:

Type Description

Unpadded vector(s) or None for missing keys

search

search(vectors, count=10, **kwargs)

Search for nearest neighbors of query vector(s).

Parameters:

Name Type Description Default
vectors

Single vector or batch of variable-length vectors to query

required
count

Maximum number of nearest neighbors to return per query

10
kwargs

Additional arguments passed to parent Index.search()

{}

Returns:

Type Description

Matches for single query or BatchMatches for batch queries

Raises:

Type Description
ValueError

If count < 1

load

load(path_or_buffer=None, progress=None)

Load index from file or buffer and restore max_dim from saved ndim.

CRITICAL: After loading, we must restore the custom NPHD metric because usearch's load() overwrites it with the saved metric (standard Hamming).

Parameters:

Name Type Description Default
path_or_buffer

Path or buffer to load from (defaults to self.path)

None
progress

Optional progress callback

None

view

view(path_or_buffer=None, progress=None)

Memory-map index from file or buffer and restore max_dim from saved ndim.

CRITICAL: After viewing, we must restore the custom NPHD metric because usearch's view() overwrites it with the saved metric (standard Hamming).

Parameters:

Name Type Description Default
path_or_buffer

Path or buffer to view from (defaults to self.path)

None
progress

Optional progress callback

None

copy

copy()

Create a copy of this index.

Returns:

Type Description

New NphdIndex with same configuration and data

restore staticmethod

restore(path_or_buffer, view=False, **kwargs)

Restore a NphdIndex from a saved file or buffer.

Parameters:

Name Type Description Default
path_or_buffer

Path or buffer to restore from

required
view

If True, memory-map the index instead of loading

False
kwargs

Additional arguments passed to NphdIndex constructor

{}

Returns:

Type Description

Restored NphdIndex or None if file is invalid

options: show_source: false heading_level: 3 members_order: source

ShardedNphdIndex

Multi-shard index combining automatic sharding with NPHD support for variable-length vectors.

ShardedNphdIndex

ShardedNphdIndex(
    *,
    max_dim: int | None = None,
    path: str | PathLike,
    **kwargs: Any,
)

Bases: ShardedIndex

Sharded index for variable-length binary bit-vectors with NPHD metric.

Combines ShardedIndex's automatic sharding with NphdIndex's support for variable-length vectors and Normalized Prefix Hamming Distance metric.

CONCURRENCY: Single-process only. No file locking. Use async/await within a single process for concurrent access.

Parameters:

Name Type Description Default
max_dim int | None

Maximum bits per vector (auto-detected from existing shards if omitted)

None
path str | PathLike

Directory path for shard storage (required)

required
shard_size

Size limit in bytes before rotating shards (default 1GB)

required
connectivity

HNSW connectivity parameter (M)

required
expansion_add

Search depth on insertions (efConstruction)

required
expansion_search

Search depth on queries (ef)

required

Initialize a sharded NPHD index.

max_dim property

max_dim: int

Maximum number of bits per vector.

max_bytes property

max_bytes: int

Maximum number of bytes per vector.

vectors property

vectors: ShardedNphdIndexedVectors

Lazy iterator over all unpadded vectors across all shards.

Returns a ShardedNphdIndexedVectors object that supports: - Iteration: for vec in idx.vectors - Length: len(idx.vectors) - Indexing: idx.vectors[0], idx.vectors[-1] - Slicing: idx.vectors[:10] - Numpy conversion: np.asarray(idx.vectors) (requires uniform vector lengths)

Vectors are returned unpadded (variable-length), consistent with the get() API. This is a live view - reflects current state at iteration time.

Returns:

Type Description
ShardedNphdIndexedVectors

ShardedNphdIndexedVectors iterator

add

add(
    keys: int | None | Any,
    vectors: NDArray[Any],
    *,
    copy: bool = True,
    threads: int = 0,
    log: str | bool = False,
    progress: Callable[[int, int], bool] | None = None,
) -> int | NDArray[np.uint64]

Add variable-length binary vectors to the index.

Pads vectors before adding to ensure consistent storage across shards.

Parameters:

Name Type Description Default
keys int | None | Any

Integer key(s) or None for auto-generation

required
vectors NDArray[Any]

Single vector or batch of variable-length vectors to add

required
copy bool

Whether to copy vectors into index

True
threads int

Number of threads (0 = auto)

0
log str | bool

Enable progress logging

False
progress Callable[[int, int], bool] | None

Progress callback

None

Returns:

Type Description
int | NDArray[uint64]

Key(s) for added vectors

search

search(
    vectors: NDArray[Any],
    count: int = 10,
    *,
    radius: float = float("inf"),
    threads: int = 0,
    exact: bool = False,
    log: str | bool = False,
    progress: Callable[[int, int], bool] | None = None,
) -> Matches | BatchMatches

Search for nearest neighbors of query vector(s).

Pads query vectors before searching to match stored format.

Parameters:

Name Type Description Default
vectors NDArray[Any]

Query vector or batch of variable-length vectors to query

required
count int

Maximum number of nearest neighbors to return per query

10
radius float

Maximum distance for results

float('inf')
threads int

Number of threads (0 = auto)

0
exact bool

Perform exact search

False
log str | bool

Enable progress logging

False
progress Callable[[int, int], bool] | None

Progress callback

None

Returns:

Type Description
Matches | BatchMatches

Matches for single query, BatchMatches for batch

get

get(
    keys: int | Any, dtype: Any = None
) -> NDArray[Any] | list | None

Retrieve unpadded variable-length vectors by key(s) from any shard.

Parameters:

Name Type Description Default
keys int | Any

Integer key(s) to lookup

required
dtype Any

Optional data type for returned vectors

None

Returns:

Type Description
NDArray[Any] | list | None

Unpadded vector(s) or None for missing keys

__repr__

__repr__() -> str

Return string representation of the sharded NPHD index.

options: show_source: false heading_level: 3 members_order: source

ShardedIndex

Generic sharded index for any metric. Use ShardedNphdIndex for NPHD workloads.

ShardedIndex

ShardedIndex(
    *,
    ndim: int | None = None,
    metric: MetricKind | Any = MetricKind.Cos,
    dtype: ScalarKind | str | None = None,
    connectivity: int | None = None,
    expansion_add: int | None = None,
    expansion_search: int | None = None,
    multi: bool = False,
    path: str | PathLike,
    shard_size: int = DEFAULT_SHARD_SIZE,
    bloom_filter: bool = True,
)

Sharded vector index for scalable append-only storage.

Wraps usearch Index/Indexes to provide automatic sharding when the active shard exceeds the configured size limit. Finished shards are memory-mapped (view mode) for efficient read-only access, while the active shard is fully loaded (load mode) for read-write operations.

CONCURRENCY: Single-process only. No file locking. Use async/await within a single process for concurrent access.

Parameters:

Name Type Description Default
ndim int | None

Number of vector dimensions (auto-detected from existing shards if omitted)

None
metric MetricKind | Any

Distance metric (MetricKind or CompiledMetric)

Cos
dtype ScalarKind | str | None

Scalar type for vectors (ScalarKind)

None
connectivity int | None

HNSW connectivity parameter (M)

None
expansion_add int | None

Search depth on insertions (efConstruction)

None
expansion_search int | None

Search depth on queries (ef)

None
multi bool

Allow multiple vectors per key

False
path str | PathLike

Directory path for shard storage (required)

required
shard_size int

Size limit in bytes before rotating shards (default 1GB)

DEFAULT_SHARD_SIZE
bloom_filter bool

Enable bloom filter for fast non-existent key rejection

True

Initialize a sharded index.

size property

size: int

Total number of vectors across all shards.

ndim property

ndim: int

Vector dimensionality.

dtype property

dtype: ScalarKind

Scalar type for vectors.

metric property

metric: MetricKind | Any

Distance metric.

metric_kind property

metric_kind: MetricKind

Distance metric kind.

connectivity property

connectivity: int

HNSW connectivity parameter.

expansion_add property writable

expansion_add: int

Expansion parameter for additions.

expansion_search: int

Expansion parameter for searches.

multi property

multi: bool

Whether multiple vectors per key are allowed.

path property

path: Path

Directory path for shard storage.

shard_count property

shard_count: int

Number of shard files.

memory_usage property

memory_usage: int

Estimated memory usage across all shards.

serialized_length property

serialized_length: int

Serialized length of active shard.

capacity property

capacity: int

Capacity of active shard.

keys property

keys: ShardedIndexedKeys

Lazy iterator over all keys across all shards.

Returns a ShardedIndexedKeys object that supports: - Iteration: for key in idx.keys - Length: len(idx.keys) - Indexing: idx.keys[0], idx.keys[-1] - Slicing: idx.keys[:10] - Numpy conversion: np.asarray(idx.keys)

This is a live view - reflects current state at iteration time.

Returns:

Type Description
ShardedIndexedKeys

ShardedIndexedKeys iterator

vectors property

vectors: ShardedIndexedVectors

Lazy iterator over all vectors across all shards.

Returns a ShardedIndexedVectors object that supports: - Iteration: for vec in idx.vectors - Length: len(idx.vectors) - Indexing: idx.vectors[0], idx.vectors[-1] - Slicing: idx.vectors[:10] - Numpy conversion: np.asarray(idx.vectors)

This is a live view - reflects current state at iteration time.

Note: Unlike usearch Index.vectors which returns an np.ndarray immediately, this returns a lazy iterator appropriate for larger-than-RAM indexes.

Returns:

Type Description
ShardedIndexedVectors

ShardedIndexedVectors iterator

add

add(
    keys: int | None | Any,
    vectors: NDArray[Any],
    *,
    copy: bool = True,
    threads: int = 0,
    log: str | bool = False,
    progress: Callable[[int, int], bool] | None = None,
) -> int | NDArray[np.uint64]

Add vectors to the active shard, rotating if size exceeded.

Parameters:

Name Type Description Default
keys int | None | Any

Integer key(s) or None for auto-generation

required
vectors NDArray[Any]

Vector or batch of vectors to add

required
copy bool

Whether to copy vectors into index

True
threads int

Number of threads (0 = auto)

0
log str | bool

Enable progress logging

False
progress Callable[[int, int], bool] | None

Progress callback

None

Returns:

Type Description
int | NDArray[uint64]

Key(s) for added vectors

search

search(
    vectors: NDArray[Any],
    count: int = 10,
    *,
    radius: float = float("inf"),
    threads: int = 0,
    exact: bool = False,
    log: str | bool = False,
    progress: Callable[[int, int], bool] | None = None,
) -> Matches | BatchMatches

Search across all shards, merging and sorting results.

Parameters:

Name Type Description Default
vectors NDArray[Any]

Query vector or batch of vectors

required
count int

Maximum number of results per query

10
radius float

Maximum distance for results

float('inf')
threads int

Number of threads (0 = auto)

0
exact bool

Perform exact search

False
log str | bool

Enable progress logging

False
progress Callable[[int, int], bool] | None

Progress callback

None

Returns:

Type Description
Matches | BatchMatches

Matches for single query, BatchMatches for batch

Raises:

Type Description
ValueError

If count < 1

get

get(
    keys: int | Any, dtype: Any = None
) -> NDArray[Any] | list | None

Retrieve vectors by key from any shard.

Parameters:

Name Type Description Default
keys int | Any

Integer key(s) to lookup

required
dtype Any

Optional data type for returned vectors

None

Returns:

Type Description
NDArray[Any] | list | None

Vector(s) or None for missing keys

contains

contains(keys: int | Any) -> bool | NDArray[np.bool_]

Check if keys exist in any shard.

When bloom_filter=True (default), uses bloom filter to quickly reject non-existent keys.

Parameters:

Name Type Description Default
keys int | Any

Integer key(s) to check

required

Returns:

Type Description
bool | NDArray[bool_]

Boolean or array of booleans

__contains__

__contains__(keys: int | Any) -> bool | NDArray[np.bool_]

Support 'in' operator.

count

count(keys: int | Any) -> int | NDArray[np.uint64]

Count occurrences of keys across all shards (sum aggregation).

Parameters:

Name Type Description Default
keys int | Any

Integer key(s) to count

required

Returns:

Type Description
int | NDArray[uint64]

Count or array of counts

save

save(
    path_or_buffer: str | PathLike | None = None,
    progress: Callable[[int, int], bool] | None = None,
) -> None

Save active shard and bloom filter to disk.

Parameters:

Name Type Description Default
path_or_buffer str | PathLike | None

Ignored (uses internal path management)

None
progress Callable[[int, int], bool] | None

Progress callback

None

rebuild_bloom

rebuild_bloom(
    save: bool = True, log_progress: bool = True
) -> int

Rebuild bloom filter from all existing keys.

Use this to populate the bloom filter for an existing index that was created without bloom filter support, or to repair a corrupted filter.

Processes keys shard-by-shard in batches for efficiency.

Parameters:

Name Type Description Default
save bool

Whether to save the bloom filter to disk after rebuilding

True
log_progress bool

Whether to log progress per shard

True

Returns:

Type Description
int

Number of keys added to the bloom filter

metadata staticmethod

metadata(path: str | PathLike) -> dict | None

Extract metadata from a sharded index directory.

Parameters:

Name Type Description Default
path str | PathLike

Directory containing shard files

required

Returns:

Type Description
dict | None

Metadata dict or None if invalid

__len__

__len__() -> int

Total number of vectors across all shards.

remove

remove(*args: Any, **kwargs: Any) -> None

Not supported - append-only design.

__delitem__

__delitem__(keys: Any) -> None

Not supported - append-only design.

rename

rename(*args: Any, **kwargs: Any) -> None

Not supported - append-only design.

join

join(*args: Any, **kwargs: Any) -> None

Not supported for sharded indexes.

cluster

cluster(*args: Any, **kwargs: Any) -> None

Not supported for sharded indexes.

pairwise_distance

pairwise_distance(*args: Any, **kwargs: Any) -> None

Not supported for sharded indexes.

copy

copy() -> None

Not supported - too complex with multiple shards.

clear

clear() -> None

Not supported - would need to handle multiple files.

reset

reset() -> None

Not supported - would need to handle multiple files.

__repr__

__repr__() -> str

Return string representation of the sharded index.

options: show_source: false heading_level: 3 members_order: source

ScalableBloomFilter

Scalable bloom filter for efficient probabilistic key existence checks.

ScalableBloomFilter

ScalableBloomFilter(
    initial_capacity: int = 10000000,
    fpr: float = 0.01,
    growth_factor: float = 2.0,
)

Scalable bloom filter that grows automatically as elements are added.

Chains multiple fixed-size bloom filters to support unlimited growth while maintaining the target false positive rate. Each new filter has progressively tighter FPR to keep the overall rate bounded.

Parameters:

Name Type Description Default
initial_capacity int

Initial number of elements before first growth

10000000
fpr float

Target false positive rate (0.0-1.0)

0.01
growth_factor float

Capacity multiplier for each new filter

2.0

Initialize a scalable bloom filter.

count property

count: int

Approximate number of elements added.

current_capacity property

current_capacity: int

Total capacity across all filters.

filter_count property

filter_count: int

Number of bloom filters in the chain.

add

add(key: int) -> None

Add a single key to the bloom filter.

Parameters:

Name Type Description Default
key int

Integer key to add (uint64)

required

add_batch

add_batch(keys: Sequence[int]) -> None

Add multiple keys to the bloom filter efficiently.

Uses native batch operations and handles capacity growth properly.

Parameters:

Name Type Description Default
keys Sequence[int]

Sequence of integer keys to add

required

contains

contains(key: int) -> bool

Check if a key might be in the filter.

Parameters:

Name Type Description Default
key int

Integer key to check

required

Returns:

Type Description
bool

False if definitely not present, True if possibly present

contains_batch

contains_batch(keys: Sequence[int]) -> list[bool]

Check if multiple keys might be in the filter.

Uses native Rust batch operations for throughput. Each filter in the chain is checked via a single batch call, and results are OR-combined.

Parameters:

Name Type Description Default
keys Sequence[int]

Sequence of integer keys to check

required

Returns:

Type Description
list[bool]

List of booleans (False=definitely not, True=possibly present)

clear

clear() -> None

Clear all filters and reset to initial state.

save

save(path: str | Path) -> None

Save bloom filter to disk.

File format: - 4 bytes: magic ("ISBF") - 1 byte: version - 8 bytes: count (uint64) - 4 bytes: initial_capacity (uint32) - 8 bytes: fpr (float64) - 8 bytes: growth_factor (float64) - 4 bytes: num_filters (uint32) - For each filter: - 4 bytes: capacity (uint32) - 4 bytes: hashes (uint32) - 4 bytes: data_len (uint32) - data_len bytes: filter data

Parameters:

Name Type Description Default
path str | Path

File path to save to

required

load classmethod

load(path: str | Path) -> ScalableBloomFilter

Load bloom filter from disk.

Parameters:

Name Type Description Default
path str | Path

File path to load from

required

Returns:

Type Description
ScalableBloomFilter

Restored ScalableBloomFilter

Raises:

Type Description
ValueError

If file format is invalid

__len__

__len__() -> int

Return approximate number of elements.

__contains__

__contains__(key: int) -> bool

Support 'in' operator.

__repr__

__repr__() -> str

Return string representation.

options: show_source: false heading_level: 3 members_order: source

timer

Context manager for timing operations with loguru integration.

timer

timer(message: str, log_start=False)

Context manager for timing code blocks and logging elapsed duration.

Logs a message with the elapsed time on exit using loguru.

Parameters:

Name Type Description Default
message str

Description of the operation being timed.

required
log_start

If True, log a "started" message on entry.

False

__enter__

__enter__()

Start the timer.

__exit__

__exit__(exc_type, exc_value, traceback)

Stop the timer and log elapsed duration.

options: show_source: false heading_level: 3 members_order: source