API Reference¶

Auto-generated documentation for all public classes in iscc-usearch.

NphdIndex¶

Single-file index for variable-length binary bit-vectors with NPHD metric.

NphdIndex ¶

NphdIndex(max_dim=256, **kwargs)

Bases: Index

Fast approximate nearest neighbor search for variable-length binary bit-vectors.

Supports Normalized Prefix Hamming Distance (NPHD) metric and packed binary vectors as np.uint8 arrays of variable length. Vector keys must be integers.

CONCURRENCY: Single-process only. The underlying .usearch files have no file locking or multi-process coordination. Running multiple processes against the same index may corrupt data. Use a single process with async/await for concurrent connections.

UPSERT: Batch upsert requires uniform-length vectors. For variable-length batch upsert, call upsert() individually for each vector: for k, v in zip(keys, vecs): idx.upsert(k, v)

Create a new NPHD index.

dirty `property` ¶

dirty

Number of unsaved key mutations since last save/load/view/reset.

Counts individual keys added or removed. Useful for implementing caller-driven flush policies (e.g. "save every N writes").

Returns:

Type	Description
	Count of key mutations since last persistence operation.

add ¶

add(keys, vectors, **kwargs)

Add variable-length binary vectors to the index.

Parameters:

Name	Description	Default
`keys`	Integer key(s) or None for auto-generation	required
`vectors`	Single vector, 2D array of uniform vectors, or list of variable-length vectors	required
`kwargs`	Additional arguments passed to parent Index.add()	`{}`

Returns:

Type	Description
	Array of keys for added vectors

remove ¶

remove(keys, **kwargs)

Remove vectors by key(s).

Only counts keys that actually exist toward the dirty counter.

Parameters:

Name	Type	Description	Default
`keys`		Integer key(s) to remove	required
`kwargs`		Additional arguments passed to parent Index.remove()	`{}`

get ¶

get(keys, dtype=None)

Retrieve unpadded variable-length vectors by key(s).

Parameters:

Name	Type	Description	Default
`keys`		Integer key(s) to lookup	required
`dtype`		Optional data type (defaults to index dtype)	`None`

Returns:

Type	Description
	Unpadded vector(s) or None for missing keys

search ¶

search(
    vectors,
    count=10,
    radius=math.inf,
    *,
    threads=0,
    exact=False,
    log=False,
    progress=None,
)

Search for nearest neighbors of query vector(s).

Parameters:

Name	Description	Default
`vectors`	Single vector or batch of variable-length vectors to query	required
`count`	Maximum number of nearest neighbors to return per query	`10`
`radius`	Maximum distance for results	`inf`
`threads`	Number of threads (0 = auto)	`0`
`exact`	Perform exact search	`False`
`log`	Enable progress logging	`False`
`progress`	Optional progress callback	`None`

Returns:

Type	Description
	Matches for single query or BatchMatches for batch queries

Raises:

Type	Description
`ValueError`	If count < 1

save ¶

save(*args, **kwargs)

Save index to file or buffer and reset dirty counter.

Accepts the same arguments as usearch Index.save().

Returns:

Type	Description
	Serialized buffer when saving to memory, None when saving to file.

load ¶

load(path_or_buffer=None, progress=None)

Load index from file or buffer and restore max_dim from saved ndim.

Parameters:

Name	Type	Description	Default
`path_or_buffer`		Path or buffer to load from (defaults to self.path)	`None`
`progress`		Optional progress callback	`None`

view ¶

view(path_or_buffer=None, progress=None)

Memory-map index from file or buffer and restore max_dim from saved ndim.

Parameters:

Name	Type	Description	Default
`path_or_buffer`		Path or buffer to view from (defaults to self.path)	`None`
`progress`		Optional progress callback	`None`

reset ¶

reset()

Reset the index to empty state and clear dirty counter.

copy ¶

copy()

Create a copy of this index.

Returns:

Type	Description
	New NphdIndex with same configuration and data

restore `staticmethod` ¶

restore(path_or_buffer, view=False, **kwargs)

Restore a NphdIndex from a saved file or buffer.

Parameters:

Name	Description	Default
`path_or_buffer`	Path or buffer to restore from	required
`view`	If True, memory-map the index instead of loading	`False`
`kwargs`	Additional arguments passed to NphdIndex constructor	`{}`

Returns:

Type	Description
	Restored NphdIndex or None if file is invalid

ShardedNphdIndex¶

Multi-shard index combining automatic sharding with NPHD support for variable-length vectors.

ShardedNphdIndex ¶

ShardedNphdIndex(
    *,
    max_dim: int | None = None,
    path: str | PathLike,
    shard_size: int = DEFAULT_SHARD_SIZE,
    connectivity: int | None = None,
    expansion_add: int | None = None,
    expansion_search: int | None = None,
    **kwargs: Any,
)

Bases: ShardedIndex

Sharded index for variable-length binary bit-vectors with NPHD metric.

Combines ShardedIndex's automatic sharding with NphdIndex's support for variable-length vectors and Normalized Prefix Hamming Distance metric.

CONCURRENCY: Single-process only. No file locking. Use async/await within a single process for concurrent access.

Initialize a sharded NPHD index.

Parameters:

Name	Type	Description	Default
`max_dim`	`int \| None`	Maximum bits per vector (auto-detected from existing shards if omitted)	`None`
`path`	`str \| PathLike`	Directory path for shard storage (required)	required
`shard_size`	`int`	Size limit in bytes before rotating shards (default 1GB)	`DEFAULT_SHARD_SIZE`
`connectivity`	`int \| None`	HNSW connectivity parameter (M)	`None`
`expansion_add`	`int \| None`	Search depth on insertions (efConstruction)	`None`
`expansion_search`	`int \| None`	Search depth on queries (ef)	`None`

max_dim `property` ¶

max_dim: int

Maximum number of bits per vector.

max_bytes `property` ¶

max_bytes: int

Maximum number of bytes per vector.

vectors `property` ¶

vectors: ShardedNphdIndexedVectors

Lazy iterator over all unpadded vectors across all shards.

Returns a ShardedNphdIndexedVectors object that supports: - Iteration: for vec in idx.vectors - Length: len(idx.vectors) - Indexing: idx.vectors[0], idx.vectors[-1] - Slicing: idx.vectors[:10] - Numpy conversion: np.asarray(idx.vectors) (requires uniform vector lengths)

Vectors are returned unpadded (variable-length), consistent with the get() API. This is a live view - reflects current state at iteration time.

Returns:

Type	Description
`ShardedNphdIndexedVectors`	ShardedNphdIndexedVectors iterator

add ¶

add(
    keys: int | None | Any,
    vectors: NDArray[Any] | Sequence[NDArray[Any]],
    *,
    copy: bool = True,
    threads: int = 0,
    log: str | bool = False,
    progress: Callable[[int, int], bool] | None = None,
) -> int | NDArray[np.uint64]

Add variable-length binary vectors to the index.

Pads vectors before adding to ensure consistent storage across shards.

Parameters:

Name	Type	Description	Default
`keys`	`int \| None \| Any`	Integer key(s) or None for auto-generation	required
`vectors`	`NDArray[Any] \| Sequence[NDArray[Any]]`	Single vector or batch of variable-length vectors to add	required
`copy`	`bool`	Whether to copy vectors into index	`True`
`threads`	`int`	Number of threads (0 = auto)	`0`
`log`	`str \| bool`	Enable progress logging	`False`
`progress`	`Callable[[int, int], bool] \| None`	Progress callback	`None`

Returns:

Type	Description
`int \| NDArray[uint64]`	Key(s) for added vectors

upsert ¶

upsert(
    keys: Any,
    vectors: NDArray[Any] | Sequence[NDArray[Any]],
    **kwargs: Any,
) -> int | NDArray

Insert or update variable-length vectors by key.

Handles ragged/mixed-length vectors that np.asarray cannot stack.

Parameters:

Name	Type	Description	Default
`keys`	`Any`	Key(s) — None not accepted	required
`vectors`	`NDArray[Any] \| Sequence[NDArray[Any]]`	Single vector or batch of variable-length vectors	required

Returns:

Type	Description
`int \| NDArray`	Key(s) for stored vectors

Raises:

Type	Description
`ValueError`	If keys is None or multi=True
`RuntimeError`	If index is read-only

add_once ¶

add_once(
    keys: int | Any,
    vectors: NDArray[Any] | Sequence[NDArray[Any]],
    **kwargs: Any,
) -> int | NDArray | None

Add variable-length vectors, skipping keys that already exist.

First-write-wins: existing keys kept unchanged. Batch duplicates deduplicated (first occurrence kept).

Not atomic under concurrent writes — caller must serialize if needed.

Parameters:

Name	Type	Description	Default
`keys`	`int \| Any`	Integer key(s) — None not accepted	required
`vectors`	`NDArray[Any] \| Sequence[NDArray[Any]]`	Single vector or batch of variable-length vectors	required
`kwargs`	`Any`	Additional arguments passed to add()	`{}`

Returns:

Type	Description
`int \| NDArray \| None`	Key(s) added, empty array if all skipped, None if single key skipped

Raises:

Type	Description
`ValueError`	If keys is None or keys/vectors length mismatch

search ¶

search(
    vectors: NDArray[Any],
    count: int = 10,
    *,
    radius: float = float("inf"),
    threads: int = 0,
    exact: bool = False,
    log: str | bool = False,
    progress: Callable[[int, int], bool] | None = None,
) -> Matches | BatchMatches

Search for nearest neighbors of query vector(s).

Pads query vectors before searching to match stored format.

Parameters:

Name	Type	Description	Default
`vectors`	`NDArray[Any]`	Query vector or batch of variable-length vectors to query	required
`count`	`int`	Maximum number of nearest neighbors to return per query	`10`
`radius`	`float`	Maximum distance for results	`float('inf')`
`threads`	`int`	Number of threads (0 = auto)	`0`
`exact`	`bool`	Perform exact search	`False`
`log`	`str \| bool`	Enable progress logging	`False`
`progress`	`Callable[[int, int], bool] \| None`	Progress callback	`None`

Returns:

Type	Description
`Matches \| BatchMatches`	Matches for single query, BatchMatches for batch

get ¶

get(
    keys: int | Any, dtype: Any = None
) -> NDArray[Any] | list | None

Retrieve unpadded variable-length vectors by key(s) from any shard.

Parameters:

Name	Type	Description	Default
`keys`	`int \| Any`	Integer key(s) to lookup	required
`dtype`	`Any`	Optional data type for returned vectors	`None`

Returns:

Type	Description
`NDArray[Any] \| list \| None`	Unpadded vector(s) or None for missing keys

repr ¶

__repr__() -> str

Return string representation of the sharded NPHD index.

ShardedIndex¶

Generic sharded index for any metric. Use ShardedNphdIndex for NPHD workloads.

ShardedIndex ¶

ShardedIndex(
    *,
    ndim: int | None = None,
    metric: MetricKind | Any = MetricKind.Cos,
    dtype: ScalarKind | str | None = None,
    connectivity: int | None = None,
    expansion_add: int | None = None,
    expansion_search: int | None = None,
    multi: bool = False,
    path: str | PathLike,
    shard_size: int = DEFAULT_SHARD_SIZE,
    bloom_filter: bool = True,
    read_only: bool = False,
)

Sharded vector index with full CRUD support.

Wraps usearch Index/Indexes to provide automatic sharding when the active shard exceeds the configured size limit. Finished shards are memory-mapped (view mode) for efficient read-only access, while the active shard is fully loaded (load mode) for read-write operations.

CRUD semantics (requires multi=False for remove/upsert): - add(): append to active shard - remove(): lazy-delete from active shard, tombstone in view shards - upsert(): remove + add (last-write-wins for batch duplicates) - compact(): rebuild view shards excluding tombstoned/duplicate entries

CONCURRENCY: Single-process only. No file locking. Use async/await within a single process for concurrent access.

Parameters:

Name	Type	Description	Default
`ndim`	`int \| None`	Number of vector dimensions (auto-detected from existing shards if omitted)	`None`
`metric`	`MetricKind \| Any`	Distance metric (MetricKind or CompiledMetric)	`Cos`
`dtype`	`ScalarKind \| str \| None`	Scalar type for vectors (ScalarKind)	`None`
`connectivity`	`int \| None`	HNSW connectivity parameter (M)	`None`
`expansion_add`	`int \| None`	Search depth on insertions (efConstruction)	`None`
`expansion_search`	`int \| None`	Search depth on queries (ef)	`None`
`multi`	`bool`	Allow multiple vectors per key	`False`
`path`	`str \| PathLike`	Directory path for shard storage (required)	required
`shard_size`	`int`	Size limit in bytes before rotating shards (default 1GB)	`DEFAULT_SHARD_SIZE`
`bloom_filter`	`bool`	Enable bloom filter for fast non-existent key rejection	`True`
`read_only`	`bool`	Open all shards in view mode (memory-mapped, read-only). Raises ValueError if no existing shards are found. Write operations raise RuntimeError.	`False`

Initialize a sharded index.

read_only `property` ¶

read_only: bool

Whether the index is in read-only mode.

size `property` ¶

size: int

Total number of logical vectors (approximate with cross-shard duplicates).

Exact when no cross-shard duplicates exist (the common case). May slightly overcount after upsert+rotation creates temporary duplicates. Compaction makes it exact. Tombstones are subtracted 1:1.

ndim `property` ¶

ndim: int

Vector dimensionality.

dtype `property` ¶

dtype: ScalarKind

Scalar type for vectors.

metric `property` ¶

metric: MetricKind | Any

Distance metric.

metric_kind `property` ¶

metric_kind: MetricKind

Distance metric kind.

connectivity `property` ¶

connectivity: int

HNSW connectivity parameter.

expansion_add `property` `writable` ¶

expansion_add: int

Expansion parameter for additions.

expansion_search `property` `writable` ¶

expansion_search: int

Expansion parameter for searches.

multi `property` ¶

multi: bool

Whether multiple vectors per key are allowed.

path `property` ¶

path: Path

Directory path for shard storage.

shard_count `property` ¶

shard_count: int

Number of shard files.

memory_usage `property` ¶

memory_usage: int

Estimated memory usage across all shards.

serialized_length `property` ¶

serialized_length: int

Serialized length of active shard.

capacity `property` ¶

capacity: int

Capacity of active shard.

dirty `property` ¶

dirty: int

Number of unsaved key mutations since last save/reset.

Counts individual keys added or removed. Useful for implementing caller-driven flush policies (e.g. "save every N writes").

Always returns 0 for read-only indexes.

Returns:

Type	Description
`int`	Count of key mutations since last persistence operation.

tombstone_count `property` ¶

tombstone_count: int

Number of pending tombstones (view shard deletions awaiting compaction).

keys `property` ¶

keys: ShardedIndexedKeys

Lazy iterator over all keys across all shards.

Returns a ShardedIndexedKeys object that supports: - Iteration: for key in idx.keys - Length: len(idx.keys) - Indexing: idx.keys[0], idx.keys[-1] - Slicing: idx.keys[:10] - Numpy conversion: np.asarray(idx.keys)

This is a live view - reflects current state at iteration time.

Returns:

Type	Description
`ShardedIndexedKeys`	ShardedIndexedKeys iterator

vectors `property` ¶

vectors: ShardedIndexedVectors

Lazy iterator over all vectors across all shards.

Returns a ShardedIndexedVectors object that supports: - Iteration: for vec in idx.vectors - Length: len(idx.vectors) - Indexing: idx.vectors[0], idx.vectors[-1] - Slicing: idx.vectors[:10] - Numpy conversion: np.asarray(idx.vectors)

This is a live view - reflects current state at iteration time.

Note: Unlike usearch Index.vectors which returns an np.ndarray immediately, this returns a lazy iterator appropriate for larger-than-RAM indexes.

Returns:

Type	Description
`ShardedIndexedVectors`	ShardedIndexedVectors iterator

add ¶

add(
    keys: int | None | Any,
    vectors: NDArray[Any],
    *,
    copy: bool = True,
    threads: int = 0,
    log: str | bool = False,
    progress: Callable[[int, int], bool] | None = None,
) -> int | NDArray[np.uint64]

Add vectors to the active shard, rotating if size exceeded.

Parameters:

Name	Type	Description	Default
`keys`	`int \| None \| Any`	Integer key(s) or None for auto-generation	required
`vectors`	`NDArray[Any]`	Vector or batch of vectors to add	required
`copy`	`bool`	Whether to copy vectors into index	`True`
`threads`	`int`	Number of threads (0 = auto)	`0`
`log`	`str \| bool`	Enable progress logging	`False`
`progress`	`Callable[[int, int], bool] \| None`	Progress callback	`None`

Returns:

Type	Description
`int \| NDArray[uint64]`	Key(s) for added vectors

add_once ¶

add_once(
    keys: int | Any, vectors: NDArray[Any], **kwargs: Any
) -> int | NDArray | None

Add vectors, silently skipping keys that already exist.

First-write-wins: if a key is already in the index, its vector is kept unchanged. Duplicate keys within a single batch are deduplicated — only the first occurrence is added.

Not atomic under concurrent writes — caller must serialize if needed (see ShardedIndex concurrency model).

Parameters:

Name	Type	Description	Default
`keys`	`int \| Any`	Integer key(s) — None not accepted	required
`vectors`	`NDArray[Any]`	Vector or batch of vectors to add	required
`kwargs`	`Any`	Additional arguments passed to add()	`{}`

Returns:

Type	Description
`int \| NDArray \| None`	Key(s) added, empty array if all skipped, None if single key skipped

Raises:

Type	Description
`ValueError`	If keys is None or keys/vectors length mismatch
`RuntimeError`	If index is read-only

upsert ¶

upsert(
    keys: Any, vectors: NDArray[Any], **kwargs: Any
) -> int | NDArray

Insert or update vectors by key.

For new keys: adds to active shard. For existing keys: removes old entry, adds new entry to active shard.

Parameters:

Name	Type	Description	Default
`keys`	`Any`	Key(s) — None not accepted	required
`vectors`	`NDArray[Any]`	Vector(s) to store	required

Returns:

Type	Description
`int \| NDArray`	Key(s) for stored vectors

Raises:

Type	Description
`ValueError`	If keys is None or multi=True
`RuntimeError`	If index is read-only

search ¶

search(
    vectors: NDArray[Any],
    count: int = 10,
    *,
    radius: float = float("inf"),
    threads: int = 0,
    exact: bool = False,
    log: str | bool = False,
    progress: Callable[[int, int], bool] | None = None,
) -> Matches | BatchMatches

Search across all shards, merging and sorting results.

Active shard results suppress view shard versions of the same key. Tombstoned keys are filtered from view shard results. Search is approximate for cross-view-shard duplicates — compaction resolves those.

Parameters:

Name	Type	Description	Default
`vectors`	`NDArray[Any]`	Query vector or batch of vectors	required
`count`	`int`	Maximum number of results per query	`10`
`radius`	`float`	Maximum distance for results	`float('inf')`
`threads`	`int`	Number of threads (0 = auto)	`0`
`exact`	`bool`	Perform exact search	`False`
`log`	`str \| bool`	Enable progress logging	`False`
`progress`	`Callable[[int, int], bool] \| None`	Progress callback	`None`

Returns:

Type	Description
`Matches \| BatchMatches`	Matches for single query, BatchMatches for batch

Raises:

Type	Description
`ValueError`	If count < 1

get ¶

get(
    keys: int | Any, dtype: Any = None
) -> NDArray[Any] | list | None

Retrieve vectors by key from any shard.

Parameters:

Name	Type	Description	Default
`keys`	`int \| Any`	Integer key(s) to lookup	required
`dtype`	`Any`	Optional data type for returned vectors	`None`

Returns:

Type	Description
`NDArray[Any] \| list \| None`	Vector(s) or None for missing keys

contains ¶

contains(keys: int | Any) -> bool | NDArray[np.bool_]

Check if keys exist in any shard.

When bloom_filter=True (default), uses bloom filter to quickly reject non-existent keys.

Parameters:

Name	Type	Description	Default
`keys`	`int \| Any`	Integer key(s) to check	required

Returns:

Type	Description
`bool \| NDArray[bool_]`	Boolean or array of booleans

contains ¶

__contains__(keys: int | Any) -> bool | NDArray[np.bool_]

Support 'in' operator.

count ¶

count(keys: int | Any) -> int | NDArray[np.uint64]

Count occurrences of keys across all shards (sum aggregation).

Parameters:

Name	Type	Description	Default
`keys`	`int \| Any`	Integer key(s) to count	required

Returns:

Type	Description
`int \| NDArray[uint64]`	Count or array of counts

save ¶

save(
    path_or_buffer: str | PathLike | None = None,
    progress: Callable[[int, int], bool] | None = None,
) -> None

Save active shard and bloom filter to disk.

ShardedIndex manages its own file layout — pass no arguments to save to the directory configured at construction time.

Parameters:

Name	Type	Description	Default
`path_or_buffer`	`str \| PathLike \| None`	Must be None. Raises TypeError if a path is provided.	`None`
`progress`	`Callable[[int, int], bool] \| None`	Progress callback	`None`

Raises:

Type	Description
`TypeError`	If path_or_buffer is not None.
`RuntimeError`	If index is read-only.

rebuild_bloom ¶

rebuild_bloom(
    save: bool = True, log_progress: bool = True
) -> int

Rebuild bloom filter from all existing keys.

Use this to populate the bloom filter for an existing index that was created without bloom filter support, or to repair a corrupted filter.

Processes keys shard-by-shard in batches for efficiency.

Parameters:

Name	Type	Description	Default
`save`	`bool`	Whether to save the bloom filter to disk after rebuilding	`True`
`log_progress`	`bool`	Whether to log progress per shard	`True`

Returns:

Type	Description
`int`	Number of keys added to the bloom filter

Raises:

Type	Description
`RuntimeError`	If index is read-only

compact ¶

compact() -> int

Rebuild view shards excluding tombstoned and cross-shard duplicate entries.

Processes shards newest-to-oldest. Keys already seen in newer shards (or the active shard) are dropped from older shards. Returns number of entries removed.

Two-phase approach: (1) collect data while shards are accessible, (2) release all mmap references, (3) execute file operations. This prevents Windows PermissionError from locked memory-mapped files.

Raises:

Type	Description
`RuntimeError`	If index is read-only

metadata `staticmethod` ¶

metadata(path: str | PathLike) -> dict | None

Extract metadata from a sharded index directory.

Parameters:

Name	Type	Description	Default
`path`	`str \| PathLike`	Directory containing shard files	required

Returns:

Type	Description
`dict \| None`	Metadata dict or None if invalid

len ¶

__len__() -> int

Total number of vectors across all shards.

remove ¶

remove(keys: Any, *, compact: bool = False) -> None

Remove vectors by key.

Active shard entries: USearch remove() (lazy deletion). View shard entries: tombstoned (suppressed on read, cleaned on compact()). Keys that exist only in the active shard are NOT tombstoned.

Parameters:

Name	Type	Description	Default
`keys`	`Any`	Key or sequence of keys to remove	required
`compact`	`bool`	If True, call USearch isolate() on active shard entries	`False`

Raises:

Type	Description
`RuntimeError`	If index is read-only
`ValueError`	If multi=True

delitem ¶

__delitem__(keys: Any) -> None

Remove vectors by key (del index[key]).

rename ¶

rename(*args: Any, **kwargs: Any) -> None

Not supported.

join ¶

join(*args: Any, **kwargs: Any) -> None

Not supported for sharded indexes.

cluster ¶

cluster(*args: Any, **kwargs: Any) -> None

Not supported for sharded indexes.

pairwise_distance ¶

pairwise_distance(*args: Any, **kwargs: Any) -> None

Not supported for sharded indexes.

copy ¶

copy() -> None

Not supported - too complex with multiple shards.

clear ¶

clear() -> None

Not supported - would need to handle multiple files.

reset ¶

reset() -> None

Release all resources and reset to empty-but-usable state.

Releases view shards, active shard, bloom filter, and tombstones in memory. Does not delete files on disk. After reset, the index is empty and ready for new add() calls with the same configuration.

Raises:

Type	Description
`RuntimeError`	If index is read-only

repr ¶

__repr__() -> str

Return string representation of the sharded index.

ShardedIndex128¶

Sharded index with 128-bit UUID keys. Uses bytes(16) for single keys and np.dtype('V16') arrays for batches.

ShardedIndex128 ¶

ShardedIndex128(*, path: str | PathLike, **kwargs: Any)

Bases: _UuidKeyMixin, ShardedIndex

Sharded vector index with 128-bit UUID keys.

Uses usearch's key_kind="uuid" for 128-bit composite keys represented as bytes(16) for single keys and np.dtype('V16') arrays for batches.

Auto-generation of keys (keys=None) is not supported — all keys must be provided explicitly.

Initialize a 128-bit sharded index.

Parameters:

Name	Type	Description	Default
`path`	`str \| PathLike`	Directory path for shard storage	required
`kwargs`	`Any`	Passed to ShardedIndex (key_kind is absorbed if present)	`{}`

ShardedNphdIndex128¶

Sharded NPHD index with 128-bit UUID keys for variable-length vectors.

ShardedNphdIndex128 ¶

ShardedNphdIndex128(*, path: str | PathLike, **kwargs: Any)

Bases: _UuidKeyMixin, ShardedNphdIndex

Sharded NPHD index with 128-bit UUID keys.

Combines ShardedNphdIndex's variable-length vector support with 128-bit composite keys represented as bytes(16) for single keys and np.dtype('V16') arrays for batches.

Auto-generation of keys (keys=None) is not supported — all keys must be provided explicitly.

Initialize a 128-bit sharded NPHD index.

Parameters:

Name	Type	Description	Default
`path`	`str \| PathLike`	Directory path for shard storage	required
`kwargs`	`Any`	Passed to ShardedNphdIndex (key_kind is absorbed if present)	`{}`

repr ¶

__repr__() -> str

Return string representation of the sharded NPHD 128-bit index.

ScalableBloomFilter¶

Scalable bloom filter for efficient probabilistic key existence checks.

ScalableBloomFilter ¶

ScalableBloomFilter(
    initial_capacity: int = 10000000,
    fpr: float = 0.01,
    growth_factor: float = 2.0,
)

Scalable bloom filter that grows automatically as elements are added.

Chains multiple fixed-size bloom filters to support unlimited growth while maintaining the target false positive rate. Each new filter has progressively tighter FPR to keep the overall rate bounded.

Parameters:

Name	Type	Description	Default
`initial_capacity`	`int`	Initial number of elements before first growth	`10000000`
`fpr`	`float`	Target false positive rate (0.0-1.0)	`0.01`
`growth_factor`	`float`	Capacity multiplier for each new filter	`2.0`

Initialize a scalable bloom filter.

count `property` ¶

count: int

Approximate number of elements added.

current_capacity `property` ¶

current_capacity: int

Total capacity across all filters.

filter_count `property` ¶

filter_count: int

Number of bloom filters in the chain.

add ¶

add(key: int | bytes) -> None

Add a single key to the bloom filter.

Parameters:

Name	Type	Description	Default
`key`	`int \| bytes`	Integer or bytes key to add	required

add_batch ¶

add_batch(keys: Sequence[int] | Sequence[bytes]) -> None

Add multiple keys to the bloom filter efficiently.

Uses native batch operations and handles capacity growth properly.

Parameters:

Name	Type	Description	Default
`keys`	`Sequence[int] \| Sequence[bytes]`	Sequence of integer or bytes keys to add	required

contains ¶

contains(key: int | bytes) -> bool

Check if a key might be in the filter.

Parameters:

Name	Type	Description	Default
`key`	`int \| bytes`	Integer or bytes key to check	required

Returns:

Type	Description
`bool`	False if definitely not present, True if possibly present

contains_batch ¶

contains_batch(
    keys: Sequence[int] | Sequence[bytes],
) -> list[bool]

Check if multiple keys might be in the filter.

Uses native Rust batch operations for throughput. Each filter in the chain is checked via a single batch call, and results are OR-combined.

Parameters:

Name	Type	Description	Default
`keys`	`Sequence[int] \| Sequence[bytes]`	Sequence of integer or bytes keys to check	required

Returns:

Type	Description
`list[bool]`	List of booleans (False=definitely not, True=possibly present)

clear ¶

clear() -> None

Clear all filters and reset to initial state.

save ¶

save(path: str | Path) -> None

Save bloom filter to disk atomically via temp file + rename.

Parameters:

Name	Type	Description	Default
`path`	`str \| Path`	File path to save to	required

load `classmethod` ¶

load(path: str | Path) -> ScalableBloomFilter

Load bloom filter from disk.

Parameters:

Name	Type	Description	Default
`path`	`str \| Path`	File path to load from	required

Returns:

Type	Description
`ScalableBloomFilter`	Restored ScalableBloomFilter

Raises:

Type	Description
`ValueError`	If file format is invalid

len ¶

__len__() -> int

Return approximate number of elements.

contains ¶

__contains__(key: int | bytes) -> bool

Support 'in' operator.

repr ¶

__repr__() -> str

Return string representation.

timer¶

Context manager for timing operations with loguru integration.

timer ¶

timer(message: str, log_start=False, level='DEBUG')

Context manager for timing code blocks and logging elapsed duration.

Logs a message with the elapsed time on exit using loguru.

Parameters:

Name	Type	Description	Default
`message`	`str`	Description of the operation being timed.	required
`log_start`		If True, log a "started" message on entry.	`False`
`level`		Log level for messages (default: "DEBUG").	`'DEBUG'`

enter ¶

__enter__()

Start the timer.

exit ¶

__exit__(exc_type, exc_value, traceback)

Stop the timer and log elapsed duration.

API Reference¶

NphdIndex¶

NphdIndex ¶

dirty property ¶

add ¶

remove ¶

get ¶

search ¶

save ¶

load ¶

view ¶

reset ¶

copy ¶

restore staticmethod ¶

ShardedNphdIndex¶

ShardedNphdIndex ¶

max_dim property ¶

max_bytes property ¶

vectors property ¶

add ¶

upsert ¶

add_once ¶

search ¶

get ¶

__repr__ ¶

ShardedIndex¶

ShardedIndex ¶

read_only property ¶

size property ¶

ndim property ¶

dtype property ¶

metric property ¶

metric_kind property ¶

connectivity property ¶

expansion_add property writable ¶

expansion_search property writable ¶

multi property ¶

path property ¶

shard_count property ¶

memory_usage property ¶

serialized_length property ¶

capacity property ¶

dirty property ¶

tombstone_count property ¶

keys property ¶

vectors property ¶

add ¶

add_once ¶

upsert ¶

search ¶

get ¶

contains ¶

__contains__ ¶

count ¶

save ¶

rebuild_bloom ¶

compact ¶

metadata staticmethod ¶

__len__ ¶

remove ¶

__delitem__ ¶

rename ¶

join ¶

cluster ¶

pairwise_distance ¶

copy ¶

clear ¶

reset ¶

__repr__ ¶

ShardedIndex128¶

ShardedIndex128 ¶

ShardedNphdIndex128¶

ShardedNphdIndex128 ¶

__repr__ ¶

ScalableBloomFilter¶

ScalableBloomFilter ¶

count property ¶

current_capacity property ¶

filter_count property ¶

add ¶

dirty `property` ¶

restore `staticmethod` ¶

max_dim `property` ¶

max_bytes `property` ¶

vectors `property` ¶

repr ¶

read_only `property` ¶

size `property` ¶

ndim `property` ¶

dtype `property` ¶

metric `property` ¶

metric_kind `property` ¶

connectivity `property` ¶

expansion_add `property` `writable` ¶

expansion_search `property` `writable` ¶

multi `property` ¶

path `property` ¶

shard_count `property` ¶

memory_usage `property` ¶

serialized_length `property` ¶

capacity `property` ¶

dirty `property` ¶

tombstone_count `property` ¶

keys `property` ¶

vectors `property` ¶

contains ¶

metadata `staticmethod` ¶

len ¶

delitem ¶

repr ¶

repr ¶

count `property` ¶

current_capacity `property` ¶

filter_count `property` ¶

load `classmethod` ¶

len ¶

contains ¶

repr ¶

enter ¶

exit ¶