Sharding design¶
The problem¶
HNSW graph construction slows as the graph grows because each insertion must search a larger
neighborhood. A single USearch Index averages ~11.7K vectors/sec over 1M inserts, with throughput
declining throughout. At hundreds of millions of vectors, this is a bottleneck.
A single index file must also fit in RAM for writes. Memory-mapping helps for reads, but write-heavy workloads need the full graph loaded.
Active shard vs. view shards¶
ShardedIndex splits storage into two tiers:
- Active shard (one, fully loaded in RAM): handles all writes. Stays small for consistent insert throughput.
- View shards (zero or more, memory-mapped): handle reads. Memory footprint is low because the OS pages data in on demand.
stateDiagram-v2
[*] --> Created: new shard
Created --> Filling: add vectors
Filling --> Full: size > shard_size
Full --> Saved: save to disk
Saved --> Viewed: reopen as mmap
Viewed --> [*]
note right of Filling: Active shard (RAM, read-write)
note right of Viewed: View shard (mmap, read-only)
When the active shard exceeds the configured shard_size, it is:
- Saved to disk as
shard_NNN.usearch. - Reopened in view mode (memory-mapped, read-only).
- Replaced by a fresh, empty active shard.
This rotation resets the HNSW insert curve and keeps throughput consistent.
Bloom filter integration¶
ShardedIndex has a ScalableBloomFilter that tracks all keys across all shards. This allows
O(1) rejection of non-existent keys in get(), contains(), and count():
graph TD
Q["get(key)"] --> BF{"Bloom filter<br/>contains(key)?"}
BF -->|"Definitely no"| NONE["Return None"]
BF -->|"Maybe yes"| AS{"Active shard<br/>contains(key)?"}
AS -->|"Yes"| RA["Return from active"]
AS -->|"No"| VS{"View shards<br/>(iterate)"}
VS -->|"Found"| RV["Return from view shard"]
VS -->|"Not found"| NONE2["Return None"]
Without the bloom filter, every key lookup must query each shard sequentially. With the bloom filter, keys that don't exist are rejected instantly.
The bloom filter is:
- Persisted alongside shard files as
bloom.isbf. - Updated automatically when vectors are added.
- Rebuilt via
rebuild_bloom()if corrupted or missing.
Search fan-out¶
Each search query fans out across all shards, then results are merged. View shards are searched
via USearch's Indexes class (which may parallelize internally), and the active shard is
searched separately. The two result sets are then combined:
- Query all view shards (via
Indexes.search()). - Query the active shard.
- Merge results using vectorized NumPy operations (concatenate, argsort, advanced indexing).
- Return top-k results sorted by distance.
Trade-offs¶
| Factor | Fewer shards (large shard_size) | More shards (small shard_size) |
|---|---|---|
| Query latency | Lower (fewer shards to search) | Higher (more shards to merge) |
| Add throughput | Degrades as shard fills | Stays consistent (frequent resets) |
| Memory usage | Higher (large active shard in RAM) | Lower (small active shard) |
| Disk I/O | Less frequent rotation | More frequent rotation |
The default shard_size of 1 GB is a reasonable starting point for most workloads. Tune it based
on your read/write ratio -- see Performance for benchmark data.
Tombstone-based deletion¶
View shards are memory-mapped and read-only at the filesystem level. remove() handles this by
using two strategies:
- Active shard entries: removed immediately via USearch's lazy deletion (the HNSW node is marked as deleted internally).
- View shard entries: tracked in a
_tombstonesset. Tombstoned keys are suppressed in search results and iterators without modifying the view shard files.
Tombstones are persisted as tombstones.npy alongside shard files, so deletion state survives
save/load cycles.
Compaction¶
Over time, tombstoned entries waste disk space and increase search fan-out overhead. compact()
rebuilds view shards to physically remove tombstoned and cross-shard duplicate entries:
- Collect live entries from each view shard (newest-to-oldest, active shard keys authoritative).
- Release all memory-mapped references (required on Windows).
- Rebuild shard files containing only live entries; delete empty shards.
- Reopen rebuilt shards in view mode, rebuild the bloom filter, and save.
Compaction is optional — tombstoned entries are already filtered from reads. Compact when disk space matters or tombstone density is high.
Dirty counter¶
All writable indexes expose a dirty property that counts unsaved key mutations (adds and
removes). It resets to 0 on save() and reset() but not on shard rotation — bloom filter and
tombstone state may still need flushing. Read-only indexes always return 0. Use dirty to
implement caller-driven flush policies (e.g., "save every 1000 writes").
Additional operations¶
upsert()provides insert-or-update semantics by callingremove()thenadd(). Batch upsert deduplicates within the batch (last occurrence wins).add_once()provides skip-if-exists semantics — it adds a vector only when its key does not already exist. Useful for idempotent batch loads.reset()releases all in-memory resources (view shards, active shard, bloom filter, tombstones) without deleting files on disk.