Persistence¶
This guide shows how to save, load, and memory-map NphdIndex instances.
Save an index to disk¶
This serializes the index to an in-memory buffer, writes it to disk via os.write(),
flushes to stable storage with fdatasync, then atomically renames the temp file into place.
The result is both atomic (no partial files on crash) and durable (data survives power loss).
Sharded indexes use the same durable file-save path for .usearch shard files. ShardedIndex.save()
does not accept a path argument; it saves the bloom filter, active shard, and tombstones into the
directory configured at construction time:
Long-running sharded saves log start and completion messages at INFO, including the shard name
and vector count.
Persistence ordering¶
Sharded saves (both explicit save() and automatic shard rotation) persist files in this order:
- Bloom filter (
bloom.isbf) — extra entries are harmless false positives. - Shard file (
shard_NNN.usearch) — the actual vector data. - Tombstones (
tombstones.npy) — tombstone removals only become visible after the shard data they depend on is durable.
This ordering prevents previously deleted keys from reappearing after a crash. If the process dies after writing the shard but before updating tombstones, the stale tombstone entries just hide the key from view shards — but the key is safely in the shard.
Background rotation (background_rotation=True) preserves this ordering: tombstone state is
captured in-memory at rotation time, and the background thread persists it after durable_write
completes — matching the synchronous path.
Crash recovery¶
On load, ShardedIndex applies defensive recovery:
- Stale temp files (
*.usearch.tmp,*.isbf.tmp,*.npy.tmp) from interrupted durable writes are deleted automatically. - Missing or corrupt bloom filter — rebuilt from all shard keys with a logged warning. The bloom is a derived index, not a source of truth.
- Missing tombstone file — assumed no tombstones. Previously tombstoned keys may reappear from view shards.
Load an index from disk¶
load() reads the entire file into RAM:
You can also use restore() to create and load in one step:
Memory-map an index¶
view() memory-maps the file for read-only access. The OS pages data in on demand, so startup is
fast and memory usage stays low:
Or explicitly:
Warning
A viewed index is read-only. Calling add() on a viewed index raises an error from
USearch's C++ core.
Restore with auto-detect¶
NphdIndex.restore() calls either load() or view() based on the view parameter:
# Full load (default)
index = NphdIndex.restore("my_index.usearch")
# Memory-mapped
index = NphdIndex.restore("my_index.usearch", view=True)
Copy an index¶
copy() creates an independent in-memory clone with the same configuration and data:
The copy is independent. Modifying one does not affect the other.
Choosing a method¶
| Method | RAM usage | Startup speed | Writable | Use case |
|---|---|---|---|---|
load() |
High | Slower | Yes | Read-write workloads |
view() |
Low | Fast | No | Read-only serving, many shards |
restore() |
Either | Either | Either | Convenience dispatcher |
copy() |
High | Instant | Yes | Fork an index for experiments |
Dirty counter¶
NphdIndex tracks unsaved mutations via the dirty property. It increments on each add() or
remove() call and resets to 0 on save(), load(), view(), and reset():
index = NphdIndex(max_dim=256)
index.add(1, vec)
print(index.dirty) # 1
index.save("my_index.usearch")
print(index.dirty) # 0
Use dirty to implement caller-driven flush policies (e.g., "save every N writes").
Metric persistence¶
The native MetricKind.NPHD metric is correctly serialized and deserialized by usearch-iscc.
No manual metric restoration is needed after load() or view() operations.