Skip to content

Upsert

upsert() is an idempotent insert-or-update operation.

What upsert does

upsert(keys, vectors) ensures that each key maps to the given vector:

  • Key is new: The vector is inserted (same as add()).
  • Key exists, vector unchanged: No operation (skip).
  • Key exists, vector changed: The old vector is removed and the new vector is inserted.

upsert() is idempotent: calling it multiple times with the same inputs produces the same result.

Single upsert

import numpy as np
from iscc_usearch import NphdIndex

index = NphdIndex(max_dim=256)

vec = np.array([255, 128, 64, 32], dtype=np.uint8)
index.upsert(1, vec)

# Calling again with same data is a no-op
index.upsert(1, vec)
print(len(index))  # 1

# Update with different vector
vec_new = np.array([0, 0, 0, 0], dtype=np.uint8)
index.upsert(1, vec_new)
print(index.get(1))  # array([0, 0, 0, 0], dtype=uint8)

Batch upsert

Batch upsert works with uniform-length vectors:

keys = [1, 2, 3]
vectors = np.array(
    [
        [255, 128, 64, 32],
        [0, 0, 0, 0],
        [1, 2, 3, 4],
    ],
    dtype=np.uint8,
)

index.upsert(keys, vectors)

Variable-length batch upsert

On single-file indexes (NphdIndex), batch upsert() requires all vectors to have the same length because it normalizes inputs to a 2D array internally. For variable-length vectors, call upsert() one at a time:

variable_keys = [10, 11, 12]
variable_vecs = [
    np.array([255, 128], dtype=np.uint8),  # 16-bit
    np.array([255, 128, 64, 32], dtype=np.uint8),  # 32-bit
    np.array([1, 2, 3, 4, 5, 6, 7, 8], dtype=np.uint8),  # 64-bit
]

for key, vec in zip(variable_keys, variable_vecs):
    index.upsert(key, vec)

ShardedNphdIndex and ShardedNphdIndex128 accept mixed-length vectors in batch upsert() directly — no need to loop one at a time.

Upsert on sharded indexes

upsert() is available on all sharded index variants (ShardedIndex, ShardedIndex128, ShardedNphdIndex, ShardedNphdIndex128). It uses tombstone-based deletion for view shard entries and inserts the new vector into the active shard:

from iscc_usearch import ShardedNphdIndex

index = ShardedNphdIndex(max_dim=256, path="./my_index")

vec = np.array([255, 128, 64, 32], dtype=np.uint8)
index.upsert(1, vec)

# Update with different vector
vec_new = np.array([0, 0, 0, 0], dtype=np.uint8)
index.upsert(1, vec_new)
print(index.get(1))  # array([0, 0, 0, 0], dtype=uint8)

Batch upsert deduplicates within the batch — last occurrence wins:

keys = [1, 2, 1]  # duplicate key 1
vecs = np.random.randint(0, 256, size=(3, 8), dtype=np.uint8)
index.upsert(keys, vecs)  # key 1 gets the third vector

Tip

After many upserts, stale duplicate entries accumulate in view shards. Call compact() to rebuild view shards and reclaim disk space. See the Sharding how-to for details.

Skip-if-exists with add_once()

add_once() adds a vector only if its key does not already exist — first-write-wins:

from iscc_usearch import ShardedNphdIndex

index = ShardedNphdIndex(max_dim=256, path="./my_index")

vec = np.array([255, 128, 64, 32], dtype=np.uint8)

# First add succeeds
index.add_once(1, vec)

# Second add is silently skipped — original vec is kept
index.add_once(1, np.array([0, 0, 0, 0], dtype=np.uint8))
print(index.get(1))  # array([255, 128, 64, 32], dtype=uint8)

Unlike upsert(), add_once() does not update existing vectors — it only prevents duplicates. Use it for idempotent batch loads where the first write should win.

When to use add vs upsert vs add_once

Scenario Use Available on
Keys are guaranteed unique add() All indexes
Keys may repeat, update vectors upsert() All indexes
Keys may repeat, keep first write add_once() Sharded indexes
Bulk initial load add() All indexes
Incremental updates upsert() All indexes
Idempotent batch load (sharded) add_once() Sharded indexes

Note

upsert() and add_once() both require explicit keys. Passing keys=None raises ValueError. The number of keys and vectors must match, or ValueError is raised.