Skip to content

Cache

The two cache classes you instantiate. Both share a Protocol-shaped surface; the async one is a thin wrapper that awaits the embedder directly and dispatches blocking store work to a thread pool. Behavior is otherwise identical.

SemanticCache

SemanticCache(path: str | Path | None = None, embedder: Embedder | None = None, *, store: Store | None = None, similarity_threshold: float = 0.85, default_ttl: int | None = None, max_entries: int | None = None, namespace_quotas: dict[str, int] | None = None, confidence_fn: ConfidenceFn | None = None, validator: Validator | None = None, metrics_hook: MetricsHook | None = None, normalize: bool = True, index_backend: IndexBackend = 'auto', index_options: dict[str, Any] | None = None, vector_dtype: VectorDtype = 'float32', multi_process_mode: MultiProcessMode = 'single', stale_check_interval: float = 0.0, max_query_bytes: int = 32768, max_response_bytes: int = 4 * 1048576, max_metadata_bytes: int = 65536)

Synchronous layered semantic cache.

Construct with either path (creates a default SQLiteStore) or store (any Store impl). Every public method is RLock-guarded.

similarity_threshold property

similarity_threshold: float

Current Layer-2 similarity threshold.

get

get(query: str, *, embedding: NDArray[Any] | None = None, namespace: str = 'default', bypass: bool = False) -> Hit | None

Layered lookup: exact match (Layer 1), then semantic match (Layer 2).

Parameters:

Name Type Description Default
query str

The query string. Normalized per normalize= constructor flag.

required
embedding NDArray[Any] | None

Optional precomputed embedding. If supplied, the cache skips its own embedder call. Useful when you've already embedded the query for another purpose (RAG retrieval, etc.).

None
namespace str

Multi-tenant scope. Layer-1 hashes are namespace-scoped; Layer-2 search is restricted to the namespace's vectors.

'default'
bypass bool

If True, force a miss — skip both Layer 1 and Layer 2, emit a miss metric with reason="bypass", and return None. Useful for forcing the caller to invoke the underlying LLM (e.g. to refresh a stale-but-not-yet-TTL'd answer, or to A/B test cached vs. fresh responses).

False

Returns:

Type Description
Hit | None

A Hit if Layer 1 or Layer 2 found a match passing the validator

Hit | None

and confidence cutoff; None otherwise (cache miss, embedder

Hit | None

failure, or bypass=True).

put

put(query: str, response: str, *, embedding: NDArray[Any] | None = None, namespace: str = 'default', metadata: dict[str, Any] | None = None, ttl: int | None = None) -> None

Store a query → response mapping with the embedder's vector.

Calling put for an existing query (same normalized hash + namespace) replaces the entry: a fresh created_at is set and the TTL re-applies from now. So put-as-refresh extends the entry's life by default_ttl (or ttl= if specified) — it doesn't preserve the remaining TTL.

Parameters:

Name Type Description Default
query str

The query string. Normalized per normalize= constructor flag.

required
response str

The cached response. Capped at max_response_bytes (default 4 MB).

required
embedding NDArray[Any] | None

Optional precomputed embedding. If supplied, the cache skips its own embedder call.

None
namespace str

Multi-tenant scope. Defaults to "default".

'default'
metadata dict[str, Any] | None

Optional JSON-serializable dict; capped at max_metadata_bytes.

None
ttl int | None

Per-entry TTL in seconds. Falls back to the constructor's default_ttl. None means no expiry.

None

Raises:

Type Description
ValueError

If response or metadata exceed their byte caps.

CacheClosedError

If the cache has been closed.

delete

delete(query: str, *, namespace: str = 'default') -> bool

Remove the entry for a query, if it exists.

Both store and in-memory index are updated. The index row becomes a tombstone; its memory is reclaimed on the next compact() (or by vacuum(), which auto-compacts).

Parameters:

Name Type Description Default
query str

The query string. Normalized per normalize= constructor flag.

required
namespace str

Multi-tenant scope. Only the entry in this namespace is removed.

'default'

Returns:

Type Description
bool

True if an entry was found and removed; False otherwise.

vacuum

vacuum(*, namespace: str | None = None, compact: bool = True) -> int

Sweep TTL-expired entries and (by default) compact the index.

Parameters:

Name Type Description Default
namespace str | None

Limit the sweep to a single namespace. None sweeps all.

None
compact bool

If True (default), call :meth:compact after the sweep so the in-memory index actually releases the deleted rows' memory. Set False if you want to schedule compaction separately (e.g. less often than vacuum).

True

Returns:

Type Description
int

The number of expired entries removed.

compact

compact() -> int

Reclaim memory occupied by tombstoned (soft-deleted) index rows.

remove, TTL expiry, and LRU eviction all mark index rows deleted without freeing their underlying matrix bytes. Long-running caches with churn accumulate tombstone memory; calling compact rebuilds the in-memory matrix at the live size and releases the rest.

Cheap when there are no tombstones (early-return). The store is not touched — entries already deleted from the store remain deleted.

Returns:

Type Description
int

The number of tombstones reclaimed.

clear_namespace

clear_namespace(namespace: str) -> int

Wipe every entry under a single namespace.

Other namespaces are untouched. Multi-tenant safe: useful for per-tenant data deletion (GDPR-style requests, demo resets) without affecting other tenants. Removed rows become tombstones in the in-memory index until the next compact() reclaims the memory.

Parameters:

Name Type Description Default
namespace str

The namespace to wipe.

required

Returns:

Type Description
int

The number of entries removed.

clear

clear() -> int

Wipe every entry across every namespace.

Returns the total number of entries removed. Backend-agnostic: works against any Store implementation (Memory, SQLite, Redis, Postgres, DynamoDB, custom). Bumps the store's version_counter once per namespace cleared, so multi-process readers see the change.

The in-memory index is rebuilt empty rather than tombstoned per-row — cheaper than O(n) remove() calls at scale.

set_similarity_threshold

set_similarity_threshold(value: float) -> None

Adjust the cosine-similarity threshold for Layer-2 matches at runtime.

Affects subsequent get calls only; entries already cached are not re-evaluated. value must be in [-1.0, 1.0]. For L2-normalized embeddings the useful range is [0.0, 1.0]; higher means stricter matching, lower means more permissive.

dumps

dumps(dest: str | Path) -> None

Write a checkpoint archive to dest (tar.gz).

loads classmethod

loads(source: str | Path, path: str | Path, embedder: Embedder, **kwargs: Any) -> SemanticCache

Restore a checkpoint into a fresh SemanticCache at path.

AsyncSemanticCache

AsyncSemanticCache(path: str | Path | None = None, embedder: AsyncEmbedder | None = None, *, store: Store | None = None, similarity_threshold: float = 0.85, default_ttl: int | None = None, max_entries: int | None = None, namespace_quotas: dict[str, int] | None = None, confidence_fn: ConfidenceFn | None = None, validator: Validator | None = None, metrics_hook: MetricsHook | None = None, normalize: bool = True, index_backend: IndexBackend = 'auto', index_options: dict[str, Any] | None = None, vector_dtype: VectorDtype = 'float32', multi_process_mode: MultiProcessMode = 'single', stale_check_interval: float = 0.0, max_query_bytes: int = 32768, max_response_bytes: int = 4 * 1048576, max_metadata_bytes: int = 65536)

Async layered semantic cache.

Same constructor signature as SemanticCache except embedder is an AsyncEmbedder (its embed returns a coroutine).

loads async classmethod

loads(source: str | Path, path: str | Path, embedder: AsyncEmbedder, **kwargs: Any) -> AsyncSemanticCache

Restore a checkpoint into a fresh AsyncSemanticCache.

The checkpoint store is restored via the sync code path on a thread; once the file is in place, an AsyncSemanticCache is constructed around it (which validates fingerprint/dim against the supplied async embedder via the underlying sync core).

clear

clear() -> int

Wipe every entry across every namespace. See SemanticCache.clear.

set_similarity_threshold

set_similarity_threshold(value: float) -> None

Adjust the Layer-2 similarity threshold. See SemanticCache.set_similarity_threshold.