Cache¶

The two cache classes you instantiate. Both share a Protocol-shaped surface; the async one is a thin wrapper that awaits the embedder directly and dispatches blocking store work to a thread pool. Behavior is otherwise identical.

SemanticCache ¶

SemanticCache(path: str | Path | None = None, embedder: Embedder | None = None, *, store: Store | None = None, similarity_threshold: float = 0.85, default_ttl: int | None = None, max_entries: int | None = None, namespace_quotas: dict[str, int] | None = None, confidence_fn: ConfidenceFn | None = None, validator: Validator | None = None, metrics_hook: MetricsHook | None = None, normalize: bool = True, index_backend: IndexBackend = 'auto', index_options: dict[str, Any] | None = None, vector_dtype: VectorDtype = 'float32', multi_process_mode: MultiProcessMode = 'single', stale_check_interval: float = 0.0, max_query_bytes: int = 32768, max_response_bytes: int = 4 * 1048576, max_metadata_bytes: int = 65536)

Synchronous layered semantic cache.

Construct with either path (creates a default SQLiteStore) or store (any Store impl). Every public method is RLock-guarded.

similarity_threshold `property` ¶

similarity_threshold: float

Current Layer-2 similarity threshold.

get ¶

get(query: str, *, embedding: NDArray[Any] | None = None, namespace: str = 'default', bypass: bool = False) -> Hit | None

Layered lookup: exact match (Layer 1), then semantic match (Layer 2).

Parameters:

Name	Type	Description	Default
`query`	`str`	The query string. Normalized per `normalize=` constructor flag.	required
`embedding`	`NDArray[Any] \| None`	Optional precomputed embedding. If supplied, the cache skips its own embedder call. Useful when you've already embedded the query for another purpose (RAG retrieval, etc.).	`None`
`namespace`	`str`	Multi-tenant scope. Layer-1 hashes are namespace-scoped; Layer-2 search is restricted to the namespace's vectors.	`'default'`
`bypass`	`bool`	If `True`, force a miss — skip both Layer 1 and Layer 2, emit a `miss` metric with `reason="bypass"`, and return `None`. Useful for forcing the caller to invoke the underlying LLM (e.g. to refresh a stale-but-not-yet-TTL'd answer, or to A/B test cached vs. fresh responses).	`False`

Returns:

Type	Description
`Hit \| None`	A `Hit` if Layer 1 or Layer 2 found a match passing the validator
`Hit \| None`	and confidence cutoff; `None` otherwise (cache miss, embedder
`Hit \| None`	failure, or `bypass=True`).

put ¶

put(query: str, response: str, *, embedding: NDArray[Any] | None = None, namespace: str = 'default', metadata: dict[str, Any] | None = None, ttl: int | None = None) -> None

Store a query → response mapping with the embedder's vector.

Calling put for an existing query (same normalized hash + namespace) replaces the entry: a fresh created_at is set and the TTL re-applies from now. So put-as-refresh extends the entry's life by default_ttl (or ttl= if specified) — it doesn't preserve the remaining TTL.

Parameters:

Name	Type	Description	Default
`query`	`str`	The query string. Normalized per `normalize=` constructor flag.	required
`response`	`str`	The cached response. Capped at `max_response_bytes` (default 4 MB).	required
`embedding`	`NDArray[Any] \| None`	Optional precomputed embedding. If supplied, the cache skips its own embedder call.	`None`
`namespace`	`str`	Multi-tenant scope. Defaults to `"default"`.	`'default'`
`metadata`	`dict[str, Any] \| None`	Optional JSON-serializable dict; capped at `max_metadata_bytes`.	`None`
`ttl`	`int \| None`	Per-entry TTL in seconds. Falls back to the constructor's `default_ttl`. `None` means no expiry.	`None`

Raises:

Type	Description
`ValueError`	If `response` or `metadata` exceed their byte caps.
`CacheClosedError`	If the cache has been closed.

delete ¶

delete(query: str, *, namespace: str = 'default') -> bool

Remove the entry for a query, if it exists.

Both store and in-memory index are updated. The index row becomes a tombstone; its memory is reclaimed on the next compact() (or by vacuum(), which auto-compacts).

Parameters:

Name	Type	Description	Default
`query`	`str`	The query string. Normalized per `normalize=` constructor flag.	required
`namespace`	`str`	Multi-tenant scope. Only the entry in this namespace is removed.	`'default'`

Returns:

Type	Description
`bool`	`True` if an entry was found and removed; `False` otherwise.

vacuum ¶

vacuum(*, namespace: str | None = None, compact: bool = True) -> int

Sweep TTL-expired entries and (by default) compact the index.

Parameters:

Name	Type	Description	Default
`namespace`	`str \| None`	Limit the sweep to a single namespace. `None` sweeps all.	`None`
`compact`	`bool`	If `True` (default), call :meth:`compact` after the sweep so the in-memory index actually releases the deleted rows' memory. Set `False` if you want to schedule compaction separately (e.g. less often than vacuum).	`True`

Returns:

Type	Description
`int`	The number of expired entries removed.

compact ¶

compact() -> int

Reclaim memory occupied by tombstoned (soft-deleted) index rows.

remove, TTL expiry, and LRU eviction all mark index rows deleted without freeing their underlying matrix bytes. Long-running caches with churn accumulate tombstone memory; calling compact rebuilds the in-memory matrix at the live size and releases the rest.

Cheap when there are no tombstones (early-return). The store is not touched — entries already deleted from the store remain deleted.

Returns:

Type	Description
`int`	The number of tombstones reclaimed.

clear_namespace ¶

clear_namespace(namespace: str) -> int

Wipe every entry under a single namespace.

Other namespaces are untouched. Multi-tenant safe: useful for per-tenant data deletion (GDPR-style requests, demo resets) without affecting other tenants. Removed rows become tombstones in the in-memory index until the next compact() reclaims the memory.

Parameters:

Name	Type	Description	Default
`namespace`	`str`	The namespace to wipe.	required

Returns:

Type	Description
`int`	The number of entries removed.

clear ¶

clear() -> int

Wipe every entry across every namespace.

Returns the total number of entries removed. Backend-agnostic: works against any Store implementation (Memory, SQLite, Redis, Postgres, DynamoDB, custom). Bumps the store's version_counter once per namespace cleared, so multi-process readers see the change.

The in-memory index is rebuilt empty rather than tombstoned per-row — cheaper than O(n) remove() calls at scale.

set_similarity_threshold ¶

set_similarity_threshold(value: float) -> None

Adjust the cosine-similarity threshold for Layer-2 matches at runtime.

Affects subsequent get calls only; entries already cached are not re-evaluated. value must be in [-1.0, 1.0]. For L2-normalized embeddings the useful range is [0.0, 1.0]; higher means stricter matching, lower means more permissive.

dumps ¶

dumps(dest: str | Path) -> None

Write a checkpoint archive to dest (tar.gz).

loads `classmethod` ¶

loads(source: str | Path, path: str | Path, embedder: Embedder, **kwargs: Any) -> SemanticCache

Restore a checkpoint into a fresh SemanticCache at path.

AsyncSemanticCache ¶

AsyncSemanticCache(path: str | Path | None = None, embedder: AsyncEmbedder | None = None, *, store: Store | None = None, similarity_threshold: float = 0.85, default_ttl: int | None = None, max_entries: int | None = None, namespace_quotas: dict[str, int] | None = None, confidence_fn: ConfidenceFn | None = None, validator: Validator | None = None, metrics_hook: MetricsHook | None = None, normalize: bool = True, index_backend: IndexBackend = 'auto', index_options: dict[str, Any] | None = None, vector_dtype: VectorDtype = 'float32', multi_process_mode: MultiProcessMode = 'single', stale_check_interval: float = 0.0, max_query_bytes: int = 32768, max_response_bytes: int = 4 * 1048576, max_metadata_bytes: int = 65536)

Async layered semantic cache.

Same constructor signature as SemanticCache except embedder is an AsyncEmbedder (its embed returns a coroutine).

loads `async` `classmethod` ¶

loads(source: str | Path, path: str | Path, embedder: AsyncEmbedder, **kwargs: Any) -> AsyncSemanticCache

Restore a checkpoint into a fresh AsyncSemanticCache.

The checkpoint store is restored via the sync code path on a thread; once the file is in place, an AsyncSemanticCache is constructed around it (which validates fingerprint/dim against the supplied async embedder via the underlying sync core).

clear ¶

clear() -> int

Wipe every entry across every namespace. See SemanticCache.clear.

set_similarity_threshold ¶

set_similarity_threshold(value: float) -> None

Adjust the Layer-2 similarity threshold. See SemanticCache.set_similarity_threshold.

Cache¶

SemanticCache ¶

similarity_threshold property ¶

get ¶

put ¶

delete ¶

vacuum ¶

compact ¶

clear_namespace ¶

clear ¶

set_similarity_threshold ¶

dumps ¶

loads classmethod ¶

AsyncSemanticCache ¶

loads async classmethod ¶

clear ¶

set_similarity_threshold ¶

similarity_threshold `property` ¶

loads `classmethod` ¶

loads `async` `classmethod` ¶