Cache¶
The two cache classes you instantiate. Both share a Protocol-shaped surface; the async one is a thin wrapper that awaits the embedder directly and dispatches blocking store work to a thread pool. Behavior is otherwise identical.
SemanticCache ¶
SemanticCache(path: str | Path | None = None, embedder: Embedder | None = None, *, store: Store | None = None, similarity_threshold: float = 0.85, default_ttl: int | None = None, max_entries: int | None = None, namespace_quotas: dict[str, int] | None = None, confidence_fn: ConfidenceFn | None = None, validator: Validator | None = None, metrics_hook: MetricsHook | None = None, normalize: bool = True, index_backend: IndexBackend = 'auto', index_options: dict[str, Any] | None = None, vector_dtype: VectorDtype = 'float32', multi_process_mode: MultiProcessMode = 'single', stale_check_interval: float = 0.0, max_query_bytes: int = 32768, max_response_bytes: int = 4 * 1048576, max_metadata_bytes: int = 65536)
Synchronous layered semantic cache.
Construct with either path (creates a default SQLiteStore) or
store (any Store impl). Every public method is RLock-guarded.
get ¶
get(query: str, *, embedding: NDArray[Any] | None = None, namespace: str = 'default', bypass: bool = False) -> Hit | None
Layered lookup: exact match (Layer 1), then semantic match (Layer 2).
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
query
|
str
|
The query string. Normalized per |
required |
embedding
|
NDArray[Any] | None
|
Optional precomputed embedding. If supplied, the cache skips its own embedder call. Useful when you've already embedded the query for another purpose (RAG retrieval, etc.). |
None
|
namespace
|
str
|
Multi-tenant scope. Layer-1 hashes are namespace-scoped; Layer-2 search is restricted to the namespace's vectors. |
'default'
|
bypass
|
bool
|
If |
False
|
Returns:
| Type | Description |
|---|---|
Hit | None
|
A |
Hit | None
|
and confidence cutoff; |
Hit | None
|
failure, or |
put ¶
put(query: str, response: str, *, embedding: NDArray[Any] | None = None, namespace: str = 'default', metadata: dict[str, Any] | None = None, ttl: int | None = None) -> None
Store a query → response mapping with the embedder's vector.
Calling put for an existing query (same normalized hash + namespace)
replaces the entry: a fresh created_at is set and the TTL
re-applies from now. So put-as-refresh extends the entry's life
by default_ttl (or ttl= if specified) — it doesn't preserve
the remaining TTL.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
query
|
str
|
The query string. Normalized per |
required |
response
|
str
|
The cached response. Capped at |
required |
embedding
|
NDArray[Any] | None
|
Optional precomputed embedding. If supplied, the cache skips its own embedder call. |
None
|
namespace
|
str
|
Multi-tenant scope. Defaults to |
'default'
|
metadata
|
dict[str, Any] | None
|
Optional JSON-serializable dict; capped at |
None
|
ttl
|
int | None
|
Per-entry TTL in seconds. Falls back to the constructor's
|
None
|
Raises:
| Type | Description |
|---|---|
ValueError
|
If |
CacheClosedError
|
If the cache has been closed. |
delete ¶
Remove the entry for a query, if it exists.
Both store and in-memory index are updated. The index row becomes a
tombstone; its memory is reclaimed on the next compact() (or by
vacuum(), which auto-compacts).
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
query
|
str
|
The query string. Normalized per |
required |
namespace
|
str
|
Multi-tenant scope. Only the entry in this namespace is removed. |
'default'
|
Returns:
| Type | Description |
|---|---|
bool
|
|
vacuum ¶
Sweep TTL-expired entries and (by default) compact the index.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
namespace
|
str | None
|
Limit the sweep to a single namespace. |
None
|
compact
|
bool
|
If |
True
|
Returns:
| Type | Description |
|---|---|
int
|
The number of expired entries removed. |
compact ¶
Reclaim memory occupied by tombstoned (soft-deleted) index rows.
remove, TTL expiry, and LRU eviction all mark index rows deleted
without freeing their underlying matrix bytes. Long-running caches
with churn accumulate tombstone memory; calling compact rebuilds
the in-memory matrix at the live size and releases the rest.
Cheap when there are no tombstones (early-return). The store is not touched — entries already deleted from the store remain deleted.
Returns:
| Type | Description |
|---|---|
int
|
The number of tombstones reclaimed. |
clear_namespace ¶
Wipe every entry under a single namespace.
Other namespaces are untouched. Multi-tenant safe: useful for
per-tenant data deletion (GDPR-style requests, demo resets) without
affecting other tenants. Removed rows become tombstones in the
in-memory index until the next compact() reclaims the memory.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
namespace
|
str
|
The namespace to wipe. |
required |
Returns:
| Type | Description |
|---|---|
int
|
The number of entries removed. |
clear ¶
Wipe every entry across every namespace.
Returns the total number of entries removed. Backend-agnostic:
works against any Store implementation (Memory, SQLite,
Redis, Postgres, DynamoDB, custom). Bumps the store's
version_counter once per namespace cleared, so multi-process
readers see the change.
The in-memory index is rebuilt empty rather than tombstoned
per-row — cheaper than O(n) remove() calls at scale.
set_similarity_threshold ¶
Adjust the cosine-similarity threshold for Layer-2 matches at runtime.
Affects subsequent get calls only; entries already cached are
not re-evaluated. value must be in [-1.0, 1.0]. For
L2-normalized embeddings the useful range is [0.0, 1.0];
higher means stricter matching, lower means more permissive.
loads
classmethod
¶
Restore a checkpoint into a fresh SemanticCache at path.
AsyncSemanticCache ¶
AsyncSemanticCache(path: str | Path | None = None, embedder: AsyncEmbedder | None = None, *, store: Store | None = None, similarity_threshold: float = 0.85, default_ttl: int | None = None, max_entries: int | None = None, namespace_quotas: dict[str, int] | None = None, confidence_fn: ConfidenceFn | None = None, validator: Validator | None = None, metrics_hook: MetricsHook | None = None, normalize: bool = True, index_backend: IndexBackend = 'auto', index_options: dict[str, Any] | None = None, vector_dtype: VectorDtype = 'float32', multi_process_mode: MultiProcessMode = 'single', stale_check_interval: float = 0.0, max_query_bytes: int = 32768, max_response_bytes: int = 4 * 1048576, max_metadata_bytes: int = 65536)
Async layered semantic cache.
Same constructor signature as SemanticCache except embedder is
an AsyncEmbedder (its embed returns a coroutine).
loads
async
classmethod
¶
loads(source: str | Path, path: str | Path, embedder: AsyncEmbedder, **kwargs: Any) -> AsyncSemanticCache
Restore a checkpoint into a fresh AsyncSemanticCache.
The checkpoint store is restored via the sync code path on a thread;
once the file is in place, an AsyncSemanticCache is constructed
around it (which validates fingerprint/dim against the supplied
async embedder via the underlying sync core).
set_similarity_threshold ¶
Adjust the Layer-2 similarity threshold. See SemanticCache.set_similarity_threshold.