Custom stores¶
The Store Protocol from mneme._types is the contract every backend must satisfy. Implement it and the cache works against your code with no other changes.
When you'd write one¶
- A backend
mnemedoesn't ship. Memcached, FoundationDB, Cassandra, Cockroach, S3-with-an-index - all reasonable. - Custom durability semantics. A read replica with eventual consistency, a multi-region active/active setup, a tiered hot/cold split.
- Compliance constraints. A backend that encrypts at rest with a customer-managed key, or one that audits every read.
- Testing fakes.
tests/stores/inmemory_store.pyis the reference custom impl used to verify the conformance battery is portable.
The Protocol¶
class Store(Protocol):
# --- Lifecycle ---
def open(self, embedder_fingerprint: str, embedder_dim: int) -> None: ...
def close(self) -> None: ...
# --- Read ---
def get_by_hash(self, namespace: str, query_hash: str) -> StoredEntry | None: ...
def get_by_id(self, id: int) -> StoredEntry | None: ...
def count(self, namespace: str | None = None) -> int: ...
def list_namespaces(self) -> list[str]: ...
def iter_lru_ids(self, n: int, namespace: str | None = None) -> Iterable[int]: ...
def iter_all(self) -> Iterable[StoredEntry]: ...
def iter_since(self, last_id: int) -> Iterable[StoredEntry]: ...
# --- Write (must be transactional) ---
def insert(self, entry: StoredEntry) -> int: ...
def update_access(self, id: int, now: int) -> None: ...
def delete_by_id(self, id: int) -> bool: ...
def delete_expired(self, now: int, namespace: str | None = None) -> int: ...
def clear_namespace(self, namespace: str) -> int: ...
# --- Quotas ---
def set_namespace_quota(self, namespace: str, max_entries: int) -> None: ...
def get_namespace_quota(self, namespace: str) -> int | None: ...
# --- Coordination ---
def read_version_counter(self) -> int: ...
def read_meta(self, key: str) -> str | None: ...
def write_meta(self, key: str, value: str) -> None: ...
# --- Health ---
def integrity_check(self) -> bool: ...
# --- Backup ---
def snapshot_to(self, dest_path: str | Path) -> None: ...
@classmethod
def restore_from(cls, source_path: str | Path, dest_path: str | Path) -> Store: ...
24 methods. Most are 1–10 lines. The library code in _cache.py only depends on this Protocol; it never imports a concrete store.
The contract¶
A few invariants the cache relies on:
insertreturns the assigned id. Stable across reopens. The id-to-entry mapping must survive aclose()/ open cycle.version_counterincrements in the same transaction as the data write. Multi-process readers poll this; if they see the counter advance, the data must be durable.get_by_hashis exact-match only. No fuzzy logic. The cache layer normalizes + hashes the query before calling.iter_*returns items ordered byid.iter_since(last_id)returns entries withid > last_id, in ascending id order.update_accessis idempotent and side-effect-light. Called on every Layer-1 hit. Make it cheap.integrity_checkreturns False rather than raising on corruption. The cache uses this onhealth()calls.
The library doesn't enforce these; the conformance battery does.
The conformance battery¶
Every shipped store passes tests/test_store_protocol_compliance.py. It exercises 50+ behaviors against your store. Pass that battery and your store is "done" by mneme's definition.
Wire your store into the parametrized fixture:
# In your test file:
import pytest
from tests.test_store_protocol_compliance import _STORE_NAMES
# Add your store's identifier:
_STORE_NAMES.append("MyStore")
@pytest.fixture
def my_store(request):
if request.param == "MyStore":
return MyStore(...)
Or copy the test file into your own project and adapt the fixture parametrization to instantiate your store. The conformance assertions are independent of the test harness.
Reference impl¶
examples/custom_store.py is a full working DictStore - a Python-dict-backed Store that satisfies the Protocol. ~150 lines. Use it as scaffolding when you start a new backend.
The structure to follow:
- Constructor. Accept connection params, pool, schema/prefix, etc.
open(). Connect, provision schema if missing, validate fingerprint+dim, initialize counters.close(). Release connection / pool.- Reads. Direct backend lookups. Map your row format to
StoredEntry. - Writes. Transactional. Bump
version_counterin the same transaction. snapshot_to/restore_from. Implement if your backend has a native backup primitive; otherwise raiseCheckpointError(most server-backed stores do this).
Common gotchas¶
- Forgetting to bump
version_counter. Multi-process readers won't see your writes. The conformance testtest_version_counter_increments_on_insertcatches this. - Returning a different
idon duplicate(namespace, query_hash). The cache treatsinsertof a duplicate as an upsert; the id must match the existing row, not get newly allocated.test_insert_replaces_on_hash_collisioncatches this. - Auto-incrementing without a transaction. If your backend allocates ids out-of-band (e.g. UUIDs at app level, or a sequence outside the txn), the data write and id allocation can drift. Use the backend's primitives for atomic allocation.
iter_*not sorted by id. The cache assumes ascending id. Sort if your backend doesn't natively.- Storing embeddings as strings or JSON. Wasteful. Use a binary type -
BLOB(SQLite),BYTEA(Postgres), Binary type (Redis/DynamoDB).
Wiring it in¶
from mneme import SemanticCache
from my_module import MyStore
store = MyStore(connection_string="...")
with SemanticCache(store=store, embedder=embedder) as cache:
cache.put("hello", "world")
That's it. No registration, no plugin system. The Protocol is structurally typed so any object with the right methods works.
Where to go next¶
- examples/custom_store.py - a runnable reference.
- API reference: types - the Protocol with full type hints.
- Stores: Memory / SQLite / Redis / Postgres / DynamoDB - five implementations to learn from.