Changelog¶
All notable changes to mneme are documented here. The format is based on Keep a Changelog and the project follows Semantic Versioning.
1.0.0 - 2026-04-29¶
The first stable release. The public surface in mneme/__init__.py is locked; future minor versions are additive.
Added¶
Core cache¶
SemanticCache- sync layered cache with exact-match (Layer 1) over normalized SHA-256 hashes and semantic-match (Layer 2) over cosine similarity on L2-normalized embeddings.AsyncSemanticCache- async wrapper that awaits the embedder directly and dispatches blocking store work viaasyncio.to_thread. Sync ↔ async parity.Hit,Stats,Health,StoredEntry- frozen, slotted dataclasses for the public read surface.Embedder,AsyncEmbedder,Index,Store- runtime-checkable Protocols for bring-your-own backends.to_async_embedder/to_sync_embedder- adapter helpers between the two embedder Protocols.SemanticCache.clear()- backend-agnostic whole-cache wipe across all namespaces. Works against anyStoreimplementation; bumpsversion_counterper cleared namespace; rebuilds the in-memory index empty.SemanticCache.set_similarity_threshold(value)and.similarity_thresholdproperty - adjust the Layer-2 threshold at runtime. Validates[-1.0, 1.0]. Affects subsequentgetcalls only.SemanticCache.compact()- reclaim memory occupied by tombstoned (soft-deleted) index rows afterdelete/TTL/LRU churn. Returns the count of reclaimed rows. Cheap when there are no tombstones (early-return). Available on both sync and async caches.SemanticCache.vacuum(compact=True)- the default now auto-compacts the index after the TTL sweep so memory is actually released. Passcompact=Falseto keep the legacy split-call behavior.Stats.index_memory_bytesandStats.index_tombstone_count- new fields exposing the actual matrix bytes and tombstone count from the in-memory index, so monitoring can detect and alert on RAM drift before it becomes a problem.
Changed¶
max_response_bytesdefault raised from 1 MB to 4 MB. Modern long-context LLM responses (Claude Opus, GPT-4 with verbose JSON, agent traces) routinely exceed 1 MB. The cap still exists; the default just stops getting in the way. Users who explicitly set the value are unaffected.
Stores (5 backends, one Protocol)¶
MemoryStore- dict-backed; tests, scratch, ephemeral.SQLiteStore(default) - single-file, WAL mode, atomic writes,mode 0o600. Implementssnapshot_to/restore_fromvia SQLite's online backup API.RedisStore-[redis]extra.MULTI/EXECfor atomicity.version_counterincremented in the same pipeline as every write.PostgresStore-[postgres]extra. Schema-scoped,BIGSERIALids,BYTEAembeddings,JSONBmetadata.version_counterUPDATE in the same transaction as data writes.DynamoDBStore-[dynamodb]extra. Single table + 2 GSIs.TransactWriteItemspairs every data op with a counterUpdate. Auto-create-on-open opt-in. Snapshot stub directs operators to AWS native backup.
All five satisfy the same Store Protocol and pass the same conformance battery.
Index backends¶
NumpyIndex- default; in-memory matrix, cosine viaM @ q, L2-normalized, geometric capacity growth. Bandwidth-bound exact search; comfortable at d=768 to ~500k entries on baseline hardware, scales further at lower dim.HnswIndex-[hnsw]extra. Approximate-NN, sub-millisecond at 1M+ entries. Per-namespace hnswlib indices, configurableM/ef_construction/ef.index_backend="auto"- picks NumPy below 500k entries, hnsw above (heuristic targeting d=768); falls back to NumPy + WARNING when hnswlib is missing. Force the backend explicitly when your dim is far from 768.
Vector dtypes¶
float32(default) - fastest, no cast at search time.float16- 2× memory cut, negligible accuracy drift.int8- 4× memory cut. Symmetric quantization, scale 127, assumes L2-normalized input.requantize(dtype)- switch dtypes at runtime without rebuilding the store.
Multi-process modes¶
single(default) - no coordination, fastest. One owner.stale-tolerant- periodic poll ofversion_counter+iter_sincedeltas. Eventually consistent. The right mode for multi-worker apps on one host.mmap-shared- single mmap matrix shared across processes underfcntl.flock(POSIX) /msvcrt.locking(Windows, best-effort). Strong consistency on the same host.
Multi-tenant¶
- Per-namespace
get/put. Layer 2 search is namespace-scoped. namespace_quotas- per-namespace LRU caps. Eviction batches at 10% of the cap (min 1).clear_namespace(ns)- wipe one tenant.
Calibration and migration¶
mneme.tools.calibrate.find_threshold- picks asimilarity_thresholdagainst your paraphrase + distractor pairs.mneme.tools.calibrate.precision_recall_curve- full sweep for hand-picking.mneme.tools.migrate.reembed/areembed- re-embed an existing cache through a new embedder when model or dim changes.- CLI:
python -m mneme.tools.calibrate --help.
Checkpoint¶
SemanticCache.dumps(dest)andSemanticCache.loads(source, ...)- round-trip a full cache as a singletar.gzarchive (manifest + store snapshot + optional vectors). Validates fingerprint + dim before unpacking.
Metrics¶
metrics_hook- singleCallable[[event, fields], None]for fan-out. Fireshit_exact,hit_semantic,miss,put,put_rejected,evict,expire,embedder_failure. Hook exceptions are caught and downgraded to WARNING.PrometheusMetricsHook-[prometheus]extra. Standardmneme_*counters and histograms.OTelMetricsHook-[otel]extra. Same metric shape, OTel meter API.
Examples¶
- 7 standalone runnable scripts:
quickstart.py,async_quickstart.py,multi_tenant.py,high_dim_quantized.py,custom_store.py,calibration.py,dynamodb_quickstart.py. - 4 reference embedder snippets: OpenAI, sentence-transformers, AWS Bedrock (Titan + Cohere), Ollama.
- Flask showcase under
examples/showcase/- multi-page UI covering all five use cases (Classify, Dedup, Translate, Agent memory, RAG retrieval) againstnemotron-3-nanoon a DGX Spark, plus operational pages (Dashboard with RAM/tombstone display + Compact button + namespace-scoped Clear, Stress test with Reset/Bypass toggles, Cache inspector, Multi-tenant). Each use case is wired to the sameSemanticCacheand demonstrates its specific pattern (sentinel responses for dedup, per-language-pair namespaces for translation, per-agent confidence-gated reuse for agent memory, JSON-bundled(answer, contexts)for RAG).
Documentation¶
- This site, built with MkDocs + Material + mkdocstrings. Auto-generated API reference plus hand-written concepts, guides, per-store deep-dives, and the showcase.
- GitHub Actions workflow auto-deploys to
gh-pageson every push tomain.
Quality gates¶
- 689 tests passing on Python 3.10–3.13, Linux + macOS.
mypy --strictclean (27 source files).ruffclean acrosssrc/,tests/, andexamples/.- Branch coverage 93% with all optional services (Redis, Postgres, DynamoDB) running.
- Conformance battery parameterized across all 5 store backends.
- Performance regression bars enforced in
tests/test_perf.py.
Acknowledgments¶
The spec and architecture decisions are documented in the project's design history; the v1.0 surface reflects 16 phases of staged implementation against the spec.