Skip to content

Changelog

All notable changes to mneme are documented here. The format is based on Keep a Changelog and the project follows Semantic Versioning.

1.0.0 - 2026-04-29

The first stable release. The public surface in mneme/__init__.py is locked; future minor versions are additive.

Added

Core cache

  • SemanticCache - sync layered cache with exact-match (Layer 1) over normalized SHA-256 hashes and semantic-match (Layer 2) over cosine similarity on L2-normalized embeddings.
  • AsyncSemanticCache - async wrapper that awaits the embedder directly and dispatches blocking store work via asyncio.to_thread. Sync ↔ async parity.
  • Hit, Stats, Health, StoredEntry - frozen, slotted dataclasses for the public read surface.
  • Embedder, AsyncEmbedder, Index, Store - runtime-checkable Protocols for bring-your-own backends.
  • to_async_embedder / to_sync_embedder - adapter helpers between the two embedder Protocols.
  • SemanticCache.clear() - backend-agnostic whole-cache wipe across all namespaces. Works against any Store implementation; bumps version_counter per cleared namespace; rebuilds the in-memory index empty.
  • SemanticCache.set_similarity_threshold(value) and .similarity_threshold property - adjust the Layer-2 threshold at runtime. Validates [-1.0, 1.0]. Affects subsequent get calls only.
  • SemanticCache.compact() - reclaim memory occupied by tombstoned (soft-deleted) index rows after delete/TTL/LRU churn. Returns the count of reclaimed rows. Cheap when there are no tombstones (early-return). Available on both sync and async caches.
  • SemanticCache.vacuum(compact=True) - the default now auto-compacts the index after the TTL sweep so memory is actually released. Pass compact=False to keep the legacy split-call behavior.
  • Stats.index_memory_bytes and Stats.index_tombstone_count - new fields exposing the actual matrix bytes and tombstone count from the in-memory index, so monitoring can detect and alert on RAM drift before it becomes a problem.

Changed

  • max_response_bytes default raised from 1 MB to 4 MB. Modern long-context LLM responses (Claude Opus, GPT-4 with verbose JSON, agent traces) routinely exceed 1 MB. The cap still exists; the default just stops getting in the way. Users who explicitly set the value are unaffected.

Stores (5 backends, one Protocol)

  • MemoryStore - dict-backed; tests, scratch, ephemeral.
  • SQLiteStore (default) - single-file, WAL mode, atomic writes, mode 0o600. Implements snapshot_to / restore_from via SQLite's online backup API.
  • RedisStore - [redis] extra. MULTI/EXEC for atomicity. version_counter incremented in the same pipeline as every write.
  • PostgresStore - [postgres] extra. Schema-scoped, BIGSERIAL ids, BYTEA embeddings, JSONB metadata. version_counter UPDATE in the same transaction as data writes.
  • DynamoDBStore - [dynamodb] extra. Single table + 2 GSIs. TransactWriteItems pairs every data op with a counter Update. Auto-create-on-open opt-in. Snapshot stub directs operators to AWS native backup.

All five satisfy the same Store Protocol and pass the same conformance battery.

Index backends

  • NumpyIndex - default; in-memory matrix, cosine via M @ q, L2-normalized, geometric capacity growth. Bandwidth-bound exact search; comfortable at d=768 to ~500k entries on baseline hardware, scales further at lower dim.
  • HnswIndex - [hnsw] extra. Approximate-NN, sub-millisecond at 1M+ entries. Per-namespace hnswlib indices, configurable M / ef_construction / ef.
  • index_backend="auto" - picks NumPy below 500k entries, hnsw above (heuristic targeting d=768); falls back to NumPy + WARNING when hnswlib is missing. Force the backend explicitly when your dim is far from 768.

Vector dtypes

  • float32 (default) - fastest, no cast at search time.
  • float16 - 2× memory cut, negligible accuracy drift.
  • int8 - 4× memory cut. Symmetric quantization, scale 127, assumes L2-normalized input.
  • requantize(dtype) - switch dtypes at runtime without rebuilding the store.

Multi-process modes

  • single (default) - no coordination, fastest. One owner.
  • stale-tolerant - periodic poll of version_counter + iter_since deltas. Eventually consistent. The right mode for multi-worker apps on one host.
  • mmap-shared - single mmap matrix shared across processes under fcntl.flock (POSIX) / msvcrt.locking (Windows, best-effort). Strong consistency on the same host.

Multi-tenant

  • Per-namespace get/put. Layer 2 search is namespace-scoped.
  • namespace_quotas - per-namespace LRU caps. Eviction batches at 10% of the cap (min 1).
  • clear_namespace(ns) - wipe one tenant.

Calibration and migration

  • mneme.tools.calibrate.find_threshold - picks a similarity_threshold against your paraphrase + distractor pairs.
  • mneme.tools.calibrate.precision_recall_curve - full sweep for hand-picking.
  • mneme.tools.migrate.reembed / areembed - re-embed an existing cache through a new embedder when model or dim changes.
  • CLI: python -m mneme.tools.calibrate --help.

Checkpoint

  • SemanticCache.dumps(dest) and SemanticCache.loads(source, ...) - round-trip a full cache as a single tar.gz archive (manifest + store snapshot + optional vectors). Validates fingerprint + dim before unpacking.

Metrics

  • metrics_hook - single Callable[[event, fields], None] for fan-out. Fires hit_exact, hit_semantic, miss, put, put_rejected, evict, expire, embedder_failure. Hook exceptions are caught and downgraded to WARNING.
  • PrometheusMetricsHook - [prometheus] extra. Standard mneme_* counters and histograms.
  • OTelMetricsHook - [otel] extra. Same metric shape, OTel meter API.

Examples

  • 7 standalone runnable scripts: quickstart.py, async_quickstart.py, multi_tenant.py, high_dim_quantized.py, custom_store.py, calibration.py, dynamodb_quickstart.py.
  • 4 reference embedder snippets: OpenAI, sentence-transformers, AWS Bedrock (Titan + Cohere), Ollama.
  • Flask showcase under examples/showcase/ - multi-page UI covering all five use cases (Classify, Dedup, Translate, Agent memory, RAG retrieval) against nemotron-3-nano on a DGX Spark, plus operational pages (Dashboard with RAM/tombstone display + Compact button + namespace-scoped Clear, Stress test with Reset/Bypass toggles, Cache inspector, Multi-tenant). Each use case is wired to the same SemanticCache and demonstrates its specific pattern (sentinel responses for dedup, per-language-pair namespaces for translation, per-agent confidence-gated reuse for agent memory, JSON-bundled (answer, contexts) for RAG).

Documentation

  • This site, built with MkDocs + Material + mkdocstrings. Auto-generated API reference plus hand-written concepts, guides, per-store deep-dives, and the showcase.
  • GitHub Actions workflow auto-deploys to gh-pages on every push to main.

Quality gates

  • 689 tests passing on Python 3.10–3.13, Linux + macOS.
  • mypy --strict clean (27 source files).
  • ruff clean across src/, tests/, and examples/.
  • Branch coverage 93% with all optional services (Redis, Postgres, DynamoDB) running.
  • Conformance battery parameterized across all 5 store backends.
  • Performance regression bars enforced in tests/test_perf.py.

Acknowledgments

The spec and architecture decisions are documented in the project's design history; the v1.0 surface reflects 16 phases of staged implementation against the spec.