mneme¶
A layered semantic cache for LLM applications.
mneme (Greek: μνήμη, "memory"; pronounced NEE-mee) is an embeddable, in-process Python library that caches LLM completions across paraphrased queries. It pairs an exact-match layer (normalized query hash) with a semantic-match layer (cosine similarity over L2-normalized embeddings) and persists durably to a single SQLite file by default.
from mneme import SemanticCache
with SemanticCache(path="cache.db", embedder=my_embedder) as cache:
hit = cache.get("How do I reset my password?")
if hit is None:
response = call_my_llm("How do I reset my password?")
cache.put("How do I reset my password?", response)
else:
response = hit.response
Why mneme¶
- Cache before you call. A semantic cache turns redundant LLM calls into a microsecond
dictlookup or a millisecond NumPy matvec. For chatbots, agent loops, and batch-style scoring jobs, this is the difference between a viable product and one that burns tokens on every paraphrase. - One required dependency. NumPy. Optional extras for hnswlib, redis, psycopg, boto3, prometheus, opentelemetry. Bring your own embedder, your own LLM client, your own server.
- In-process, no daemon. A library you
import, not a service you operate. Persists to a single SQLite file by default; swap in Redis / Postgres / DynamoDB when you need shared state across hosts. - Strict typing, zero magic. Public surface is a small set of frozen
@dataclasses andProtocols.py.typedshipped.
Pick your path¶
-
Get started
Sync and async quickstarts, write your first cached LLM call, choose an embedder.
-
Understand the moving parts
The two-layer cache, embedders, quantization, multi-process modes, multi-tenant.
-
Pick a store
Memory, SQLite, Redis, Postgres, DynamoDB - same Protocol, five backends.
-
API reference
Every public class, method, type, and exception, autogenerated from the source.
-
How-to guides
Calibration, checkpoints, re-embed migrations, metrics, custom stores, perf tuning.
-
See it live
A Flask showcase classifies real customer-support messages with Nemotron on a local DGX Spark.
-
Beyond LLM caching
Five other patterns the same machinery covers - RAG retrieval, translation, dedup, classification, agent memory.
-
How is this different?
Where mneme makes deliberately different choices than other semantic-cache libraries, and the design philosophies behind them.
Status¶
v1.0, released 2026. The public surface in mneme/__init__.py is locked; future minor versions are additive.