mneme¶

A layered semantic cache for LLM applications.

mneme (Greek: μνήμη, "memory"; pronounced NEE-mee) is an embeddable, in-process Python library that caches LLM completions across paraphrased queries. It pairs an exact-match layer (normalized query hash) with a semantic-match layer (cosine similarity over L2-normalized embeddings) and persists durably to a single SQLite file by default.

from mneme import SemanticCache

with SemanticCache(path="cache.db", embedder=my_embedder) as cache:
    hit = cache.get("How do I reset my password?")
    if hit is None:
        response = call_my_llm("How do I reset my password?")
        cache.put("How do I reset my password?", response)
    else:
        response = hit.response

Why mneme¶

Cache before you call. A semantic cache turns redundant LLM calls into a microsecond dict lookup or a millisecond NumPy matvec. For chatbots, agent loops, and batch-style scoring jobs, this is the difference between a viable product and one that burns tokens on every paraphrase.
One required dependency. NumPy. Optional extras for hnswlib, redis, psycopg, boto3, prometheus, opentelemetry. Bring your own embedder, your own LLM client, your own server.
In-process, no daemon. A library you import, not a service you operate. Persists to a single SQLite file by default; swap in Redis / Postgres / DynamoDB when you need shared state across hosts.
Strict typing, zero magic. Public surface is a small set of frozen @dataclasses and Protocols. py.typed shipped.

Pick your path¶

Get started

Sync and async quickstarts, write your first cached LLM call, choose an embedder.

Getting started
Understand the moving parts

The two-layer cache, embedders, quantization, multi-process modes, multi-tenant.

Concepts
Pick a store

Memory, SQLite, Redis, Postgres, DynamoDB - same Protocol, five backends.

Stores
API reference

Every public class, method, type, and exception, autogenerated from the source.

Reference
How-to guides

Calibration, checkpoints, re-embed migrations, metrics, custom stores, perf tuning.

Guides
See it live

A Flask showcase classifies real customer-support messages with Nemotron on a local DGX Spark.

Showcase
Beyond LLM caching

Five other patterns the same machinery covers - RAG retrieval, translation, dedup, classification, agent memory.

Use cases
How is this different?

Where mneme makes deliberately different choices than other semantic-cache libraries, and the design philosophies behind them.

Differentiation

Status¶

v1.0, released 2026. The public surface in mneme/__init__.py is locked; future minor versions are additive.

License¶

Apache 2.0.