Skip to content

Metrics

mneme doesn't ship with a hard-coded metrics backend. It fires events through a single MetricsHook callable; you wire those to whatever you operate. Two adapters are bundled - Prometheus and OpenTelemetry - but a custom hook is just a function.

The hook signature

MetricsHook = Callable[[str, dict[str, Any]], None]

Two arguments:

  • event: str - one of: hit_exact, hit_semantic, miss, put, put_rejected, evict, expire, embedder_failure.
  • fields: dict - event-specific payload. Always includes namespace. Hits also include similarity (Layer 2 only) and confidence.

The hook fires inline on the operation's thread. Treat it as cheap: any blocking work in the hook serializes against the cache lock.

Custom hook

from mneme import SemanticCache

def my_hook(event: str, fields: dict) -> None:
    print(f"{event} {fields}")

cache = SemanticCache(..., metrics_hook=my_hook)

That's the whole interface. Send events to whatever you want - logs, statsd, internal counters, a queue for async fan-out.

Failure handling

Hook exceptions are caught and downgraded to WARNING. The cache never crashes for an observability problem. If your hook raises, the corresponding get / put still succeeds - only the metric is dropped.

This is intentional: a flaky StatsD client should never break user-facing requests.

Prometheus

PrometheusMetricsHook registers the standard mneme_* counters and histograms with the default Prometheus registry:

from mneme.adapters.prometheus import PrometheusMetricsHook

cache = SemanticCache(..., metrics_hook=PrometheusMetricsHook())

Metrics exposed:

Metric Type Labels
mneme_hits_total Counter namespace, layer (exact / semantic)
mneme_misses_total Counter namespace, reason (no_match / below_threshold / embedder_failure)
mneme_puts_total Counter namespace, result (ok / rejected)
mneme_evictions_total Counter namespace, reason
mneme_expirations_total Counter namespace
mneme_similarity_score Histogram namespace
mneme_get_latency_seconds Histogram namespace, layer
mneme_cache_entries Gauge namespace

Install with mneme[prometheus]. Wire to your scrape endpoint via the standard prometheus_client.start_http_server(...) or a Flask/FastAPI exposition endpoint.

OpenTelemetry

OTelMetricsHook emits the same metrics through the OTel meter:

from mneme.adapters.opentelemetry import OTelMetricsHook
from opentelemetry import metrics

provider = metrics.get_meter_provider()
cache = SemanticCache(..., metrics_hook=OTelMetricsHook(meter=provider.get_meter("mneme")))

Names match Prometheus where convention permits; labels become OTel attributes. Install with mneme[otel]. Configure your OTel exporter (OTLP, Console, etc.) the usual way.

Common patterns

Stack hooks

metrics_hook= takes one callable. To run two (e.g. logs + Prometheus), compose:

def composite(event: str, fields: dict) -> None:
    log_hook(event, fields)
    prom_hook(event, fields)

cache = SemanticCache(..., metrics_hook=composite)

Add static labels

If you want all events tagged with a service= or region= label:

def tagged_hook(event: str, fields: dict) -> None:
    fields = {**fields, "service": "intent-classifier", "region": "us-east-1"}
    inner_hook(event, fields)

Async-safe fan-out

For very high event rates, push to an asyncio.Queue and drain in a background task:

queue = asyncio.Queue(maxsize=10_000)

def queue_hook(event: str, fields: dict) -> None:
    try:
        queue.put_nowait((event, fields))
    except asyncio.QueueFull:
        pass        # drop on overflow rather than block the cache

async def drain():
    while True:
        event, fields = await queue.get()
        await fan_out(event, fields)

The hook itself stays cheap (one queue put); heavy fan-out runs off the request path.

Counters from stats()

cache.stats() returns a snapshot of the same counters the hooks fire on:

s = cache.stats()
print(f"hit_rate: {(s.hits_exact + s.hits_semantic) / (s.hits_exact + s.hits_semantic + s.misses):.2%}")

This is a polling alternative to hooks - useful for occasional inspection but not for sustained observability.

What to watch in production

Signal Why
Hit rate by namespace Drops correlate with embedder regressions or threshold drift
embedder_failure rate Network or quota issues with the embedder; cache silently falling through to LLM
similarity_score p50/p99 Distribution shifts mean your corpus is changing
get_latency_seconds p99 spikes mean the index is too big for the dtype/backend combo
Cache entry count Growing without bound? Eviction misconfigured

The showcase dashboard exposes most of these live as a reference UI.

Where to go next