Skip to content

Tools

Operational scripts shipped with mneme. Both have Python APIs and python -m CLIs.

Calibration

mneme.tools.calibrate finds the right similarity_threshold for your embedder + corpus.

find_threshold

find_threshold(paraphrase_pairs: list[tuple[str, str]], distractor_pairs: list[tuple[str, str]], embedder: Embedder, *, target_metric: Literal['f1', 'precision', 'recall'] = 'f1', min_precision: float | None = None, min_recall: float | None = None, grid: list[float] | None = None, vector_dtype: VectorDtype = 'float32') -> CalibrationResult

Pick the threshold that maximizes target_metric subject to the optional min_precision and min_recall constraints.

precision_recall_curve

precision_recall_curve(paraphrase_pairs: list[tuple[str, str]], distractor_pairs: list[tuple[str, str]], embedder: Embedder, *, grid: list[float] | None = None, vector_dtype: VectorDtype = 'float32') -> list[tuple[float, float, float]]

Sweep grid (default 0.50..0.99 step 0.01) and return one tuple (threshold, precision, recall) per grid point.

CalibrationResult dataclass

CalibrationResult(threshold: float, precision: float, recall: float, f1: float, pr_curve: list[tuple[float, float, float]] = list())

CLI

python -m mneme.tools.calibrate --help

Migration

mneme.tools.migrate re-embeds an existing cache through a new embedder when you switch model or dimension.

reembed

reembed(source_path: str | Path, dest_path: str | Path, new_embedder: Embedder, *, batch_size: int = 64, progress: bool = False, namespaces: list[str] | None = None) -> int

Re-embed source_path into dest_path using new_embedder.

Returns the count of migrated entries. Source is not modified. The destination is created (or replaced) at dest_path.

areembed async

areembed(source_path: str | Path, dest_path: str | Path, new_embedder: AsyncEmbedder, *, batch_size: int = 64, progress: bool = False, concurrency: int = 8, namespaces: list[str] | None = None) -> int

Async reembed: new_embedder.embed is awaited; up to concurrency embeddings run in parallel via asyncio.gather.