Tools¶
Operational scripts shipped with mneme. Both have Python APIs and python -m CLIs.
Calibration¶
mneme.tools.calibrate finds the right similarity_threshold for your embedder + corpus.
find_threshold ¶
find_threshold(paraphrase_pairs: list[tuple[str, str]], distractor_pairs: list[tuple[str, str]], embedder: Embedder, *, target_metric: Literal['f1', 'precision', 'recall'] = 'f1', min_precision: float | None = None, min_recall: float | None = None, grid: list[float] | None = None, vector_dtype: VectorDtype = 'float32') -> CalibrationResult
Pick the threshold that maximizes target_metric subject to the
optional min_precision and min_recall constraints.
precision_recall_curve ¶
precision_recall_curve(paraphrase_pairs: list[tuple[str, str]], distractor_pairs: list[tuple[str, str]], embedder: Embedder, *, grid: list[float] | None = None, vector_dtype: VectorDtype = 'float32') -> list[tuple[float, float, float]]
Sweep grid (default 0.50..0.99 step 0.01) and return one tuple
(threshold, precision, recall) per grid point.
CalibrationResult
dataclass
¶
CalibrationResult(threshold: float, precision: float, recall: float, f1: float, pr_curve: list[tuple[float, float, float]] = list())
CLI¶
Migration¶
mneme.tools.migrate re-embeds an existing cache through a new embedder when you switch model or dimension.
reembed ¶
reembed(source_path: str | Path, dest_path: str | Path, new_embedder: Embedder, *, batch_size: int = 64, progress: bool = False, namespaces: list[str] | None = None) -> int
Re-embed source_path into dest_path using new_embedder.
Returns the count of migrated entries. Source is not modified. The
destination is created (or replaced) at dest_path.
areembed
async
¶
areembed(source_path: str | Path, dest_path: str | Path, new_embedder: AsyncEmbedder, *, batch_size: int = 64, progress: bool = False, concurrency: int = 8, namespaces: list[str] | None = None) -> int
Async reembed: new_embedder.embed is awaited; up to
concurrency embeddings run in parallel via asyncio.gather.