| feat(cli): add `qmd bench` command for search quality benchmarks
Adds a benchmark harness that measures search quality across backends.
Given a fixture file with queries and expected results, it runs each
query through BM25, vector, hybrid (no rerank), and full pipeline,
then reports precision@k, recall, MRR, F1, and latency.
This is primarily a regression testing tool — users create fixtures
for their own vaults to catch quality regressions after config or
index changes. Ships with an example fixture against the eval-docs
test collection to demonstrate the format.
New files:
src/bench/bench.ts — main runner
src/bench/score.ts — precision, recall, MRR, F1, path matching
src/bench/types.ts — fixture and result types
src/bench/fixtures/ — example fixture
test/bench-score.test.ts — unit tests for scoring (16 tests)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
| 1 个月前 |