mini cli search engine for your docs, knowledge bases, meeting notes, whatever. Tracking current sota approaches while being all local

分支24Tags15
TTobi Lütkerelease: v2.5.2
443760f4创建于 20 小时前535次提交
文件最后提交记录最后更新时间
feat: add Claude Code plugin support with inline status check (#99) - Add marketplace.json for Claude Code plugin installation - Simplify skill status check to inline `qmd status` (portable across agents) - Update SKILL.md MCP section, reference mcp-setup.md for manual config - Clean up mcp-setup.md (remove redundant prerequisites) - Rename MCP-SETUP.md to mcp-setup.md Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>3 个月前
Use npm trusted publishing 3 天前
feat: release skill with changelog-driven workflow and git hooks - Add /release skill with full process: hook install, changelog validation, git history review, preview, and release execution - Skill auto-populates [Unreleased] from git history when empty - Install hook script symlinks pre-push for tag validation - Register skills/ dir in .pi/settings.json for pi discovery 3 个月前
Add QMD architecture diagram to README Generated with PaperBanana (Gemini 3 Pro). Shows query expansion fanning HyDE+Vec into vector searches, Lex into BM25, merged via reciprocal rank fusion and LLM reranking. 3 个月前
Enhance launcher shebang polyglot to fall back to Bun if Node is missing on the system Result: {"status":"keep","test_status":0} 20 小时前
feat: add intent parameter for query disambiguation Add optional `intent` parameter that steers query expansion, reranking, chunk selection, and snippet extraction without searching on its own. When a query like "performance" is ambiguous (web-perf vs team health vs fitness), intent provides background context that disambiguates results across all pipeline stages: - expandQuery: includes intent in LLM prompt ("Query intent: {intent}") - rerank: prepends intent to rerank query for Qwen3-Reranker - chunk selection: intent terms scored at 0.5x weight vs query terms - snippet extraction: intent terms scored at 0.3x weight - strong-signal bypass: disabled when intent provided Available via CLI (--intent flag or intent: line in query documents), MCP (intent field on query tool), and programmatic API. Adapted from PR #180 (thanks @vyalamar). 2 个月前
fix: bump transitive deps to resolve security alerts npm: vite 7.3.1→7.3.2, hono 4.12.10→4.12.12, @hono/node-server 1.19.12→1.19.13 pypi: add uv constraint-dependencies for authlib>=1.6.9, aiohttp>=3.13.4, cryptography>=46.0.7 Made-with: Cursor 1 个月前
Make release script portable 3 天前
docs: improve qmd skill guidance 3 天前
Improve qmd diagnostics and embed resilience 3 天前
Make bin/qmd launcher a shebang polyglot to support both Windows cmd/ps1 native wrappers and sh-invoked smoke tests Result: {"status":"keep","test_status":0} 20 小时前
Update get and multi-get commands for virtual paths - Update getDocument() to support qmd:// virtual paths and filesystem paths - Update multiGet() to handle virtual paths in patterns and comma-separated lists - Update matchFilesByGlob() in store.ts to return virtual paths - Remove duplicate getContextForFile() function from qmd.ts - Use collection-scoped getContextForPath() instead of legacy function - All get and multi-get tests now passing Closes qmd-vro 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com> 5 个月前
chore: gitignore package-lock.json 2 个月前
release: v2.5.2 20 小时前
feat: AST-aware chunking for code files via tree-sitter Add opt-in AST-aware chunk boundary detection for code files using web-tree-sitter. When enabled with `--chunk-strategy auto`, code files (.ts, .tsx, .js, .jsx, .py, .go, .rs) are chunked at function, class, and import boundaries instead of arbitrary text positions. Default behavior (`regex`) is unchanged — no surprises on upgrade. In testing on QMD's own codebase, AST mode split 42% fewer function bodies across chunk boundaries compared to regex-only chunking. Usage: qmd embed --chunk-strategy auto qmd query "search terms" --chunk-strategy auto What's included: - Language detection from file extension with support for TypeScript, JavaScript (including arrow functions and function expressions), Python, Go, and Rust - Per-language tree-sitter queries with scored break points aligned to the existing markdown scale (class=100, function=90, type=80, import=60) - AST break points merged with regex break points — highest score wins at each position, so embedded markdown (comments, docstrings) still benefits from regex patterns - Refactored chunking core: chunkDocumentWithBreakPoints() extracted, mergeBreakPoints() added, async chunkDocumentAsync() wrapper for AST - ChunkStrategy type ("auto" | "regex") threaded through generateEmbeddings(), hybridQuery(), structuredSearch(), CLI, and SDK - getASTStatus() health check wired into `qmd status` - Parse failures log a warning and fall back to regex — never crash Hardening: - Grammar packages are optionalDependencies with pinned versions to prevent ABI breaks from semver drift - web-tree-sitter is a direct dependency (pinned) - Errors are logged (not silently swallowed) for debuggability - Tested on both Node.js and Bun (Bun is actually faster) Testing: - 26 unit tests (test/ast.test.ts) — all 4 languages, error handling - 7 integration tests (test/store.test.ts) — merge, equivalence, bypass - Standalone test-ast-chunking.mjs with 63 synthetic tests and a real-collection performance scanner (npx tsx test-ast-chunking.mjs ~/code) - Validated end-to-end with qmd embed + qmd query on QMD's own codebase - Zero markdown regressions across all test paths Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> 2 个月前
chore: set up npm publishing as @tobi/qmd v0.9.0 - Scope package to @tobi/qmd, version 0.9.0 - Add files whitelist, publishConfig, repo metadata - Add CI workflow (bun tests on ubuntu + macos, bun latest + 1.1.0) - Add publish workflow (triggers on v* tags, publishes to npm) - Add release script for version bumping + changelog generation - Add LICENSE (MIT) and initial CHANGELOG.md - Update install instructions to use @tobi/qmd 3 个月前
fix: keep llama GPU fallback noise off JSON stdout 6 天前
chore: update core runtime dependencies 5 天前
fixes 4 个月前
Enable SQLite extension loading in devshell (#48) Override sqlite in devShell to enable extension loading for sqlite-vec support when running tests. Only sets BREW_PREFIX if not already defined to avoid overriding user's existing setup. Package build remains unchanged. Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>3 个月前
fix: update nix module hashes 5 天前
Migrate documents table to use collection names instead of IDs Schema changes: - documents.collection_id (INTEGER FK) → documents.collection (TEXT) - Update UNIQUE constraint to (collection, path) - Update indices to use collection name - Update FTS triggers to compute filepath from collection || '/' || path Code changes in store.ts: - Change all function parameters from collectionId: number to collectionName: string - Update all SQL queries to use d.collection instead of d.collection_id - Remove unnecessary JOINs where collection name is already available - Update DocumentResult type: collectionId → collectionName - Update renameCollection() to also update documents.collection Successfully migrated 2309 documents across 6 collections. This prepares for YAML-based collection configuration where collections table will be removed and collection names will be the primary identifier. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com> 5 个月前
release: v2.5.2 20 小时前
chore: update core runtime dependencies 5 天前
Improve qmd diagnostics and embed resilience 3 天前
test: split integration/model suites Split test suites for explicit runtime execution. - Move model-related tests under `src/models/*`. - Move CLI/integration tests under `src/integration/*`. - Add `src/store.helpers.unit.test.ts` for helper unit coverage. - Add shared Vitest config with default timeout and suite organization. - Remove legacy flat test files from `src/` root. - Keep core test commands in scripts supporting unit/models/integration runs. 3 个月前
fix(test): resolve LLM test timeouts by disabling file parallelism Parallel test files each cold-load their own LLM model, competing for CPU and causing timeouts even at 120s. Sequential execution eliminates contention — tests that timed out at 30s now complete in 1-15s. Made-with: Cursor 1 个月前

QMD - Query Markup Documents

本地设备上的搜索引擎,助您记住所有需要的内容。为您的 Markdown 笔记、会议记录、文档资料和知识库建立索引。支持关键词或自然语言搜索。是您智能工作流的理想之选。

QMD 融合了 BM25 全文搜索、向量语义搜索和 LLM 重排序技术 — 所有功能均通过 node-llama-cpp 在本地运行,并使用 GGUF 模型。

QMD Architecture

您可以在 CHANGELOG 中了解更多关于 QMD 的开发进展。

快速开始

# Install globally (Node or Bun)
npm install -g @tobilu/qmd
# or
bun install -g @tobilu/qmd

# Or run directly
npx @tobilu/qmd ...
bunx @tobilu/qmd ...

# Create collections for your notes, docs, and meeting transcripts
qmd collection add ~/notes --name notes
qmd collection add ~/Documents/meetings --name meetings
qmd collection add ~/work/docs --name docs

# Add context to help with search results, each piece of context will be returned when matching sub documents are returned. This works as a tree. This is the key feature of QMD as it allows LLMs to make much better contextual choices when selecting documents. Don't sleep on it!
qmd context add qmd://notes "Personal notes and ideas"
qmd context add qmd://meetings "Meeting transcripts and notes"
qmd context add qmd://docs "Work documentation"

# Generate embeddings for semantic search
qmd embed

# Search across everything
qmd search "project timeline"           # Fast keyword search
qmd vsearch "how to deploy"             # Semantic search
qmd query "quarterly planning process"  # Hybrid + reranking (best quality)

# Get a specific document
qmd get "meetings/2024-01-15.md"

# Get a document by docid (shown in search results)
qmd get "#abc123"

# Get multiple documents by glob pattern
qmd multi-get "journals/2025-05*.md"

# Search within a specific collection
qmd search "API" -c notes

# Export all matches for an agent
qmd search "API" --all --files --min-score 0.3

与 AI 智能体配合使用

QMD 的 --json--files 输出格式专为智能体工作流设计:

# Get structured results for an LLM
qmd search "authentication" --json -n 10

# List all relevant files above a threshold
qmd query "error handling" --all --files --min-score 0.4

# Retrieve full document content
qmd get "docs/api-reference.md" --full

MCP 服务器

尽管直接在命令行中让智能体使用该工具即可完美运行,但它也公开了一个 MCP(模型上下文协议)服务器,以实现更紧密的集成。

公开的工具:

  • query — 使用类型化子查询(lex/vec/hyde)进行搜索,通过 RRF + 重排序组合结果
  • get — 按路径或 docid 检索文档(提供模糊匹配建议)
  • multi_get — 通过通配符模式、逗号分隔列表或 docid 批量检索
  • status — 索引健康状况和集合信息

Claude 桌面版配置~/Library/Application Support/Claude/claude_desktop_config.json):

{
  "mcpServers": {
    "qmd": {
      "command": "qmd",
      "args": ["mcp"]
    }
  }
}

Claude 代码 — 安装插件(推荐):

claude plugin marketplace add tobi/qmd
claude plugin install qmd@qmd

或者在 ~/.claude/settings.json 中手动配置 MCP:

{
  "mcpServers": {
    "qmd": {
      "command": "qmd",
      "args": ["mcp"]
    }
  }
}

HTTP 传输

默认情况下,qmd 的 MCP 服务器使用标准输入输出(由每个客户端作为子进程启动)。若要使用共享的、长期运行的服务器以避免重复加载模型,请使用 HTTP 传输:

# Foreground (Ctrl-C to stop)
qmd mcp --http                    # localhost:8181
qmd mcp --http --port 8080        # custom port

# Background daemon
qmd mcp --http --daemon           # start, writes PID to ~/.cache/qmd/mcp.pid
qmd mcp stop                      # stop via PID file
qmd status                        # shows "MCP: running (PID ...)" when active

HTTP 服务器公开了两个端点:

  • POST /mcp — MCP 可流式 HTTP(JSON 响应,无状态)
  • GET /health — 带运行时间的存活状态检查

LLM 模型在请求之间保持加载在 VRAM 中。嵌入/重排序上下文在闲置 5 分钟后会被释放,并在下次请求时自动重新创建(约 1 秒延迟,模型仍保持加载状态)。

将任何 MCP 客户端指向 http://localhost:8181/mcp 即可连接。

SDK / 库使用

在您自己的 Node.js 或 Bun 应用程序中将 QMD 用作库。

安装

npm install @tobilu/qmd

快速开始

import { createStore } from '@tobilu/qmd'

const store = await createStore({
  dbPath: './my-index.sqlite',
  config: {
    collections: {
      docs: { path: '/path/to/docs', pattern: '**/*.md' },
    },
  },
})

const results = await store.search({ query: "authentication flow" })
console.log(results.map(r => `${r.title} (${Math.round(r.score * 100)}%)`))

await store.close()

存储创建

createStore() 接受三种模式:

import { createStore } from '@tobilu/qmd'

// 1. Inline config — no files needed besides the DB
const store = await createStore({
  dbPath: './index.sqlite',
  config: {
    collections: {
      docs: { path: '/path/to/docs', pattern: '**/*.md' },
      notes: { path: '/path/to/notes' },
    },
  },
})

// 2. YAML config file — collections defined in a file
const store2 = await createStore({
  dbPath: './index.sqlite',
  configPath: './qmd.yml',
})

// 3. DB-only — reopen a previously configured store
const store3 = await createStore({ dbPath: './index.sqlite' })

搜索

统一的 search() 方法可同时处理简单查询和预展开的结构化查询:

// Simple query — auto-expanded via LLM, then BM25 + vector + reranking
const results = await store.search({ query: "authentication flow" })

// With options
const results2 = await store.search({
  query: "rate limiting",
  intent: "API throttling and abuse prevention",
  collection: "docs",
  limit: 5,
  minScore: 0.3,
  explain: true,
})

// Pre-expanded queries — skip auto-expansion, control each sub-query
const results3 = await store.search({
  queries: [
    { type: 'lex', query: '"connection pool" timeout -redis' },
    { type: 'vec', query: 'why do database connections time out under load' },
  ],
  collections: ["docs", "notes"],
})

// Skip reranking for faster results
const fast = await store.search({ query: "auth", rerank: false })

如需直接访问后端:

// BM25 keyword search (fast, no LLM)
const lexResults = await store.searchLex("auth middleware", { limit: 10 })

// Vector similarity search (embedding model, no reranking)
const vecResults = await store.searchVector("how users log in", { limit: 10 })

// Manual query expansion for full control
const expanded = await store.expandQuery("auth flow", { intent: "user login" })
const results4 = await store.search({ queries: expanded })

检索

// Get a document by path or docid
const doc = await store.get("docs/readme.md")
const byId = await store.get("#abc123")

if (!("error" in doc)) {
  console.log(doc.title, doc.displayPath, doc.context)
}

// Get document body with line range
const body = await store.getDocumentBody("docs/readme.md", {
  fromLine: 50,
  maxLines: 100,
})

// Batch retrieve by glob or comma-separated list
const { docs, errors } = await store.multiGet("docs/**/*.md", {
  maxBytes: 20480,
})

集合

// Add a collection
await store.addCollection("myapp", {
  path: "/src/myapp",
  pattern: "**/*.ts",
  ignore: ["node_modules/**", "*.test.ts"],
})

// List collections with document stats
const collections = await store.listCollections()
// => [{ name, pwd, glob_pattern, doc_count, active_count, last_modified, includeByDefault }]

// Get names of collections included in queries by default
const defaults = await store.getDefaultCollectionNames()

// Remove / rename
await store.removeCollection("myapp")
await store.renameCollection("old-name", "new-name")

上下文

上下文会添加描述性元数据,以提高搜索相关性,并与结果一同返回:

// Add context for a path within a collection
await store.addContext("docs", "/api", "REST API reference documentation")

// Set global context (applies to all collections)
await store.setGlobalContext("Internal engineering documentation")

// List all contexts
const contexts = await store.listContexts()
// => [{ collection, path, context }]

// Remove context
await store.removeContext("docs", "/api")
await store.setGlobalContext(undefined)  // clear global

索引建立

// Re-index collections by scanning the filesystem
const result = await store.update({
  collections: ["docs"],  // optional — defaults to all
  onProgress: ({ collection, file, current, total }) => {
    console.log(`[${collection}] ${current}/${total} ${file}`)
  },
})
// => { collections, indexed, updated, unchanged, removed, needsEmbedding }

// Generate vector embeddings
const embedResult = await store.embed({
  force: false,           // true to re-embed everything
  chunkStrategy: "auto",  // "regex" (default) or "auto" (AST for code files)
  onProgress: ({ current, total, collection }) => {
    console.log(`Embedding ${current}/${total}`)
  },
})

类型

为 SDK 使用者导出的关键类型:

import type {
  QMDStore,            // The store interface
  SearchOptions,       // Options for search()
  LexSearchOptions,    // Options for searchLex()
  VectorSearchOptions, // Options for searchVector()
  HybridQueryResult,   // Search result with score, snippet, context
  SearchResult,        // Result from searchLex/searchVector
  ExpandedQuery,       // Typed sub-query { type: 'lex'|'vec'|'hyde', query }
  DocumentResult,      // Document metadata + body
  DocumentNotFound,    // Error with similarFiles suggestions
  MultiGetResult,      // Batch retrieval result
  UpdateProgress,      // Progress callback info for update()
  UpdateResult,        // Aggregated update result
  EmbedProgress,       // Progress callback info for embed()
  EmbedResult,         // Embedding result
  StoreOptions,        // createStore() options
  CollectionConfig,    // Inline config shape
  IndexStatus,         // From getStatus()
  IndexHealthInfo,     // From getIndexHealth()
} from '@tobilu/qmd'

工具函数导出:

import {
  extractSnippet,              // Extract a relevant snippet from text
  addLineNumbers,              // Add line numbers to text
  DEFAULT_MULTI_GET_MAX_BYTES, // Default max file size for multiGet (10KB)
  Maintenance,                 // Database maintenance operations
} from '@tobilu/qmd'

生命周期

// Close the store — disposes LLM models and DB connection
await store.close()

SDK 需要显式提供 dbPath——不预设任何默认值。这确保其可以安全地嵌入任何应用程序,且不会产生副作用。

架构

┌─────────────────────────────────────────────────────────────────────────────┐
│                         QMD Hybrid Search Pipeline                          │
└─────────────────────────────────────────────────────────────────────────────┘

                              ┌─────────────────┐
                              │   User Query    │
                              └────────┬────────┘
                                       │
                        ┌──────────────┴──────────────┐
                        ▼                             ▼
               ┌────────────────┐            ┌────────────────┐
               │ Query Expansion│            │  Original Query│
               │  (fine-tuned)  │            │   (×2 weight)  │
               └───────┬────────┘            └───────┬────────┘
                       │                             │
                       │ 2 alternative queries       │
                       └──────────────┬──────────────┘
                                      │
              ┌───────────────────────┼───────────────────────┐
              ▼                       ▼                       ▼
     ┌─────────────────┐     ┌─────────────────┐     ┌─────────────────┐
     │ Original Query  │     │ Expanded Query 1│     │ Expanded Query 2│
     └────────┬────────┘     └────────┬────────┘     └────────┬────────┘
              │                       │                       │
      ┌───────┴───────┐       ┌───────┴───────┐       ┌───────┴───────┐
      ▼               ▼       ▼               ▼       ▼               ▼
  ┌───────┐       ┌───────┐ ┌───────┐     ┌───────┐ ┌───────┐     ┌───────┐
  │ BM25  │       │Vector │ │ BM25  │     │Vector │ │ BM25  │     │Vector │
  │(FTS5) │       │Search │ │(FTS5) │     │Search │ │(FTS5) │     │Search │
  └───┬───┘       └───┬───┘ └───┬───┘     └───┬───┘ └───┬───┘     └───┬───┘
      │               │         │             │         │             │
      └───────┬───────┘         └──────┬──────┘         └──────┬──────┘
              │                        │                       │
              └────────────────────────┼───────────────────────┘
                                       │
                                       ▼
                          ┌───────────────────────┐
                          │   RRF Fusion + Bonus  │
                          │  Original query: ×2   │
                          │  Top-rank bonus: +0.05│
                          │     Top 30 Kept       │
                          └───────────┬───────────┘
                                      │
                                      ▼
                          ┌───────────────────────┐
                          │    LLM Re-ranking     │
                          │  (qwen3-reranker)     │
                          │  Yes/No + logprobs    │
                          └───────────┬───────────┘
                                      │
                                      ▼
                          ┌───────────────────────┐
                          │  Position-Aware Blend │
                          │  Top 1-3:  75% RRF    │
                          │  Top 4-10: 60% RRF    │
                          │  Top 11+:  40% RRF    │
                          └───────────────────────┘

分数归一化与融合

搜索后端

后端 原始分数 转换方式 范围
FTS (BM25) SQLite FTS5 BM25 Math.abs(score) 0 至 ~25+
向量 余弦距离 1 / (1 + distance) 0.0 至 1.0
重排序器 LLM 0-10 评分 score / 10 0.0 至 1.0

融合策略

query 命令采用 倒数排序融合(RRF) 并结合位置感知混合:

  1. 查询扩展:原始查询(加权 ×2)+ 1 个 LLM 变体
  2. 并行检索:每个查询同时搜索 FTS 和向量索引
  3. RRF 融合:使用 score = Σ(1/(k+rank+1)) 合并所有结果列表,其中 k=60
  4. 顶级排名奖励:在任何列表中排名第 1 的文档加 0.05 分,排名第 2-3 的加 0.02 分
  5. Top-K 选择:取排名前 30 的候选文档进行重排序
  6. 重排序:LLM 为每个文档评分(是/否,附带对数概率置信度)
  7. 位置感知混合
    • RRF 排名 1-3:75% 检索分数,25% 重排序分数(保留精确匹配)
    • RRF 排名 4-10:60% 检索分数,40% 重排序分数
    • RRF 排名 11+:40% 检索分数,60% 重排序分数(更信任重排序器)

为何采用此方法:当扩展查询不匹配时,纯 RRF 可能会稀释精确匹配结果。顶级排名奖励可保留在原始查询中得分第 1 的文档。位置感知混合可防止重排序器破坏高置信度的检索结果。

分数解读

分数 含义
0.8 - 1.0 高度相关
0.5 - 0.8 中度相关
0.2 - 0.5 一定相关性
0.0 - 0.2 低相关性

要求

系统要求

  • Node.js >= 22
  • Bun >= 1.0.0
  • macOS:Homebrew SQLite(用于扩展支持)
    brew install sqlite
    

GGUF 模型(通过 node-llama-cpp)

QMD 使用三个本地 GGUF 模型(首次使用时自动下载):

模型 用途 大小
embeddinggemma-300M-Q8_0 向量嵌入(默认) ~300MB
qwen3-reranker-0.6b-q8_0 重排序 ~640MB
qmd-query-expansion-1.7B-q4_k_m 查询扩展(经过微调) ~1.1GB

模型从 HuggingFace 下载并缓存在 ~/.cache/qmd/models/ 目录中。

自定义嵌入模型

通过 QMD_EMBED_MODEL 环境变量覆盖默认嵌入模型。 这对于多语言语料库(例如中文、日文、韩文)非常有用,因为 embeddinggemma-300M 的覆盖范围有限。

# Use Qwen3-Embedding-0.6B for better multilingual (CJK) support
export QMD_EMBED_MODEL="hf:Qwen/Qwen3-Embedding-0.6B-GGUF/Qwen3-Embedding-0.6B-Q8_0.gguf"

# After changing the model, re-embed all collections:
qmd embed -f

支持的模型系列:

  • embeddinggemma(默认)—— 针对英语优化,占用空间小
  • Qwen3-Embedding—— 多语言(包括中日韩在内的119种语言),MTEB排名领先

注意: 切换嵌入模型时,必须使用 qmd embed -f 重新建立索引,因为不同模型之间的向量不兼容。系统会自动为每个模型系列调整提示词格式。

安装

npm install -g @tobilu/qmd
# or
bun install -g @tobilu/qmd

开发

git clone https://github.com/tobi/qmd
cd qmd
npm install
npm link

使用方法

集合管理

# Create a collection from current directory
qmd collection add . --name myproject

# Create a collection with explicit path and custom glob mask
qmd collection add ~/Documents/notes --name notes --mask "**/*.md"

# List all collections
qmd collection list

# Remove a collection
qmd collection remove myproject

# Rename a collection
qmd collection rename myproject my-project

# List files in a collection
qmd ls notes
qmd ls notes/subfolder

生成向量嵌入

# Embed all indexed documents (900 tokens/chunk, 15% overlap)
qmd embed

# Force re-embed everything
qmd embed -f

# Enable AST-aware chunking for code files (TS, JS, Python, Go, Rust)
qmd embed --chunk-strategy auto

# Also works with query for consistent chunk selection
qmd query "auth flow" --chunk-strategy auto

AST感知分块--chunk-strategy auto)利用tree-sitter按函数、类和导入边界对代码文件进行分块,而非基于任意文本位置。这能生成更高质量的分块,为代码库带来更优的搜索结果。无论采用何种策略,Markdown及其他文件类型始终使用基于正则表达式的分块方式。

默认策略为regex(现有行为)。使用--chunk-strategy auto可选择启用AST感知分块。运行qmd status可验证可用的语法解析器。

注意:Tree-sitter语法解析器是可选依赖。若未安装,--chunk-strategy auto会自动回退至仅使用正则表达式的分块方式。已在Node.js和Bun环境中测试通过。

上下文管理

上下文为集合和路径添加描述性元数据,帮助搜索功能更好地理解您的内容。

# Add context to a collection (using qmd:// virtual paths)
qmd context add qmd://notes "Personal notes and ideas"
qmd context add qmd://docs/api "API documentation"

# Add context from within a collection directory
cd ~/notes && qmd context add "Personal notes and ideas"
cd ~/notes/work && qmd context add "Work-related notes"

# Add global context (applies to all collections)
qmd context add / "Knowledge base for my projects"

# List all contexts
qmd context list

# Remove context
qmd context rm qmd://notes/old

搜索命令

┌──────────────────────────────────────────────────────────────────┐
│                        Search Modes                              │
├──────────┬───────────────────────────────────────────────────────┤
│ search   │ BM25 full-text search only                           │
│ vsearch  │ Vector semantic search only                          │
│ query    │ Hybrid: FTS + Vector + Query Expansion + Re-ranking  │
└──────────┴───────────────────────────────────────────────────────┘
# Full-text search (fast, keyword-based)
qmd search "authentication flow"

# Vector search (semantic similarity)
qmd vsearch "how to login"

# Hybrid search with re-ranking (best quality)
qmd query "user authentication"

选项

# Search options
-n <num>           # Number of results (default: 5, or 20 for --files/--json)
-c, --collection   # Restrict search to a specific collection
--all              # Return all matches (use with --min-score to filter)
--min-score <num>  # Minimum score threshold (default: 0)
--full             # Show full document content
--line-numbers     # Add line numbers to output
--explain          # Include retrieval score traces (query, JSON/CLI output)
--index <name>     # Use named index

# Output formats (for search and multi-get)
--files            # Output: docid,score,filepath,context
--json             # JSON output with snippets
--csv              # CSV output
--md               # Markdown output
--xml              # XML output

# Get options
qmd get <file>[:line]  # Get document, optionally starting at line
-l <num>               # Maximum lines to return
--from <num>           # Start from line number

# Multi-get options
-l <num>           # Maximum lines per file
--max-bytes <num>  # Skip files larger than N bytes (default: 10KB)

输出格式

默认输出为带颜色的 CLI 格式(遵循 NO_COLOR 环境变量)。

当标准输出(stdout)是 TTY 时,结果路径会作为可点击的终端超链接(OSC 8)显示。点击路径会使用编辑器 URI 模板在你的编辑器中打开文件。

当标准输出不是 TTY 时(例如通过管道传输到另一个命令或重定向到文件),QMD 会输出纯文本路径,不包含转义序列。

TTY 示例:

docs/guide.md:42 #a1b2c3
Title: Software Craftsmanship
Context: Work documentation
Score: 93%

This section covers the **craftsmanship** of building
quality software with attention to detail.
See also: engineering principles


notes/meeting.md:15 #d4e5f6
Title: Q4 Planning
Context: Personal notes and ideas
Score: 67%

Discussion about code quality and craftsmanship
in the development process.

使用 QMD_EDITOR_URI(或配置文件中的 editor_uri)配置编辑器链接目标:

# VS Code (default)
export QMD_EDITOR_URI="vscode://file/{path}:{line}:{col}"

# Cursor
export QMD_EDITOR_URI="cursor://file/{path}:{line}:{col}"

# Zed
export QMD_EDITOR_URI="zed://file/{path}:{line}:{col}"

# Sublime Text
export QMD_EDITOR_URI="subl://open?url=file://{path}&line={line}"

模板占位符:

  • {path} 文件系统绝对路径(URI 编码)

  • {line} 基于 1 的行号

  • {col}{column} 基于 1 的列号

  • Path:相对于集合的路径(例如,docs/guide.md

  • Docid:短哈希标识符(例如,#a1b2c3)- 与 qmd get #a1b2c3 配合使用

  • Title:从文档中提取(首个标题或文件名)

  • Context:通过 qmd context add 配置的路径上下文

  • Score:颜色编码(绿色表示 >70%,黄色表示 >40%,其他情况显示为暗淡色)

  • Snippet:匹配内容的上下文,其中查询术语会被高亮显示

示例

# Get 10 results with minimum score 0.3
qmd query -n 10 --min-score 0.3 "API design patterns"

# Output as markdown for LLM context
qmd search --md --full "error handling"

# JSON output for scripting
qmd query --json "quarterly reports"

# Inspect how each result was scored (RRF + rerank blend)
qmd query --json --explain "quarterly reports"

# Use separate index for different knowledge base
qmd --index work search "quarterly reports"

索引维护

# Show index status and collections with contexts
qmd status

# Re-index all collections
qmd update

# Re-index with git pull first (for remote repos)
qmd update --pull

# Get document by filepath (with fuzzy matching suggestions)
qmd get notes/meeting.md

# Get document by docid (from search results)
qmd get "#abc123"

# Get document starting at line 50, max 100 lines
qmd get notes/meeting.md:50 -l 100

# Get multiple documents by glob pattern
qmd multi-get "journals/2025-05*.md"

# Get multiple documents by comma-separated list (supports docids)
qmd multi-get "doc1.md, doc2.md, #abc123"

# Limit multi-get to files under 20KB
qmd multi-get "docs/*.md" --max-bytes 20480

# Output multi-get as JSON for agent processing
qmd multi-get "docs/*.md" --json

# Clean up cache and orphaned data
qmd cleanup

数据存储

索引存储位置:~/.cache/qmd/index.sqlite

架构

collections     -- Indexed directories with name and glob patterns
path_contexts   -- Context descriptions by virtual path (qmd://...)
documents       -- Markdown content with metadata and docid (6-char hash)
documents_fts   -- FTS5 full-text index
content_vectors -- Embedding chunks (hash, seq, pos, 900 tokens each)
vectors_vec     -- sqlite-vec vector index (hash_seq key)
llm_cache       -- Cached LLM responses (query expansion, rerank scores)

环境变量

变量 默认值 说明
XDG_CACHE_HOME ~/.cache 缓存目录位置
QMD_LLAMA_GPU auto 强制 llama.cpp 使用 GPU 后端(metalvulkancuda),或设置为 false 禁用 GPU
QMD_FORCE_CPU 未设置 设置为 1/true 可在任何 CUDA/Vulkan/Metal 探测前强制启用 CPU 模式。等效的命令行标志:--no-gpu
QMD_EMBED_PARALLELISM 自动 覆盖嵌入/重排序上下文并行度(1-8)。Windows CUDA 默认值为 1,因为并行 CUDA 上下文可能导致 ggml-cuda.cu:98 错误崩溃;请使用 Vulkan,或仅在驱动稳定时提高此值。

工作原理

索引流程

Collection ──► Glob Pattern ──► Markdown Files ──► Parse Title ──► Hash Content
    │                                                   │              │
    │                                                   │              ▼
    │                                                   │         Generate docid
    │                                                   │         (6-char hash)
    │                                                   │              │
    └──────────────────────────────────────────────────►└──► Store in SQLite
                                                                       │
                                                                       ▼
                                                                  FTS5 Index

嵌入流程

文档会通过智能边界检测被切分为约900个token的片段,片段之间有15%的重叠:

Document ──► Smart Chunk (~900 tokens) ──► Format each chunk ──► node-llama-cpp ──► Store Vectors
                │                           "title | text"        embedBatch()
                │
                └─► Chunks stored with:
                    - hash: document hash
                    - seq: chunk sequence (0, 1, 2...)
                    - pos: character position in original

智能分块

QMD 并非按生硬的 token 边界切割,而是采用评分算法寻找自然的 Markdown 断点。这样可以将语义单元(章节、段落、代码块)保持完整。

断点评分:

模式 分值 说明
# 标题 100 H1 - 主要章节
## 标题 90 H2 - 子章节
### 标题 80 H3
#### 标题 70 H4
##### 标题 60 H5
###### 标题 50 H6
``` 80 代码块边界
--- / *** 60 水平分隔线
空行 20 段落边界
- 项目 / 1. 项目 5 列表项
换行符 1 最小断点

算法:

  1. 扫描文档,找出所有带分值的断点
  2. 当接近 900-token 目标时,在截止点前搜索 200-token 窗口
  3. 为每个断点评分:最终得分 = 基础分值 × (1 - (距离/窗口)² × 0.7)
  4. 在得分最高的断点处切割

平方距离衰减意味着 200-token 外的标题(得分约 30)仍能胜过目标位置的简单换行符(得分 1),但较近的标题会优于较远的标题。

代码块保护: 代码块内部的断点会被忽略——代码保持完整。如果代码块超过分块大小,会尽可能保持其完整性。

AST 感知分块(代码文件):

对于支持的代码文件,QMD 还会使用 tree-sitter 解析源代码,并添加从 AST 派生的断点,这些断点会与上述正则表达式评分合并:

AST 节点 分值 语言
类 / 接口 / 结构体 / 实现 / 特征 100 所有
函数 / 方法 90 所有
类型别名 / 枚举 80 所有
导入 / 使用声明 60 所有

支持 .ts.tsx.js.jsx.py.go.rs 文件。通过 --chunk-strategy auto 启用。Markdown 和其他文件类型始终使用正则表达式分块。

查询流程(混合式)

Query ──► LLM Expansion ──► [Original, Variant 1, Variant 2]
                │
      ┌─────────┴─────────┐
      ▼                   ▼
   For each query:     FTS (BM25)
      │                   │
      ▼                   ▼
   Vector Search      Ranked List
      │
      ▼
   Ranked List
      │
      └─────────┬─────────┘
                ▼
         RRF Fusion (k=60)
         Original query ×2 weight
         Top-rank bonus: +0.05/#1, +0.02/#2-3
                │
                ▼
         Top 30 candidates
                │
                ▼
         LLM Re-ranking
         (yes/no + logprob confidence)
                │
                ▼
         Position-Aware Blend
         Rank 1-3:  75% RRF / 25% reranker
         Rank 4-10: 60% RRF / 40% reranker
         Rank 11+:  40% RRF / 60% reranker
                │
                ▼
         Final Results

模型配置

模型在 src/llm.ts 中以 HuggingFace URI 的形式进行配置:

const DEFAULT_EMBED_MODEL = "hf:ggml-org/embeddinggemma-300M-GGUF/embeddinggemma-300M-Q8_0.gguf";
const DEFAULT_RERANK_MODEL = "hf:ggml-org/Qwen3-Reranker-0.6B-Q8_0-GGUF/qwen3-reranker-0.6b-q8_0.gguf";
const DEFAULT_GENERATE_MODEL = "hf:tobil/qmd-query-expansion-1.7B-gguf/qmd-query-expansion-1.7B-q4_k_m.gguf";

EmbeddingGemma 提示词格式

// For queries
"task: search result | query: {query}"

// For documents
"title: {title} | text: {content}"

Qwen3-Reranker

使用 node-llama-cpp 的 createRankingContext()rankAndSort() API 进行交叉编码器重排序。返回按相关度分数(0.0 - 1.0)排序的文档。

Qwen3(查询扩展)

用于通过 LlamaChatSession 生成查询变体。

许可证

MIT