文件最后提交记录最后更新时间
Merge origin/main into feat/ast-aware-chunking Resolve conflicts: combine AST chunking args (filepath, chunkStrategy) with abort signal parameter from #458. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> 1 个月前
Merge origin/main into feat/ast-aware-chunking Resolve conflicts: combine AST chunking args (filepath, chunkStrategy) with abort signal parameter from #458. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> 1 个月前
Merge origin/main into feat/ast-aware-chunking Resolve conflicts: combine AST chunking args (filepath, chunkStrategy) with abort signal parameter from #458. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> 1 个月前
finetune: quoted phrases, negation, and entity preservation (#247) Training data: - Expand lex phrases/negation examples from 12 to 74 with intent field - Add 50 personal entity examples (meetings, emails, projects with names) Reward function: - Detect entities at position 0 (fixes "Bob asked about deploy") - Per-entity coverage penalty: -20 per entity absent from all lex+vec - Phrase quoting bonus: +3 when lex uses quotes for multi-word terms - Expanded stopwords to reduce false positive entity detection Eval queries: add 21 test queries for personal entities, quoted phrases, and negation/disambiguation scenarios. 2 个月前
Move GRPO training out of default finetune pipeline 2 个月前
Move GRPO training out of default finetune pipeline 2 个月前
lots of training stuff 3 个月前
Remove grpo command from default train entrypoint 2 个月前
feat: add ONNX conversion script for Transformers.js deployment Add convert_onnx.py that mirrors convert_gguf.py's structure: - Loads base Qwen3 model, merges SFT + GRPO adapters - Exports to ONNX via Optimum (text-generation-with-past task) - Supports Q4 (MatMulNBits), Q8, FP16, and FP32 output - Uploads to separate HF repo (e.g. tobil/qmd-query-expansion-1.7B-ONNX) - Writes Transformers.js compatibility config - Includes model card with usage example Usage: uv run convert_onnx.py --size 1.7B uv run convert_onnx.py --size 1.7B --quantize q4 --no-upload Also adds `just convert-onnx` and `just convert-gguf` tasks. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> 2 个月前
lots of training stuff 3 个月前
Remove grpo command from default train entrypoint 2 个月前
finetune: quoted phrases, negation, and entity preservation (#247) Training data: - Expand lex phrases/negation examples from 12 to 74 with intent field - Add 50 personal entity examples (meetings, emails, projects with names) Reward function: - Detect entities at position 0 (fixes "Bob asked about deploy") - Per-entity coverage penalty: -20 per entity absent from all lex+vec - Phrase quoting bonus: +3 when lex uses quotes for multi-word terms - Expanded stopwords to reduce false positive entity detection Eval queries: add 21 test queries for personal entities, quoted phrases, and negation/disambiguation scenarios. 2 个月前
Merge origin/main into feat/ast-aware-chunking Resolve conflicts: combine AST chunking args (filepath, chunkStrategy) with abort signal parameter from #458. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> 1 个月前
Finetune 2.0: consolidate and simplify the entire training pipeline Consolidate ~2,800 lines of duplicated code across 12 files into 5 clean, well-documented files targeting Qwen3-1.7B end-to-end. Key changes: - Extract reward function into single source of truth (reward.py) Previously duplicated 3x with divergent bugs across rl.py, train_1.7B_grpo.py, and train_4B_grpo.py - Unify training into one script with sft/grpo subcommands (train.py) Replaces train.py + rl.py + train_1.7B_grpo.py + train_4B_grpo.py - Merge eval generate+score into single eval.py Replaces evals/run.py + evals/score.py - Parameterize GGUF conversion by --size (convert_gguf.py) Replaces convert_1.7B_gguf.py + convert_4B_gguf.py - Fix critical bug: rl.py silently ignored beta/temperature from config, causing the exact catastrophic drift its own comments warned about - Fix prompt consistency: all files use /no_think chat template format - Retarget configs from 0.6B to 1.7B - Comprehensive README documenting the full pipeline Removed: rl.py, train_1.7B_grpo.py, train_4B_grpo.py, convert_1.7B_gguf.py, convert_4B_gguf.py, tui.py, evals/run.py, evals/score.py Net: -3,429 lines, +382 lines Co-Authored-By: Claude (claude-fudge-eap-cc) <noreply@anthropic.com> 3 个月前
fix: map quantize_type to valid Transformers.js dtype values --quantize none now emits dtype: "fp32" in the README instead of dtype: "none", matching Transformers.js documented values (fp32, fp16, q8, q4). 2 个月前
Add wall-clock checkpoints and full eval defaults 2 个月前
Merge origin/main into feat/ast-aware-chunking Resolve conflicts: combine AST chunking args (filepath, chunkStrategy) with abort signal parameter from #458. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> 1 个月前
fix: bump transitive deps to resolve security alerts npm: vite 7.3.1→7.3.2, hono 4.12.10→4.12.12, @hono/node-server 1.19.12→1.19.13 pypi: add uv constraint-dependencies for authlib>=1.6.9, aiohttp>=3.13.4, cryptography>=46.0.7 Made-with: Cursor 1 个月前
fix(reward): tighten entity detection, add filler penalty, stricter diversity - Compound entity chaining now stops one level deep. Previously "TDS motorsports team history" would inflate the expected entity set with "team" and "history", causing false-positive entity-preservation penalties during GRPO. Now only {tds, motorsports} are detected. - Add INTERIOR_FILLER_WORDS penalty (-3/line): lex lines containing "overview" or "basics" absent from the original query are penalised. Targets template-generator noise, e.g. "ancient overview rome timeline". - Raise is_diverse threshold 2→3: requires 3 unique words between lex lines before they count as diverse. Reduces reward for near-duplicate pairs like "auth setup" / "auth configuration". - Broaden quoted-phrase bonus: was gated on named entities existing; now any multi-word query earns +3 for using quotes in lex lines. Better incentivises BM25-aware syntax like "memory leak" python. Fixes scoring noise identified while working on issue #247. 2 个月前
Merge origin/main into feat/ast-aware-chunking Resolve conflicts: combine AST chunking args (filepath, chunkStrategy) with abort signal parameter from #458. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> 1 个月前
Remove grpo command from default train entrypoint 2 个月前
Merge origin/main into feat/ast-aware-chunking Resolve conflicts: combine AST chunking args (filepath, chunkStrategy) with abort signal parameter from #458. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> 1 个月前
fix: bump transitive deps to resolve security alerts npm: vite 7.3.1→7.3.2, hono 4.12.10→4.12.12, @hono/node-server 1.19.12→1.19.13 pypi: add uv constraint-dependencies for authlib>=1.6.9, aiohttp>=3.13.4, cryptography>=46.0.7 Made-with: Cursor 1 个月前
README.md

license: mit language:

  • en base_model: Qwen/Qwen3-1.7B tags:
  • query-expansion
  • search
  • gguf
  • qwen3 pipeline_tag: text-generation

QMD Query Expansion Fine-Tuning

Train small language models to expand search queries for QMD's hybrid retrieval pipeline.

What This Does

Given a raw search query like "auth config", the trained model produces structured expansions:

hyde: Authentication can be configured by setting the AUTH_SECRET environment variable.
lex: authentication configuration
lex: auth settings setup
vec: how to configure authentication settings
vec: authentication configuration options

These feed into QMD's three search backends:

  • lex: lines go to BM25 full-text search (short, keyword-focused)
  • vec: lines go to vector similarity search (natural language phrases)
  • hyde: is a hypothetical document passage for embedding-based retrieval (HyDE technique)

Quick Start

Cloud training via HuggingFace Jobs (no GPU needed)

# 1. SFT: teach the model the output format (~45 min on A10G, ~$1.50)
hf jobs uv run --flavor a10g-large --secrets HF_TOKEN --timeout 2h jobs/sft.py

# 2. Evaluate against test queries (needs local GPU or use eval job)
uv run eval.py tobil/qmd-query-expansion-1.7B

# 3. Convert to GGUF for local deployment (Ollama, llama.cpp)
uv run convert_gguf.py --size 1.7B

# NOTE: GRPO is currently experimental and moved to finetune/experiments/grpo
# if you want to run it manually, use:
#   cd finetune && uv run python experiments/grpo/grpo.py

Local training (if you have a GPU)

uv run train.py sft  --config configs/sft.yaml

# Experimental GRPO
cd finetune && uv run python experiments/grpo/grpo.py

Monitoring HF Jobs

hf jobs ps                           # list running jobs
hf jobs inspect <job-id>             # check status
hf jobs logs <job-id>                # stream logs
hf jobs cancel <job-id>              # cancel a job

Prompt Format

All tools use the same prompt — Qwen3 chat template with /no_think:

<|im_start|>user
/no_think Expand this search query: {query}<|im_end|>
<|im_start|>assistant

The /no_think directive suppresses Qwen3's chain-of-thought mode, producing direct lex:/vec:/hyde: output without <think> blocks.

File Structure

finetune/
├── reward.py          # Scoring/reward function (single source of truth)
├── train.py           # SFT training entrypoint
├── eval.py            # Generate expansions and score them
├── convert_gguf.py    # GGUF conversion for Ollama/llama.cpp
├── jobs/
│   ├── sft.py         # Self-contained SFT for HuggingFace Jobs
│   ├── eval.py        # Self-contained eval for HuggingFace Jobs
│   └── eval_common.py # Shared eval utilities
├── configs/
│   └── sft.yaml       # SFT hyperparameters for Qwen3-1.7B
├── evals/
│   └── queries.txt    # 31 test queries across 8 categories
├── experiments/
│   └── grpo/          # Experimental GRPO configuration and script (optional)
├── data/              # Training JSONL files (all concatenated for training)
├── dataset/
│   ├── prepare_data.py     # Format for Qwen3 chat template, dedup, split
│   ├── schema.py           # Parse/normalize output format
│   ├── validate_schema.py  # Validate JSONL against schema
│   ├── score_data.py       # Score all examples using reward.py
│   └── analyze_data.py     # Analyze distribution and quality
├── SCORING.md         # Detailed scoring rubric reference
└── README.md          # This file

Training Pipeline

Stage 1: SFT (Supervised Fine-Tuning)

Teaches the model the lex:/vec:/hyde: output format from labeled examples.

Parameter Value
Base model Qwen/Qwen3-1.7B
Method LoRA (rank 16, alpha 32)
Target modules All projection layers (q/k/v/o/gate/up/down)
Dataset ~2,290 examples (train split)
Effective batch size 16 (4 x 4 gradient accumulation)
Epochs 5
Learning rate 2e-4 (cosine schedule)
uv run train.py sft --config configs/sft.yaml
uv run train.py sft --config configs/sft.yaml --dry-run  # preview config

Stage 2: (Experimental) GRPO

GRPO is currently treated as experimental and kept under experiments/grpo/. It is not part of the default production path for this repository.

# Optional experimental GRPO run
cd finetune && uv run python experiments/grpo/grpo.py

Evaluation

eval.py generates expansions from a model and scores them against test queries:

# Evaluate a SFT model
uv run eval.py --model tobil/qmd-query-expansion-1.7B-sft

# Evaluate an SFT output dir
uv run eval.py outputs/sft

# Verbose output with deduction details
uv run eval.py tobil/qmd-query-expansion-1.7B -v

# Optional: evaluate GRPO experimental output (if run)
uv run eval.py outputs/grpo

# Save detailed scores to JSON
uv run eval.py tobil/qmd-query-expansion-1.7B -o scores.json

Reward Function

reward.py is the single source of truth for scoring. It is used for evaluation and (optionally) as the GRPO reward signal in the experimental path.

Five scoring dimensions (max 120 without hyde, 140 with):

Dimension Points What It Measures
Format 0-30 Has lex/vec lines, no invalid lines
Diversity 0-30 Multiple expansion types, diverse content, no query echoes
HyDE 0-20 Present, 50-200 chars, single line, not repetitive
Quality 0-20 Lex shorter than vec, natural language, preserves key terms
Entity -45 to +20 Named entities preserved in lex and vec lines
Think bonus 0-20 Reward for NOT using <think> mode

Hard failures (instant 0.0):

  • Chat template leakage (<|im_start|>, <|im_end|>, etc.)
  • Any line without a valid lex:, vec:, or hyde: prefix
# Self-test the reward function
uv run reward.py

GGUF Conversion

Merges base + SFT and (optionally) GRPO adapters into a single model, then produces quantized GGUF files for deployment:

# Use preset for 1.7B
uv run convert_gguf.py --size 1.7B

# Custom models
uv run convert_gguf.py --base Qwen/Qwen3-1.7B \
                       --sft tobil/qmd-query-expansion-1.7B-sft \
                       --grpo tobil/qmd-query-expansion-1.7B-grpo \
                       --output tobil/qmd-query-expansion-1.7B-gguf

Using with Ollama

huggingface-cli download tobil/qmd-query-expansion-1.7B-gguf \
    qmd-query-expansion-1.7B-q4_k_m.gguf --local-dir .

echo 'FROM ./qmd-query-expansion-1.7B-q4_k_m.gguf' > Modelfile
ollama create qmd-expand -f Modelfile
ollama run qmd-expand

Data Pipeline

All JSONL files in data/ are concatenated for training. To prepare for training:

# Format for Qwen3 chat template, deduplicate, split train/val
uv run dataset/prepare_data.py

# Validate data quality
just validate

Architecture Notes

The production training approach is currently SFT-only:

  1. SFT establishes format compliance and basic query understanding. It uses a large LoRA (rank 16, all projection layers) because it needs to learn a new output format from scratch.

  2. GRPO exists as an optional experimental path under experiments/grpo/ and is not in the production training pipeline.

The reward function is entirely rule-based (no LLM judge) which makes it fast, deterministic, and suitable as an RL signal. See SCORING.md for the full rubric.

Training Results (Qwen3-1.7B, v2)

SFT

Metric Value
Final train loss 0.472
Final eval loss 0.304
Token accuracy (train) 97.4%
Token accuracy (eval) 93.8%
Epochs 5
Hardware A10G (24 GB VRAM)

Evaluation Scores

Model Average Score Excellent (30)
SFT 92.0% 30/30

GRPO scores are not tracked in this branch; see experiments/grpo/ for historical experimental results.