qmd/finetune/data · qmd1/qmd - AtomGit

TTobi LütkeAdd wall-clock checkpoints and full eval defaults

文件	最后提交记录	最后更新时间
train	Add wall-clock checkpoints and full eval defaults	2 个月前
fix_hyde_checkpoint.json	Merge origin/main into feat/ast-aware-chunking Resolve conflicts: combine AST chunking args (filepath, chunkStrategy) with abort signal parameter from #458. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	1 个月前
qmd_expansion_balanced_deduped.jsonl	lots of training stuff	3 个月前
qmd_expansion_diverse_addon.jsonl	lots of training stuff	3 个月前
qmd_expansion_handcrafted.jsonl	lots of training stuff	3 个月前
qmd_expansion_handcrafted_only.jsonl	lots of training stuff	3 个月前
qmd_expansion_lex_phrases_negation.jsonl	finetune: quoted phrases, negation, and entity preservation (#247) Training data: - Expand lex phrases/negation examples from 12 to 74 with intent field - Add 50 personal entity examples (meetings, emails, projects with names) Reward function: - Detect entities at position 0 (fixes "Bob asked about deploy") - Per-entity coverage penalty: -20 per entity absent from all lex+vec - Phrase quoting bonus: +3 when lex uses quotes for multi-word terms - Expanded stopwords to reduce false positive entity detection Eval queries: add 21 test queries for personal entities, quoted phrases, and negation/disambiguation scenarios.	2 个月前
qmd_expansion_locations.jsonl	lots of training stuff	3 个月前
qmd_expansion_people.jsonl	lots of training stuff	3 个月前
qmd_expansion_personal_entities.jsonl	finetune: quoted phrases, negation, and entity preservation (#247) Training data: - Expand lex phrases/negation examples from 12 to 74 with intent field - Add 50 personal entity examples (meetings, emails, projects with names) Reward function: - Detect entities at position 0 (fixes "Bob asked about deploy") - Per-entity coverage penalty: -20 per entity absent from all lex+vec - Phrase quoting bonus: +3 when lex uses quotes for multi-word terms - Expanded stopwords to reduce false positive entity detection Eval queries: add 21 test queries for personal entities, quoted phrases, and negation/disambiguation scenarios.	2 个月前
qmd_expansion_short_nontech.jsonl	lots of training stuff	3 个月前
qmd_expansion_sports.jsonl	data: add 48 sports acronym training examples Covers UFC, NFL, NBA, NHL, MLB, F1, MLS, IMSA, WEC, NASCAR, PGA, ATP, WTA, FIFA. Fixes query expansion failures like UFC → 'united fighting club'.	2 个月前
qmd_expansion_v3_structured.jsonl	finetune: strict Pydantic schema, one canonical data format Replace ad-hoc JSON parsing with a strict Pydantic model (TrainingExample with typed OutputPair). All data loading goes through load_examples() which fails loudly on invalid data. - Convert v3_structured.jsonl from "searches" to "output" format - Rewrite all consumer scripts (prepare, validate, score, analyze) to load through the Pydantic schema - Prepared train/val files are ephemeral build artifacts - Restore LFM2 and GEPA experiments under experiments/ - Add pydantic>=2.0 to dependencies	2 个月前
qmd_only_sampled.jsonl	lots of training stuff	3 个月前