GRPO (Experimental)
This folder contains the experimental GRPO training path for query expansion. It is not part of the default production pipeline.
Files
grpo.yaml– experimental GRPO hyperparametersgrpo.py– standalone GRPO training script
Run
# Recommended default: run from repo root
cd /home/tobi/qmd
uv run finetune/experiments/grpo/grpo.py
# Or use unified entrypoint (deprecated in main pipeline):
uv run train.py grpo --config finetune/experiments/grpo/grpo.yaml
Notes
- Current mainline focuses on SFT-only quality and benchmarks.
- Keep this workflow isolated unless you are explicitly experimenting with reinforcement-learning refinement.