| 文件 | 最后提交记录 | 最后更新时间 |
|---|---|---|
| 3 个月前 | ||
| 3 个月前 | ||
| 3 个月前 | ||
| 3 个月前 |
Multi-Task Evaluation Example
Configuring multi_task.yaml
eval.defaultsdefines inference parameters shared by every dataset entry. Override them inside an individual dataset block if needed.eval.datasetsenumerates the datasets to evaluate. Each entry should specify:name: a short identifier that appears in logs and dashboards.path: the path to the dataset JSONL file.rm_type: which reward function to use for scoring.n_samples_per_eval_prompt: how many candidate completions to generate per prompt.
IFBench Notes
- When
ifbenchis used,slime/rollout/rm_hub/ifbench.pywill automatically prepares the scoring environment, so no additional manual setup is required beyond providing the dataset path.