MindSpeed-RL/configs/datasets/orca_rlhf.yaml-代码预览-MindSpeed-RL:基于昇腾生态的强化学习加速框架项目 - AtomGit

Ii-robot!383 [pytorch][feature]Add DPO Algorithm to MindSpeed-RL

95aa302f创建于 2025年7月11日历史提交

input: ./dataset/orca_rlhf.jsonl
tokenizer_name_or_path: ./model_from_hf/Qwen3
output_prefix: ./dataset/dpo
handler_name: AlpacaStylePairwiseHandler
tokenizer_type: HuggingFaceTokenizer
workers: 12
log_interval: 1000
prompt_type: qwen3
seq_length: 4096
map_keys: {"prompt":"question", "query":"", "system":"system"}
enable_thinking: true