MindSpeed-LLM/examples/mcore/qwen15/data_convert_qwen15_instruction.sh-代码预览-MindSpeed-LLM:基于昇腾生态的大语言模型分布式训练套件 - AtomGit

Ii-robot!1759 新增Qwen2-7B/Qwen1.5-4B mcore全参微调脚本

de4d0664创建于 2024年10月26日历史提交

# 请按照您的真实环境修改 set_env.sh 路径
source /usr/local/Ascend/ascend-toolkit/set_env.sh
mkdir ./finetune_dataset

python ./preprocess_data.py \
   --input ./dataset/train-00000-of-00001-a09b74b3ef9c3b56.parquet \
   --tokenizer-name-or-path ./model_from_hf/qwen15_hf/ \
   --output-prefix ./finetune_dataset/alpaca \
   --handler-name AlpacaStyleInstructionHandler \
   --tokenizer-type PretrainedFromHF \
   --workers 4 \
   --log-interval 1000  \
   --prompt-type qwen
   # --map-keys '{"prompt":"instruction","query":"input","response":"output"}' # 默认值，可不传