MindSpeed-LLM/mindspeed_llm/training/tokenizer · Ascend/MindSpeed-LLM - AtomGit

ascend-robotfeat(pytorch): add DeepSeek4 fine-tuning template

文件	最后提交记录	最后更新时间
__init__.py	!2470 [core-llm][dskv3]mtp loss scaler and fix expert bias dtype Merge pull request !2470 from shengjy/mtp_loss_scaler	1 年前
magistral_tokenizer.py	[pytorch][feature]magistral-small pretrain/sft Co-authored-by: EVA<jingsiyu1@huawei.com> # message auto-generated for no-merge-commit merge: !3468 merge pr_3392 into master [pytorch][feature]magistral-small pretrain/sft Created-by: EVA1 Commit-by: EVA1;EVA Merged-by: ascend-robot Description: 1.magistral-small type tokenizer 支持； 2.magistral-small-2506 模型支持：数据集格式转换、权重转换、微调、预训练； 3.精度已对齐，sft相对误差小于千分之一。 See merge request: Ascend/MindSpeed-LLM!3468	6 个月前
tokenizer.py	feat(pytorch): add DeepSeek4 fine-tuning template Co-authored-by: HanhuiChen<chenhanhui1@h-partners.com> # message auto-generated for no-merge-commit merge: !4436 merge dsv4 into master feat(pytorch): add DeepSeek4 fine-tuning template Created-by: HANHU1CHEN Commit-by: HanhuiChen Merged-by: ascend-robot Description: ## What this PR does / why we need it? Adds a fine-tuning template for the DeepSeek4 model series to support its specific prompt format, including thinking mode, tool calling (DSML format), and reasoning effort control. ## Does this PR introduce any user-facing change? Yes — users can now select --prompt-type deepseek4 to fine-tune DeepSeek4 models. Two new behaviors are also exposed: - `--enable-thinking` controls thinking vs chat mode - `--reasoning-effort {max,high}` inserts a max-effort instruction prefix; only valid when thinking is enabled - `--drop-thinking` controls whether reasoning content is kept in each turn ## How was this patch tested? Tested with byte-level alignment against the official encoding_dsv4 script. See merge request: Ascend/MindSpeed-LLM!4436	25 天前