文件最后提交记录最后更新时间
!1998 rename: repo package name from modellink to mindspeed_llm Merge pull request !1998 from MeiFei/master-package-rename 1 年前
!2712 [pytorch][feature]upgrading Megatron to r0.12.1 Merge pull request !2712 from yanzhixiao/llm-0.12.0-0526 11 个月前
[pytorch][ckpt]add save checkpoint lazy in ckpt Co-authored-by: qyzqyz<quyueze@h-partners.com> # message auto-generated for no-merge-commit merge: !3995 merge master into master [pytorch][ckpt]add save checkpoint lazy in ckpt Created-by: qyzqyz Commit-by: qyzqyz Merged-by: ascend-robot Description: add save checkpoint lazy in ckpt See merge request: Ascend/MindSpeed-LLM!39954 个月前
feat(pytorch): add DeepSeek4 fine-tuning template Co-authored-by: HanhuiChen<chenhanhui1@h-partners.com> # message auto-generated for no-merge-commit merge: !4436 merge dsv4 into master feat(pytorch): add DeepSeek4 fine-tuning template Created-by: HANHU1CHEN Commit-by: HanhuiChen Merged-by: ascend-robot Description: ## What this PR does / why we need it? Adds a fine-tuning template for the DeepSeek4 model series to support its specific prompt format, including thinking mode, tool calling (DSML format), and reasoning effort control. ## Does this PR introduce any user-facing change? Yes — users can now select --prompt-type deepseek4 to fine-tune DeepSeek4 models. Two new behaviors are also exposed: - --enable-thinking controls thinking vs chat mode - --reasoning-effort {max,high} inserts a max-effort instruction prefix; only valid when thinking is enabled - --drop-thinking controls whether reasoning content is kept in each turn ## How was this patch tested? Tested with byte-level alignment against the official encoding_dsv4 script. See merge request: Ascend/MindSpeed-LLM!443625 天前