MindSpeed-LLM/examples/mcore · Ascend/MindSpeed-LLM - AtomGit

ascend-robotfeat(pytorch): feat add deepseek-v4-flash sft train script

文件	最后提交记录	最后更新时间
deepseek2_lite	fix(pytorch):add ckpt-format argument to scripts Co-authored-by: z__y<z4t155664@163.com> # message auto-generated for no-merge-commit merge: !4371 merge add_ckpt_torch_dist_argument_for_shells into master fix(pytorch):add ckpt-format argument to scripts Created-by: z__y Commit-by: z__y Merged-by: ascend-robot Description: ## What this PR does / why we need it? This PR explicitly adds ckpt-format torch to all repository scripts to support the asynchronous checkpoint saving feature. ## Does this PR introduce any user-facing change? No. This change only adjusts internal script parameters to maintain existing behavior. There are no user-facing API or usage changes. ## How was this patch tested? Tests confirm that asynchronous checkpoint saving works correctly and that the original torch format checkpoint behavior is preserved. See merge request: Ascend/MindSpeed-LLM!4371	1 个月前
deepseek3	docs: fix md Co-authored-by: cjy840282<chenjingyi9@huawei.com> # message auto-generated for no-merge-commit merge: !4375 merge docs into master docs: fix md Created-by: cjy840282 Commit-by: cjy840282 Merged-by: ascend-robot Description: ## What this PR does / why we need it? fix documents. ## Does this PR introduce any user-facing change? Improve readability. ## How was this patch tested? AIDD scan. See merge request: Ascend/MindSpeed-LLM!4375	1 个月前
deepseek32	fix(pytorch):add ckpt-format argument to scripts Co-authored-by: z__y<z4t155664@163.com> # message auto-generated for no-merge-commit merge: !4371 merge add_ckpt_torch_dist_argument_for_shells into master fix(pytorch):add ckpt-format argument to scripts Created-by: z__y Commit-by: z__y Merged-by: ascend-robot Description: ## What this PR does / why we need it? This PR explicitly adds ckpt-format torch to all repository scripts to support the asynchronous checkpoint saving feature. ## Does this PR introduce any user-facing change? No. This change only adjusts internal script parameters to maintain existing behavior. There are no user-facing API or usage changes. ## How was this patch tested? Tests confirm that asynchronous checkpoint saving works correctly and that the original torch format checkpoint behavior is preserved. See merge request: Ascend/MindSpeed-LLM!4371	1 个月前
deepseek4_flash	feat(pytorch): feat add deepseek-v4-flash sft train script Co-authored-by: tcund89<tcund@126.com> # message auto-generated for no-merge-commit merge: !4478 merge feature-dsv4-flash-sft into master feat(pytorch): feat add deepseek-v4-flash sft train script Created-by: tcund89 Commit-by: tcund89 Merged-by: ascend-robot Description: ## What this PR does / why we need it? add deepseek-v4-flash sft train script ## Does this PR introduce any user-facing change? Users can use this script to complete SFT training on DeepSeeker-V4-Flash custom data ## How was this patch tested? I have completed SFT training on 8 A3 servers. The training parameters have been configured in the script Dataset: tatsu-lab/alpaca Loss: ![mindspeed_loss.png](https://raw.gitcode.com/user-images/assets/7623105/09c44605-5fcc-4698-aef8-2944096f84b8/mindspeed_loss.png 'mindspeed_loss.png') GradNorm: ![mindspeed_grad_norm.png](https://raw.gitcode.com/user-images/assets/7623105/180e6923-dc71-4f17-994d-3df77072cb8a/mindspeed_grad_norm.png 'mindspeed_grad_norm.png') See merge request: Ascend/MindSpeed-LLM!4478	8 天前
gemma2	fix(pytorch):add ckpt-format argument to scripts Co-authored-by: z__y<z4t155664@163.com> # message auto-generated for no-merge-commit merge: !4371 merge add_ckpt_torch_dist_argument_for_shells into master fix(pytorch):add ckpt-format argument to scripts Created-by: z__y Commit-by: z__y Merged-by: ascend-robot Description: ## What this PR does / why we need it? This PR explicitly adds ckpt-format torch to all repository scripts to support the asynchronous checkpoint saving feature. ## Does this PR introduce any user-facing change? No. This change only adjusts internal script parameters to maintain existing behavior. There are no user-facing API or usage changes. ## How was this patch tested? Tests confirm that asynchronous checkpoint saving works correctly and that the original torch format checkpoint behavior is preserved. See merge request: Ascend/MindSpeed-LLM!4371	1 个月前
glm45-air	feat(torch): add GLM-4.5 scripts Co-authored-by: cjy840282<chenjingyi9@huawei.com> # message auto-generated for no-merge-commit merge: !4369 merge GLM-4.5-new into master feat(torch): add GLM-4.5 scripts Created-by: cjy840282 Commit-by: cjy840282 Merged-by: ascend-robot Description: ## What this PR does / why we need it? add GLM-4.5 scripts. ## Does this PR introduce any user-facing change? support GLM-4.5 lora finetune. ## How was this patch tested? vllm inference is normal. See merge request: Ascend/MindSpeed-LLM!4369	1 个月前
glm45	fix(pytorch): GLM-4.5 support enable-hf2mg-convert Co-authored-by: cjy840282<chenjingyi9@huawei.com> # message auto-generated for no-merge-commit merge: !4381 merge glm into master fix(pytorch): GLM-4.5 support enable-hf2mg-convert Created-by: cjy840282 Commit-by: cjy840282 Merged-by: ascend-robot Description: ## What this PR does / why we need it? GLM-4.5 support enable-hf2mg-convert. ## Does this PR introduce any user-facing change? Simplify fine-tuning process. ## How was this patch tested? hf2mg-convert without errors. See merge request: Ascend/MindSpeed-LLM!4381	1 个月前
glm5	feat(pytorch): add GLM5 poc script Co-authored-by: HanhuiChen<chenhanhui1@h-partners.com> # message auto-generated for no-merge-commit merge: !4457 merge glm into master feat(pytorch): add GLM5 poc script Created-by: HANHU1CHEN Commit-by: HanhuiChen Merged-by: ascend-robot Description: ## What this PR does / why we need it? Add GLM-5 script for POC. ## Does this PR introduce any user-facing change? No. ## How was this patch tested? We have tested this scipt with full-parameter model in 32 nodes。 See merge request: Ascend/MindSpeed-LLM!4457	21 天前
internlm3	fix(pytorch):add ckpt-format argument to scripts Co-authored-by: z__y<z4t155664@163.com> # message auto-generated for no-merge-commit merge: !4371 merge add_ckpt_torch_dist_argument_for_shells into master fix(pytorch):add ckpt-format argument to scripts Created-by: z__y Commit-by: z__y Merged-by: ascend-robot Description: ## What this PR does / why we need it? This PR explicitly adds ckpt-format torch to all repository scripts to support the asynchronous checkpoint saving feature. ## Does this PR introduce any user-facing change? No. This change only adjusts internal script parameters to maintain existing behavior. There are no user-facing API or usage changes. ## How was this patch tested? Tests confirm that asynchronous checkpoint saving works correctly and that the original torch format checkpoint behavior is preserved. See merge request: Ascend/MindSpeed-LLM!4371	1 个月前
kimi2	feat: Added a high-performance script for POC testing of Kimi2 Co-authored-by: downtiser<wangchenyang52@huawei.com> # message auto-generated for no-merge-commit merge: !4485 merge master into master feat: Added a high-performance script for POC testing of Kimi2 Created-by: downtiser Commit-by: downtiser Merged-by: ascend-robot Description: ## What this PR does / why we need it? Added a high-performance script for POC testing of Kimi2 ## Does this PR introduce any user-facing change? Please describe whether the PR will result in any user-facing usage changes. If there is related documentation, please specify its path. ## How was this patch tested? Please explain how to verify the correctness and effectiveness of this feature, as well as its usage constraints and limitations. See merge request: Ascend/MindSpeed-LLM!4485	13 天前
ling_v2	fix(pytorch): ckpt convert V2 bug fix Co-authored-by: cjy840282<chenjingyi9@huawei.com> # message auto-generated for no-merge-commit merge: !4463 merge ckpt_fix into master fix(pytorch): ckpt convert V2 bug fix Created-by: cjy840282 Commit-by: cjy840282 Merged-by: ascend-robot Description: ## What this PR does / why we need it? Ckpt convert V2 bug fix. Delete args: --spec/--use-mcore-models/--tokenizer-model/--params-dtype ## Does this PR introduce any user-facing change? Ckpt convert V2 runs without errors. ## How was this patch tested? Validated in the test environment. See merge request: Ascend/MindSpeed-LLM!4463	19 天前
llama2	fix(megatron): PT_Mcore_Llama2_70B_Qlora_8P_Perf_BF16 performance Co-authored-by: cjy840282<chenjingyi9@huawei.com> # message auto-generated for no-merge-commit merge: !4472 merge CI into master fix(megatron): PT_Mcore_Llama2_70B_Qlora_8P_Perf_BF16 performance Created-by: cjy840282 Commit-by: cjy840282 Merged-by: ascend-robot Description: ## What this PR does / why we need it? Test case PT_Mcore_Llama2_70B_Qlora_8P_Perf_BF16 performance degradation. ## Does this PR introduce any user-facing change? Nightly-CI. ## How was this patch tested? Performance compliance. See merge request: Ascend/MindSpeed-LLM!4472	18 天前
llama31	fix(pytorch):add ckpt-format argument to scripts Co-authored-by: z__y<z4t155664@163.com> # message auto-generated for no-merge-commit merge: !4371 merge add_ckpt_torch_dist_argument_for_shells into master fix(pytorch):add ckpt-format argument to scripts Created-by: z__y Commit-by: z__y Merged-by: ascend-robot Description: ## What this PR does / why we need it? This PR explicitly adds ckpt-format torch to all repository scripts to support the asynchronous checkpoint saving feature. ## Does this PR introduce any user-facing change? No. This change only adjusts internal script parameters to maintain existing behavior. There are no user-facing API or usage changes. ## How was this patch tested? Tests confirm that asynchronous checkpoint saving works correctly and that the original torch format checkpoint behavior is preserved. See merge request: Ascend/MindSpeed-LLM!4371	1 个月前
longcat	fix(pytorch):add ckpt-format argument to scripts Co-authored-by: z__y<z4t155664@163.com> # message auto-generated for no-merge-commit merge: !4371 merge add_ckpt_torch_dist_argument_for_shells into master fix(pytorch):add ckpt-format argument to scripts Created-by: z__y Commit-by: z__y Merged-by: ascend-robot Description: ## What this PR does / why we need it? This PR explicitly adds ckpt-format torch to all repository scripts to support the asynchronous checkpoint saving feature. ## Does this PR introduce any user-facing change? No. This change only adjusts internal script parameters to maintain existing behavior. There are no user-facing API or usage changes. ## How was this patch tested? Tests confirm that asynchronous checkpoint saving works correctly and that the original torch format checkpoint behavior is preserved. See merge request: Ascend/MindSpeed-LLM!4371	1 个月前
magistral	fix(pytorch): ckpt convert V2 bug fix Co-authored-by: cjy840282<chenjingyi9@huawei.com> # message auto-generated for no-merge-commit merge: !4463 merge ckpt_fix into master fix(pytorch): ckpt convert V2 bug fix Created-by: cjy840282 Commit-by: cjy840282 Merged-by: ascend-robot Description: ## What this PR does / why we need it? Ckpt convert V2 bug fix. Delete args: --spec/--use-mcore-models/--tokenizer-model/--params-dtype ## Does this PR introduce any user-facing change? Ckpt convert V2 runs without errors. ## How was this patch tested? Validated in the test environment. See merge request: Ascend/MindSpeed-LLM!4463	19 天前
mamba2	fix(pytorch):add ckpt-format argument to scripts Co-authored-by: z__y<z4t155664@163.com> # message auto-generated for no-merge-commit merge: !4371 merge add_ckpt_torch_dist_argument_for_shells into master fix(pytorch):add ckpt-format argument to scripts Created-by: z__y Commit-by: z__y Merged-by: ascend-robot Description: ## What this PR does / why we need it? This PR explicitly adds ckpt-format torch to all repository scripts to support the asynchronous checkpoint saving feature. ## Does this PR introduce any user-facing change? No. This change only adjusts internal script parameters to maintain existing behavior. There are no user-facing API or usage changes. ## How was this patch tested? Tests confirm that asynchronous checkpoint saving works correctly and that the original torch format checkpoint behavior is preserved. See merge request: Ascend/MindSpeed-LLM!4371	1 个月前
phi35	fix(pytorch): ckpt convert V2 bug fix Co-authored-by: cjy840282<chenjingyi9@huawei.com> # message auto-generated for no-merge-commit merge: !4463 merge ckpt_fix into master fix(pytorch): ckpt convert V2 bug fix Created-by: cjy840282 Commit-by: cjy840282 Merged-by: ascend-robot Description: ## What this PR does / why we need it? Ckpt convert V2 bug fix. Delete args: --spec/--use-mcore-models/--tokenizer-model/--params-dtype ## Does this PR introduce any user-facing change? Ckpt convert V2 runs without errors. ## How was this patch tested? Validated in the test environment. See merge request: Ascend/MindSpeed-LLM!4463	19 天前
plm	fix(pytorch):add ckpt-format argument to scripts Co-authored-by: z__y<z4t155664@163.com> # message auto-generated for no-merge-commit merge: !4371 merge add_ckpt_torch_dist_argument_for_shells into master fix(pytorch):add ckpt-format argument to scripts Created-by: z__y Commit-by: z__y Merged-by: ascend-robot Description: ## What this PR does / why we need it? This PR explicitly adds ckpt-format torch to all repository scripts to support the asynchronous checkpoint saving feature. ## Does this PR introduce any user-facing change? No. This change only adjusts internal script parameters to maintain existing behavior. There are no user-facing API or usage changes. ## How was this patch tested? Tests confirm that asynchronous checkpoint saving works correctly and that the original torch format checkpoint behavior is preserved. See merge request: Ascend/MindSpeed-LLM!4371	1 个月前
qwen25	fix(pytorch):add ckpt-format argument to scripts Co-authored-by: z__y<z4t155664@163.com> # message auto-generated for no-merge-commit merge: !4371 merge add_ckpt_torch_dist_argument_for_shells into master fix(pytorch):add ckpt-format argument to scripts Created-by: z__y Commit-by: z__y Merged-by: ascend-robot Description: ## What this PR does / why we need it? This PR explicitly adds ckpt-format torch to all repository scripts to support the asynchronous checkpoint saving feature. ## Does this PR introduce any user-facing change? No. This change only adjusts internal script parameters to maintain existing behavior. There are no user-facing API or usage changes. ## How was this patch tested? Tests confirm that asynchronous checkpoint saving works correctly and that the original torch format checkpoint behavior is preserved. See merge request: Ascend/MindSpeed-LLM!4371	1 个月前
qwen3	fix(pytorch):add ckpt-format argument to scripts Co-authored-by: z__y<z4t155664@163.com> # message auto-generated for no-merge-commit merge: !4371 merge add_ckpt_torch_dist_argument_for_shells into master fix(pytorch):add ckpt-format argument to scripts Created-by: z__y Commit-by: z__y Merged-by: ascend-robot Description: ## What this PR does / why we need it? This PR explicitly adds ckpt-format torch to all repository scripts to support the asynchronous checkpoint saving feature. ## Does this PR introduce any user-facing change? No. This change only adjusts internal script parameters to maintain existing behavior. There are no user-facing API or usage changes. ## How was this patch tested? Tests confirm that asynchronous checkpoint saving works correctly and that the original torch format checkpoint behavior is preserved. See merge request: Ascend/MindSpeed-LLM!4371	1 个月前
qwen3_coder_next	fix(pytorch):add ckpt-format argument to scripts Co-authored-by: z__y<z4t155664@163.com> # message auto-generated for no-merge-commit merge: !4371 merge add_ckpt_torch_dist_argument_for_shells into master fix(pytorch):add ckpt-format argument to scripts Created-by: z__y Commit-by: z__y Merged-by: ascend-robot Description: ## What this PR does / why we need it? This PR explicitly adds ckpt-format torch to all repository scripts to support the asynchronous checkpoint saving feature. ## Does this PR introduce any user-facing change? No. This change only adjusts internal script parameters to maintain existing behavior. There are no user-facing API or usage changes. ## How was this patch tested? Tests confirm that asynchronous checkpoint saving works correctly and that the original torch format checkpoint behavior is preserved. See merge request: Ascend/MindSpeed-LLM!4371	1 个月前
qwen3_moe	fix(python): fix the performance fluctuations of qwen3-8b Co-authored-by: yanzhixiao<yanzhixiao@h-partners.com> # message auto-generated for no-merge-commit merge: !4395 merge update-qwen3-sh into master fix(python): fix the performance fluctuations of qwen3-8b Created-by: yanzhixiao23 Commit-by: yanzhixiao Merged-by: ascend-robot Description: ## What this PR does / why we need it? fix the performance fluctuations of qwen3-8b. ## Does this PR introduce any user-facing change? NA. ## How was this patch tested? The bug is fixed. See merge request: Ascend/MindSpeed-LLM!4395	1 个月前
qwen3_next	fix(pytorch):add ckpt-format argument to scripts Co-authored-by: z__y<z4t155664@163.com> # message auto-generated for no-merge-commit merge: !4371 merge add_ckpt_torch_dist_argument_for_shells into master fix(pytorch):add ckpt-format argument to scripts Created-by: z__y Commit-by: z__y Merged-by: ascend-robot Description: ## What this PR does / why we need it? This PR explicitly adds ckpt-format torch to all repository scripts to support the asynchronous checkpoint saving feature. ## Does this PR introduce any user-facing change? No. This change only adjusts internal script parameters to maintain existing behavior. There are no user-facing API or usage changes. ## How was this patch tested? Tests confirm that asynchronous checkpoint saving works correctly and that the original torch format checkpoint behavior is preserved. See merge request: Ascend/MindSpeed-LLM!4371	1 个月前
seed_oss	fix: modify the seed-oss fine-tuning script Co-authored-by: EVA1<jingsiyu1@huawei.com> # message auto-generated for no-merge-commit merge: !4462 merge master into master fix: modify the seed-oss fine-tuning script Created-by: EVA1 Commit-by: EVA1 Merged-by: ascend-robot Description: Modify the seed-oss fine-tuning script to add weight conversion parameters and dataset conversion parameters. See merge request: Ascend/MindSpeed-LLM!4462	20 天前