文件最后提交记录最后更新时间
chore(fsdp2): develop longcat-flash-lite model in fsdp2 Co-authored-by: guihaowen666<guihaowen@huawei.com> # message auto-generated for no-merge-commit merge: !4344 merge br_master_longcat_flash_lite_fsdp2 into master chore(fsdp2): develop longcat-flash-lite model in fsdp2 Created-by: guihaowen666 Commit-by: guihaowen666 Merged-by: ascend-robot Description: ## What this PR does / why we need it? develop longcat-flash-lite model in fsdp2 ## Does this PR introduce any user-facing change? new model development, no user-facing change ## How was this patch tested? Run the inference task and check whether the model can perform normal dialogs. See merge request: Ascend/MindSpeed-LLM!43443 小时前
feat(pytorch): feat add deepseek-v4-flash sft train script Co-authored-by: tcund89<tcund@126.com> # message auto-generated for no-merge-commit merge: !4478 merge feature-dsv4-flash-sft into master feat(pytorch): feat add deepseek-v4-flash sft train script Created-by: tcund89 Commit-by: tcund89 Merged-by: ascend-robot Description: ## What this PR does / why we need it? add deepseek-v4-flash sft train script ## Does this PR introduce any user-facing change? Users can use this script to complete SFT training on DeepSeeker-V4-Flash custom data ## How was this patch tested? I have completed SFT training on 8 A3 servers. The training parameters have been configured in the script Dataset: tatsu-lab/alpaca Loss: ![mindspeed_loss.png](https://raw.gitcode.com/user-images/assets/7623105/09c44605-5fcc-4698-aef8-2944096f84b8/mindspeed_loss.png 'mindspeed_loss.png') GradNorm: ![mindspeed_grad_norm.png](https://raw.gitcode.com/user-images/assets/7623105/180e6923-dc71-4f17-994d-3df77072cb8a/mindspeed_grad_norm.png 'mindspeed_grad_norm.png') See merge request: Ascend/MindSpeed-LLM!44788 天前
docs(pytorch): doc fix error Co-authored-by: LQ1206<liuqian164@h-partners.com> # message auto-generated for no-merge-commit merge: !4402 merge master into master docs(pytorch): doc fix error Created-by: LQ1206 Commit-by: LQ1206 Merged-by: ascend-robot Description: ## What this PR does / why we need it? Please describe the background and detailed changes of the PR. If it is a bugfix, please attach the related issue. ## Does this PR introduce any user-facing change? Please describe whether the PR will result in any user-facing usage changes. If there is related documentation, please specify its path. ## How was this patch tested? Please explain how to verify the correctness and effectiveness of this feature, as well as its usage constraints and limitations. See merge request: Ascend/MindSpeed-LLM!44021 个月前
fix(pytorch):add ckpt-format argument to scripts Co-authored-by: z__y<z4t155664@163.com> # message auto-generated for no-merge-commit merge: !4371 merge add_ckpt_torch_dist_argument_for_shells into master fix(pytorch):add ckpt-format argument to scripts Created-by: z__y Commit-by: z__y Merged-by: ascend-robot Description: ## What this PR does / why we need it? This PR explicitly adds ckpt-format torch to all repository scripts to support the asynchronous checkpoint saving feature. ## Does this PR introduce any user-facing change? No. This change only adjusts internal script parameters to maintain existing behavior. There are no user-facing API or usage changes. ## How was this patch tested? Tests confirm that asynchronous checkpoint saving works correctly and that the original torch format checkpoint behavior is preserved. See merge request: Ascend/MindSpeed-LLM!43711 个月前