文件最后提交记录最后更新时间
!3232 [pytorh][refactor]refactor tp-2d Merge pull request !3232 from jwhk/master 8 个月前
feat(pytorch): support deepseekv4_flash in mcore backend Co-authored-by: dingzicha1997<dingzilin@huawei.com> # message auto-generated for no-merge-commit merge: !4420 merge geneva2 into master feat(pytorch): support deepseekv4_flash in mcore backend Created-by: dingzicha1997 Commit-by: dingzicha1997 Merged-by: ascend-robot Description: ## What this PR does / why we need it? Please describe the background and detailed changes of the PR. If it is a bugfix, please attach the related issue. ## Does this PR introduce any user-facing change? Please describe whether the PR will result in any user-facing usage changes. If there is related documentation, please specify its path. ## How was this patch tested? Please explain how to verify the correctness and effectiveness of this feature, as well as its usage constraints and limitations. See merge request: Ascend/MindSpeed-LLM!44201 个月前
!1998 rename: repo package name from modellink to mindspeed_llm Merge pull request !1998 from MeiFei/master-package-rename 1 年前
[pytorch][bugfix] baichaun2 no-fa-adapt Co-authored-by: jzh6229<jiangzhihui4@huawei.com> 7 个月前
feat(pytorch): support deepseekv4_flash in mcore backend Co-authored-by: dingzicha1997<dingzilin@huawei.com> # message auto-generated for no-merge-commit merge: !4420 merge geneva2 into master feat(pytorch): support deepseekv4_flash in mcore backend Created-by: dingzicha1997 Commit-by: dingzicha1997 Merged-by: ascend-robot Description: ## What this PR does / why we need it? Please describe the background and detailed changes of the PR. If it is a bugfix, please attach the related issue. ## Does this PR introduce any user-facing change? Please describe whether the PR will result in any user-facing usage changes. If there is related documentation, please specify its path. ## How was this patch tested? Please explain how to verify the correctness and effectiveness of this feature, as well as its usage constraints and limitations. See merge request: Ascend/MindSpeed-LLM!44201 个月前
fix(torch): add recomputation of actual_seq_len for TND Co-authored-by: yanzhixiao<yanzhixiao@h-partners.com> # message auto-generated for no-merge-commit merge: !4285 merge bugfix-tnd into master fix(torch): add recomputation of actual_seq_len for TND Created-by: yanzhixiao23 Commit-by: yanzhixiao Merged-by: ascend-robot Description: # Pull Request 模板 ---- ## What this PR does / why we need it? Add recomputation of actual_seq_len for tuning ## Does this PR introduce any user-facing change? NA ## How was this patch tested? Known bug fixed See merge request: Ascend/MindSpeed-LLM!42852 个月前
[pytorch][feature]PLM-1.8B pretrain/sft Co-authored-by: EVA1<jingsiyu1@huawei.com> # message auto-generated for no-merge-commit merge: !3637 merge master into master [pytorch][feature]PLM-1.8B pretrain/sft Created-by: EVA1 Commit-by: EVA1 Merged-by: ascend-robot Description: 1.PLM-1.8B 模型支持:数据集格式转换、权重转换、微调、预训练; 2.精度已对齐,sft相对误差小于千分之一。 See merge request: Ascend/MindSpeed-LLM!36376 个月前
refactor(pytorch): update deepseek4 shell Co-authored-by: dingzicha1997<dingzilin@huawei.com> # message auto-generated for no-merge-commit merge: !4423 merge master into master refactor(pytorch): update deepseek4 shell Created-by: dingzicha1997 Commit-by: dingzicha1997 Merged-by: ascend-robot Description: ## What this PR does / why we need it? Please describe the background and detailed changes of the PR. If it is a bugfix, please attach the related issue. ## Does this PR introduce any user-facing change? Please describe whether the PR will result in any user-facing usage changes. If there is related documentation, please specify its path. ## How was this patch tested? Please explain how to verify the correctness and effectiveness of this feature, as well as its usage constraints and limitations. See merge request: Ascend/MindSpeed-LLM!44231 个月前
feat(pytorch): support deepseekv4_flash in mcore backend Co-authored-by: dingzicha1997<dingzilin@huawei.com> # message auto-generated for no-merge-commit merge: !4420 merge geneva2 into master feat(pytorch): support deepseekv4_flash in mcore backend Created-by: dingzicha1997 Commit-by: dingzicha1997 Merged-by: ascend-robot Description: ## What this PR does / why we need it? Please describe the background and detailed changes of the PR. If it is a bugfix, please attach the related issue. ## Does this PR introduce any user-facing change? Please describe whether the PR will result in any user-facing usage changes. If there is related documentation, please specify its path. ## How was this patch tested? Please explain how to verify the correctness and effectiveness of this feature, as well as its usage constraints and limitations. See merge request: Ascend/MindSpeed-LLM!44201 个月前
[pytorch][model]longcat model fix Co-authored-by: guihaowen666<guihaowen@huawei.com> # message auto-generated for no-merge-commit merge: !4251 merge br_master_longcat_fix into master [pytorch][model]longcat model fix Created-by: guihaowen666 Commit-by: guihaowen666 Merged-by: ascend-robot Description: longcat model fix See merge request: Ascend/MindSpeed-LLM!42513 个月前
refactor(pytorch): update deepseek4 shell Co-authored-by: dingzicha1997<dingzilin@huawei.com> # message auto-generated for no-merge-commit merge: !4423 merge master into master refactor(pytorch): update deepseek4 shell Created-by: dingzicha1997 Commit-by: dingzicha1997 Merged-by: ascend-robot Description: ## What this PR does / why we need it? Please describe the background and detailed changes of the PR. If it is a bugfix, please attach the related issue. ## Does this PR introduce any user-facing change? Please describe whether the PR will result in any user-facing usage changes. If there is related documentation, please specify its path. ## How was this patch tested? Please explain how to verify the correctness and effectiveness of this feature, as well as its usage constraints and limitations. See merge request: Ascend/MindSpeed-LLM!44231 个月前