MindSpeed-LLM/mindspeed_llm/core/transformer · Ascend/MindSpeed-LLM - AtomGit

ascend-robotrefactor(pytorch): update deepseek4 shell

文件	最后提交记录	最后更新时间
custom_layers	!3232 [pytorh][refactor]refactor tp-2d Merge pull request !3232 from jwhk/master	8 个月前
moe	feat(pytorch): support deepseekv4_flash in mcore backend Co-authored-by: dingzicha1997<dingzilin@huawei.com> # message auto-generated for no-merge-commit merge: !4420 merge geneva2 into master feat(pytorch): support deepseekv4_flash in mcore backend Created-by: dingzicha1997 Commit-by: dingzicha1997 Merged-by: ascend-robot Description: ## What this PR does / why we need it? Please describe the background and detailed changes of the PR. If it is a bugfix, please attach the related issue. ## Does this PR introduce any user-facing change? Please describe whether the PR will result in any user-facing usage changes. If there is related documentation, please specify its path. ## How was this patch tested? Please explain how to verify the correctness and effectiveness of this feature, as well as its usage constraints and limitations. See merge request: Ascend/MindSpeed-LLM!4420	1 个月前
__init__.py	!1998 rename: repo package name from modellink to mindspeed_llm Merge pull request !1998 from MeiFei/master-package-rename	1 年前
alibi_attention.py	[pytorch][bugfix] baichaun2 no-fa-adapt Co-authored-by: jzh6229<jiangzhihui4@huawei.com>	7 个月前
attention.py	feat(pytorch): support deepseekv4_flash in mcore backend Co-authored-by: dingzicha1997<dingzilin@huawei.com> # message auto-generated for no-merge-commit merge: !4420 merge geneva2 into master feat(pytorch): support deepseekv4_flash in mcore backend Created-by: dingzicha1997 Commit-by: dingzicha1997 Merged-by: ascend-robot Description: ## What this PR does / why we need it? Please describe the background and detailed changes of the PR. If it is a bugfix, please attach the related issue. ## Does this PR introduce any user-facing change? Please describe whether the PR will result in any user-facing usage changes. If there is related documentation, please specify its path. ## How was this patch tested? Please explain how to verify the correctness and effectiveness of this feature, as well as its usage constraints and limitations. See merge request: Ascend/MindSpeed-LLM!4420	1 个月前
custom_dot_product_attention.py	fix(torch): add recomputation of actual_seq_len for TND Co-authored-by: yanzhixiao<yanzhixiao@h-partners.com> # message auto-generated for no-merge-commit merge: !4285 merge bugfix-tnd into master fix(torch): add recomputation of actual_seq_len for TND Created-by: yanzhixiao23 Commit-by: yanzhixiao Merged-by: ascend-robot Description: # Pull Request 模板 ---- ## What this PR does / why we need it? Add recomputation of actual_seq_len for tuning ## Does this PR introduce any user-facing change? NA ## How was this patch tested? Known bug fixed See merge request: Ascend/MindSpeed-LLM!4285	2 个月前
mlp.py	[pytorch][feature]PLM-1.8B pretrain/sft Co-authored-by: EVA1<jingsiyu1@huawei.com> # message auto-generated for no-merge-commit merge: !3637 merge master into master [pytorch][feature]PLM-1.8B pretrain/sft Created-by: EVA1 Commit-by: EVA1 Merged-by: ascend-robot Description: 1.PLM-1.8B 模型支持：数据集格式转换、权重转换、微调、预训练； 2.精度已对齐，sft相对误差小于千分之一。 See merge request: Ascend/MindSpeed-LLM!3637	6 个月前
multi_token_prediction.py	refactor(pytorch): update deepseek4 shell Co-authored-by: dingzicha1997<dingzilin@huawei.com> # message auto-generated for no-merge-commit merge: !4423 merge master into master refactor(pytorch): update deepseek4 shell Created-by: dingzicha1997 Commit-by: dingzicha1997 Merged-by: ascend-robot Description: ## What this PR does / why we need it? Please describe the background and detailed changes of the PR. If it is a bugfix, please attach the related issue. ## Does this PR introduce any user-facing change? Please describe whether the PR will result in any user-facing usage changes. If there is related documentation, please specify its path. ## How was this patch tested? Please explain how to verify the correctness and effectiveness of this feature, as well as its usage constraints and limitations. See merge request: Ascend/MindSpeed-LLM!4423	1 个月前
transformer_block.py	feat(pytorch): support deepseekv4_flash in mcore backend Co-authored-by: dingzicha1997<dingzilin@huawei.com> # message auto-generated for no-merge-commit merge: !4420 merge geneva2 into master feat(pytorch): support deepseekv4_flash in mcore backend Created-by: dingzicha1997 Commit-by: dingzicha1997 Merged-by: ascend-robot Description: ## What this PR does / why we need it? Please describe the background and detailed changes of the PR. If it is a bugfix, please attach the related issue. ## Does this PR introduce any user-facing change? Please describe whether the PR will result in any user-facing usage changes. If there is related documentation, please specify its path. ## How was this patch tested? Please explain how to verify the correctness and effectiveness of this feature, as well as its usage constraints and limitations. See merge request: Ascend/MindSpeed-LLM!4420	1 个月前
transformer_config.py	[pytorch][model]longcat model fix Co-authored-by: guihaowen666<guihaowen@huawei.com> # message auto-generated for no-merge-commit merge: !4251 merge br_master_longcat_fix into master [pytorch][model]longcat model fix Created-by: guihaowen666 Commit-by: guihaowen666 Merged-by: ascend-robot Description: longcat model fix See merge request: Ascend/MindSpeed-LLM!4251	3 个月前
transformer_layer.py	refactor(pytorch): update deepseek4 shell Co-authored-by: dingzicha1997<dingzilin@huawei.com> # message auto-generated for no-merge-commit merge: !4423 merge master into master refactor(pytorch): update deepseek4 shell Created-by: dingzicha1997 Commit-by: dingzicha1997 Merged-by: ascend-robot Description: ## What this PR does / why we need it? Please describe the background and detailed changes of the PR. If it is a bugfix, please attach the related issue. ## Does this PR introduce any user-facing change? Please describe whether the PR will result in any user-facing usage changes. If there is related documentation, please specify its path. ## How was this patch tested? Please explain how to verify the correctness and effectiveness of this feature, as well as its usage constraints and limitations. See merge request: Ascend/MindSpeed-LLM!4423	1 个月前