华为昇腾面向大规模分布式训练的多模态大模型套件，支撑多模态生成、多模态理解。

f79852d8创建于 18 天前1,528次提交

文件	最后提交记录	最后更新时间
.gitcode	docs: modify pull request template Co-authored-by: liyingxuan<liyingxuan3@huawei.com> # message auto-generated for no-merge-commit merge: !2225 merge master into master docs: modify pull request template Created-by: liyx616 Commit-by: liyingxuan Merged-by: ascend-robot Description: ## What this PR does / why we need it? 按照最新的标准修改pr模板 ## Does this PR introduce any user-facing change? 不涉及代码功能 ## How was this patch tested? 不涉及 See merge request: Ascend/MindSpeed-MM!2225	2 个月前
UserGuide	[Docs] Modify current repository URLs to relative paths Co-authored-by: AZe_404<wangze62@h-partners.com> # message auto-generated for no-merge-commit merge: !2360 merge chg_branch_2600 into 26.0.0 [Docs] Modify current repository URLs to relative paths Created-by: AZe_404 Commit-by: AZe_404 Merged-by: ascend-robot Description: ## What this PR does / why we need it? 1. 拉取代码修改为拉取26.0.0分支，包括之前未指定版本的MindSpeed Core 2. 将MM仓库内的链接修改为相对路径访问 ## Does this PR introduce any user-facing change? Please describe whether the PR will result in any user-facing usage changes. If there is related documentation, please specify its path. ## How was this patch tested? Please explain how to verify the correctness and effectiveness of this feature, as well as its usage constraints and limitations. See merge request: Ascend/MindSpeed-MM!2360	1 个月前
bridge	[bugfix]hunyuanvideo1.5 bridge 初始化 Co-authored-by: Miss_min<qiaoxiaomin@huawei.com> # message auto-generated for no-merge-commit merge: !2208 merge master into master [bugfix]hunyuanvideo1.5 bridge 初始化 Created-by: Miss_min Commit-by: Miss_min Merged-by: ascend-robot Description: ## Motivation Please describe the motivation of this PR and the goal you want to achieve through this PR. ## Modification Please briefly describe what modification is made in this PR. ## Self-test (Optional) If modifications to this PR may cause/fix function/accuracy/performance DTSs/issues, a self-inspection record needs to be attached. ## BC-breaking (Optional) If there are compatibility issues, such as dependencies on cann/torch_npu versions, they need to be explained in the PR. ## Checklist Before PR: - [ ] The new code needs to comply with the Clean Code specification. - [ ] The PR content is self-checked, and the expression can be clear and the writing standardized After PR: - [ ] CLA has been signed and all committers have signed the CLA in this PR. - [ ] The ci-pipeline is passed, Code Check is passed. See merge request: Ascend/MindSpeed-MM!2208	3 个月前
checkpoint	[docs] modify qwen2.5vl mindspeed branch to support A5 training Co-authored-by: LKONE<wanglikai4@huawei.com> # message auto-generated for no-merge-commit merge: !2262 merge master into master [docs] modify qwen2.5vl mindspeed branch to support A5 training Created-by: wanglikai1019 Commit-by: LKONE Merged-by: ascend-robot Description: ## What this PR does / why we need it? modify qwen2.5vl mindspeed branch to support A5 training ## Does this PR introduce any user-facing change? modify qwen2.5vl mindspeed branch to support A5 training ## How was this patch tested? Please explain how to verify the correctness and effectiveness of this feature, as well as its usage constraints and limitations. See merge request: Ascend/MindSpeed-MM!2262	2 个月前
ci	[Refactor] compatible for transformers-5.0.0(7a833d1c) Co-authored-by: zhangxubin<1656631289@qq.com> # message auto-generated for no-merge-commit merge: !2079 merge master into master [Refactor] compatible for transformers-5.0.0(7a833d1c) Created-by: MoCuishle-M Commit-by: MoCuishle-M;zhangxubin Merged-by: ascend-robot Description: ## Motivation compatible for transformers-5.0.0（7a833d1c）. ## Modification 该PR大部分改动来自https://gitcode.com/Ascend/MindSpeed-MM/pull/2040 ，只修改了lora patch的实现。 1.兼容qwen2/2.5/3vl transformers 5.0.0 rope 配置 2.规避pretrain_transformer forward参数检验 3.过滤相关参数兼容 transformers 5.0.0 4.修复ci打屏日志utf-8编解码问题 5.lora适配peft 0.18.1 ## Self-test (Optional) If modifications to this PR may cause/fix function/accuracy/performance DTSs/issues, a self-inspection record needs to be attached. ## BC-breaking (Optional) If there are compatibility issues, such as dependencies on cann/torch_npu versions, they need to be explained in the PR. ## Checklist Before PR: - [x] The new code needs to comply with the Clean Code specification. - [x] The PR content is self-checked, and the expression can be clear and the writing standardized After PR: - [x] CLA has been signed and all committers have signed the CLA in this PR. - [x] The ci-pipeline is passed, Code Check is passed. See merge request: Ascend/MindSpeed-MM!2079	4 个月前
docs	DOCS：修改安装PyTorch章节名称 Co-authored-by: z60112595<zhaotao68@h-partners.com> # message auto-generated for no-merge-commit merge: !2516 merge 26.0.0 into 26.0.0 DOCS：修改安装PyTorch章节名称 Created-by: kali00 Commit-by: z60112595 Merged-by: ascend-robot Description: ## What this PR does / why we need it? https://gitcode.com/Ascend/MindSpeed-MM/issues/314 修改安装PyTorch章节名称 ## Does this PR introduce any user-facing change? Please describe whether the PR will result in any user-facing usage changes. If there is related documentation, please specify its path. ## How was this patch tested? Please explain how to verify the correctness and effectiveness of this feature, as well as its usage constraints and limitations. See merge request: Ascend/MindSpeed-MM!2516	18 天前
examples	[Docs]rename release_notes Co-authored-by: z60119525<zhaotao68@h-partners.com> # message auto-generated for no-merge-commit merge: !2416 merge 26.0.0 into 26.0.0 [Docs]rename release_notes Created-by: kali00 Commit-by: z60119525 Merged-by: ascend-robot Description: 1、更新CANN版本为最新版本 2、重命名了版本说明 3、矫正资料目录中的简介路径 See merge request: Ascend/MindSpeed-MM!2416	1 个月前
mindspeed_mm	[Bugfix] bugfix for clip grad & empty ep Co-authored-by: htwang<wanghaitao60@huawei.com> # message auto-generated for no-merge-commit merge: !2382 merge 26.0.0 into 26.0.0 [Bugfix] bugfix for clip grad & empty ep Created-by: htwang Commit-by: htwang Merged-by: ascend-robot Description: ## What this PR does / why we need it? 1、EP使能时，当部分ep rank没有收到tokens时，保持空运算，防止专家参数失去梯度 2、修复不开EP切clip grad norm大于0时，clip grad 计算错误的问题 ## Does this PR introduce any user-facing change? Please describe whether the PR will result in any user-facing usage changes. If there is related documentation, please specify its path. ## How was this patch tested? Please explain how to verify the correctness and effectiveness of this feature, as well as its usage constraints and limitations. See merge request: Ascend/MindSpeed-MM!2382	1 个月前
pre-commit	修改完善pre-commit开源代码检测工具 Co-authored-by: wujinyuan1<wujinyuan1@huawei.com> # message auto-generated for no-merge-commit merge: !2471 merge 26.0.0 into 26.0.0 修改完善pre-commit开源代码检测工具 Created-by: wujinyuan1 Commit-by: wujinyuan1 Merged-by: ascend-robot Description: ## What this PR does / why we need it? pre-commit 工具规则配置修改：试运行阶段发现python bandit工具检测过严，超出昇腾编程规范。此次修改重点修改bandit规则。 ## Does this PR introduce any user-facing change? 门禁codecheck检测规则精简，提升代码合入效率 ## How was this patch tested? 代码扫描工具配置，不涉及代码仓功能 See merge request: Ascend/MindSpeed-MM!2471	24 天前
scripts	feat(torch): Squash merge fsdp2_dev into master. Co-authored-by: zs-derrick1<1434012475@qq.com> # message auto-generated for no-merge-commit merge: !2223 merge master into master feat(torch): Squash merge fsdp2_dev into master. Created-by: zs-derrick1 Commit-by: zs-derrick1 Merged-by: ascend-robot Description: ## Motivation Squash merge fsdp2_dev into master. ## Modification fsdp2_dev分支合并至master ## Self-test (Optional) If modifications to this PR may cause/fix function/accuracy/performance DTSs/issues, a self-inspection record needs to be attached. ## BC-breaking (Optional) If there are compatibility issues, such as dependencies on cann/torch_npu versions, they need to be explained in the PR. ## Checklist Before PR: - [x] The new code needs to comply with the Clean Code specification. - [x] The PR content is self-checked, and the expression can be clear and the writing standardized After PR: - [x] CLA has been signed and all committers have signed the CLA in this PR. - [x] The ci-pipeline is passed, Code Check is passed. See merge request: Ascend/MindSpeed-MM!2223	3 个月前
sources	[Docs]modified getting_start Co-authored-by: z60119525<zhaotao68@h-partners.com> # message auto-generated for no-merge-commit merge: !2367 merge 26.0.0 into 26.0.0 [Docs]modified getting_start Created-by: kali00 Commit-by: z60119525 Merged-by: ascend-robot Description: ## What this PR does / why we need it? Please describe the background and detailed changes of the PR. If it is a bugfix, please attach the related issue. ## Does this PR introduce any user-facing change? Please describe whether the PR will result in any user-facing usage changes. If there is related documentation, please specify its path. ## How was this patch tested? Please explain how to verify the correctness and effectiveness of this feature, as well as its usage constraints and limitations. See merge request: Ascend/MindSpeed-MM!2367	1 个月前
tests	docs: add DT authoring guide Co-authored-by: zhangxubin<1656631289@qq.com> # message auto-generated for no-merge-commit merge: !2372 merge 26.0.0 into 26.0.0 docs: add DT authoring guide Created-by: MoCuishle-M Commit-by: zhangxubin Merged-by: ascend-robot Description: ## What this PR does / why we need it? 为26.0.0分支增加DT编写指南：用于指导开发者如何为MindSpeed MM贡献DT用例。 ## Does this PR introduce any user-facing change? Please describe whether the PR will result in any user-facing usage changes. If there is related documentation, please specify its path. ## How was this patch tested? Please explain how to verify the correctness and effectiveness of this feature, as well as its usage constraints and limitations. See merge request: Ascend/MindSpeed-MM!2372	1 个月前
verl_plugin	[Modify] Fix document formatting and broken links Co-authored-by: WendongPang<pangwendong@huawei.com> # message auto-generated for no-merge-commit merge: !2353 merge doc_26 into 26.0.0 [Modify] Fix document formatting and broken links Created-by: WendongPang Commit-by: WendongPang Merged-by: ascend-robot Description: ## What this PR does / why we need it? [Modify] Fix document formatting and broken links. ## Does this PR introduce any user-facing change? Please describe whether the PR will result in any user-facing usage changes. If there is related documentation, please specify its path. ## How was this patch tested? Please explain how to verify the correctness and effectiveness of this feature, as well as its usage constraints and limitations. See merge request: Ascend/MindSpeed-MM!2353	1 个月前
.clang-format	修改完善pre-commit开源代码检测工具 Co-authored-by: wujinyuan1<wujinyuan1@huawei.com> # message auto-generated for no-merge-commit merge: !2471 merge 26.0.0 into 26.0.0 修改完善pre-commit开源代码检测工具 Created-by: wujinyuan1 Commit-by: wujinyuan1 Merged-by: ascend-robot Description: ## What this PR does / why we need it? pre-commit 工具规则配置修改：试运行阶段发现python bandit工具检测过严，超出昇腾编程规范。此次修改重点修改bandit规则。 ## Does this PR introduce any user-facing change? 门禁codecheck检测规则精简，提升代码合入效率 ## How was this patch tested? 代码扫描工具配置，不涉及代码仓功能 See merge request: Ascend/MindSpeed-MM!2471	24 天前
.gitignore	!729 【安全】加载功能安全加固 Merge pull request !729 from htwang/master	1 年前
.pre-commit-config.yaml	修改完善pre-commit开源代码检测工具 Co-authored-by: wujinyuan1<wujinyuan1@huawei.com> # message auto-generated for no-merge-commit merge: !2471 merge 26.0.0 into 26.0.0 修改完善pre-commit开源代码检测工具 Created-by: wujinyuan1 Commit-by: wujinyuan1 Merged-by: ascend-robot Description: ## What this PR does / why we need it? pre-commit 工具规则配置修改：试运行阶段发现python bandit工具检测过严，超出昇腾编程规范。此次修改重点修改bandit规则。 ## Does this PR introduce any user-facing change? 门禁codecheck检测规则精简，提升代码合入效率 ## How was this patch tested? 代码扫描工具配置，不涉及代码仓功能 See merge request: Ascend/MindSpeed-MM!2471	24 天前
LICENSE	!180 【资料】修改LICENSE Merge pull request !180 from liuqiyuan/master	1 年前
MANIFEST.in	!325 【测试】添加InternVL2-8B ST & build打包内容完善 Merge pull request !325 from 陆劲夫/master	1 年前
OWNERS	[bugfix]hunyuanvideo1.5 bridge 初始化 Co-authored-by: Miss_min<qiaoxiaomin@huawei.com> # message auto-generated for no-merge-commit merge: !2208 merge master into master [bugfix]hunyuanvideo1.5 bridge 初始化 Created-by: Miss_min Commit-by: Miss_min Merged-by: ascend-robot Description: ## Motivation Please describe the motivation of this PR and the goal you want to achieve through this PR. ## Modification Please briefly describe what modification is made in this PR. ## Self-test (Optional) If modifications to this PR may cause/fix function/accuracy/performance DTSs/issues, a self-inspection record needs to be attached. ## BC-breaking (Optional) If there are compatibility issues, such as dependencies on cann/torch_npu versions, they need to be explained in the PR. ## Checklist Before PR: - [ ] The new code needs to comply with the Clean Code specification. - [ ] The PR content is self-checked, and the expression can be clear and the writing standardized After PR: - [ ] CLA has been signed and all committers have signed the CLA in this PR. - [ ] The ci-pipeline is passed, Code Check is passed. See merge request: Ascend/MindSpeed-MM!2208	3 个月前
README.md	[Docs]rename release_notes Co-authored-by: z60119525<zhaotao68@h-partners.com> # message auto-generated for no-merge-commit merge: !2416 merge 26.0.0 into 26.0.0 [Docs]rename release_notes Created-by: kali00 Commit-by: z60119525 Merged-by: ascend-robot Description: 1、更新CANN版本为最新版本 2、重命名了版本说明 3、矫正资料目录中的简介路径 See merge request: Ascend/MindSpeed-MM!2416	1 个月前
Third-Party Open Source Software Notice.txt	[Docs] add license of pytorch Co-authored-by: htwang<wanghaitao60@huawei.com> # message auto-generated for no-merge-commit merge: !2319 merge 26.0.0 into 26.0.0 [Docs] add license of pytorch Created-by: htwang Commit-by: htwang Merged-by: ascend-robot Description: ## What this PR does / why we need it? 增加pytorch license声明 ## Does this PR introduce any user-facing change? Please describe whether the PR will result in any user-facing usage changes. If there is related documentation, please specify its path. ## How was this patch tested? Please explain how to verify the correctness and effectiveness of this feature, as well as its usage constraints and limitations. See merge request: Ascend/MindSpeed-MM!2319	2 个月前
evaluate_gen.py	!1321 [Bugfix] import dependency isolation. Merge pull request !1321 from zs-Derrick/master	10 个月前
evaluate_vlm.py	!1321 [Bugfix] import dependency isolation. Merge pull request !1321 from zs-Derrick/master	10 个月前
inference_qihoo.py	[Docs] Annotation Standardization Rectification Co-authored-by: LKONE<wanglikai4@huawei.com> # message auto-generated for no-merge-commit merge: !2010 merge master into master [Docs] Annotation Standardization Rectification Created-by: wanglikai1019 Commit-by: LKONE Merged-by: ascend-robot Description: ## Motivation Rectify the annotations for the code repository in accordance with the annotation specification requirements. ## Modification 1. Replace Chinese annotations with English ones; 2. Organize the environment variable documentation and add explanations in the startup scripts (currently implemented on Qwen3VL and Wan2.2); 3. Add annotations to the public dataset functions and public model classes of Wan2.2 and Qwen3VL; 4. Adjust the annotation format (e.g., number of indentations). ## Self-test (Optional) If modifications to this PR may cause/fix function/accuracy/performance DTSs/issues, a self-inspection record needs to be attached. ## BC-breaking (Optional) If there are compatibility issues, such as dependencies on cann/torch_npu versions, they need to be explained in the PR. ## Checklist Before PR: - [ ] The new code needs to comply with the Clean Code specification. - [ ] The PR content is self-checked, and the expression can be clear and the writing standardized After PR: - [ ] CLA has been signed and all committers have signed the CLA in this PR. - [ ] The ci-pipeline is passed, Code Check is passed. See merge request: Ascend/MindSpeed-MM!2010	4 个月前
inference_sora.py	[Docs] Annotation Standardization Rectification Co-authored-by: LKONE<wanglikai4@huawei.com> # message auto-generated for no-merge-commit merge: !2010 merge master into master [Docs] Annotation Standardization Rectification Created-by: wanglikai1019 Commit-by: LKONE Merged-by: ascend-robot Description: ## Motivation Rectify the annotations for the code repository in accordance with the annotation specification requirements. ## Modification 1. Replace Chinese annotations with English ones; 2. Organize the environment variable documentation and add explanations in the startup scripts (currently implemented on Qwen3VL and Wan2.2); 3. Add annotations to the public dataset functions and public model classes of Wan2.2 and Qwen3VL; 4. Adjust the annotation format (e.g., number of indentations). ## Self-test (Optional) If modifications to this PR may cause/fix function/accuracy/performance DTSs/issues, a self-inspection record needs to be attached. ## BC-breaking (Optional) If there are compatibility issues, such as dependencies on cann/torch_npu versions, they need to be explained in the PR. ## Checklist Before PR: - [ ] The new code needs to comply with the Clean Code specification. - [ ] The PR content is self-checked, and the expression can be clear and the writing standardized After PR: - [ ] CLA has been signed and all committers have signed the CLA in this PR. - [ ] The ci-pipeline is passed, Code Check is passed. See merge request: Ascend/MindSpeed-MM!2010	4 个月前
inference_videoalign.py	!1453 [Docs] readme for videoalign Merge pull request !1453 from chenpeizhe/master	8 个月前
inference_vlm.py	!1321 [Bugfix] import dependency isolation. Merge pull request !1321 from zs-Derrick/master	10 个月前
posttrain_flux_dancegrpo.py	!1376 [Feature]add DanceGRPO-flux feature readme Merge pull request !1376 from lmy/dev	9 个月前
posttrain_qwen2vl_dpo.py	!1116 [Bugfix]Rectify the code in the repository based on the CleanCode scan results. Merge pull request !1116 from zhangxubin/master	11 个月前
posttrain_sora_dpo.py	!1384 [Feature] support lumina-mgpt2 model Merge pull request !1384 from meng-coding/master	9 个月前
pretrain_ae.py	!599 【特性】新增vae训练脚本以及配置文件 Merge pull request !599 from zs-Derrick/master	1 年前
pretrain_deepseekvl.py	[mindspore][master]support deepseekvl, llava, glm4.1v for mindspore backend Co-authored-by: ffmh<fengminghao2@huawei.com> # message auto-generated for no-merge-commit merge: !1671 merge ms_adapt into master [mindspore][master]support deepseekvl, llava, glm4.1v for mindspore backend Created-by: ffmh Commit-by: ffmh Merged-by: ascend-robot Description: ## Motivation deepseekvl2, llava1.5, glm4.1v 支持mindspore后端 ## Modification patch修改介绍 1. npu_rotary_position_embedding使用mindspore框架接口，不走mindspeed自定义算子流程 2. vmap接口缺失，使用等价写法替换transformers中 sdpa_mask_older_torch 函数 ## Self-test (Optional) If modifications to this PR may cause/fix function/accuracy/performance DTSs/issues, a self-inspection record needs to be attached. dsvl2 ![image.png](https://raw.gitcode.com/user-images/assets/7404510/64b345d1-c04c-4c7f-8607-895f638dcee0/image.png 'image.png') glm ![image.png](https://raw.gitcode.com/user-images/assets/7404510/6c75c0cb-8478-42cd-a05d-effa1b18dc17/image.png 'image.png') llava ![image.png](https://raw.gitcode.com/user-images/assets/7404510/addbf12a-9de9-44d6-8184-4b8b46d262fe/image.png 'image.png') ## BC-breaking (Optional) If there are compatibility issues, such as dependencies on cann/torch_npu versions, they need to be explained in the PR. ## Checklist Before PR: - [ ] The new code needs to comply with the Clean Code specification. - [ ] The PR content is self-checked, and the expression can be clear and the writing standardized After PR: - [ ] CLA has been signed and all committers have signed the CLA in this PR. - [ ] The ci-pipeline is passed, Code Check is passed. See merge request: Ascend/MindSpeed-MM!1671	6 个月前
pretrain_internvl.py	delete dist-train Co-authored-by: lu-jinfu1999<lujinfu1@h-partners.com> # message auto-generated for no-merge-commit merge: !1835 merge master into master [Modify] delete dist-train from master Created-by: lu-jinfu1999 Commit-by: lu-jinfu1999 Merged-by: ascend-robot Description: ## Motivation delete dist-train. ## Modification delete dist-train. ## Self-test (Optional) If modifications to this PR may cause/fix function/accuracy/performance DTSs/issues, a self-inspection record needs to be attached. ## BC-breaking (Optional) If there are compatibility issues, such as dependencies on cann/torch_npu versions, they need to be explained in the PR. ## Checklist Before PR: - [x] The new code needs to comply with the Clean Code specification. - [x] The PR content is self-checked, and the expression can be clear and the writing standardized After PR: - [x] CLA has been signed and all committers have signed the CLA in this PR. - [x] The ci-pipeline is passed, Code Check is passed. See merge request: Ascend/MindSpeed-MM!1835	5 个月前
pretrain_lumina.py	!1384 [Feature] support lumina-mgpt2 model Merge pull request !1384 from meng-coding/master	9 个月前
pretrain_omni.py	[Bugfix] Bug in Bagel folder changes and validation Co-authored-by: ningmengliu<liuhao438@huawei.com> # message auto-generated for no-merge-commit merge: !1966 merge master into master [Bugfix] Bug in Bagel folder changes and validation Created-by: ningmenglh Commit-by: ningmengliu Merged-by: ascend-robot Description: ## Motivation Bug in Bagel folder changes and validation ## Modification Bug in Bagel folder changes and validation ## Self-test (Optional) If modifications to this PR may cause/fix function/accuracy/performance DTSs/issues, a self-inspection record needs to be attached. ## BC-breaking (Optional) If there are compatibility issues, such as dependencies on cann/torch_npu versions, they need to be explained in the PR. ## Checklist Before PR: - [ ] The new code needs to comply with the Clean Code specification. - [ ] The PR content is self-checked, and the expression can be clear and the writing standardized After PR: - [ ] CLA has been signed and all committers have signed the CLA in this PR. - [ ] The ci-pipeline is passed, Code Check is passed. See merge request: Ascend/MindSpeed-MM!1966	5 个月前
pretrain_qwen2vl.py	[Docs] Annotation Standardization Rectification Co-authored-by: LKONE<wanglikai4@huawei.com> # message auto-generated for no-merge-commit merge: !2010 merge master into master [Docs] Annotation Standardization Rectification Created-by: wanglikai1019 Commit-by: LKONE Merged-by: ascend-robot Description: ## Motivation Rectify the annotations for the code repository in accordance with the annotation specification requirements. ## Modification 1. Replace Chinese annotations with English ones; 2. Organize the environment variable documentation and add explanations in the startup scripts (currently implemented on Qwen3VL and Wan2.2); 3. Add annotations to the public dataset functions and public model classes of Wan2.2 and Qwen3VL; 4. Adjust the annotation format (e.g., number of indentations). ## Self-test (Optional) If modifications to this PR may cause/fix function/accuracy/performance DTSs/issues, a self-inspection record needs to be attached. ## BC-breaking (Optional) If there are compatibility issues, such as dependencies on cann/torch_npu versions, they need to be explained in the PR. ## Checklist Before PR: - [ ] The new code needs to comply with the Clean Code specification. - [ ] The PR content is self-checked, and the expression can be clear and the writing standardized After PR: - [ ] CLA has been signed and all committers have signed the CLA in this PR. - [ ] The ci-pipeline is passed, Code Check is passed. See merge request: Ascend/MindSpeed-MM!2010	4 个月前
pretrain_sora.py	[bugfix] fix bugs for wan2.2&qwen3vl fsdp checkpointing Co-authored-by: peng-hengduo<penghengduo@huawei.com> # message auto-generated for no-merge-commit merge: !2180 merge wan_checkpointing_bugfix into master [bugfix] fix bugs for wan2.2&qwen3vl fsdp checkpointing Created-by: peng-hengduo Commit-by: peng-hengduo Merged-by: ascend-robot Description: Fix the bugs of wan2.2 qwen3vl breakpointing. See merge request: Ascend/MindSpeed-MM!2180	3 个月前
pretrain_transformers.py	[Feature]Add use_audio_in_video config option for Qwen3-Omni data processor Co-authored-by: yaoyaoxu<xuyaoyao.824404@huawei.com> # message auto-generated for no-merge-commit merge: !2053 merge qwen3omni_audio_video_fix into master [Feature]Add use_audio_in_video config option for Qwen3-Omni data processor Created-by: yaoyaoxu Commit-by: yaoyaoxu Merged-by: ascend-robot Description: ## Motivation [Feature]Add use_audio_in_video config option for Qwen3-Omni data processor ## Modification 1.增加use_audio_in_video配置，在代码链路中完善值的传递流程，并在readme中说明用法 2.修复move_to_device，遗漏非tensor的kv值 ## Self-test (Optional) If modifications to this PR may cause/fix function/accuracy/performance DTSs/issues, a self-inspection record needs to be attached. ## BC-breaking (Optional) If there are compatibility issues, such as dependencies on cann/torch_npu versions, they need to be explained in the PR. ## Checklist Before PR: - [ ] The new code needs to comply with the Clean Code specification. - [ ] The PR content is self-checked, and the expression can be clear and the writing standardized After PR: - [ ] CLA has been signed and all committers have signed the CLA in this PR. - [ ] The ci-pipeline is passed, Code Check is passed. See merge request: Ascend/MindSpeed-MM!2053	4 个月前
pretrain_videoalign.py	!1429 [Feature] support videoalign model Merge pull request !1429 from chenpeizhe/main	9 个月前
pretrain_vlm.py	[modify] modify the threshold for gc Co-authored-by: LKONE<wanglikai4@huawei.com> # message auto-generated for no-merge-commit merge: !2294 merge master into master [modify] modify the threshold for gc Created-by: wanglikai1019 Commit-by: LKONE Merged-by: ascend-robot Description: ## What this PR does / why we need it? python的垃圾回收机制，如果垃圾回收进程不同步，会出现一个step内多个进程先后gc回收，导致单个step拉长。该回收机制导致qwen2vl在运行过程中会产生单步性能劣化，因为通过增大gc二次回收阈值来缓解性能劣化问题。 modify the threshold for gc to mitigate single-step performance degradation ## Does this PR introduce any user-facing change? 无 ## How was this patch tested? 测试qwen2vl前15步，是否产生性能波动 See merge request: Ascend/MindSpeed-MM!2294	2 个月前
pretrain_whisper.py	!127 【特性】新增WhisperForConditionalGeneration模型 Merge pull request !127 from zzztq/master	1 年前
pyproject.toml	feat(torch): Squash merge fsdp2_dev into master. Co-authored-by: zs-derrick1<1434012475@qq.com> # message auto-generated for no-merge-commit merge: !2223 merge master into master feat(torch): Squash merge fsdp2_dev into master. Created-by: zs-derrick1 Commit-by: zs-derrick1 Merged-by: ascend-robot Description: ## Motivation Squash merge fsdp2_dev into master. ## Modification fsdp2_dev分支合并至master ## Self-test (Optional) If modifications to this PR may cause/fix function/accuracy/performance DTSs/issues, a self-inspection record needs to be attached. ## BC-breaking (Optional) If there are compatibility issues, such as dependencies on cann/torch_npu versions, they need to be explained in the PR. ## Checklist Before PR: - [x] The new code needs to comply with the Clean Code specification. - [x] The PR content is self-checked, and the expression can be clear and the writing standardized After PR: - [x] CLA has been signed and all committers have signed the CLA in this PR. - [x] The ci-pipeline is passed, Code Check is passed. See merge request: Ascend/MindSpeed-MM!2223	3 个月前

简介

MindSpeed MM：面向大规模分布式训练的昇腾多模态大模型套件，支持业界主流多模态大模型训练，旨在为华为昇腾芯片提供端到端的多模态训练解决方案, 包含预置业界主流模型，数据工程，分布式训练及加速，预训练、微调、后训练、在线推理任务等特性。

未来规划

📅未来规划会动态刷新在MindSpeed MM RoadMap中，欢迎大家通过此链接进行互动并提出诉求

社区会议

MindSpeed系列TC及SIG会议安排请查看Ascend会议中心

加入我们

为了交流开发经验、分享使用心得、及时获取项目更新，我们创建了MindSpeed MM官方微信群。

无论你是正在使用这个项目，还是有奇思妙想，都欢迎加入👋

加入方式：

直接扫码加入微信交流群（二维码7天有效，定期更新，当前1群已达到扫码加入人数上限，可加入2群）
添加昇腾开源小助手，获取群链接，进入MindSpeed MM社区交流群

MindSpeed MM社区交流群

昇腾开源小助手

目录结构

├─bridge          # mbridge在线权重转换
├─checkpoint      # 离线权重转换工具
├─ci              # Continuous Integration
├─docs            # 项目文档目录
│  └─zh           # 中文文档目录
├─examples        # 预置模型，包括模型配置、数据集配置、训练脚本、推理脚本等文件
├─mindspeed_mm    # 核心代码目录
├─scripts         # 脚本目录
├─sources         # 图片视频目录
├─tests           # 测试代码目录
│  ├─st           # 系统测试用例
│  └─ut           # 单元测试用例
├─UserGuide       # 用户指南目录
└─verl_plugin     # verl插件模块

效果展示

文生视频： Wan 2.2 T2V

Prompt: Ultra HD, 4K, cinematic composition, low contrast ratio, low saturation, cool tone; The queen wears an iron crown and rides on the dragon over the city. She holds a big flag that shows:" MindSpeed MM".

文生视频： OpensoraPlan 1.5 T2V

Prompt: A fluffy white rabbit with soft, velvety fur and twitching pink nose sits curiously near a rustic wooden fence, surrounded by a lush garden of vibrant wildflowers and tall grasses swaying gently in the breeze. The rabbit's large, expressive eyes scan the environment, reflecting the golden hues of the setting sun. As it nibbles on a patch of clover, its ears perk up at the distant sound of chirping birds. The fence, weathered and covered in patches of moss, adds a charming, pastoral backdrop to this serene scene, capturing the essence of a peaceful countryside moment.

Prompt: A majestic Berlin tower stands tall against the night sky, its structure bathed in a mesmerizing array of vibrant lights, casting a kaleidoscope of colors across the cityscape. The tower's intricate architectural details are highlighted by the illumination, creating a stunning contrast against the deep indigo sky. As the camera pans upward, the lights shift, revealing a dynamic play of shadows and hues that dance across the tower's surface. The surrounding city lights twinkle in harmony, enhancing the tower's grandeur and creating a breathtaking visual symphony that captures the essence of Berlin's vibrant nightlife.

文生图：Qwen-Image -> 图片编辑 Flux.1-Kontext

Prompt for generation: A coffee shop entrance features a chalkboard sign reading "MindSpeed Coffee 😊 $2 per cup," with a neon light displaying "MindSpeed MM". Next to it hangs a poster showing a beautiful Chinese woman, and beneath the poster is written "Welcome to use MindSpeed MM". Ultra HD, 4K, cinematic composition. (Qwen-Image)

Prompt for edition: Change the decoration of the coffee shop to a modern style with white painting. (Flux.1-Kontext)

理解模型：Qwen2VL

Input image for both models:

Input text for both models: Please describe the image shortly

Qwen2VL推理结果: The image depicts a serene lakeside scene with a wooden dock extending into the calm waters. The dock is made of weathered wooden planks and leads to a small platform with a ladder, suggesting it is used for swimming or diving. The lake is surrounded by lush green forests and mountains in the background, creating a picturesque and tranquil setting. The sky is overcast, adding to the calm and peaceful atmosphere of the scene.

Input text for Qwen2VL: 请用中文简短描述这张照片

Qwen2VL推理结果: 这张图片展示了一座木制码头延伸到平静的湖面上，背景是连绵的山脉和茂密的森林。天空多云，整体色调偏冷，给人一种宁静和自然的感觉。

版本说明

MindSpeed MM支持Atlas 800T A2等昇腾训练硬件形态，软件版本配套表如下：

MindSpeed MM版本	MindSpeed版本	Megatron版本	PyTorch版本	torch_npu版本	CANN版本	Python版本
master（在研版本）	master（在研版本）	Core 0.12.1	2.7.1	在研版本	在研版本	Python3.10
26.0.0（商用）	26.0.0_core_r0.12.1	Core 0.12.1	2.7.1	26.0.0	9.0.0	Python3.10
2.3.0（商用）	2.3.0_core_r0.12.1	Core 0.12.1	2.6.0, 2.7.1	7.3.0	8.5.0	Python3.10
2.2.0（商用）	2.2.0_core_r0.12.1	Core 0.12.1	2.6.0, 2.7.1	7.2.0	8.3.RC1	Python3.10
2.1.0（商用）	2.1.0_core_r0.8.0	Core 0.8.0	2.1.0, 2.6.0	7.1.0	8.2.RC1	Python3.8, Python3.10
2.0.0（商用）	2.0.0_core_r0.8.0	Core 0.8.0	2.1.0	7.0.0	8.1.RC1	Python3.8, Python3.10
1.0.0（商用）	1.0.0_core_r0.6.0	Core 0.6.0	2.1.0	6.0.0	8.0.0	Python3.8, Python3.10

Note

“在研版本”指当前正处于开发迭代中的版本，由于该版本的功能仍处于持续迭代与优化阶段，其配套依赖项即使采用已发布的商用版本，仍可能存在兼容性风险或运行不稳定性，如需稳定使用，建议优先使用已正式发布的商用版本。

更多详情请参考版本配套表。

安装

MindSpeed MM具体的安装请参考安装指导。当前qwen3vl、wan2.2模型已支持一键安装，一键安装使用说明详见一键安装使用说明。

快速上手

MindSpeed MM将以Qwen2.5-VL-3B和Wan2.1-T2V-1.3B模型为例，引导开发者快速上手预置模型在昇腾NPU上的高效运行。具体的操作请参考快速入门。

特性/模型介绍

已支持特性概览

模型 \ 特性	TP	TP-SP	VPP	PP	CP	Distributed Optimizer	Recomputation	LoRA	RL	FSDP2
Magistral-Small-2509							✔	✔		✔
InternVL3.5-30B							✔			✔
Qwen3-VL-8B							✔			✔
Qwen3-VL-30B							✔			✔
Wan2.2					CP (Ulysses)		✔			✔
OpenSoraPlan1.5-T2V	✔	✔					✔
Wan2.1					CP (Ulysses)	✔	✔	✔		✔
HunyuanVideo	✔	✔			CP (Ulysses)	✔	✔	✔
HunyuanVideo1.5						✔	✔			✔
CogVideoX系列-T2V	✔	✔			CP (Ulysses)	✔	✔	✔
CogVideoX系列-I2V	✔	✔			CP (Ulysses)	✔	✔	✔
OpensoraPlan1.3-T2V	✔	✔	✔	✔	CP (Ulysses)	✔	✔
OpensoraPlan1.3-I2V	✔	✔	✔	✔	CP (Ulysses)	✔	✔
GLM-4.1V				✔		✔	✔
Qwen2VL-2B	✔	✔		✔	CP (Ulysses)	✔	✔	✔
Qwen2VL-7B	✔	✔		✔	CP (Ulysses)	✔	✔	✔
Qwen2VL-72B	✔	✔		✔	CP (Ulysses)	✔	✔	✔	DPO
Qwen2.5VL-3B	✔	✔		✔		✔	✔		GRPO
Qwen2.5VL-7B	✔	✔		✔		✔	✔		GRPO
Qwen2.5VL-32B	✔	✔		✔		✔	✔		GRPO
Qwen2.5VL-72B	✔	✔		✔		✔	✔	✔
Qwen2.5Omni-7B	✔			✔		✔		✔
Qwen3-Omni							✔			✔
InternVL3-8B	✔	✔	✔	✔	CP (Ring)	✔	✔
InternVL3-78B	✔	✔	✔	✔	CP (Ring)	✔	✔

备注：

TP: Tensor Parallel
TP-SP: Tensor Parallel with Sequence Parallel
VPP: Virtual Pipeline Parallel
PP: Pipeline Parallel
DSP: Dynamic Sequence Parallel
CP (Ulysses): Context Parallel by leveraging Deepspeed Ulysses with Sequence Parallel
CP (Ring Attention): Context Parallel with Ring Attention
Distributed Optimizer: Zero Redundancy Optimizer (ZeRO)
Recomputation: Reducing Activation Recomputation
LoRA: Low-Rank Adaptation
RL: Reinforcement Learning
FSDP2: Fully Sharded Data Parallelism

配套版本与支持模型

【现版本实测性能（硬件信息：Atlas 900 A2 PODc）】

下述列表中支持的模型，我们在各模型的README文件中提供了相应的使用说明，里面有详细的模型训练、推理、微调等流程

模型列中的超链接指向各模型的文件夹地址， 参数量列中的超链接指向模型的社区资源地址

认证【Pass】表示已经通过测试的模型，【Test】表示测试中的模型

Samples per Second 为 (SPS); Frames per Second 为 (FPS); Tokens per Second 为 (TPS)

(注：此处SPS、FPS展示集群吞吐；TPS展示单卡吞吐)

平均序列长度是指在性能测试过程中所使用数据集的平均序列长度，通过统计各个序列长度的出现频率进行加权平均计算得出

亲和场景为调整少量结构或参数，使得模型更加亲和昇腾，性能更优

A3 为硬件 Atlas A3 训练系列产品

MindSpeed MM模型列表
模型任务	模型	参数量	任务	集群	精度格式	NPU性能	参考性能	平均序列长度	认证
多模态生成
	Lumina-mGPT 2.0	7B	微调	1x8	BF16	8.24 (SPS)	8.79 (SPS)	1024	【Pass】
	OpenSoraPlan1.5	8.5B	预训练	1x8	BF16	0.83 (SPS)	/	/	【北大贡献】
	Wan2.2-T2V	5B	预训练	1x4 (A3)	BF16	3.18 (SPS)	2.93 (SPS)	/	【Test】
	Wan2.2-T2V	A14B	预训练	1x8 (A3)	BF16	0.710 (SPS)	0.292 (SPS)	/	【Test】
	Wan2.2-TI2V	5B	预训练	1x4 (A3)	BF16	3.18 (SPS)	2.93 (SPS)	/	【Test】
	Wan2.2-I2V	A14B	预训练	1x8 (A3)	BF16	0.671 (SPS)	0.294 (SPS)	/	【Test】
	Wan2.1-T2V	1.3B	预训练	1x8	BF16	0.918 (SPS)	1.04 (SPS)	/	【Pass】
		1.3B	Lora微调	1x8	BF16	0.954 (SPS)	1.042 (SPS)	/	【Pass】
		14B	预训练	1x8	BF16	0.160 (SPS)	0.160 (SPS)	/	【Pass】
		14B	Lora微调	1x8	BF16	0.179 (SPS)	0.174 (SPS)	/	【Pass】
	Wan2.1-I2V	1.3B	预训练	1x8	BF16	0.76 (SPS)	/	/	【Pass】
		14B	预训练	1x8	BF16	0.130 (SPS)	/	/	【Pass】
		14B	Lora微调	1x8	BF16	0.179 (SPS)	0.173 (SPS)	/	【Pass】
	Self-Forcing	1.3B	DMD蒸馏	1x8	BF16	0.225 (FPS)	0.282 (FPS)	/	【Test】
	HunyuanVideo-T2V	13B	预训练	1x8	BF16	0.171 (SPS)	0.181 (SPS)	/	【Pass】
	HunyuanVideo-I2V	13B	预训练	1x8	BF16	0.164 (SPS)	0.202 (SPS)	/	【Pass】
	HunyuanVideo1.5-T2V	8B	预训练	1x8	BF16	/	/	/	【Pass】
	OpenSora 1.0	5.5B	预训练	1x8	BF16	3.18 (SPS)	2.04 (SPS)	/	【Pass】
	OpenSora 1.2	5.2B	预训练	1x8	BF16	7.31 (SPS)	8.15 (SPS)	/	【Test】
	OpenSora 2.0-T2V	11B	预训练	1x8	BF16	1.33 (SPS)	1.46 (SPS)	/	【Pass】
	OpenSoraPlan 1.2	8.7B	预训练	1x8	BF16	0.42 (SPS)	0.37 (SPS)	/	【Pass】
	OpenSoraPlan 1.3-T2V	8.6B	预训练	1x8	BF16	1.29 (SPS)	1.27 (SPS)	/	【Pass】
	OpenSoraPlan 1.3-I2V	8.6B	预训练	1x8	BF16	1.17 (SPS)	1.15 (SPS)	/	【Pass】
	WFVAE	0.18B	预训练	1x8	BF16	23.860 (SPS)	26.091 (SPS)	/	【Pass】
	CogVideoX-T2V	5B	预训练	1x8	BF16	1.14 (SPS)	1.00 (SPS)	6976	【Pass】
	CogVideoX-I2V	5B	预训练	1x8	BF16	1.13 (SPS)	0.84 (SPS)	6976	【Pass】
	CogVideoX 1.5-T2V	5B	预训练	1x8	BF16	1.44 (SPS)	1.75 (SPS)	6976	【Pass】
	CogVideoX 1.5-T2V	5B	Lora微调	1x8	BF16	2.76 (SPS)	2.64 (SPS)	/	【Pass】
	CogVideoX 1.5-I2V	5B	预训练	1x8	BF16	1.43 (SPS)	1.44 (SPS)	6976	【Pass】
	CogVideoX 1.5-I2V	5B	Lora微调	1x8	BF16	2.33 (SPS)	2.04 (SPS)	/	【Pass】
	Qihoo-T2X	1.1B	推理	1x1	BF16	/	/	/	【奇虎360贡献】
	SDXL	3.5B	预训练	1x8	BF16	29.92 (FPS)	30.65 (FPS)	/	【Pass】
	SDXL	3.5B	预训练	1x8	FP16	28.51 (FPS)	30.23 (FPS)	/	【Pass】
	SD3	2B	全参微调	1x8	BF16	16.09 (FPS)	16.01 (FPS)	/	【Pass】
	SD3.5	8.1B	全参微调	1x8	BF16	26.20 (FPS)	28.33 (FPS)	/	【Pass】
	SD3.5	8.1B	Lora微调	1x8	FP16	47.93 (FPS)	47.95 (FPS)	/	【Pass】
	Flux	12B	全参微调	1x8	BF16	55.23 (FPS)	53.65 (FPS)	/	【Pass】
	Flux2-T2I	32B	全参微调	1x8	BF16	1.28 (FPS)	1.24 (FPS)	/	【Test】
	Flux2-I2I	32B	全参微调	1x8	BF16	0.61 (FPS)	0.60 (FPS)	/	【Test】
	Flux-Kontext	12B	全参微调	1x8	BF16	1.97 (FPS)	2.00 (FPS)	/	【Pass】
	Sana	1.6B	Lora微调	1x8	BF16	28.7 (FPS)	32.8 (FPS)	/	【Pass】
	HiDream	17B	Lora微调	1x8	BF16	18.37 (FPS)	19.61 (FPS)	/	【Pass】
	Kolors	2.6B	推理	1x1	FP16	/	/	/	【Test】
	Qwen-Image	27B	Lora微调	1x8	BF16	23.02 (FPS)	21.54 (FPS)	/	【Pass】
	Qwen-Image-Edit	27B	Lora微调	1x8	BF16	20.59 (FPS)	17.47 (FPS)	/	【Test】
多模态理解
	GLM-4.1V	9B	微调	1x8	BF16	1074.64(TPS)	908.49(TPS)	707	【Pass】
	DeepSeek-OCR	3B	微调	1x8	BF16	1327.694(TPS)	/	/	【Test】
	LLaVA 1.5	7B	全参微调	1x8	BF16	3632.31 (TPS)	3757.98 (TPS)	602	【Test】
	InternVL 2.0	2B	微调	1x8	BF16	7653.12 (TPS)	5089.99 (TPS)	1813	【Pass】
		8B	微调	1x8	BF16	2914.39 (TPS)	2492.87 (TPS)	1813	【Pass】
		26B	微调	1x8	BF16	750.12 (TPS)	738.79 (TPS)	1813	【Pass】
		76B	全参微调	8x16	BF16	214 (TPS)	191 (TPS)	1813	【Pass】
	InternVL 2.5	78B	微调	8x8	BF16	228.33	/	1896	【Test】
	InternVL 3.0	8B	微调	1x8	BF16	2344.58 (TPS)	2211.93 (TPS)	2653	【Pass】
	InternVL 3.0	78B	微调	4x8 (A3)	BF16	228.82 (TPS)	283.15 (TPS)	1932	【Pass】
	InternVL 3.5	30B	微调	1x8 (A3)	BF16	52.76 (TPS)	47.73 (TPS)	201	【Test】
	Qwen2-VL	2B	微调	1x8	BF16	2941.17 (TPS)	3004.04 (TPS)	689	【Pass】
		7B	微调	1x8	BF16	1143.74 (TPS)	1004.22 (TPS)	689	【Pass】
		72B	微调	4x8 (A3)	BF16	261.25 (TPS)	257.63 (TPS)	689	【Pass】
	Qwen2.5-VL	3B	微调	1x8	BF16	2047.19 (TPS)	1876.66 (TPS)	689	【Pass】
		7B	微调	1x8	BF16	1620.87 (TPS)	1091.20 (TPS)	689	【Pass】
		32B	微调	2x8	BF16	257.50 (TPS)	/	689	【Pass】
		72B	微调	4x8 (A3)	BF16	322.96 (TPS)	256.28 (TPS)	689	【Pass】
	Qwen3-VL	8B	微调	1x8	BF16	146.54 (TPS)	129.71 (TPS)	179	【Test】
		30B	微调	1x8 (A3)	BF16	179.57 (TPS)	/	185	【Test】
		235B	微调	16x8 (A3)	BF16	598.05 (TPS)	/	16116	【Test】
	Qwen2.5-Omni	7B	微调	1x8	BF16	575.01 (TPS)	534.28 (TPS)	296	【Pass】
	Qwen3-Omni	30B	微调	2x4 (A3)	BF16	131.3 (TPS)	16.4 (TPS)	288	【Test】
	Magistral-Small-2509	24B	微调	1x8	BF16	1.843 (SPS)	1.185 (SPS)	/	【Test】
语音识别	Whisper	1.5B	预训练	1x8	BF16	93.38 (SPS)	109.23 (SPS)	/	【Test】
语音生成	CosyVoice3	0.5B	预训练	1x8	BF16	290.91 (SPS)	326.11 (SPS)	24	【Test】

大语言模型（稠密模型、稀疏模型和状态空间模型）由MindSpeed-LLM专项维护，如果需要进行大语言模型的训练，请访问大语言模型仓库MindSpeed-LLM获取详细的适用说明，当前MindSpeed-LLM已支持以下的主流模型：

模型类型	模型	下载链接	脚本位置	序列长度	训练后端	集群规模	支持版本	贡献方	认证
稠密模型	Aquila	7B	aquila	2K	Legacy	1x8	2.0.0	【GTS】	【Pass】
	Aquila2	7B	aquila2	2K	Legacy	1x8	2.0.0	【GTS】	【Pass】
	Aquila2	34B	aquila2	4K	Legacy	2x8	2.0.0	【GTS】	【Pass】
	Baichuan	7B	baichuan	4K	Legacy	1x8	2.0.0	【GTS】	【Pass】
	Baichuan	13B	baichuan	4K	Legacy	1x8	2.0.0	【GTS】	【Pass】
	Baichuan2	7B	baichuan2	4K	Legacy	1x8	2.0.0	【Ascend】	【Pass】
	Baichuan2	13B	baichuan2	4K	Mcore	1x8	2.3.0	【Ascend】	【Pass】
	Bloom	7B1	bloom	2K	Legacy	1x8	2.0.0	【Ascend】	【Pass】
	Bloom	176B	bloom	2K	Legacy	12x8	2.0.0	【Ascend】	【Pass】
	ChatGLM3	6B	chatglm3	8K	Mcore	1x8	2.3.0	【Ascend】	【Pass】
				32K	Mcore	1x8	2.3.0	【Ascend】	【Pass】
				64K	Mcore	2x8	2.3.0	【Ascend】	【Pass】
	GLM4	9B	glm4	8K	Mcore	1x8	2.3.0	【GTS】	【Pass】
	GLM4	9B	glm4	32K	Mcore	2x8	2.3.0	【GTS】	【Pass】
	CodeLlama	34B	codellama	4K	Mcore	2x8	2.2.0	【GTS】	【Pass】
	InternLM	7B	intern	2K	Legacy	1x8	2.0.0	【Ascend】	【Pass】
	InternLM	65B	intern	2K	Legacy	4x8	2.0.0	【Ascend】	【Pass】
	InternLM2	20B	internlm2	4K	Mcore	1x8	2.2.0	【GTS】	【Pass】
	InternLM2	20B	internlm2	32K	Mcore	1x8	2.2.0	【GTS】	【Pass】
	InternLM2.5	1.8B	internlm25	32K	Mcore	1x8	2.3.0	【GTS】	【Pass】
		7B		32K	Mcore	1x8	2.3.0	【GTS】	【Pass】
		20B		32K	Mcore	2x8	2.3.0	【GTS】	【Test】
	InternLM3	8B	internlm3	8K	Mcore	1x8		【Ascend】	【Pass】
	LLaMA	7B	llama	2K	Legacy	1x8	2.0.0	【Ascend】	【Pass】
		13B		2K	Legacy	1x8	2.0.0	【Ascend】	【Pass】
		33B		2K	Legacy	4x8	2.0.0	【Ascend】	【Pass】
		65B		2K	Legacy	4x8	2.0.0	【Ascend】	【Pass】
	LLaMA2	7B	llama2	4K	Mcore	1x8		【NAIE】	【Pass】
		13B		4K	Mcore	1x8		【NAIE】	【Pass】
		34B		4K	Mcore	2x8	2.3.0	【GTS】	【Pass】
		70B		4K	Mcore	4x8		【GTS】	【Pass】
		70B		128K	Mcore	8x8		【Ascend】	【Pass】
	LLaMA3	8B	llama3	8K	Mcore	1x8	2.3.0	【GTS】	【Pass】
	LLaMA3	70B	llama3	8K	Mcore	4x8	2.3.0	【GTS】	【Pass】
	LLaMA3.1	8B	llama31	8K	Mcore	1x8	2.3.0	【GTS】	【Pass】
		8B		128K	Mcore	4x8	2.3.0	【GTS】	【Pass】
		50B		128K	Mcore	8x8	2.3.0	【Ascend】	【Pass】
		70B		8K	Mcore	4x8	2.3.0	【GTS】	【Pass】
		70B		128K	Mcore	24x8	2.3.0	【Ascend】	【Pass】
		200B		8K	Mcore	8x8	2.3.0	【Ascend】	【Pass】
		405B		8K	Mcore	8x8		【Ascend】	【Pass】
		405B		128K	Mcore	36x8	2.3.0	【Ascend】	【Pass】
	LLaMA3.2	1B	llama32	8K	Mcore	1x8	2.3.0	【GTS】	【Pass】
	LLaMA3.2	3B	llama32	8K	Mcore	1x8	2.3.0	【GTS】	【Pass】
	LLaMA3.3	70B-Instruct	llama33	8K	Mcore	4x8	2.3.0	【GTS】	【Pass】
	Qwen	7B	qwen	8K	Legacy	1x8	2.0.0	【GTS】	【Pass】
		14B		2K	Legacy	1x8	2.0.0	【GTS】	【Pass】
		72B		8K	Legacy	16x8	2.0.0	【GTS】	【Pass】
	Qwen1.5	0.5B	qwen15	8K	Mcore	1x8	2.2.0	【GTS】	【Pass】
		1.8B		8K	Mcore	1x8		【GTS】	【Pass】
		4B		8K	Mcore	1x8		【GTS】	【Pass】
		7B		8K	Mcore	1x8		【GTS】	【Pass】
		14B		8K	Mcore	1x8		【GTS】	【Pass】
		32B		8K	Mcore	4x8		【GTS】	【Pass】
		72B		8K	Mcore	8x8		【GTS】	【Pass】
		110B		8K	Mcore	8x8		【GTS】	【Pass】
	CodeQwen1.5	7B		8K	Mcore	1x8		【GTS】	【Pass】
	Qwen2	0.5B	qwen2	4K	Mcore	1x8	2.2.0	【GTS】	【Pass】
		0.5B		32K	Mcore	1x8		【GTS】	【Pass】
		1.5B		4K	Mcore	1x8		【GTS】	【Pass】
		1.5B		32K	Mcore	1x8		【GTS】	【Pass】
		7B		4K	Mcore	1x8		【GTS】	【Pass】
		7B		32K	Mcore	1x8		【GTS】	【Pass】
		72B		4K	Mcore	4x8		【GTS】	【Pass】
32K		72B		Mcore	16x8	【Ascend】		【Pass】
Qwen2.5	0.5B	qwen25	32K	Mcore	1x8	2.3.0	【GTS】	【Pass】
	1.5B		32K	Mcore	1x8	2.3.0	【GTS】	【Pass】
	3B		32K	Mcore	1x8	2.3.0	【GTS】	【Pass】
	7B		32K	Mcore	1x8	2.3.0	【Ascend】	【Pass】
	14B		32K	Mcore	2x8	2.3.0	【GTS】	【Pass】
	32B		32K	Mcore	4x8	2.3.0	【GTS】	【Pass】
	72B		32K	Mcore	16x8		【GTS】	【Pass】
Qwen3	0.6B	qwen3	4K	Mcore	1x8		【Ascend】	【Pass】
	1.7B		4K	Mcore	1x8		【Ascend】	【Pass】
	4B		4K	Mcore	1x8		【Ascend】	【Pass】
	8B		4K	Mcore	1x8		【Ascend】	【Pass】
	14B		4K	Mcore	1x8		【Ascend】	【Pass】
	32B		4K	Mcore	2x8		【Ascend】	【Pass】
	32B	qwen3	4K	FSDP2	1x16		【Ascend】	【Test】
QwQ	32B	qwq	4K	Mcore	1x8	2.2.0	【GTS】	【Test】
Qwen2.5-Math	1.5B	qwen25_math	4K	Mcore	1x8	2.2.0	【GTS】	【Pass】
	7B		4K	Mcore	1x8		【GTS】	【Pass】
	72B		4K	Mcore	4x8		【GTS】	【Test】
CodeQwen2.5	7B	qwen25_coder	8K	Mcore	1x8	2.2.0	【China Mobile Cloud】	【Test】
Yi	9B	yi	4K	Legacy	1x4	2.0.0	【OpenMind】	【Test】
Yi	34B	yi	4K	Mcore	2x8	2.2.0	【GTS】	【Pass】
Yi1.5	6B	yi15	4K	Mcore	1x8	2.2.0	【GTS】	【Pass】
	9B		4K	Mcore	1x8		【GTS】	【Pass】
	34B		4K	Mcore	2x8		【GTS】	【Test】
Mistral	7B	mistral	32K	Mcore	1x8	2.2.0	【NAIE】	【Pass】
Gemma	2B	gemma	8K	Mcore	1x8	2.2.0	【GTS】	【Pass】
Gemma	7B	gemma	8K	Mcore	1x8	2.2.0	【GTS】	【Pass】
Gemma2	9B	gemma2	8K	Mcore	1x8		【GTS】	【Pass】
Gemma2	27B	gemma2	8K	Mcore	2x8		【GTS】	【Pass】
MiniCPM	2B	minicpm	4K	Mcore	1x8	2.2.0	【NAIE】	【Pass】
MiniCPM3	4B	minicpm3	32K	Mcore	1x8	2.2.0	【GTS】	【Test】
Phi3.5	mini-instruct	phi35	4K	Mcore	1x8		【GTS】	【Test】
DeepSeek-Math	7B	deepseek_math	4K	Mcore	1x8	2.2.0	【Ascend】	【Test】
DeepSeek-R1-Distill-Qwen	1.5B	deepseek_r1_distill_qwen	4K	Mcore	1x8	2.2.0	【Ascend】	【Pass】
	7B		4K	Mcore	1x8		【Ascend】	【Pass】
	14B		4K	Mcore	1x8		【Ascend】	【Pass】
	32B		8K	Mcore	2x8		【Ascend】	【Pass】
DeepSeek-R1-Distill-LLaMA	8B	deepseek_r1_distill_llama	8K	Mcore	1x8	2.2.0	【Ascend】	【Pass】
DeepSeek-R1-Distill-LLaMA	70B	deepseek_r1_distill_llama	8K	Mcore	4x8	2.2.0	【Ascend】	【Pass】
Seed-OSS	36B	seed_oss	2K	Mcore	1x8		【Ascend】	【Test】
Magistral	24B	magistral	4K	Mcore	1x8		【Ascend】	【Test】
PLM	1.8B	plm	2K	Mcore	1x8		【Ascend】	【Test】
稀疏模型	Qwen3	30B-A3B	qwen3_moe	4K	Mcore	2x8		【Ascend】	【Pass】
		30B-A3B	qwen3_moe	4K	FSDP2	1x16		【Ascend】	【Test】
		235B-A22B	qwen3_moe	4K	Mcore	16x16		【Ascend】	【Pass】
		235B-A22B	qwen3_moe	4K	FSDP2	16x16		【Ascend】	【Test】
	Qwen3-Next	80B-A3B	qwen3_next	16K	Mcore	4x16		【Ascend】	【Pass】
	Qwen3-Next	80B-A3B	qwen3_next	16K	FSDP2	4x16		【Ascend】	【Test】
	Qwen3-Coder-Next	80B-A3B	qwen3_coder_next	16K	Mcore	4x16		【Ascend】	【Test】
	Qwen2	57B-A14B	qwen2_moe	4K	Mcore	8x8	2.2.0	【GTS】	【Pass】
	Grok-1	40B	grok-1	8K	Mcore	4x8	2.0.0	【GTS】	【Pass】
	Mixtral	8x7B	mixtral	32K	Mcore	8x8	2.2.0	【Ascend】	【Pass】
		8x22B		32K	Mcore	8x8		【NAIE】	【Pass】
		8x22B		64K	Mcore	8x8		【NAIE】	【Test】
	DeepSeek-V2	236B	deepseek2	8K	Mcore	20x8	2.2.0	【Ascend】	【Pass】
	DeepSeek-V2-coder	236B	deepseek2_coder	8K	Mcore	20x8	2.2.0	【Ascend】	【Test】
	DeepSeek-V2-Lite	16B	deepseek2_lite	8K	Mcore	1x8		【Ascend】	【Pass】
	DeepSeek-V2.5	236B	deepseek25	8K	Mcore	20x8	2.2.0	【NAIE】	【Test】
	DeepSeek-V3	671B	deepseek3	4K	Mcore	64x8		【Ascend】	【Pass】
	DeepSeek-V3.2	671B	deepseek3.2	4K	Mcore	32x16		【Ascend】	【Test】
	MiniCPM	8x2B	minicpm	4K	Mcore	1x8	2.2.0	【NAIE】	【Test】
	Ling-mini-2.0	16B	ling_v2	4K	Mcore	1x8		【Ascend】	【Test】
	Ring	1T	ling_v2	32K	Mcore	32x8		【Ascend】	【Test】
	Phi3.5	MoE-instruct	phi35	4K	Mcore	2x8		【GTS】	【Test】
	Hunyuan	389B	hunyuanLarge	8K	Mcore	8x8	2.3.0	【Ascend】	【Pass】
	GPT4	MoE-175B	gpt4	128K	Mcore	8x8	2.3.0	【Ascend】	【Pass】
	GLM4.5	MoE-106B	glm45-moe	4K	Mcore	8x8		【Ascend】	【Test】
	GLM5	MoE-744B	glm5	4K	Mcore	32x16		【Ascend】	【Test】
	Step3.5-Flash	MoE-196B	step35	4K	FSDP2	12x16		【Ascend】	【Test】
	LongCat	MoE-560B	longcat	4K	Mcore	8x16		【Ascend】	【Test】
	GPT-OSS	MoE-20B	gpt_oss	4K	FSDP2	1x16		【Ascend】	【Test】
	状态空间模型	Mamba2	2.7B	mamba2	4K	Mcore	1x8		【Ascend】	【Test】
8B		Mamba2	4K	mamba2	Mcore	1x8		【Ascend】	【Test】
Mamba2Hybrid		8B	mamba2	4K	Mcore	1x8		【Ascend】	【Test】

常用参数解释说明

针对MindSpeed MM套件中运行所使用的参数做解释说明，具体见README

特性规划

【新模型】 JanusPro
【模型特性】 CogVideoX: PP
【模型特性】 OpensoraPlan1.3: CP (Ring Attention)
【模型特性】 Qwen2VL: VPP, CP (Ulysses & Ring Attention)
【模型特性】 InternVL2: TP, CP (Ulysses & Ring Attention)
【基础特性】 Hetero-parallel

工具使用

昇腾Profiling采集工具

MindSpeed MM集成了昇腾profiling采集工具，以提供对模型运行情况的分析。该工具能够依照配置采集模型的算子、显存等关键信息，同时支持动静态两种采集方式，协助开发者分析模型瓶颈，并可根据实际场景需求选择使用。

具体方法见 README 的profiling章节

MindStudio Insight性能分析工具

针对大模型集群场景的性能调优，这里推荐一款优秀的可视化调优工具MindStudio Insight。 MindStudio Insight提供了包括Timeline视图、通信分析、计算耗时等的可视化呈现，以便用户分析潜在的性能瓶颈，并指导如何采取措施消除或减少这些瓶颈。

具体安装及使用方法见《MindStudio Insight操作指南》

Sora类模型特征提取

MindSpeed MM支持提取视频和文本特征并保存

具体方法见 README 的Sora类模型特征提取章节

内存快照提取

MindSpeed MM集成了昇腾内存快照采集工具，以提供对模型运行情况的分析。

具体方法见 README 的内存快照提取章节

Tensorboard使用

MindSpeed MM支持Tensorboard的使用

具体方法见 README 的Tensorboard使用章节

版本维护

MindSpeed MM版本有以下五个维护阶段：

状态	时间	说明
计划	1—3 个月	计划特性
开发	3 个月	开发特性
维护	6-12 个月	合入所有已解决的问题并发布版本，针对不同的MindSpeed MM版本采取不同的维护策略，常规版本和长期支持版本维护周期分别为6个月和12个月
无维护	0—3 个月	合入所有已解决的问题，无专职维护人员，无版本发布
生命周期终止（EOL）	N/A	分支不再接受任何修改

MindSpeed MM已发布版本维护策略：

MindSpeed MM版本	维护策略	当前状态	发布时间	后续状态
26.0.0	常规版本	维护	2026/03/30	预计2026/09/30起无维护
2.3.0	常规版本	维护	2025/12/30	预计2026/06/30起无维护
2.2.0	常规版本	无维护	2025/09/30	预计2026/03/30起无维护
2.1.0	常规版本	无维护	2025/06/30	预计2025/12/30起无维护
2.0.0	常规版本	无维护	2025/03/30	预计2025/09/30起无维护
1.0.0	常规版本	无维护	2024/12/30	预计2025/06/30起无维护
1.0.RC3	常规版本	无维护	2024/09/30	预计2025/03/30起无维护

常见问题

安全声明

MindSpeed MM 安全声明

免责声明

致MindSpeed MM使用者

MindSpeed MM提供的模型仅供您用于非商业目的。
对于各模型，MindSpeed MM平台仅提示性地向您建议可用于训练的数据集，华为不提供任何数据集，如您使用这些数据集进行训练，请您特别注意应遵守对应数据集的License，如您因使用数据集而产生侵权纠纷，华为不承担任何责任。
如您在使用MindSpeed MM模型过程中，发现任何问题（包括但不限于功能问题、合规问题），请在Gitcode提交issue，我们将及时审视并解决。
MindSpeed MM功能依赖的Megatron等第三方开源软件，均由第三方社区提供和维护，因第三方开源软件导致的问题的修复依赖相关社区的贡献和反馈。您应理解，MindSpeed MM仓库不保证第三方开源软件本身的问题进行修复，也不保证会测试，纠正所有第三方开源软件的漏洞和错误。

致数据集所有者

如果您不希望您的数据集在MindSpeed MM中的模型被提及，或希望更新MindSpeed MM中的模型关于您的数据集的描述，请在Gitcode提交issue，我们将根据您的issue要求删除或更新您的数据集描述。衷心感谢您对MindSpeed MM的理解和贡献。

License声明

Ascend MindSpeed MM提供的模型，如模型目录下存在License的，以该License为准。如模型目录下不存在License的，以Apache 2.0许可证许可，对应许可证文本可查阅Ascend MindSpeed MM根目录LICENSE文件，docs目录下的文档适用CC-BY 4.0许可证，具体参见文档LICENSE。