MindSpeed-MM:基于昇腾芯片的多模态大模型训练套件项目

华为昇腾面向大规模分布式训练的多模态大模型套件,支撑多模态生成、多模态理解。

分支10Tags7
文件最后提交记录最后更新时间
docs: modify pull request template Co-authored-by: liyingxuan<liyingxuan3@huawei.com> # message auto-generated for no-merge-commit merge: !2225 merge master into master docs: modify pull request template Created-by: liyx616 Commit-by: liyingxuan Merged-by: ascend-robot Description: ## What this PR does / why we need it? 按照最新的标准修改pr模板 ## Does this PR introduce any user-facing change? 不涉及代码功能 ## How was this patch tested? 不涉及 See merge request: Ascend/MindSpeed-MM!22252 个月前
[Docs] Modify current repository URLs to relative paths Co-authored-by: AZe_404<wangze62@h-partners.com> # message auto-generated for no-merge-commit merge: !2360 merge chg_branch_2600 into 26.0.0 [Docs] Modify current repository URLs to relative paths Created-by: AZe_404 Commit-by: AZe_404 Merged-by: ascend-robot Description: ## What this PR does / why we need it? 1. 拉取代码修改为拉取26.0.0分支,包括之前未指定版本的MindSpeed Core 2. 将MM仓库内的链接修改为相对路径访问 ## Does this PR introduce any user-facing change? Please describe whether the PR will result in any user-facing usage changes. If there is related documentation, please specify its path. ## How was this patch tested? Please explain how to verify the correctness and effectiveness of this feature, as well as its usage constraints and limitations. See merge request: Ascend/MindSpeed-MM!23601 个月前
[bugfix]hunyuanvideo1.5 bridge 初始化 Co-authored-by: Miss_min<qiaoxiaomin@huawei.com> # message auto-generated for no-merge-commit merge: !2208 merge master into master [bugfix]hunyuanvideo1.5 bridge 初始化 Created-by: Miss_min Commit-by: Miss_min Merged-by: ascend-robot Description: ## Motivation Please describe the motivation of this PR and the goal you want to achieve through this PR. ## Modification Please briefly describe what modification is made in this PR. ## Self-test (Optional) If modifications to this PR may cause/fix function/accuracy/performance DTSs/issues, a self-inspection record needs to be attached. ## BC-breaking (Optional) If there are compatibility issues, such as dependencies on cann/torch_npu versions, they need to be explained in the PR. ## Checklist **Before PR**: - [ ] The new code needs to comply with the Clean Code specification. - [ ] The PR content is self-checked, and the expression can be clear and the writing standardized **After PR**: - [ ] CLA has been signed and all committers have signed the CLA in this PR. - [ ] The ci-pipeline is passed, Code Check is passed. See merge request: Ascend/MindSpeed-MM!22083 个月前
[docs] modify qwen2.5vl mindspeed branch to support A5 training Co-authored-by: LKONE<wanglikai4@huawei.com> # message auto-generated for no-merge-commit merge: !2262 merge master into master [docs] modify qwen2.5vl mindspeed branch to support A5 training Created-by: wanglikai1019 Commit-by: LKONE Merged-by: ascend-robot Description: ## What this PR does / why we need it? modify qwen2.5vl mindspeed branch to support A5 training ## Does this PR introduce any user-facing change? modify qwen2.5vl mindspeed branch to support A5 training ## How was this patch tested? Please explain how to verify the correctness and effectiveness of this feature, as well as its usage constraints and limitations. See merge request: Ascend/MindSpeed-MM!22622 个月前
[Refactor] compatible for transformers-5.0.0(7a833d1c) Co-authored-by: zhangxubin<1656631289@qq.com> # message auto-generated for no-merge-commit merge: !2079 merge master into master [Refactor] compatible for transformers-5.0.0(7a833d1c) Created-by: MoCuishle-M Commit-by: MoCuishle-M;zhangxubin Merged-by: ascend-robot Description: ## Motivation compatible for transformers-5.0.0(7a833d1c). ## Modification 该PR大部分改动来自https://gitcode.com/Ascend/MindSpeed-MM/pull/2040 ,只修改了lora patch的实现。 1.兼容qwen2/2.5/3vl transformers 5.0.0 rope 配置 2.规避pretrain_transformer forward参数检验 3.过滤相关参数兼容 transformers 5.0.0 4.修复ci打屏日志utf-8编解码问题 5.lora适配peft 0.18.1 ## Self-test (Optional) If modifications to this PR may cause/fix function/accuracy/performance DTSs/issues, a self-inspection record needs to be attached. ## BC-breaking (Optional) If there are compatibility issues, such as dependencies on cann/torch_npu versions, they need to be explained in the PR. ## Checklist **Before PR**: - [x] The new code needs to comply with the Clean Code specification. - [x] The PR content is self-checked, and the expression can be clear and the writing standardized **After PR**: - [x] CLA has been signed and all committers have signed the CLA in this PR. - [x] The ci-pipeline is passed, Code Check is passed. See merge request: Ascend/MindSpeed-MM!20794 个月前
DOCS:修改安装PyTorch章节名称 Co-authored-by: z60112595<zhaotao68@h-partners.com> # message auto-generated for no-merge-commit merge: !2516 merge 26.0.0 into 26.0.0 DOCS:修改安装PyTorch章节名称 Created-by: kali00 Commit-by: z60112595 Merged-by: ascend-robot Description: ## What this PR does / why we need it? https://gitcode.com/Ascend/MindSpeed-MM/issues/314 修改安装PyTorch章节名称 ## Does this PR introduce any user-facing change? Please describe whether the PR will result in any user-facing usage changes. If there is related documentation, please specify its path. ## How was this patch tested? Please explain how to verify the correctness and effectiveness of this feature, as well as its usage constraints and limitations. See merge request: Ascend/MindSpeed-MM!251618 天前
[Docs]rename release_notes Co-authored-by: z60119525<zhaotao68@h-partners.com> # message auto-generated for no-merge-commit merge: !2416 merge 26.0.0 into 26.0.0 [Docs]rename release_notes Created-by: kali00 Commit-by: z60119525 Merged-by: ascend-robot Description: 1、更新CANN版本为最新版本 2、重命名了版本说明 3、矫正资料目录中的简介路径 See merge request: Ascend/MindSpeed-MM!24161 个月前
[Bugfix] bugfix for clip grad & empty ep Co-authored-by: htwang<wanghaitao60@huawei.com> # message auto-generated for no-merge-commit merge: !2382 merge 26.0.0 into 26.0.0 [Bugfix] bugfix for clip grad & empty ep Created-by: htwang Commit-by: htwang Merged-by: ascend-robot Description: ## What this PR does / why we need it? 1、EP使能时,当部分ep rank没有收到tokens时,保持空运算,防止专家参数失去梯度 2、修复不开EP切clip grad norm大于0时,clip grad 计算错误的问题 ## Does this PR introduce any user-facing change? Please describe whether the PR will result in any user-facing usage changes. If there is related documentation, please specify its path. ## How was this patch tested? Please explain how to verify the correctness and effectiveness of this feature, as well as its usage constraints and limitations. See merge request: Ascend/MindSpeed-MM!23821 个月前
修改完善pre-commit开源代码检测工具 Co-authored-by: wujinyuan1<wujinyuan1@huawei.com> # message auto-generated for no-merge-commit merge: !2471 merge 26.0.0 into 26.0.0 修改完善pre-commit开源代码检测工具 Created-by: wujinyuan1 Commit-by: wujinyuan1 Merged-by: ascend-robot Description: ## What this PR does / why we need it? pre-commit 工具规则配置修改:试运行阶段发现python bandit工具检测过严,超出昇腾编程规范。 此次修改重点修改bandit规则。 ## Does this PR introduce any user-facing change? 门禁codecheck检测规则精简,提升代码合入效率 ## How was this patch tested? 代码扫描工具配置,不涉及代码仓功能 See merge request: Ascend/MindSpeed-MM!247124 天前
feat(torch): Squash merge fsdp2_dev into master. Co-authored-by: zs-derrick1<1434012475@qq.com> # message auto-generated for no-merge-commit merge: !2223 merge master into master feat(torch): Squash merge fsdp2_dev into master. Created-by: zs-derrick1 Commit-by: zs-derrick1 Merged-by: ascend-robot Description: ## Motivation Squash merge fsdp2_dev into master. ## Modification fsdp2_dev分支合并至master ## Self-test (Optional) If modifications to this PR may cause/fix function/accuracy/performance DTSs/issues, a self-inspection record needs to be attached. ## BC-breaking (Optional) If there are compatibility issues, such as dependencies on cann/torch_npu versions, they need to be explained in the PR. ## Checklist **Before PR**: - [x] The new code needs to comply with the Clean Code specification. - [x] The PR content is self-checked, and the expression can be clear and the writing standardized **After PR**: - [x] CLA has been signed and all committers have signed the CLA in this PR. - [x] The ci-pipeline is passed, Code Check is passed. See merge request: Ascend/MindSpeed-MM!22233 个月前
[Docs]modified getting_start Co-authored-by: z60119525<zhaotao68@h-partners.com> # message auto-generated for no-merge-commit merge: !2367 merge 26.0.0 into 26.0.0 [Docs]modified getting_start Created-by: kali00 Commit-by: z60119525 Merged-by: ascend-robot Description: ## What this PR does / why we need it? Please describe the background and detailed changes of the PR. If it is a bugfix, please attach the related issue. ## Does this PR introduce any user-facing change? Please describe whether the PR will result in any user-facing usage changes. If there is related documentation, please specify its path. ## How was this patch tested? Please explain how to verify the correctness and effectiveness of this feature, as well as its usage constraints and limitations. See merge request: Ascend/MindSpeed-MM!23671 个月前
docs: add DT authoring guide Co-authored-by: zhangxubin<1656631289@qq.com> # message auto-generated for no-merge-commit merge: !2372 merge 26.0.0 into 26.0.0 docs: add DT authoring guide Created-by: MoCuishle-M Commit-by: zhangxubin Merged-by: ascend-robot Description: ## What this PR does / why we need it? 为26.0.0分支增加DT编写指南:用于指导开发者如何为MindSpeed MM贡献DT用例。 ## Does this PR introduce any user-facing change? Please describe whether the PR will result in any user-facing usage changes. If there is related documentation, please specify its path. ## How was this patch tested? Please explain how to verify the correctness and effectiveness of this feature, as well as its usage constraints and limitations. See merge request: Ascend/MindSpeed-MM!23721 个月前
[Modify] Fix document formatting and broken links Co-authored-by: WendongPang<pangwendong@huawei.com> # message auto-generated for no-merge-commit merge: !2353 merge doc_26 into 26.0.0 [Modify] Fix document formatting and broken links Created-by: WendongPang Commit-by: WendongPang Merged-by: ascend-robot Description: ## What this PR does / why we need it? [Modify] Fix document formatting and broken links. ## Does this PR introduce any user-facing change? Please describe whether the PR will result in any user-facing usage changes. If there is related documentation, please specify its path. ## How was this patch tested? Please explain how to verify the correctness and effectiveness of this feature, as well as its usage constraints and limitations. See merge request: Ascend/MindSpeed-MM!23531 个月前
修改完善pre-commit开源代码检测工具 Co-authored-by: wujinyuan1<wujinyuan1@huawei.com> # message auto-generated for no-merge-commit merge: !2471 merge 26.0.0 into 26.0.0 修改完善pre-commit开源代码检测工具 Created-by: wujinyuan1 Commit-by: wujinyuan1 Merged-by: ascend-robot Description: ## What this PR does / why we need it? pre-commit 工具规则配置修改:试运行阶段发现python bandit工具检测过严,超出昇腾编程规范。 此次修改重点修改bandit规则。 ## Does this PR introduce any user-facing change? 门禁codecheck检测规则精简,提升代码合入效率 ## How was this patch tested? 代码扫描工具配置,不涉及代码仓功能 See merge request: Ascend/MindSpeed-MM!247124 天前
!729 【安全】加载功能安全加固 Merge pull request !729 from htwang/master 1 年前
修改完善pre-commit开源代码检测工具 Co-authored-by: wujinyuan1<wujinyuan1@huawei.com> # message auto-generated for no-merge-commit merge: !2471 merge 26.0.0 into 26.0.0 修改完善pre-commit开源代码检测工具 Created-by: wujinyuan1 Commit-by: wujinyuan1 Merged-by: ascend-robot Description: ## What this PR does / why we need it? pre-commit 工具规则配置修改:试运行阶段发现python bandit工具检测过严,超出昇腾编程规范。 此次修改重点修改bandit规则。 ## Does this PR introduce any user-facing change? 门禁codecheck检测规则精简,提升代码合入效率 ## How was this patch tested? 代码扫描工具配置,不涉及代码仓功能 See merge request: Ascend/MindSpeed-MM!247124 天前
!180 【资料】修改LICENSE Merge pull request !180 from liuqiyuan/master 1 年前
!325 【测试】添加InternVL2-8B ST & build打包内容完善 Merge pull request !325 from 陆劲夫/master 1 年前
[bugfix]hunyuanvideo1.5 bridge 初始化 Co-authored-by: Miss_min<qiaoxiaomin@huawei.com> # message auto-generated for no-merge-commit merge: !2208 merge master into master [bugfix]hunyuanvideo1.5 bridge 初始化 Created-by: Miss_min Commit-by: Miss_min Merged-by: ascend-robot Description: ## Motivation Please describe the motivation of this PR and the goal you want to achieve through this PR. ## Modification Please briefly describe what modification is made in this PR. ## Self-test (Optional) If modifications to this PR may cause/fix function/accuracy/performance DTSs/issues, a self-inspection record needs to be attached. ## BC-breaking (Optional) If there are compatibility issues, such as dependencies on cann/torch_npu versions, they need to be explained in the PR. ## Checklist **Before PR**: - [ ] The new code needs to comply with the Clean Code specification. - [ ] The PR content is self-checked, and the expression can be clear and the writing standardized **After PR**: - [ ] CLA has been signed and all committers have signed the CLA in this PR. - [ ] The ci-pipeline is passed, Code Check is passed. See merge request: Ascend/MindSpeed-MM!22083 个月前
[Docs]rename release_notes Co-authored-by: z60119525<zhaotao68@h-partners.com> # message auto-generated for no-merge-commit merge: !2416 merge 26.0.0 into 26.0.0 [Docs]rename release_notes Created-by: kali00 Commit-by: z60119525 Merged-by: ascend-robot Description: 1、更新CANN版本为最新版本 2、重命名了版本说明 3、矫正资料目录中的简介路径 See merge request: Ascend/MindSpeed-MM!24161 个月前
[Docs] add license of pytorch Co-authored-by: htwang<wanghaitao60@huawei.com> # message auto-generated for no-merge-commit merge: !2319 merge 26.0.0 into 26.0.0 [Docs] add license of pytorch Created-by: htwang Commit-by: htwang Merged-by: ascend-robot Description: ## What this PR does / why we need it? 增加pytorch license声明 ## Does this PR introduce any user-facing change? Please describe whether the PR will result in any user-facing usage changes. If there is related documentation, please specify its path. ## How was this patch tested? Please explain how to verify the correctness and effectiveness of this feature, as well as its usage constraints and limitations. See merge request: Ascend/MindSpeed-MM!23192 个月前
!1321 [Bugfix] import dependency isolation. Merge pull request !1321 from zs-Derrick/master 10 个月前
!1321 [Bugfix] import dependency isolation. Merge pull request !1321 from zs-Derrick/master 10 个月前
[Docs] Annotation Standardization Rectification Co-authored-by: LKONE<wanglikai4@huawei.com> # message auto-generated for no-merge-commit merge: !2010 merge master into master [Docs] Annotation Standardization Rectification Created-by: wanglikai1019 Commit-by: LKONE Merged-by: ascend-robot Description: ## Motivation Rectify the annotations for the code repository in accordance with the annotation specification requirements. ## Modification 1. Replace Chinese annotations with English ones; 2. Organize the environment variable documentation and add explanations in the startup scripts (currently implemented on Qwen3VL and Wan2.2); 3. Add annotations to the public dataset functions and public model classes of Wan2.2 and Qwen3VL; 4. Adjust the annotation format (e.g., number of indentations). ## Self-test (Optional) If modifications to this PR may cause/fix function/accuracy/performance DTSs/issues, a self-inspection record needs to be attached. ## BC-breaking (Optional) If there are compatibility issues, such as dependencies on cann/torch_npu versions, they need to be explained in the PR. ## Checklist **Before PR**: - [ ] The new code needs to comply with the Clean Code specification. - [ ] The PR content is self-checked, and the expression can be clear and the writing standardized **After PR**: - [ ] CLA has been signed and all committers have signed the CLA in this PR. - [ ] The ci-pipeline is passed, Code Check is passed. See merge request: Ascend/MindSpeed-MM!20104 个月前
[Docs] Annotation Standardization Rectification Co-authored-by: LKONE<wanglikai4@huawei.com> # message auto-generated for no-merge-commit merge: !2010 merge master into master [Docs] Annotation Standardization Rectification Created-by: wanglikai1019 Commit-by: LKONE Merged-by: ascend-robot Description: ## Motivation Rectify the annotations for the code repository in accordance with the annotation specification requirements. ## Modification 1. Replace Chinese annotations with English ones; 2. Organize the environment variable documentation and add explanations in the startup scripts (currently implemented on Qwen3VL and Wan2.2); 3. Add annotations to the public dataset functions and public model classes of Wan2.2 and Qwen3VL; 4. Adjust the annotation format (e.g., number of indentations). ## Self-test (Optional) If modifications to this PR may cause/fix function/accuracy/performance DTSs/issues, a self-inspection record needs to be attached. ## BC-breaking (Optional) If there are compatibility issues, such as dependencies on cann/torch_npu versions, they need to be explained in the PR. ## Checklist **Before PR**: - [ ] The new code needs to comply with the Clean Code specification. - [ ] The PR content is self-checked, and the expression can be clear and the writing standardized **After PR**: - [ ] CLA has been signed and all committers have signed the CLA in this PR. - [ ] The ci-pipeline is passed, Code Check is passed. See merge request: Ascend/MindSpeed-MM!20104 个月前
!1453 [Docs] readme for videoalign Merge pull request !1453 from chenpeizhe/master 8 个月前
!1321 [Bugfix] import dependency isolation. Merge pull request !1321 from zs-Derrick/master 10 个月前
!1376 [Feature]add DanceGRPO-flux feature readme Merge pull request !1376 from lmy/dev 9 个月前
!1116 [Bugfix]Rectify the code in the repository based on the CleanCode scan results. Merge pull request !1116 from zhangxubin/master 11 个月前
!1384 [Feature] support lumina-mgpt2 model Merge pull request !1384 from meng-coding/master 9 个月前
!599 【特性】新增vae训练脚本以及配置文件 Merge pull request !599 from zs-Derrick/master 1 年前
[mindspore][master]support deepseekvl, llava, glm4.1v for mindspore backend Co-authored-by: ffmh<fengminghao2@huawei.com> # message auto-generated for no-merge-commit merge: !1671 merge ms_adapt into master [mindspore][master]support deepseekvl, llava, glm4.1v for mindspore backend Created-by: ffmh Commit-by: ffmh Merged-by: ascend-robot Description: ## Motivation deepseekvl2, llava1.5, glm4.1v 支持mindspore后端 ## Modification patch修改介绍 1. npu_rotary_position_embedding使用mindspore框架接口,不走mindspeed自定义算子流程 2. vmap接口缺失,使用等价写法替换transformers中 sdpa_mask_older_torch 函数 ## Self-test (Optional) If modifications to this PR may cause/fix function/accuracy/performance DTSs/issues, a self-inspection record needs to be attached. dsvl2 ![image.png](https://raw.gitcode.com/user-images/assets/7404510/64b345d1-c04c-4c7f-8607-895f638dcee0/image.png 'image.png') glm ![image.png](https://raw.gitcode.com/user-images/assets/7404510/6c75c0cb-8478-42cd-a05d-effa1b18dc17/image.png 'image.png') llava ![image.png](https://raw.gitcode.com/user-images/assets/7404510/addbf12a-9de9-44d6-8184-4b8b46d262fe/image.png 'image.png') ## BC-breaking (Optional) If there are compatibility issues, such as dependencies on cann/torch_npu versions, they need to be explained in the PR. ## Checklist **Before PR**: - [ ] The new code needs to comply with the Clean Code specification. - [ ] The PR content is self-checked, and the expression can be clear and the writing standardized **After PR**: - [ ] CLA has been signed and all committers have signed the CLA in this PR. - [ ] The ci-pipeline is passed, Code Check is passed. See merge request: Ascend/MindSpeed-MM!16716 个月前
delete dist-train Co-authored-by: lu-jinfu1999<lujinfu1@h-partners.com> # message auto-generated for no-merge-commit merge: !1835 merge master into master [Modify] delete dist-train from master Created-by: lu-jinfu1999 Commit-by: lu-jinfu1999 Merged-by: ascend-robot Description: ## Motivation delete dist-train. ## Modification delete dist-train. ## Self-test (Optional) If modifications to this PR may cause/fix function/accuracy/performance DTSs/issues, a self-inspection record needs to be attached. ## BC-breaking (Optional) If there are compatibility issues, such as dependencies on cann/torch_npu versions, they need to be explained in the PR. ## Checklist **Before PR**: - [x] The new code needs to comply with the Clean Code specification. - [x] The PR content is self-checked, and the expression can be clear and the writing standardized **After PR**: - [x] CLA has been signed and all committers have signed the CLA in this PR. - [x] The ci-pipeline is passed, Code Check is passed. See merge request: Ascend/MindSpeed-MM!18355 个月前
!1384 [Feature] support lumina-mgpt2 model Merge pull request !1384 from meng-coding/master 9 个月前
[Bugfix] Bug in Bagel folder changes and validation Co-authored-by: ningmengliu<liuhao438@huawei.com> # message auto-generated for no-merge-commit merge: !1966 merge master into master [Bugfix] Bug in Bagel folder changes and validation Created-by: ningmenglh Commit-by: ningmengliu Merged-by: ascend-robot Description: ## Motivation Bug in Bagel folder changes and validation ## Modification Bug in Bagel folder changes and validation ## Self-test (Optional) If modifications to this PR may cause/fix function/accuracy/performance DTSs/issues, a self-inspection record needs to be attached. ## BC-breaking (Optional) If there are compatibility issues, such as dependencies on cann/torch_npu versions, they need to be explained in the PR. ## Checklist **Before PR**: - [ ] The new code needs to comply with the Clean Code specification. - [ ] The PR content is self-checked, and the expression can be clear and the writing standardized **After PR**: - [ ] CLA has been signed and all committers have signed the CLA in this PR. - [ ] The ci-pipeline is passed, Code Check is passed. See merge request: Ascend/MindSpeed-MM!19665 个月前
[Docs] Annotation Standardization Rectification Co-authored-by: LKONE<wanglikai4@huawei.com> # message auto-generated for no-merge-commit merge: !2010 merge master into master [Docs] Annotation Standardization Rectification Created-by: wanglikai1019 Commit-by: LKONE Merged-by: ascend-robot Description: ## Motivation Rectify the annotations for the code repository in accordance with the annotation specification requirements. ## Modification 1. Replace Chinese annotations with English ones; 2. Organize the environment variable documentation and add explanations in the startup scripts (currently implemented on Qwen3VL and Wan2.2); 3. Add annotations to the public dataset functions and public model classes of Wan2.2 and Qwen3VL; 4. Adjust the annotation format (e.g., number of indentations). ## Self-test (Optional) If modifications to this PR may cause/fix function/accuracy/performance DTSs/issues, a self-inspection record needs to be attached. ## BC-breaking (Optional) If there are compatibility issues, such as dependencies on cann/torch_npu versions, they need to be explained in the PR. ## Checklist **Before PR**: - [ ] The new code needs to comply with the Clean Code specification. - [ ] The PR content is self-checked, and the expression can be clear and the writing standardized **After PR**: - [ ] CLA has been signed and all committers have signed the CLA in this PR. - [ ] The ci-pipeline is passed, Code Check is passed. See merge request: Ascend/MindSpeed-MM!20104 个月前
[bugfix] fix bugs for wan2.2&qwen3vl fsdp checkpointing Co-authored-by: peng-hengduo<penghengduo@huawei.com> # message auto-generated for no-merge-commit merge: !2180 merge wan_checkpointing_bugfix into master [bugfix] fix bugs for wan2.2&qwen3vl fsdp checkpointing Created-by: peng-hengduo Commit-by: peng-hengduo Merged-by: ascend-robot Description: Fix the bugs of wan2.2 qwen3vl breakpointing. See merge request: Ascend/MindSpeed-MM!21803 个月前
[Feature]Add use_audio_in_video config option for Qwen3-Omni data processor Co-authored-by: yaoyaoxu<xuyaoyao.824404@huawei.com> # message auto-generated for no-merge-commit merge: !2053 merge qwen3omni_audio_video_fix into master [Feature]Add use_audio_in_video config option for Qwen3-Omni data processor Created-by: yaoyaoxu Commit-by: yaoyaoxu Merged-by: ascend-robot Description: ## Motivation [Feature]Add use_audio_in_video config option for Qwen3-Omni data processor ## Modification 1.增加use_audio_in_video配置,在代码链路中完善值的传递流程,并在readme中说明用法 2.修复move_to_device,遗漏非tensor的kv值 ## Self-test (Optional) If modifications to this PR may cause/fix function/accuracy/performance DTSs/issues, a self-inspection record needs to be attached. ## BC-breaking (Optional) If there are compatibility issues, such as dependencies on cann/torch_npu versions, they need to be explained in the PR. ## Checklist **Before PR**: - [ ] The new code needs to comply with the Clean Code specification. - [ ] The PR content is self-checked, and the expression can be clear and the writing standardized **After PR**: - [ ] CLA has been signed and all committers have signed the CLA in this PR. - [ ] The ci-pipeline is passed, Code Check is passed. See merge request: Ascend/MindSpeed-MM!20534 个月前
!1429 [Feature] support videoalign model Merge pull request !1429 from chenpeizhe/main 9 个月前
[modify] modify the threshold for gc Co-authored-by: LKONE<wanglikai4@huawei.com> # message auto-generated for no-merge-commit merge: !2294 merge master into master [modify] modify the threshold for gc Created-by: wanglikai1019 Commit-by: LKONE Merged-by: ascend-robot Description: ## What this PR does / why we need it? python的垃圾回收机制,如果垃圾回收进程不同步,会出现一个step内多个进程先后gc回收,导致单个step拉长。 该回收机制导致qwen2vl在运行过程中会产生单步性能劣化,因为通过增大gc二次回收阈值来缓解性能劣化问题。 modify the threshold for gc to mitigate single-step performance degradation ## Does this PR introduce any user-facing change? 无 ## How was this patch tested? 测试qwen2vl前15步,是否产生性能波动 See merge request: Ascend/MindSpeed-MM!22942 个月前
!127 【特性】新增WhisperForConditionalGeneration模型 Merge pull request !127 from zzztq/master 1 年前
feat(torch): Squash merge fsdp2_dev into master. Co-authored-by: zs-derrick1<1434012475@qq.com> # message auto-generated for no-merge-commit merge: !2223 merge master into master feat(torch): Squash merge fsdp2_dev into master. Created-by: zs-derrick1 Commit-by: zs-derrick1 Merged-by: ascend-robot Description: ## Motivation Squash merge fsdp2_dev into master. ## Modification fsdp2_dev分支合并至master ## Self-test (Optional) If modifications to this PR may cause/fix function/accuracy/performance DTSs/issues, a self-inspection record needs to be attached. ## BC-breaking (Optional) If there are compatibility issues, such as dependencies on cann/torch_npu versions, they need to be explained in the PR. ## Checklist **Before PR**: - [x] The new code needs to comply with the Clean Code specification. - [x] The PR content is self-checked, and the expression can be clear and the writing standardized **After PR**: - [x] CLA has been signed and all committers have signed the CLA in this PR. - [x] The ci-pipeline is passed, Code Check is passed. See merge request: Ascend/MindSpeed-MM!22233 个月前

Badge Documentation

简介


MindSpeed MM:面向大规模分布式训练的昇腾多模态大模型套件,支持业界主流多模态大模型训练,旨在为华为 昇腾芯片 提供端到端的多模态训练解决方案, 包含预置业界主流模型,数据工程,分布式训练及加速,预训练、微调、后训练、在线推理任务等特性。

未来规划


📅未来规划会动态刷新在MindSpeed MM RoadMap中,欢迎大家通过此链接进行互动并提出诉求

社区会议


加入我们


为了交流开发经验、分享使用心得、及时获取项目更新,我们创建了MindSpeed MM官方微信群。

无论你是正在使用这个项目,还是有奇思妙想,都欢迎加入👋

加入方式:

  1. 直接扫码加入微信交流群(二维码7天有效,定期更新,当前1群已达到扫码加入人数上限,可加入2群)
  2. 添加昇腾开源小助手,获取群链接,进入MindSpeed MM社区交流群
MindSpeed MM社区交流群
MindSpeed MM社区交流2群
昇腾开源小助手
昇腾小助手 微信

目录结构

关键目录如下,详细目录介绍参见目录介绍

├─bridge          # mbridge在线权重转换
├─checkpoint      # 离线权重转换工具
├─ci              # Continuous Integration
├─docs            # 项目文档目录
│  └─zh           # 中文文档目录
├─examples        # 预置模型,包括模型配置、数据集配置、训练脚本、推理脚本等文件
├─mindspeed_mm    # 核心代码目录
├─scripts         # 脚本目录
├─sources         # 图片视频目录
├─tests           # 测试代码目录
│  ├─st           # 系统测试用例
│  └─ut           # 单元测试用例
├─UserGuide       # 用户指南目录
└─verl_plugin     # verl插件模块

最新消息


  • [Feb. 16, 2026]: 🚀 MindSpeed MM基于FSDP2支持Qwen3.5模型【Prototype】
  • [Feb. 14, 2026]: 🚀 MindSpeed MM基于FSDP2支持CosyVoice3模型训练
  • [Feb. 13, 2026]: 🚀 MindSpeed MM基于FSDP2支持Kimi-K2.5模型【Prototype】
  • [Feb. 12, 2026]: 🚀 MindSpeed MM基于FSDP2支持HunyuanVideo1.5模型训练demo【Prototype】
  • [Feb. 03, 2026]: 🚀 MindSpeed MM基于FSDP2支持DeepseekOCR2模型训练demo【Prototype】
  • [Jan. 29, 2026]: 🎉 昇腾镜像仓库上线MindSpeed MM镜像
  • [Jan. 29, 2026]: 🚀 MindSpeed MM基于FSDP2支持Qwen3-TTS模型【Prototype】
  • [Jan. 28, 2026]: 🚀 MindSpeed MM基于FSDP2支持Magistral-Small-2509模型【Prototype】
  • [Jan. 08, 2026]: 🚀 MindSpeed MM支持FLUX.2模型【Prototype】
  • [Dec. 25, 2025]: 🎉 用户手册上线!体验链接:https://mindspeed-mm.readthedocs.io/zh-cn/latest/
  • [Dec. 03, 2025]: 🚀 MindSpeed MM基于FSDP2支持Glm4.5v模型训练demo【Prototype】
  • [Dec. 02, 2025]: 🚀 MindSpeed MM支持Self-Forcing基于Wan2.1-1.3B的DMD蒸馏 【Prototype】
  • [Nov. 27, 2025]: 🚀 MindSpeed MM基于fully shard支持Qwen3VL-235B模型
  • [Nov. 20, 2025]: 🚀 MindSpeed MM基于FSDP2支持Qwen3-Omni模型
  • [Nov. 19, 2025]: 🚀 MindSpeed MM支持Qwen Image、Qwen Image Edit模型 【Prototype】
  • [Nov. 13, 2025]: 🚀 MindSpeed MM基于FSDP2支持InternVL3.5-30B模型
  • [Nov. 06, 2025]: 🚀 MindSpeed MM基于FSDP2支持DeepseekOCR模型训练demo【Prototype】
  • [Oct. 31, 2025]: 🚀 MindSpeed MM基于fully shard支持Qwen3VL-8B/30B模型
  • [Oct. 22, 2025]: 🚀 MindSpeed MM基于fully shard支持Wan2.2系列模型
  • [Sep. 08, 2025]: 🚀 MindSpeed MM支持FLUX.1-Kontext模型
  • [Sep. 8, 2025]: 🚀 MindSpeed MM支持FLUX 强化学习 DanceGRPO训练
  • [Sep. 03, 2025]: 🎉 强化学习上线! MindSpeed MM支持Qwen2.5VL 7B/32B GRPO训练
  • [Aug. 15, 2025]: 🤝 MindSpeed MM原生支持Lumina-mGPT 2.0模型
  • [Jul. 29, 2025]: 🌴 MindSpeed MM支持core 0.12.1版本
  • [Jul. 10, 2025]: 🚀 MindSpeed MM支持InternVL3-8B/78B模型
  • [Jul. 02, 2025]: ⚡ MindSpeed MM 0Day支持GLM-4.1V模型
  • [Jun. 30, 2025]: 🌴 MindSpeed MM版本2.1.0发布
  • [Jun. 25, 2025]: 🚀 MindSpeed MM支持HiDream-I1模型
  • [Jun. 05, 2025]: 🚀 MindSpeed MM支持Qwen2.5Omni-7B模型
  • [Jun. 05, 2025]: 🤝 MindSpeed MM原生支持OpenSoraPlan 1.5模型
  • [Apr. 03, 2025]: 🚀 MindSpeed MM支持Qwen2.5VL-32B模型
  • [Mar. 27, 2025]: 🚀 MindSpeed MM支持Wan2.1-1.3B/14B模型
  • [Mar. 26, 2025]: 🚀 MindSpeed MM支持Qwen2.5VL-3B/7B/72B模型
  • [Feb. 20, 2025]: 🚀 MindSpeed MM支持InternVL2.5-78B模型
  • [Feb. 18, 2025]: 🚀 MindSpeed MM支持HunyuanVideo模型
  • [Feb. 17, 2025]: 🔥 MindSpeed MM支持Mindspeed-Core & Megatron 0.8.0版本
  • [Feb. 15, 2025]: 🚀 MindSpeed MM支持Sana模型
  • [Jan. 24, 2025]: 🚀 MindSpeed MM支持CogVideoX 1.5模型
  • [Dec. 30, 2024]: 🌴 MindSpeed MM版本1.0.0发布
  • [Dec. 16, 2024]: 🤝 MindSpeed MM原生支持Qihoo-T2X模型
  • [Dec. 03, 2024]: 🚀 MindSpeed MM支持SD3.5模型
  • [Nov. 30, 2024]: 🎉 MindSpeed MM支持多模态理解测评
  • [Nov. 22, 2024]: 🚀 MindSpeed MM支持CogVideoX模型
  • [Nov. 06, 2024]: 🚀 MindSpeed MM支持FLUX模型
  • [Oct. 30, 2024]: 🤝 MindSpeed MM原生支持OpenSoraPlan 1.3模型
  • [Oct. 21, 2024]: 🚀 MindSpeed MM支持InternVL2、以及Qwen2VL模型
  • [Oct. 16, 2024]: 🌱 MindSpeed MM首版本1.0.RC3发布

注意: Prototype特性未经过充分验证,可能存在不稳定和bug问题,beta表示非商用特性。

效果展示


文生视频: Wan 2.2 T2V

Prompt: Ultra HD, 4K, cinematic composition, low contrast ratio, low saturation, cool tone; The queen wears an iron crown and rides on the dragon over the city. She holds a big flag that shows:" MindSpeed MM".

文生视频: OpensoraPlan 1.5 T2V

Prompt: A fluffy white rabbit with soft, velvety fur and twitching pink nose sits curiously near a rustic wooden fence, surrounded by a lush garden of vibrant wildflowers and tall grasses swaying gently in the breeze. The rabbit's large, expressive eyes scan the environment, reflecting the golden hues of the setting sun. As it nibbles on a patch of clover, its ears perk up at the distant sound of chirping birds. The fence, weathered and covered in patches of moss, adds a charming, pastoral backdrop to this serene scene, capturing the essence of a peaceful countryside moment.

Prompt: A majestic Berlin tower stands tall against the night sky, its structure bathed in a mesmerizing array of vibrant lights, casting a kaleidoscope of colors across the cityscape. The tower's intricate architectural details are highlighted by the illumination, creating a stunning contrast against the deep indigo sky. As the camera pans upward, the lights shift, revealing a dynamic play of shadows and hues that dance across the tower's surface. The surrounding city lights twinkle in harmony, enhancing the tower's grandeur and creating a breathtaking visual symphony that captures the essence of Berlin's vibrant nightlife.

文生图:Qwen-Image -> 图片编辑 Flux.1-Kontext

Prompt for generation: A coffee shop entrance features a chalkboard sign reading "MindSpeed Coffee 😊 $2 per cup," with a neon light displaying "MindSpeed MM". Next to it hangs a poster showing a beautiful Chinese woman, and beneath the poster is written "Welcome to use MindSpeed MM". Ultra HD, 4K, cinematic composition. (Qwen-Image)

Prompt for edition: Change the decoration of the coffee shop to a modern style with white painting. (Flux.1-Kontext)

理解模型:Qwen2VL

Input image for both models:

Input text for both models: Please describe the image shortly

Qwen2VL推理结果: The image depicts a serene lakeside scene with a wooden dock extending into the calm waters. The dock is made of weathered wooden planks and leads to a small platform with a ladder, suggesting it is used for swimming or diving. The lake is surrounded by lush green forests and mountains in the background, creating a picturesque and tranquil setting. The sky is overcast, adding to the calm and peaceful atmosphere of the scene.

Input text for Qwen2VL: 请用中文简短描述这张照片

Qwen2VL推理结果: 这张图片展示了一座木制码头延伸到平静的湖面上,背景是连绵的山脉和茂密的森林。天空多云,整体色调偏冷,给人一种宁静和自然的感觉。

版本说明


MindSpeed MM支持Atlas 800T A2等昇腾训练硬件形态,软件版本配套表如下:

MindSpeed MM版本 MindSpeed版本 Megatron版本 PyTorch版本 torch_npu版本 CANN版本 Python版本
master(在研版本) master(在研版本) Core 0.12.1 2.7.1 在研版本 在研版本 Python3.10
26.0.0(商用) 26.0.0_core_r0.12.1 Core 0.12.1 2.7.1 26.0.0 9.0.0 Python3.10
2.3.0(商用) 2.3.0_core_r0.12.1 Core 0.12.1 2.6.0, 2.7.1 7.3.0 8.5.0 Python3.10
2.2.0(商用) 2.2.0_core_r0.12.1 Core 0.12.1 2.6.0, 2.7.1 7.2.0 8.3.RC1 Python3.10
2.1.0(商用) 2.1.0_core_r0.8.0 Core 0.8.0 2.1.0, 2.6.0 7.1.0 8.2.RC1 Python3.8, Python3.10
2.0.0(商用) 2.0.0_core_r0.8.0 Core 0.8.0 2.1.0 7.0.0 8.1.RC1 Python3.8, Python3.10
1.0.0(商用) 1.0.0_core_r0.6.0 Core 0.6.0 2.1.0 6.0.0 8.0.0 Python3.8, Python3.10

Note

“在研版本”指当前正处于开发迭代中的版本,由于该版本的功能仍处于持续迭代与优化阶段,其配套依赖项即使采用已发布的商用版本,仍可能存在兼容性风险或运行不稳定性,如需稳定使用,建议优先使用已正式发布的商用版本。

更多详情请参考版本配套表

安装


MindSpeed MM具体的安装请参考安装指导。 当前qwen3vl、wan2.2模型已支持一键安装,一键安装使用说明详见一键安装使用说明

快速上手


MindSpeed MM将以Qwen2.5-VL-3B和Wan2.1-T2V-1.3B模型为例,引导开发者快速上手预置模型在昇腾NPU上的高效运行。具体的操作请参考快速入门

特性/模型介绍


已支持特性概览

模型 \ 特性 TP TP-SP VPP PP CP Distributed Optimizer Recomputation LoRA RL FSDP2
Magistral-Small-2509
InternVL3.5-30B
Qwen3-VL-8B
Qwen3-VL-30B
Wan2.2 CP (Ulysses)
OpenSoraPlan1.5-T2V
Wan2.1 CP (Ulysses)
HunyuanVideo CP (Ulysses)
HunyuanVideo1.5
CogVideoX系列-T2V CP (Ulysses)
CogVideoX系列-I2V CP (Ulysses)
OpensoraPlan1.3-T2V CP (Ulysses)
OpensoraPlan1.3-I2V CP (Ulysses)
GLM-4.1V
Qwen2VL-2B CP (Ulysses)
Qwen2VL-7B CP (Ulysses)
Qwen2VL-72B CP (Ulysses) DPO
Qwen2.5VL-3B GRPO
Qwen2.5VL-7B GRPO
Qwen2.5VL-32B GRPO
Qwen2.5VL-72B
Qwen2.5Omni-7B
Qwen3-Omni
InternVL3-8B CP (Ring)
InternVL3-78B CP (Ring)

备注:


配套版本与支持模型

【现版本实测性能(硬件信息:Atlas 900 A2 PODc)】

下述列表中支持的模型,我们在各模型的README文件中提供了相应的使用说明,里面有详细的模型训练、推理、微调等流程

模型列中的超链接指向各模型的文件夹地址, 参数量列中的超链接指向模型的社区资源地址

认证【Pass】表示已经通过测试的模型,【Test】表示测试中的模型

Samples per Second 为 (SPS); Frames per Second 为 (FPS); Tokens per Second 为 (TPS)

(注:此处SPS、FPS展示集群吞吐;TPS展示单卡吞吐)

平均序列长度是指在性能测试过程中所使用数据集的平均序列长度,通过统计各个序列长度的出现频率进行加权平均计算得出

亲和场景为调整少量结构或参数,使得模型更加亲和昇腾,性能更优

A3 为硬件 Atlas A3 训练系列产品

MindSpeed MM模型列表
模型任务 模型 参数量 任务 集群 精度格式 NPU性能 参考性能 平均序列长度 认证
多模态生成
Lumina-mGPT 2.0 7B 微调 1x8 BF16 8.24 (SPS) 8.79 (SPS) 1024 【Pass】
OpenSoraPlan1.5 8.5B 预训练 1x8 BF16 0.83 (SPS) / / 【北大贡献】
Wan2.2-T2V 5B 预训练 1x4 (A3) BF16 3.18 (SPS) 2.93 (SPS) / 【Test】
A14B 预训练 1x8 (A3) BF16 0.710 (SPS) 0.292 (SPS) / 【Test】
Wan2.2-TI2V 5B 预训练 1x4 (A3) BF16 3.18 (SPS) 2.93 (SPS) / 【Test】
Wan2.2-I2V A14B 预训练 1x8 (A3) BF16 0.671 (SPS) 0.294 (SPS) / 【Test】
Wan2.1-T2V 1.3B 预训练 1x8 BF16 0.918 (SPS) 1.04 (SPS) / 【Pass】
1.3B Lora微调 1x8 BF16 0.954 (SPS) 1.042 (SPS) / 【Pass】
14B 预训练 1x8 BF16 0.160 (SPS) 0.160 (SPS) / 【Pass】
14B Lora微调 1x8 BF16 0.179 (SPS) 0.174 (SPS) / 【Pass】
Wan2.1-I2V 1.3B 预训练 1x8 BF16 0.76 (SPS) / / 【Pass】
14B 预训练 1x8 BF16 0.130 (SPS) / / 【Pass】
14B Lora微调 1x8 BF16 0.179 (SPS) 0.173 (SPS) / 【Pass】
Self-Forcing 1.3B DMD蒸馏 1x8 BF16 0.225 (FPS) 0.282 (FPS) / 【Test】
HunyuanVideo-T2V 13B 预训练 1x8 BF16 0.171 (SPS) 0.181 (SPS) / 【Pass】
HunyuanVideo-I2V 13B 预训练 1x8 BF16 0.164 (SPS) 0.202 (SPS) / 【Pass】
HunyuanVideo1.5-T2V 8B 预训练 1x8 BF16 / / / 【Pass】
OpenSora 1.0 5.5B 预训练 1x8 BF16 3.18 (SPS) 2.04 (SPS) / 【Pass】
OpenSora 1.2 5.2B 预训练 1x8 BF16 7.31 (SPS) 8.15 (SPS) / 【Test】
OpenSora 2.0-T2V 11B 预训练 1x8 BF16 1.33 (SPS) 1.46 (SPS) / 【Pass】
OpenSoraPlan 1.2 8.7B 预训练 1x8 BF16 0.42 (SPS) 0.37 (SPS) / 【Pass】
OpenSoraPlan 1.3-T2V 8.6B 预训练 1x8 BF16 1.29 (SPS) 1.27 (SPS) / 【Pass】
OpenSoraPlan 1.3-I2V 8.6B 预训练 1x8 BF16 1.17 (SPS) 1.15 (SPS) / 【Pass】
WFVAE 0.18B 预训练 1x8 BF16 23.860 (SPS) 26.091 (SPS) / 【Pass】
CogVideoX-T2V 5B 预训练 1x8 BF16 1.14 (SPS) 1.00 (SPS) 6976 【Pass】
CogVideoX-I2V 5B 预训练 1x8 BF16 1.13 (SPS) 0.84 (SPS) 6976 【Pass】
CogVideoX 1.5-T2V 5B 预训练 1x8 BF16 1.44 (SPS) 1.75 (SPS) 6976 【Pass】
5B Lora微调 1x8 BF16 2.76 (SPS) 2.64 (SPS) / 【Pass】
CogVideoX 1.5-I2V 5B 预训练 1x8 BF16 1.43 (SPS) 1.44 (SPS) 6976 【Pass】
5B Lora微调 1x8 BF16 2.33 (SPS) 2.04 (SPS) / 【Pass】
Qihoo-T2X 1.1B 推理 1x1 BF16 / / / 【奇虎360贡献】
SDXL 3.5B 预训练 1x8 BF16 29.92 (FPS) 30.65 (FPS) / 【Pass】
3.5B 预训练 1x8 FP16 28.51 (FPS) 30.23 (FPS) / 【Pass】
SD3 2B 全参微调 1x8 BF16 16.09 (FPS) 16.01 (FPS) / 【Pass】
SD3.5 8.1B 全参微调 1x8 BF16 26.20 (FPS) 28.33 (FPS) / 【Pass】
8.1B Lora微调 1x8 FP16 47.93 (FPS) 47.95 (FPS) / 【Pass】
Flux 12B 全参微调 1x8 BF16 55.23 (FPS) 53.65 (FPS) / 【Pass】
Flux2-T2I 32B 全参微调 1x8 BF16 1.28 (FPS) 1.24 (FPS) / 【Test】
Flux2-I2I 32B 全参微调 1x8 BF16 0.61 (FPS) 0.60 (FPS) / 【Test】
Flux-Kontext 12B 全参微调 1x8 BF16 1.97 (FPS) 2.00 (FPS) / 【Pass】
Sana 1.6B Lora微调 1x8 BF16 28.7 (FPS) 32.8 (FPS) / 【Pass】
HiDream 17B Lora微调 1x8 BF16 18.37 (FPS) 19.61 (FPS) / 【Pass】
Kolors 2.6B 推理 1x1 FP16 / / / 【Test】
Qwen-Image 27B Lora微调 1x8 BF16 23.02 (FPS) 21.54 (FPS) / 【Pass】
Qwen-Image-Edit 27B Lora微调 1x8 BF16 20.59 (FPS) 17.47 (FPS) / 【Test】
多模态理解
GLM-4.1V 9B 微调 1x8 BF16 1074.64(TPS) 908.49(TPS) 707 【Pass】
DeepSeek-OCR 3B 微调 1x8 BF16 1327.694(TPS) / / 【Test】
LLaVA 1.5 7B 全参微调 1x8 BF16 3632.31 (TPS) 3757.98 (TPS) 602 【Test】
InternVL 2.0 2B 微调 1x8 BF16 7653.12 (TPS) 5089.99 (TPS) 1813 【Pass】
8B 微调 1x8 BF16 2914.39 (TPS) 2492.87 (TPS) 1813 【Pass】
26B 微调 1x8 BF16 750.12 (TPS) 738.79 (TPS) 1813 【Pass】
76B 全参微调 8x16 BF16 214 (TPS) 191 (TPS) 1813 【Pass】
InternVL 2.5 78B 微调 8x8 BF16 228.33 / 1896 【Test】
InternVL 3.0 8B 微调 1x8 BF16 2344.58 (TPS) 2211.93 (TPS) 2653 【Pass】
78B 微调 4x8 (A3) BF16 228.82 (TPS) 283.15 (TPS) 1932 【Pass】
InternVL 3.5 30B 微调 1x8 (A3) BF16 52.76 (TPS) 47.73 (TPS) 201 【Test】
Qwen2-VL 2B 微调 1x8 BF16 2941.17 (TPS) 3004.04 (TPS) 689 【Pass】
7B 微调 1x8 BF16 1143.74 (TPS) 1004.22 (TPS) 689 【Pass】
72B 微调 4x8 (A3) BF16 261.25 (TPS) 257.63 (TPS) 689 【Pass】
Qwen2.5-VL 3B 微调 1x8 BF16 2047.19 (TPS) 1876.66 (TPS) 689 【Pass】
7B 微调 1x8 BF16 1620.87 (TPS) 1091.20 (TPS) 689 【Pass】
32B 微调 2x8 BF16 257.50 (TPS) / 689 【Pass】
72B 微调 4x8 (A3) BF16 322.96 (TPS) 256.28 (TPS) 689 【Pass】
Qwen3-VL 8B 微调 1x8 BF16 146.54 (TPS) 129.71 (TPS) 179 【Test】
30B 微调 1x8 (A3) BF16 179.57 (TPS) / 185 【Test】
235B 微调 16x8 (A3) BF16 598.05 (TPS) / 16116 【Test】
Qwen2.5-Omni 7B 微调 1x8 BF16 575.01 (TPS) 534.28 (TPS) 296 【Pass】
Qwen3-Omni 30B 微调 2x4 (A3) BF16 131.3 (TPS) 16.4 (TPS) 288 【Test】
Magistral-Small-2509 24B 微调 1x8 BF16 1.843 (SPS) 1.185 (SPS) / 【Test】
语音识别 Whisper 1.5B 预训练 1x8 BF16 93.38 (SPS) 109.23 (SPS) / 【Test】
语音生成 CosyVoice3 0.5B 预训练 1x8 BF16 290.91 (SPS) 326.11 (SPS) 24 【Test】

大语言模型(稠密模型、稀疏模型和状态空间模型)由MindSpeed-LLM专项维护,如果需要进行大语言模型的训练,请访问大语言模型仓库MindSpeed-LLM获取详细的适用说明,当前MindSpeed-LLM已支持以下的主流模型:

模型类型 模型 下载链接 脚本位置 序列长度 训练后端 集群规模 支持版本 贡献方 认证
稠密模型 Aquila 7B aquila 2K Legacy 1x8 2.0.0 【GTS】 【Pass】
Aquila2 7B aquila2 2K Legacy 1x8 2.0.0 【GTS】 【Pass】
34B 4K Legacy 2x8 2.0.0 【GTS】 【Pass】
Baichuan 7B baichuan 4K Legacy 1x8 2.0.0 【GTS】 【Pass】
13B 4K Legacy 1x8 2.0.0 【GTS】 【Pass】
Baichuan2 7B baichuan2 4K Legacy 1x8 2.0.0 【Ascend】 【Pass】
13B 4K Mcore 1x8 2.3.0 【Ascend】 【Pass】
Bloom 7B1 bloom 2K Legacy 1x8 2.0.0 【Ascend】 【Pass】
176B 2K Legacy 12x8 2.0.0 【Ascend】 【Pass】
ChatGLM3 6B chatglm3 8K Mcore 1x8 2.3.0 【Ascend】 【Pass】
32K Mcore 1x8 2.3.0 【Ascend】 【Pass】
64K Mcore 2x8 2.3.0 【Ascend】 【Pass】
GLM4 9B glm4 8K Mcore 1x8 2.3.0 【GTS】 【Pass】
32K Mcore 2x8 【GTS】 【Pass】
CodeLlama 34B codellama 4K Mcore 2x8 2.2.0 【GTS】 【Pass】
InternLM 7B intern 2K Legacy 1x8 2.0.0 【Ascend】 【Pass】
65B 2K Legacy 4x8 2.0.0 【Ascend】 【Pass】
InternLM2 20B internlm2 4K Mcore 1x8 2.2.0 【GTS】 【Pass】
32K Mcore 1x8 2.2.0 【GTS】 【Pass】
InternLM2.5 1.8B internlm25 32K Mcore 1x8 2.3.0 【GTS】 【Pass】
7B 32K Mcore 1x8 2.3.0 【GTS】 【Pass】
20B 32K Mcore 2x8 2.3.0 【GTS】 【Test】
InternLM3 8B internlm3 8K Mcore 1x8 【Ascend】 【Pass】
LLaMA 7B llama 2K Legacy 1x8 2.0.0 【Ascend】 【Pass】
13B 2K Legacy 1x8 2.0.0 【Ascend】 【Pass】
33B 2K Legacy 4x8 2.0.0 【Ascend】 【Pass】
65B 2K Legacy 4x8 2.0.0 【Ascend】 【Pass】
LLaMA2 7B llama2 4K Mcore 1x8 【NAIE】 【Pass】
13B 4K Mcore 1x8 【NAIE】 【Pass】
34B 4K Mcore 2x8 2.3.0 【GTS】 【Pass】
70B 4K Mcore 4x8 【GTS】 【Pass】
128K Mcore 8x8 【Ascend】 【Pass】
LLaMA3 8B llama3 8K Mcore 1x8 2.3.0 【GTS】 【Pass】
70B 8K Mcore 4x8 2.3.0 【GTS】 【Pass】
LLaMA3.1 8B llama31 8K Mcore 1x8 2.3.0 【GTS】 【Pass】
128K Mcore 4x8 2.3.0 【GTS】 【Pass】
50B 128K Mcore 8x8 2.3.0 【Ascend】 【Pass】
70B 8K Mcore 4x8 2.3.0 【GTS】 【Pass】
128K Mcore 24x8 2.3.0 【Ascend】 【Pass】
200B 8K Mcore 8x8 2.3.0 【Ascend】 【Pass】
405B 8K Mcore 8x8 【Ascend】 【Pass】
128K Mcore 36x8 2.3.0 【Ascend】 【Pass】
LLaMA3.2 1B llama32 8K Mcore 1x8 2.3.0 【GTS】 【Pass】
3B 8K Mcore 1x8 2.3.0 【GTS】 【Pass】
LLaMA3.3 70B-Instruct llama33 8K Mcore 4x8 2.3.0 【GTS】 【Pass】
Qwen 7B qwen 8K Legacy 1x8 2.0.0 【GTS】 【Pass】
14B 2K Legacy 1x8 2.0.0 【GTS】 【Pass】
72B 8K Legacy 16x8 2.0.0 【GTS】 【Pass】
Qwen1.5 0.5B qwen15 8K Mcore 1x8 2.2.0 【GTS】 【Pass】
1.8B 8K Mcore 1x8 【GTS】 【Pass】
4B 8K Mcore 1x8 【GTS】 【Pass】
7B 8K Mcore 1x8 【GTS】 【Pass】
14B 8K Mcore 1x8 【GTS】 【Pass】
32B 8K Mcore 4x8 【GTS】 【Pass】
72B 8K Mcore 8x8 【GTS】 【Pass】
110B 8K Mcore 8x8 【GTS】 【Pass】
CodeQwen1.5 7B 8K Mcore 1x8 【GTS】 【Pass】
Qwen2 0.5B qwen2 4K Mcore 1x8 2.2.0 【GTS】 【Pass】
32K Mcore 1x8 【GTS】 【Pass】
1.5B 4K Mcore 1x8 【GTS】 【Pass】
32K Mcore 1x8 【GTS】 【Pass】
7B 4K Mcore 1x8 【GTS】 【Pass】
32K Mcore 1x8 【GTS】 【Pass】
72B 4K Mcore 4x8 【GTS】 【Pass】
32K Mcore 16x8 【Ascend】 【Pass】
Qwen2.5 0.5B qwen25 32K Mcore 1x8 2.3.0 【GTS】 【Pass】
1.5B 32K Mcore 1x8 2.3.0 【GTS】 【Pass】
3B 32K Mcore 1x8 2.3.0 【GTS】 【Pass】
7B 32K Mcore 1x8 2.3.0 【Ascend】 【Pass】
14B 32K Mcore 2x8 2.3.0 【GTS】 【Pass】
32B 32K Mcore 4x8 2.3.0 【GTS】 【Pass】
72B 32K Mcore 16x8 【GTS】 【Pass】
Qwen3 0.6B qwen3 4K Mcore 1x8 【Ascend】 【Pass】
1.7B 4K Mcore 1x8 【Ascend】 【Pass】
4B 4K Mcore 1x8 【Ascend】 【Pass】
8B 4K Mcore 1x8 【Ascend】 【Pass】
14B 4K Mcore 1x8 【Ascend】 【Pass】
32B 4K Mcore 2x8 【Ascend】 【Pass】
32B qwen3 4K FSDP2 1x16 【Ascend】 【Test】
QwQ 32B qwq 4K Mcore 1x8 2.2.0 【GTS】 【Test】
Qwen2.5-Math 1.5B qwen25_math 4K Mcore 1x8 2.2.0 【GTS】 【Pass】
7B 4K Mcore 1x8 【GTS】 【Pass】
72B 4K Mcore 4x8 【GTS】 【Test】
CodeQwen2.5 7B qwen25_coder 8K Mcore 1x8 2.2.0 【China Mobile Cloud】 【Test】
Yi 9B yi 4K Legacy 1x4 2.0.0 【OpenMind】 【Test】
34B 4K Mcore 2x8 2.2.0 【GTS】 【Pass】
Yi1.5 6B yi15 4K Mcore 1x8 2.2.0 【GTS】 【Pass】
9B 4K Mcore 1x8 【GTS】 【Pass】
34B 4K Mcore 2x8 【GTS】 【Test】
Mistral 7B mistral 32K Mcore 1x8 2.2.0 【NAIE】 【Pass】
Gemma 2B gemma 8K Mcore 1x8 2.2.0 【GTS】 【Pass】
7B 8K Mcore 1x8 【GTS】 【Pass】
Gemma2 9B gemma2 8K Mcore 1x8 【GTS】 【Pass】
27B 8K Mcore 2x8 【GTS】 【Pass】
MiniCPM 2B minicpm 4K Mcore 1x8 2.2.0 【NAIE】 【Pass】
MiniCPM3 4B minicpm3 32K Mcore 1x8 2.2.0 【GTS】 【Test】
Phi3.5 mini-instruct phi35 4K Mcore 1x8 【GTS】 【Test】
DeepSeek-Math 7B deepseek_math 4K Mcore 1x8 2.2.0 【Ascend】 【Test】
DeepSeek-R1-Distill-Qwen 1.5B deepseek_r1_distill_qwen 4K Mcore 1x8 2.2.0 【Ascend】 【Pass】
7B 4K Mcore 1x8 【Ascend】 【Pass】
14B 4K Mcore 1x8 【Ascend】 【Pass】
32B 8K Mcore 2x8 【Ascend】 【Pass】
DeepSeek-R1-Distill-LLaMA 8B deepseek_r1_distill_llama 8K Mcore 1x8 2.2.0 【Ascend】 【Pass】
70B 8K Mcore 4x8 【Ascend】 【Pass】
Seed-OSS 36B seed_oss 2K Mcore 1x8 【Ascend】 【Test】
Magistral 24B magistral 4K Mcore 1x8 【Ascend】 【Test】
PLM 1.8B plm 2K Mcore 1x8 【Ascend】 【Test】
稀疏模型 Qwen3 30B-A3B qwen3_moe 4K Mcore 2x8 【Ascend】 【Pass】
qwen3_moe 4K FSDP2 1x16 【Ascend】 【Test】
235B-A22B qwen3_moe 4K Mcore 16x16 【Ascend】 【Pass】
qwen3_moe 4K FSDP2 16x16 【Ascend】 【Test】
Qwen3-Next 80B-A3B qwen3_next 16K Mcore 4x16 【Ascend】 【Pass】
qwen3_next 16K FSDP2 4x16 【Ascend】 【Test】
Qwen3-Coder-Next 80B-A3B qwen3_coder_next 16K Mcore 4x16 【Ascend】 【Test】
Qwen2 57B-A14B qwen2_moe 4K Mcore 8x8 2.2.0 【GTS】 【Pass】
Grok-1 40B grok-1 8K Mcore 4x8 2.0.0 【GTS】 【Pass】
Mixtral 8x7B mixtral 32K Mcore 8x8 2.2.0 【Ascend】 【Pass】
8x22B 32K Mcore 8x8 【NAIE】 【Pass】
64K Mcore 8x8 【NAIE】 【Test】
DeepSeek-V2 236B deepseek2 8K Mcore 20x8 2.2.0 【Ascend】 【Pass】
DeepSeek-V2-coder 236B deepseek2_coder 8K Mcore 20x8 2.2.0 【Ascend】 【Test】
DeepSeek-V2-Lite 16B deepseek2_lite 8K Mcore 1x8 【Ascend】 【Pass】
DeepSeek-V2.5 236B deepseek25 8K Mcore 20x8 2.2.0 【NAIE】 【Test】
DeepSeek-V3 671B deepseek3 4K Mcore 64x8 【Ascend】 【Pass】
DeepSeek-V3.2 671B deepseek3.2 4K Mcore 32x16 【Ascend】 【Test】
MiniCPM 8x2B minicpm 4K Mcore 1x8 2.2.0 【NAIE】 【Test】
Ling-mini-2.0 16B ling_v2 4K Mcore 1x8 【Ascend】 【Test】
Ring 1T 32K Mcore 32x8 【Ascend】 【Test】
Phi3.5 MoE-instruct phi35 4K Mcore 2x8 【GTS】 【Test】
Hunyuan 389B hunyuanLarge 8K Mcore 8x8 2.3.0 【Ascend】 【Pass】
GPT4 MoE-175B gpt4 128K Mcore 8x8 2.3.0 【Ascend】 【Pass】
GLM4.5 MoE-106B glm45-moe 4K Mcore 8x8 【Ascend】 【Test】
GLM5 MoE-744B glm5 4K Mcore 32x16 【Ascend】 【Test】
Step3.5-Flash MoE-196B step35 4K FSDP2 12x16 【Ascend】 【Test】
LongCat MoE-560B longcat 4K Mcore 8x16 【Ascend】 【Test】
GPT-OSS MoE-20B gpt_oss 4K FSDP2 1x16 【Ascend】 【Test】
状态空间模型 Mamba2 2.7B mamba2 4K Mcore 1x8 【Ascend】 【Test】
8B 4K Mcore 1x8 【Ascend】 【Test】
Mamba2Hybrid 8B mamba2 4K Mcore 1x8 【Ascend】 【Test】

常用参数解释说明

针对MindSpeed MM套件中运行所使用的参数做解释说明,具体见README

特性规划


  • 【新模型】 JanusPro
  • 【模型特性】 CogVideoX: PP
  • 【模型特性】 OpensoraPlan1.3: CP (Ring Attention)
  • 【模型特性】 Qwen2VL: VPP, CP (Ulysses & Ring Attention)
  • 【模型特性】 InternVL2: TP, CP (Ulysses & Ring Attention)
  • 【基础特性】 Hetero-parallel

工具使用


昇腾Profiling采集工具

MindSpeed MM集成了昇腾profiling采集工具,以提供对模型运行情况的分析。该工具能够依照配置采集模型的算子、显存等关键信息,同时支持动静态两种采集方式,协助开发者分析模型瓶颈,并可根据实际场景需求选择使用。

具体方法见 README 的profiling章节

MindStudio Insight性能分析工具

针对大模型集群场景的性能调优,这里推荐一款优秀的可视化调优工具MindStudio Insight。 MindStudio Insight提供了包括Timeline视图、通信分析、计算耗时等的可视化呈现,以便用户分析潜在的性能瓶颈,并指导如何采取措施消除或减少这些瓶颈。

具体安装及使用方法见《MindStudio Insight操作指南》

Sora类模型特征提取

MindSpeed MM支持提取视频和文本特征并保存

具体方法见 README 的Sora类模型特征提取章节

内存快照提取

MindSpeed MM集成了昇腾内存快照采集工具,以提供对模型运行情况的分析。

具体方法见 README 的内存快照提取章节

Tensorboard使用

MindSpeed MM支持Tensorboard的使用

具体方法见 README 的Tensorboard使用章节

版本维护


MindSpeed MM版本有以下五个维护阶段:

状态 时间 说明
计划 1—3 个月 计划特性
开发 3 个月 开发特性
维护 6-12 个月 合入所有已解决的问题并发布版本,针对不同的MindSpeed MM版本采取不同的维护策略,常规版本和长期支持版本维护周期分别为6个月和12个月
无维护 0—3 个月 合入所有已解决的问题,无专职维护人员,无版本发布
生命周期终止(EOL) N/A 分支不再接受任何修改

MindSpeed MM已发布版本维护策略:

MindSpeed MM版本 维护策略 当前状态 发布时间 后续状态 EOL日期
26.0.0 常规版本 维护 2026/03/30 预计2026/09/30起无维护
2.3.0 常规版本 维护 2025/12/30 预计2026/06/30起无维护
2.2.0 常规版本 无维护 2025/09/30 预计2026/03/30起无维护
2.1.0 常规版本 无维护 2025/06/30 预计2025/12/30起无维护
2.0.0 常规版本 无维护 2025/03/30 预计2025/09/30起无维护
1.0.0 常规版本 无维护 2024/12/30 预计2025/06/30起无维护
1.0.RC3 常规版本 无维护 2024/09/30 预计2025/03/30起无维护

常见问题


相关FAQ请参考链接:FAQ

相关资源


  1. 面向大规模分布式训练的多模态套件
  2. 凭借昇腾澎湃算力,Open-Sora Plan实现电影级视频生成
  3. MindSpeed MM支持主流多模态理解大模型,性能实现大幅提升!
  4. 基于昇腾原生训练!中大和360联合打造多模态任务新范式Qihoo-T2X
  5. 基于昇腾MindSpeed MM玩转Wan2.1视频生成SOTA模型
  6. 多模态理解SOTA模型开箱即用,MindSpeed MM支持Qwen2.5-VL最佳实践
  7. 联创首发-基于昇腾MindSpeed MM玩转Open-Sora Plan V1.5模型
  8. 开源即支持!基于昇腾MindSpeed MM玩转GLM-4.1V-Thinking多模态理解最新模型

安全声明


MindSpeed MM 安全声明

免责声明


致MindSpeed MM使用者

  1. MindSpeed MM提供的模型仅供您用于非商业目的。
  2. 对于各模型,MindSpeed MM平台仅提示性地向您建议可用于训练的数据集,华为不提供任何数据集,如您使用这些数据集进行训练,请您特别注意应遵守对应数据集的License,如您因使用数据集而产生侵权纠纷,华为不承担任何责任。
  3. 如您在使用MindSpeed MM模型过程中,发现任何问题(包括但不限于功能问题、合规问题),请在Gitcode提交issue,我们将及时审视并解决。
  4. MindSpeed MM功能依赖的Megatron等第三方开源软件,均由第三方社区提供和维护,因第三方开源软件导致的问题的修复依赖相关社区的贡献和反馈。您应理解,MindSpeed MM仓库不保证第三方开源软件本身的问题进行修复,也不保证会测试,纠正所有第三方开源软件的漏洞和错误。

致数据集所有者

如果您不希望您的数据集在MindSpeed MM中的模型被提及,或希望更新MindSpeed MM中的模型关于您的数据集的描述,请在Gitcode提交issue,我们将根据您的issue要求删除或更新您的数据集描述。衷心感谢您对MindSpeed MM的理解和贡献。

License声明

Ascend MindSpeed MM提供的模型,如模型目录下存在License的,以该License为准。如模型目录下不存在License的,以Apache 2.0许可证许可,对应许可证文本可查阅Ascend MindSpeed MM根目录LICENSE文件,docs目录下的文档适用CC-BY 4.0许可证,具体参见文档LICENSE

贡献声明


1. 报告问题

  • 如果您发现任何问题,请先查看仓库的issues列表,尝试寻找类似问题或解决方案。

  • 如果现有issues列表中没有您遇到的问题,可以提交一个新的issue,并尽量提供清晰的问题描述、复现步骤与环境信息。

2. 贡献代码流程

若您希望提交代码改动,请遵循以下简要步骤:

  • 在您的个人分支上开发并提交,然后向本项目仓库发起Pull Request(PR);

  • 在我们的SIG例会PR评审申请登记中,参照既定格式申请PR评审,并按时参加对应的评审会议;

  • 根据评审意见进行修改,并更新PR;

  • PR通过评审后,在评论区输入compile以触发门禁流水线(CI);

  • 当PR的CI通过且获得足够的标签后,仓库Committer将进行最终审核,并合入在研分支。

感谢您的参与与贡献!我们期待与您共同推动项目发展。

致谢


MindSpeed MM 由华为公司的下列部门及昇腾生态合作伙伴联合贡献:

华为公司:

  • 计算产品线
  • 公共开发部
  • 2012实验室
  • 华为云

生态合作伙伴:

  • 360 AI Research
  • 北大OpenSoraPlan团队
  • 微信技术架构部基础架构中心
  • 京东零售九数研发技术部

感谢来自社区的每一个PR,欢迎贡献 MindSpeed MM。