MindSpeed-MM:基于昇腾芯片的多模态大模型训练套件项目

华为昇腾面向大规模分布式训练的多模态大模型套件,支撑多模态生成、多模态理解。

分支10Tags7
文件最后提交记录最后更新时间
add PULL_REQUEST_TEMPLATE 1 年前
[bugfix]fix get num_experts error in mm_2_hf权重转换 Co-authored-by: chengpeng25<chengpeng9@huawei.com> # message auto-generated for no-merge-commit merge: !1918 merge dev_cp_fix_230 into 2.3.0 [bugfix]fix get num_experts error in mm_2_hf权重转换 Created-by: chengpeng25 Commit-by: chengpeng25 Merged-by: ascend-robot Description: ## Motivation Please describe the motivation of this PR and the goal you want to achieve through this PR. 修复mm_2_hf权重转换时,InterVl模型取num_experts无对应属性问题 issue: https://gitcode.com/Ascend/MindSpeed-MM/issues/186 ## Modification Please briefly describe what modification is made in this PR. 校验config中是否有'text_config',没有的话给默认值1. ## Self-test (Optional) If modifications to this PR may cause/fix function/accuracy/performance DTSs/issues, a self-inspection record needs to be attached. 修改后,Intervl3-8b模型hf_2_mm,mm_2_hf执行都正常 ## BC-breaking (Optional) If there are compatibility issues, such as dependencies on cann/torch_npu versions, they need to be explained in the PR. ## Checklist **Before PR**: - [ ] The new code needs to comply with the Clean Code specification. - [ ] The PR content is self-checked, and the expression can be clear and the writing standardized **After PR**: - [ ] CLA has been signed and all committers have signed the CLA in this PR. - [ ] The ci-pipeline is passed, Code Check is passed. See merge request: Ascend/MindSpeed-MM!19185 个月前
[Test] modify local ci ut Co-authored-by: yangx_sy<sunyang49@huawei.com> # message auto-generated for no-merge-commit merge: !1585 merge local_ci into master [Test] modify local ci ut Created-by: yangx_sy Commit-by: yangx_sy Merged-by: ascend-robot Description: ## Motivation Please describe the motivation of this PR and the goal you want to achieve through this PR. ## Modification Please briefly describe what modification is made in this PR. ## Self-test (Optional) If modifications to this PR may cause/fix function/accuracy/performance DTSs/issues, a self-inspection record needs to be attached. ## BC-breaking (Optional) If there are compatibility issues, such as dependencies on cann/torch_npu versions, they need to be explained in the PR. ## Checklist **Before PR**: - [x] The new code needs to comply with the Clean Code specification. - [x] The PR content is self-checked, and the expression can be clear and the writing standardized **After PR**: - [x] CLA has been signed and all committers have signed the CLA in this PR. - [x] The ci-pipeline is passed, Code Check is passed. See merge request: Ascend/MindSpeed-MM!15857 个月前
docs: update branch 2.3.0 docs link, switch from master to 2.3.0 Co-authored-by: liyingxuan<liyingxuan3@huawei.com> # message auto-generated for no-merge-commit merge: !2241 merge 2.3.0 into 2.3.0 docs: update branch 2.3.0 docs link, switch from master to 2.3.0 Created-by: liyx616 Commit-by: liyingxuan Merged-by: ascend-robot Description: ## What this PR does / why we need it? 将分支2.3.0链接到master分支的文档全部修改为链接到2.3.0 ## Does this PR introduce any user-facing change? 修复了2.3.0分支的使用体验 ## How was this patch tested? 文档修改,不涉及 See merge request: Ascend/MindSpeed-MM!22412 个月前
[Bugfix] Update pkgs for cve issues Co-authored-by: AZe_404<wangze62@h-partners.com> # message auto-generated for no-merge-commit merge: !2336 merge cve_230 into 2.3.0 [Bugfix] Update pkgs for cve issues Created-by: AZe_404 Commit-by: AZe_404 Merged-by: ascend-robot Description: ## What this PR does / why we need it? 修复requests 低于2.33.0版本以及其他三方库版本存在漏洞的问题. ## Does this PR introduce any user-facing change? Please describe whether the PR will result in any user-facing usage changes. If there is related documentation, please specify its path. ## How was this patch tested? Please explain how to verify the correctness and effectiveness of this feature, as well as its usage constraints and limitations. See merge request: Ascend/MindSpeed-MM!23362 个月前
[Bugfix]Layerzero saving weight bugfix-branch 2.3.0 Co-authored-by: zs-derrick1<1434012475@qq.com> # message auto-generated for no-merge-commit merge: !1943 merge 2.3.0 into 2.3.0 [Bugfix]Layerzero saving weight bugfix-branch 2.3.0 Created-by: zs-derrick1 Commit-by: zs-derrick1 Merged-by: ascend-robot Description: ## Motivation Layerzero saving weight bugfix-branch 2.3.0 ## Modification Please briefly describe what modification is made in this PR. ## Self-test (Optional) If modifications to this PR may cause/fix function/accuracy/performance DTSs/issues, a self-inspection record needs to be attached. ## BC-breaking (Optional) If there are compatibility issues, such as dependencies on cann/torch_npu versions, they need to be explained in the PR. ## Checklist **Before PR**: - [x] The new code needs to comply with the Clean Code specification. - [x] The PR content is self-checked, and the expression can be clear and the writing standardized **After PR**: - [x] CLA has been signed and all committers have signed the CLA in this PR. - [x] The ci-pipeline is passed, Code Check is passed. See merge request: Ascend/MindSpeed-MM!19435 个月前
[Docs] Readme updates Co-authored-by: js1234567<jiangshuo9@h-partners.com> # message auto-generated for no-merge-commit merge: !1885 merge 2.3.0 into 2.3.0 [Docs] Readme updates Created-by: js1234567 Commit-by: js1234567 Merged-by: ascend-robot Description: ## Motivation Readme updates ## Modification QR code updates Raodmap updates ## Self-test (Optional) If modifications to this PR may cause/fix function/accuracy/performance DTSs/issues, a self-inspection record needs to be attached. ## BC-breaking (Optional) If there are compatibility issues, such as dependencies on cann/torch_npu versions, they need to be explained in the PR. ## Checklist **Before PR**: - [x] The new code needs to comply with the Clean Code specification. - [x] The PR content is self-checked, and the expression can be clear and the writing standardized **After PR**: - [ ] CLA has been signed and all committers have signed the CLA in this PR. - [ ] The ci-pipeline is passed, Code Check is passed. See merge request: Ascend/MindSpeed-MM!18855 个月前
delete outdated model for 2.3.0 Co-authored-by: lu-jinfu1999<lujinfu1@h-partners.com> # message auto-generated for no-merge-commit merge: !1981 merge 2.3.0 into 2.3.0 [Modify] delete outdated model for 2.3.0 Created-by: lu-jinfu1999 Commit-by: lu-jinfu1999 Merged-by: ascend-robot Description: ## Motivation delete outdated model for 2.3.0 ## Modification delete outdated model for 2.3.0 ## Self-test (Optional) If modifications to this PR may cause/fix function/accuracy/performance DTSs/issues, a self-inspection record needs to be attached. ## BC-breaking (Optional) If there are compatibility issues, such as dependencies on cann/torch_npu versions, they need to be explained in the PR. ## Checklist **Before PR**: - [x] The new code needs to comply with the Clean Code specification. - [x] The PR content is self-checked, and the expression can be clear and the writing standardized **After PR**: - [x] CLA has been signed and all committers have signed the CLA in this PR. - [x] The ci-pipeline is passed, Code Check is passed. See merge request: Ascend/MindSpeed-MM!19815 个月前
[Docs] Update qwen2.5vl performance script Co-authored-by: js1234567<jiangshuo9@h-partners.com> # message auto-generated for no-merge-commit merge: !2021 merge 2.3.0 into 2.3.0 [Docs] Update qwen2.5vl performance script Created-by: js1234567 Commit-by: js1234567 Merged-by: ascend-robot Description: ## Motivation Update qwen2.5vl performance script ## Modification Modify the readme Add shell for performance ## Self-test (Optional) If modifications to this PR may cause/fix function/accuracy/performance DTSs/issues, a self-inspection record needs to be attached. ## BC-breaking (Optional) If there are compatibility issues, such as dependencies on cann/torch_npu versions, they need to be explained in the PR. ## Checklist **Before PR**: - [x] The new code needs to comply with the Clean Code specification. - [x] The PR content is self-checked, and the expression can be clear and the writing standardized **After PR**: - [x] CLA has been signed and all committers have signed the CLA in this PR. - [x] The ci-pipeline is passed, Code Check is passed. See merge request: Ascend/MindSpeed-MM!20214 个月前
!729 【安全】加载功能安全加固 Merge pull request !729 from htwang/master 1 年前
!180 【资料】修改LICENSE Merge pull request !180 from liuqiyuan/master 1 年前
!325 【测试】添加InternVL2-8B ST & build打包内容完善 Merge pull request !325 from 陆劲夫/master 1 年前
[Bugfix] add empty_cache before generate Co-authored-by: mr-lin314<798948055@qq.com> # message auto-generated for no-merge-commit merge: merge master into master [Bugfix] add empty_cache before generate Created-by: mr-lin125 Commit-by: mr-lin314 Merged-by: ascend-robot Description: ## Motivation Please describe the motivation of this PR and the goal you want to achieve through this PR. ## Modification Please briefly describe what modification is made in this PR. ## Self-test (Optional) If modifications to this PR may cause/fix function/accuracy/performance DTSs/issues, a self-inspection record needs to be attached. ## BC-breaking (Optional) If there are compatibility issues, such as dependencies on cann/torch_npu versions, they need to be explained in the PR. ## Checklist **Before PR**: - [x] The new code needs to comply with the Clean Code specification. - [x] The PR content is self-checked, and the expression can be clear and the writing standardized **After PR**: - [x] CLA has been signed and all committers have signed the CLA in this PR. - [x] The ci-pipeline is passed, Code Check is passed. See merge request: Ascend/MindSpeed-MM!15767 个月前
docs: update branch 2.3.0 docs link, switch from master to 2.3.0 Co-authored-by: liyingxuan<liyingxuan3@huawei.com> # message auto-generated for no-merge-commit merge: !2241 merge 2.3.0 into 2.3.0 docs: update branch 2.3.0 docs link, switch from master to 2.3.0 Created-by: liyx616 Commit-by: liyingxuan Merged-by: ascend-robot Description: ## What this PR does / why we need it? 将分支2.3.0链接到master分支的文档全部修改为链接到2.3.0 ## Does this PR introduce any user-facing change? 修复了2.3.0分支的使用体验 ## How was this patch tested? 文档修改,不涉及 See merge request: Ascend/MindSpeed-MM!22412 个月前
[Docs] add wan2.2 readme and copyright Co-authored-by: 林明哲<linmingzhe3@huawei.com> # message auto-generated for no-merge-commit merge: !1628 merge 1022reame into master [Docs] add wan2.2 readme and copyright Created-by: LinMingZhe Commit-by: 林明哲 Merged-by: ascend-robot Description: ## Motivation add readme and copyright ## Modification add readme and copyright btw remove redundant shell script ## Checklist **Before PR**: - [ ] The new code needs to comply with the Clean Code specification. - [ ] The PR content is self-checked, and the expression can be clear and the writing standardized **After PR**: - [ ] CLA has been signed and all committers have signed the CLA in this PR. - [ ] The ci-pipeline is passed, Code Check is passed. See merge request: Ascend/MindSpeed-MM!16286 个月前
!1321 [Bugfix] import dependency isolation. Merge pull request !1321 from zs-Derrick/master 10 个月前
!1321 [Bugfix] import dependency isolation. Merge pull request !1321 from zs-Derrick/master 10 个月前
!1321 [Bugfix] import dependency isolation. Merge pull request !1321 from zs-Derrick/master 10 个月前
[Bugfix]resolve multiple issues — unused code, index out of bounds, undefined vars, resource leaks Co-authored-by: zhangxubin<1656631289@qq.com> # message auto-generated for no-merge-commit merge: !1662 merge master into master [Bugfix]resolve multiple issues — unused code, index out of bounds, undefined vars, resource leaks Created-by: MoCuishle-M Commit-by: zhangxubin Merged-by: ascend-robot Description: ## Motivation Fix some security issues. ## Modification The issues fixed are as follows: 1. Removed unused code and fixed logic errors 2. Fixed array out-of-bounds access. 3. Fixed usage of undefined variables 4. Fixed resource leaks by ensuring proper release ## Self-test (Optional) If modifications to this PR may cause/fix function/accuracy/performance DTSs/issues, a self-inspection record needs to be attached. ## BC-breaking (Optional) If there are compatibility issues, such as dependencies on cann/torch_npu versions, they need to be explained in the PR. ## Checklist **Before PR**: - [x] The new code needs to comply with the Clean Code specification. - [x] The PR content is self-checked, and the expression can be clear and the writing standardized **After PR**: - [x] CLA has been signed and all committers have signed the CLA in this PR. - [x] The ci-pipeline is passed, Code Check is passed. See merge request: Ascend/MindSpeed-MM!16626 个月前
!1453 [Docs] readme for videoalign Merge pull request !1453 from chenpeizhe/master 8 个月前
!1321 [Bugfix] import dependency isolation. Merge pull request !1321 from zs-Derrick/master 10 个月前
!1376 [Feature]add DanceGRPO-flux feature readme Merge pull request !1376 from lmy/dev 9 个月前
!1116 [Bugfix]Rectify the code in the repository based on the CleanCode scan results. Merge pull request !1116 from zhangxubin/master 11 个月前
!1384 [Feature] support lumina-mgpt2 model Merge pull request !1384 from meng-coding/master 9 个月前
!599 【特性】新增vae训练脚本以及配置文件 Merge pull request !599 from zs-Derrick/master 1 年前
[mindspore][master]support deepseekvl, llava, glm4.1v for mindspore backend Co-authored-by: ffmh<fengminghao2@huawei.com> # message auto-generated for no-merge-commit merge: !1671 merge ms_adapt into master [mindspore][master]support deepseekvl, llava, glm4.1v for mindspore backend Created-by: ffmh Commit-by: ffmh Merged-by: ascend-robot Description: ## Motivation deepseekvl2, llava1.5, glm4.1v 支持mindspore后端 ## Modification patch修改介绍 1. npu_rotary_position_embedding使用mindspore框架接口,不走mindspeed自定义算子流程 2. vmap接口缺失,使用等价写法替换transformers中 sdpa_mask_older_torch 函数 ## Self-test (Optional) If modifications to this PR may cause/fix function/accuracy/performance DTSs/issues, a self-inspection record needs to be attached. dsvl2 ![image.png](https://raw.gitcode.com/user-images/assets/7404510/64b345d1-c04c-4c7f-8607-895f638dcee0/image.png 'image.png') glm ![image.png](https://raw.gitcode.com/user-images/assets/7404510/6c75c0cb-8478-42cd-a05d-effa1b18dc17/image.png 'image.png') llava ![image.png](https://raw.gitcode.com/user-images/assets/7404510/addbf12a-9de9-44d6-8184-4b8b46d262fe/image.png 'image.png') ## BC-breaking (Optional) If there are compatibility issues, such as dependencies on cann/torch_npu versions, they need to be explained in the PR. ## Checklist **Before PR**: - [ ] The new code needs to comply with the Clean Code specification. - [ ] The PR content is self-checked, and the expression can be clear and the writing standardized **After PR**: - [ ] CLA has been signed and all committers have signed the CLA in this PR. - [ ] The ci-pipeline is passed, Code Check is passed. See merge request: Ascend/MindSpeed-MM!16716 个月前
MM disttrain intervl2 适配自动并行搜索 Co-authored-by: gcw_amOUPDs9<fuyuefeng@huawei.com> # message auto-generated for no-merge-commit merge: !1588 merge master into master MM disttrain intervl2 适配自动并行搜索 Created-by: gcw_amOUPDs9 Commit-by: gcw_amOUPDs9 Merged-by: ascend-robot Description: ## Motivation Please describe the motivation of this PR and the goal you want to achieve through this PR. ## Modification Please briefly describe what modification is made in this PR. ## Self-test (Optional) If modifications to this PR may cause/fix function/accuracy/performance DTSs/issues, a self-inspection record needs to be attached. ## BC-breaking (Optional) If there are compatibility issues, such as dependencies on cann/torch_npu versions, they need to be explained in the PR. ## Checklist **Before PR**: - [x] The new code needs to comply with the Clean Code specification. - [x] The PR content is self-checked, and the expression can be clear and the writing standardized **After PR**: - [x] CLA has been signed and all committers have signed the CLA in this PR. - [x] The ci-pipeline is passed, Code Check is passed. See merge request: Ascend/MindSpeed-MM!15886 个月前
!1384 [Feature] support lumina-mgpt2 model Merge pull request !1384 from meng-coding/master 9 个月前
[Feature]:Bagel Model Co-authored-by: ningmengliu<liuhao438@huawei.com> # message auto-generated for no-merge-commit merge: !1621 merge master into master [Feature]:Bagel Model Created-by: ningmenglh Commit-by: ningmengliu Merged-by: ascend-robot Description: ## Motivation Submit Bagel model ## Modification Added the Bagel model backbone along with training interfaces, established framework APIs, and reused VAE and SigLIP components. Built the qwen2_mot model. ## Self-test (Optional) If modifications to this PR may cause/fix function/accuracy/performance DTSs/issues, a self-inspection record needs to be attached. ## BC-breaking (Optional) If there are compatibility issues, such as dependencies on cann/torch_npu versions, they need to be explained in the PR. ## Checklist **Before PR**: - [ ] The new code needs to comply with the Clean Code specification. - [ ] The PR content is self-checked, and the expression can be clear and the writing standardized **After PR**: - [ ] CLA has been signed and all committers have signed the CLA in this PR. - [ ] The ci-pipeline is passed, Code Check is passed. See merge request: Ascend/MindSpeed-MM!16216 个月前
!1116 [Bugfix]Rectify the code in the repository based on the CleanCode scan results. Merge pull request !1116 from zhangxubin/master 11 个月前
[Feature]opensoraplan1.3新增动态DPCP切换功能 Co-authored-by: qusongyun1<qusongyun1@noreply.gitcode.com> # message auto-generated for no-merge-commit merge: !1677 merge dynamicDPCP into master [Feature]opensoraplan1.3新增动态DPCP切换功能 Created-by: qusongyun1 Commit-by: qusongyun1 Merged-by: ascend-robot Description: ## Motivation 当前静态DPCP方案在动态负载下无法充分利用算力,例如在大量短序列和少量长序列的情况下,为了保证不OOM,需要设置较大的CP,然而短序列进行大CP并行会导致性能的下降。本特性新增动态DPCP功能,支持在每轮训练迭代中根据数据特征动态切换DP/CP并行策略。 ## Modification pretrain_sora.py:如果开启了动态DPCP,则优先获取缓存数据 training.py: 在初始化时,新增DPCP并行组的初始化,切换后,将数据在cp组内广播并放入缓存 MindSpeed-MM/mindspeed_mm/utils 中新增dpcp_utils.py文件,所有本特性相关的函数实现均在该文件中 ## Self-test (Optional) If modifications to this PR may cause/fix function/accuracy/performance DTSs/issues, a self-inspection record needs to be attached. ## BC-breaking (Optional) If there are compatibility issues, such as dependencies on cann/torch_npu versions, they need to be explained in the PR. ## Checklist **Before PR**: - [x] The new code needs to comply with the Clean Code specification. - [x] The PR content is self-checked, and the expression can be clear and the writing standardized **After PR**: - [x] CLA has been signed and all committers have signed the CLA in this PR. - [x] The ci-pipeline is passed, Code Check is passed. See merge request: Ascend/MindSpeed-MM!16776 个月前
[Bugfix] 修复使用pretrain_transformers.py系列模型在开启calculate_per_token_loss时loss错误打印问题 Co-authored-by: yangx_sy<sunyang49@huawei.com> # message auto-generated for no-merge-commit merge: !1949 merge token_loss_230 into 2.3.0 [Bugfix] Fixed an issue where incorrect loss printing occurred when using models from the pretrain_transformers.py series with calculate_per_token_loss enabled. Created-by: yangx_sy Commit-by: yangx_sy Merged-by: ascend-robot Description: ## Motivation 修复使用pretrain_transformers.py系列模型在开启calculate_per_token_loss时loss错误打印问题 ## Modification Please briefly describe what modification is made in this PR. ## Self-test (Optional) If modifications to this PR may cause/fix function/accuracy/performance DTSs/issues, a self-inspection record needs to be attached. ## BC-breaking (Optional) If there are compatibility issues, such as dependencies on cann/torch_npu versions, they need to be explained in the PR. ## Checklist **Before PR**: - [x] The new code needs to comply with the Clean Code specification. - [x] The PR content is self-checked, and the expression can be clear and the writing standardized **After PR**: - [x] CLA has been signed and all committers have signed the CLA in this PR. - [x] The ci-pipeline is passed, Code Check is passed. See merge request: Ascend/MindSpeed-MM!19495 个月前
!1429 [Feature] support videoalign model Merge pull request !1429 from chenpeizhe/main 9 个月前
[Modify] hetero-pp support encoder/decoder different micro-batch-size & improve qwen2.5omni data balance on global batch Co-authored-by: huangdabiao<huangdabiao1@huawei.com> # message auto-generated for no-merge-commit merge: !1736 merge test_final_1111 into master [Modify] hetero-pp support encoder/decoder different micro-batch-size & improve qwen2.5omni data balance on global batch Created-by: huangdabiao Commit-by: huangdabiao Merged-by: ascend-robot Description: ## Motivation [Modify] hetero-pp support encoder/decoder different micro-batch-size & improve qwen2.5omni data balance on global batch ## Modification [Modify] hetero-pp support encoder/decoder different micro-batch-size & improve qwen2.5omni data balance on global batch ## Self-test (Optional) If modifications to this PR may cause/fix function/accuracy/performance DTSs/issues, a self-inspection record needs to be attached. ## BC-breaking (Optional) If there are compatibility issues, such as dependencies on cann/torch_npu versions, they need to be explained in the PR. ## Checklist **Before PR**: - [ ] The new code needs to comply with the Clean Code specification. - [ ] The PR content is self-checked, and the expression can be clear and the writing standardized **After PR**: - [ ] CLA has been signed and all committers have signed the CLA in this PR. - [ ] The ci-pipeline is passed, Code Check is passed. See merge request: Ascend/MindSpeed-MM!17366 个月前
!127 【特性】新增WhisperForConditionalGeneration模型 Merge pull request !127 from zzztq/master 1 年前
!1431 [Docs]Update transformers dependency version to 4.53.0 Merge pull request !1431 from zhangxubin/master 9 个月前

Badge Documentation

简介


MindSpeed MM:面向大规模分布式训练的昇腾多模态大模型套件,支持业界主流多模态大模型训练,旨在为华为 昇腾芯片 提供端到端的多模态训练解决方案, 包含预置业界主流模型,数据工程,分布式训练及加速,预训练、微调、后训练、在线推理任务等特性。

未来规划


📈未来规划会动态刷新在MindSpeed MM RoadMap中,欢迎社区通过此链接进行互动并提出诉求

加入我们


为了交流开发经验、分享使用心得、及时获取项目更新,我们创建了MindSpeed MM官方微信群。

无论你是正在使用这个项目,还是有奇思妙想,都欢迎加入👋

最新消息


  • [Nov. 20, 2025]: 🚀 MindSpeed MM基于FSDP2支持Qwen3-Omni模型
  • [Nov. 19, 2025]: 🚀 MindSpeed MM支持Qwen Image、Qwen Image Edit模型 【Prototype】
  • [Nov. 13, 2025]: 🚀 MindSpeed MM基于FSDP2支持InternVL3.5-30B模型
  • [Nov. 06, 2025]: 🚀 MindSpeed MM基于FSDP2支持DeepseekOCR模型训练demo【Prototype】
  • [Oct. 31, 2025]: 🚀 MindSpeed MM基于fully shard支持Qwen3VL-8B/30B模型 【Prototype】
  • [Oct. 22, 2025]: 🚀 MindSpeed MM基于fully shard支持Wan2.2系列模型
  • [Sep. 08, 2025]: 🚀 MindSpeed MM支持FLUX.1-Kontext模型
  • [Sep. 03, 2025]: 🎉 强化学习上线! MindSpeed MM支持Qwen2.5VL 7B/32B GRPO训练
  • [Aug. 15, 2025]: 🤝 MindSpeed MM原生支持Lumina-mGPT 2.0模型
  • [Jul. 29, 2025]: 🌴 MindSpeed MM支持core 0.12.1版本
  • [Jul. 10, 2025]: 🚀 MindSpeed MM支持InternVL3-8B/78B模型
  • [Jul. 02, 2025]: ⚡ MindSpeed MM 0Day支持GLM-4.1V模型
  • [Jun. 30, 2025]: 🌴 MindSpeed MM版本2.1.0发布
  • [Jun. 25, 2025]: 🚀 MindSpeed MM支持HiDream-I1模型
  • [Jun. 05, 2025]: 🚀 MindSpeed MM支持Qwen2.5Omni-7B模型
  • [Jun. 05, 2025]: 🤝 MindSpeed MM原生支持OpenSoraPlan 1.5模型
  • [Apr. 03, 2025]: 🚀 MindSpeed MM支持Qwen2.5VL-32B模型
  • [Mar. 27, 2025]: 🚀 MindSpeed MM支持Wan2.1-1.3B/14B模型
  • [Mar. 26, 2025]: 🚀 MindSpeed MM支持Qwen2.5VL-3B/7B/72B模型
  • [Feb. 20, 2025]: 🚀 MindSpeed MM支持InternVL2.5-78B模型
  • [Feb. 18, 2025]: 🚀 MindSpeed MM支持HunyuanVideo模型
  • [Feb. 17, 2025]: 🔥 MindSpeed MM支持Mindspeed-Core & Megatron 0.8.0版本
  • [Feb. 15, 2025]: 🚀 MindSpeed MM支持Sana模型
  • [Jan. 24, 2025]: 🚀 MindSpeed MM支持CogVideoX 1.5模型
  • [Dec. 30, 2024]: 🌴 MindSpeed MM版本1.0.0发布
  • [Dec. 16, 2024]: 🤝 MindSpeed MM原生支持Qihoo-T2X模型
  • [Dec. 03, 2024]: 🚀 MindSpeed MM支持SD3.5模型
  • [Nov. 30, 2024]: 🎉 MindSpeed MM支持多模态理解测评
  • [Nov. 22, 2024]: 🚀 MindSpeed MM支持CogVideoX模型
  • [Nov. 06, 2024]: 🚀 MindSpeed MM支持FLUX模型
  • [Oct. 30, 2024]: 🤝 MindSpeed MM原生支持OpenSoraPlan 1.3模型
  • [Oct. 21, 2024]: 🚀 MindSpeed MM支持InternVL2、以及Qwen2VL模型
  • [Oct. 16, 2024]: 🌱 MindSpeed MM首版本1.0.RC3发布

注意: Prototype特性未经过充分验证,可能存在不稳定和bug问题,beta表示非商用特性。

效果展示


文生视频: Wan 2.2 T2V

Prompt: Ultra HD, 4K, cinematic composition, low contrast ratio, low saturation, cool tone; The queen wears an iron crown and rides on the dragon over the city. She holds a big flag that shows:" MindSpeed MM".

文生视频: OpensoraPlan 1.5 T2V

Prompt: A fluffy white rabbit with soft, velvety fur and twitching pink nose sits curiously near a rustic wooden fence, surrounded by a lush garden of vibrant wildflowers and tall grasses swaying gently in the breeze. The rabbit's large, expressive eyes scan the environment, reflecting the golden hues of the setting sun. As it nibbles on a patch of clover, its ears perk up at the distant sound of chirping birds. The fence, weathered and covered in patches of moss, adds a charming, pastoral backdrop to this serene scene, capturing the essence of a peaceful countryside moment.

Prompt: A majestic Berlin tower stands tall against the night sky, its structure bathed in a mesmerizing array of vibrant lights, casting a kaleidoscope of colors across the cityscape. The tower's intricate architectural details are highlighted by the illumination, creating a stunning contrast against the deep indigo sky. As the camera pans upward, the lights shift, revealing a dynamic play of shadows and hues that dance across the tower's surface. The surrounding city lights twinkle in harmony, enhancing the tower's grandeur and creating a breathtaking visual symphony that captures the essence of Berlin's vibrant nightlife.

文生图:Qwen-Image -> 图片编辑 Flux.1-Kontext

Prompt for generation: A coffee shop entrance features a chalkboard sign reading "MindSpeed Coffee 😊 $2 per cup," with a neon light displaying "MindSpeed MM". Next to it hangs a poster showing a beautiful Chinese woman, and beneath the poster is written "Welcome to use MindSpeed MM". Ultra HD, 4K, cinematic composition. (Qwen-Image)

Prompt for edition: Change the decoration of the coffee shop to a modern style with white painting. (Flux.1-Kontext)

理解模型:Qwen2VL

Input image for both models:

Input text for both models: Please describe the image shortly

Qwen2VL推理结果: The image depicts a serene lakeside scene with a wooden dock extending into the calm waters. The dock is made of weathered wooden planks and leads to a small platform with a ladder, suggesting it is used for swimming or diving. The lake is surrounded by lush green forests and mountains in the background, creating a picturesque and tranquil setting. The sky is overcast, adding to the calm and peaceful atmosphere of the scene.

Input text for Qwen2VL: 请用中文简短描述这张照片

Qwen2VL推理结果: 这张图片展示了一座木制码头延伸到平静的湖面上,背景是连绵的山脉和茂密的森林。天空多云,整体色调偏冷,给人一种宁静和自然的感觉。

版本说明


MindSpeed MM支持Atlas 800T A2等昇腾训练硬件形态,软件版本配套表如下:

MindSpeed MM版本 MindSpeed版本 Megatron版本 PyTorch版本 torch_npu版本 CANN版本 Python版本
master(主线) master Core 0.12.1 2.6.0, 2.7.1 在研版本 在研版本 Python3.10
2.3.0(商用) 2.3.0_core_r0.12.1 Core 0.12.1 2.6.0, 2.7.1 7.3.0 8.5.0 Python3.10
2.2.0(商用) 2.2.0_core_r0.12.1 Core 0.12.1 2.6.0, 2.7.1 7.2.0 8.3.RC1 Python3.10
2.1.0(商用) 2.1.0_core_r0.8.0 Core 0.8.0 2.1.0, 2.6.0 7.1.0 8.2.RC1 Python3.8, Python3.10
2.0.0(商用) 2.0.0_core_r0.8.0 Core 0.8.0 2.1.0 7.0.0 8.1.RC1 Python3.8, Python3.10
1.0.0(商用) 1.0.0_core_r0.6.0 Core 0.6.0 2.1.0 6.0.0 8.0.0 Python3.8, Python3.10

更多详情请参考版本配套表

安装


MindSpeed MM具体的安装请参考安装指南

快速上手


MindSpeed MM将以Qwen2.5-VL-3B和Wan2.1-T2V-1.3B模型为例,引导开发者快速上手预置模型在昇腾NPU上的高效运行。具体的操作请参考快速上手

特性/模型介绍


已支持特性概览

模型 \ 特性 TP TP-SP VPP PP CP Distributed Optimizer Recomputation LoRA RL FSDP2
InternVL3.5-30B
Qwen3-VL-8B
Qwen3-VL-30B
Wan2.2 CP (Ulysses)
OpenSoraPlan1.5-T2V
Wan2.1 CP (Ulysses)
HunyuanVideo CP (Ulysses)
CogVideoX系列-T2V CP (Ulysses)
CogVideoX系列-I2V CP (Ulysses)
OpensoraPlan1.3-T2V CP (Ulysses)
OpensoraPlan1.3-I2V CP (Ulysses)
GLM-4.1V
Qwen2VL-2B CP (Ulysses)
Qwen2VL-7B CP (Ulysses)
Qwen2VL-72B CP (Ulysses) DPO
Qwen2.5VL-3B GRPO
Qwen2.5VL-7B GRPO
Qwen2.5VL-32B GRPO
Qwen2.5VL-72B
Qwen2.5Omni-7B
Qwen3-Omni
InternVL3-8B CP (Ring)
InternVL3-78B CP (Ring)

备注:


配套版本与支持模型

【现版本实测性能(硬件信息:Atlas 900 A2 PODc)】

下述列表中支持的模型,我们在各模型的README文件中提供了相应的使用说明,里面有详细的模型训练、推理、微调等流程

模型列中的超链接指向各模型的文件夹地址, 参数量列中的超链接指向模型的社区资源地址

认证【Pass】表示已经通过测试的模型,【Test】表示测试中的模型

Samples per Second 为 (SPS); Frames per Second 为 (FPS); Tokens per Second 为 (TPS)

(注:此处SPS、FPS展示集群吞吐;TPS展示单卡吞吐)

平均序列长度是指在性能测试过程中所使用数据集的平均序列长度,通过统计各个序列长度的出现频率进行加权平均计算得出

亲和场景为调整少量结构或参数,使得模型更加亲和昇腾,性能更优

A3 为硬件 Atlas A3 训练系列产品

MindSpeed MM模型列表
模型任务 模型 参数量 任务 集群 精度格式 NPU性能 参考性能 平均序列长度 认证
多模态生成
Lumina-mGPT 2.0 7B 微调 1x8 BF16 8.24 (SPS) 8.79 (SPS) 1024 【Pass】
OpenSoraPlan1.5 8.5B 预训练 1x8 BF16 0.83 (SPS) / / 【北大贡献】
Wan2.2-T2V 5B 预训练 1x4 (A3) BF16 3.18 (SPS) 2.93 (SPS) / 【Test】
A14B 预训练 1x8 (A3) BF16 0.710 (SPS) 0.292 (SPS) / 【Test】
Wan2.2-TI2V 5B 预训练 1x4 (A3) BF16 3.18 (SPS) 2.93 (SPS) / 【Test】
Wan2.2-I2V A14B 预训练 1x8 (A3) BF16 0.671 (SPS) 0.294 (SPS) / 【Test】
Wan2.1-T2V 1.3B 预训练 1x8 BF16 0.918 (SPS) 1.04 (SPS) / 【Pass】
1.3B Lora微调 1x8 BF16 0.954 (SPS) 1.042 (SPS) / 【Pass】
14B 预训练 1x8 BF16 0.160 (SPS) 0.160 (SPS) / 【Pass】
14B Lora微调 1x8 BF16 0.179 (SPS) 0.174 (SPS) / 【Pass】
Wan2.1-I2V 1.3B 预训练 1x8 BF16 0.76 (SPS) / / 【Pass】
14B 预训练 1x8 BF16 0.130 (SPS) / / 【Pass】
14B Lora微调 1x8 BF16 0.179 (SPS) 0.173 (SPS) / 【Pass】
HunyuanVideo-T2V 13B 预训练 1x8 BF16 0.171 (SPS) 0.181 (SPS) / 【Pass】 HunyuanVideo-I2V 13B 预训练 1x8 BF16 0.164 (SPS) 0.202 (SPS) / 【Pass】
OpenSora 1.0 5.5B 预训练 1x8 BF16 3.18 (SPS) 2.04 (SPS) / 【Pass】
OpenSora 1.2 5.2B 预训练 1x8 BF16 7.31 (SPS) 8.15 (SPS) / 【Test】
OpenSora 2.0-T2V 11B 预训练 1x8 BF16 1.33 (SPS) 1.46 (SPS) / 【Pass】
OpenSoraPlan 1.2 8.7B 预训练 1x8 BF16 0.42 (SPS) 0.37 (SPS) / 【Pass】
OpenSoraPlan 1.3-T2V 8.6B 预训练 1x8 BF16 1.29 (SPS) 1.27 (SPS) / 【Pass】
OpenSoraPlan 1.3-I2V 8.6B 预训练 1x8 BF16 1.17 (SPS) 1.15 (SPS) / 【Pass】
WFVAE 0.18B 预训练 1x8 BF16 23.860 (SPS) 26.091 (SPS) / 【Pass】
CogVideoX-T2V 5B 预训练 1x8 BF16 1.14 (SPS) 1.00 (SPS) 6976 【Pass】
CogVideoX-I2V 5B 预训练 1x8 BF16 1.13 (SPS) 0.84 (SPS) 6976 【Pass】
CogVideoX 1.5-T2V 5B 预训练 1x8 BF16 1.44 (SPS) 1.75 (SPS) 6976 【Pass】
5B Lora微调 1x8 BF16 2.76 (SPS) 2.64 (SPS) / 【Pass】
CogVideoX 1.5-I2V 5B 预训练 1x8 BF16 1.43 (SPS) 1.44 (SPS) 6976 【Pass】
5B Lora微调 1x8 BF16 2.33 (SPS) 2.04 (SPS) / 【Pass】
Qihoo-T2X 1.1B 推理 1x1 BF16 / / / 【奇虎360贡献】
SDXL 3.5B 预训练 1x8 BF16 29.92 (FPS) 30.65 (FPS) / 【Pass】
3.5B 预训练 1x8 FP16 28.51 (FPS) 30.23 (FPS) / 【Pass】
SD3 2B 全参微调 1x8 BF16 16.09 (FPS) 16.01 (FPS) / 【Pass】
SD3.5 8.1B 全参微调 1x8 BF16 26.20 (FPS) 28.33 (FPS) / 【Pass】
8.1B Lora微调 1x8 FP16 47.93 (FPS) 47.95 (FPS) / 【Pass】
Flux 12B 全参微调 1x8 BF16 55.23 (FPS) 53.65 (FPS) / 【Pass】
Flux-Kontext 12B 全参微调 1x8 BF16 1.97 (FPS) 2.00 (FPS) / 【Pass】
Sana 1.6B Lora微调 1x8 BF16 28.7 (FPS) 32.8 (FPS) / 【Pass】
HiDream 17B Lora微调 1x8 BF16 18.37 (FPS) 19.61 (FPS) / 【Pass】
Kolors 2.6B 推理 1x1 FP16 / / / 【Test】
Qwen-Image 27B Lora微调 1x8 BF16 23.02 (FPS) 21.54 (FPS) / 【Pass】
Qwen-Image-Edit 27B Lora微调 1x8 BF16 20.59 (FPS) 17.47 (FPS) / 【Test】
多模态理解
GLM-4.1V 9B 微调 1x8 BF16 1074.64(TPS) 908.49(TPS) 707 【Pass】
LLaVA 1.5 7B 全参微调 1x8 BF16 3632.31 (TPS) 3757.98 (TPS) 602 【Test】
InternVL 2.0 2B 微调 1x8 BF16 7653.12 (TPS) 5089.99 (TPS) 1813 【Pass】
8B 微调 1x8 BF16 2914.39 (TPS) 2492.87 (TPS) 1813 【Pass】
26B 微调 1x8 BF16 750.12 (TPS) 738.79 (TPS) 1813 【Pass】
76B 全参微调 8x16 BF16 214 (TPS) 191 (TPS) 1813 【Pass】
InternVL 2.5 78B 微调 8x8 BF16 228.33 / 1896 【Test】
InternVL 3.0 8B 微调 1x8 BF16 2344.58 (TPS) 2211.93 (TPS) 2653 【Pass】
78B 微调 4x8 (A3) BF16 228.82 (TPS) 283.15 (TPS) 1932 【Pass】
InternVL 3.5 30B 微调 1x8 (A3) BF16 52.76 (TPS) 47.73 (TPS) 201 【Test】
Qwen2-VL 2B 微调 1x8 BF16 2941.17 (TPS) 3004.04 (TPS) 689 【Pass】
7B 微调 1x8 BF16 1143.74 (TPS) 1004.22 (TPS) 689 【Pass】
72B 微调 4x8 (A3) BF16 261.25 (TPS) 257.63 (TPS) 689 【Pass】
Qwen2.5-VL 3B 微调 1x8 BF16 2047.19 (TPS) 1876.66 (TPS) 689 【Pass】
7B 微调 1x8 BF16 1620.87 (TPS) 1091.20 (TPS) 689 【Pass】
32B 微调 2x8 BF16 257.50 (TPS) / 689 【Pass】
72B 微调 4x8 (A3) BF16 322.96 (TPS) 256.28 (TPS) 689 【Pass】
Qwen3-VL 8B 微调 1x8 BF16 146.54 (TPS) 129.71 (TPS) 179 【Test】
30B 微调 1x8 (A3) BF16 179.57 (TPS) / 185 【Test】
235B 微调 16x8 (A3) BF16 598.05 (TPS) / 16116 【Test】
Qwen2.5-Omni 7B 微调 1x8 BF16 575.01 (TPS) 534.28 (TPS) 296 【Pass】
Qwen3-Omni 30B 微调 2x4 (A3) BF16 131.3 (TPS) 16.4 (TPS) 288 【Test】
语音识别 Whisper 1.5B 预训练 1x8 BF16 93.38 (SPS) 109.23 (SPS) / 【Test】

常用参数解释说明

针对MindSpeed MM套件中运行所使用的参数做解释说明,具体见README

特性规划


  • 【新模型】 JanusPro
  • 【模型特性】 CogVideoX: PP
  • 【模型特性】 OpensoraPlan1.3: CP (Ring Attention)
  • 【模型特性】 Qwen2VL: VPP, CP (Ulysses & Ring Attention)
  • 【模型特性】 InternVL2: TP, CP (Ulysses & Ring Attention)
  • 【基础特性】 Hetero-parallel

工具使用


昇腾Profiling采集工具

MindSpeed MM集成了昇腾profiling采集工具,以提供对模型运行情况的分析。该工具能够依照配置采集模型的算子、显存等关键信息,同时支持动静态两种采集方式,协助开发者分析模型瓶颈,并可根据实际场景需求选择使用。

具体方法见 README 的profiling章节

MindStudio Insight性能分析工具

针对大模型集群场景的性能调优,这里推荐一款优秀的可视化调优工具MindStudio Insight。 MindStudio Insight提供了包括Timeline视图、通信分析、计算耗时等的可视化呈现,以便用户分析潜在的性能瓶颈,并指导如何采取措施消除或减少这些瓶颈。

具体安装及使用方法见《MindStudio Insight操作指南》

Sora类模型特征提取

MindSpeed MM支持提取视频和文本特征并保存

具体方法见 README 的Sora类模型特征提取章节

内存快照提取

MindSpeed MM集成了昇腾内存快照采集工具,以提供对模型运行情况的分析。

具体方法见 README 的内存快照提取章节

Tensorboard使用

MindSpeed MM支持Tensorboard的使用

具体方法见 README 的Tensorboard使用章节

版本维护


MindSpeed MM版本有以下五个维护阶段:

状态 时间 说明
计划 1—3 个月 计划特性
开发 3 个月 开发特性
维护 6-12 个月 合入所有已解决的问题并发布版本,针对不同的MindSpeed MM版本采取不同的维护策略,常规版本和长期支持版本维护周期分别为6个月和12个月
无维护 0—3 个月 合入所有已解决的问题,无专职维护人员,无版本发布
生命周期终止(EOL) N/A 分支不再接受任何修改

MindSpeed MM已发布版本维护策略:

MindSpeed MM版本 维护策略 当前状态 发布时间 后续状态 EOL日期
2.3.0 常规版本 维护 2025/12/30 预计2026/06/30起无维护
2.2.0 常规版本 维护 2025/09/30 预计2026/03/30起无维护
2.1.0 常规版本 无维护 2025/06/30 预计2025/12/30起无维护
2.0.0 常规版本 无维护 2025/03/30 预计2025/09/30起无维护
1.0.0 常规版本 无维护 2024/12/30 预计2025/06/30起无维护
1.0.RC3 常规版本 无维护 2024/09/30 预计2025/03/30起无维护

常见问题


相关FAQ请参考链接:FAQ

相关资源


  1. 面向大规模分布式训练的多模态套件
  2. 凭借昇腾澎湃算力,Open-Sora Plan实现电影级视频生成
  3. MindSpeed MM支持主流多模态理解大模型,性能实现大幅提升!
  4. 基于昇腾原生训练!中大和360联合打造多模态任务新范式Qihoo-T2X
  5. 基于昇腾MindSpeed MM玩转Wan2.1视频生成SOTA模型
  6. 多模态理解SOTA模型开箱即用,MindSpeed MM支持Qwen2.5-VL最佳实践
  7. 联创首发-基于昇腾MindSpeed MM玩转Open-Sora Plan V1.5模型
  8. 开源即支持!基于昇腾MindSpeed MM玩转GLM-4.1V-Thinking多模态理解最新模型

安全声明


MindSpeed MM 安全声明

免责声明


致MindSpeed MM使用者

  1. MindSpeed MM提供的模型仅供您用于非商业目的。
  2. 对于各模型,MindSpeed MM平台仅提示性地向您建议可用于训练的数据集,华为不提供任何数据集,如您使用这些数据集进行训练,请您特别注意应遵守对应数据集的License,如您因使用数据集而产生侵权纠纷,华为不承担任何责任。
  3. 如您在使用MindSpeed MM模型过程中,发现任何问题(包括但不限于功能问题、合规问题),请在Gitcode提交issue,我们将及时审视并解决。
  4. MindSpeed MM功能依赖的Megatron等第三方开源软件,均由第三方社区提供和维护,因第三方开源软件导致的问题的修复依赖相关社区的贡献和反馈。您应理解,MindSpeed MM仓库不保证第三方开源软件本身的问题进行修复,也不保证会测试,纠正所有第三方开源软件的漏洞和错误。

致数据集所有者

如果您不希望您的数据集在MindSpeed MM中的模型被提及,或希望更新MindSpeed MM中的模型关于您的数据集的描述,请在Gitcode提交issue,我们将根据您的issue要求删除或更新您的数据集描述。衷心感谢您对MindSpeed MM的理解和贡献。

License声明

Ascend MindSpeed MM提供的模型,如模型目录下存在License的,以该License为准。如模型目录下不存在License的,以Apache 2.0许可证许可,对应许可证文本可查阅Ascend MindSpeed MM根目录。

致谢


MindSpeed MM 由华为公司的下列部门及昇腾生态合作伙伴联合贡献:

华为公司:

  • 计算产品线
  • 公共开发部
  • 2012实验室
  • 华为云

生态合作伙伴:

  • 360 AI Research
  • 北大OpenSoraPlan团队
  • 微信技术架构部基础架构中心

感谢来自社区的每一个PR,欢迎贡献 MindSpeed MM。