昇腾大模型加速库

wangjinyi6fix: update software environment versions in README and script

文件	最后提交记录	最后更新时间
.gitcode	chore: add pull request template configuration Co-authored-by: Jia_Austin<dengjia6@huawei.com> # message auto-generated for no-merge-commit merge: !3285 merge docs_0226 into master chore: add pull request template configuration Created-by: Jia_Austin Commit-by: Jia_Austin Merged-by: ascend-robot Description: chore: add pull request template configuration See merge request: Ascend/MindSpeed!3285	3 个月前
ci	initial	2 年前
docker	Add configurable MindSpeed Docker build Co-authored-by: wangjinyi6<wangjinyi6@huawei.com> # message auto-generated for no-merge-commit merge: !3437 merge 26.0.0_core_r0.12.1 into 26.0.0_core_r0.12.1 Add configurable MindSpeed Docker build Created-by: wangjinyi6 Commit-by: wangjinyi6 Merged-by: ascend-robot Description: What this PR does / why we need it? mindspeed core镜像 dockerfile和构建脚本 Does this PR introduce any user-facing change? 新增docker文件夹 How was this patch tested? 910b机器上验证通过 See merge request: Ascend/MindSpeed!3437	1 个月前
docs	docs:modified document Co-authored-by: z60112595<zhaotao68@h-partners.com> # message auto-generated for no-merge-commit merge: !3466 merge 26.0.0_core_r0.12.1 into 26.0.0_core_r0.12.1 docs:modified document Created-by: kali00 Commit-by: z60112595 Merged-by: ascend-robot Description: What this PR does / why we need it? https://gitcode.com/Ascend/MindSpeed/issues/146 修改CANN安装方式 Does this PR introduce any user-facing change? Please describe whether the PR will result in any user-facing usage changes. If there is related documentation, please specify its path. How was this patch tested? Please explain how to verify the correctness and effectiveness of this feature, as well as its usage constraints and limitations. See merge request: Ascend/MindSpeed!3466	15 天前
mindspeed	fix: mc2 validate args Co-authored-by: clc2025<chenlucong@huawei.com> # message auto-generated for no-merge-commit merge: !3402 merge 26q1 into 26.0.0_core_r0.12.1 fix: mc2 validate args Created-by: clc2025 Commit-by: clc2025 Merged-by: ascend-robot Description: What this PR does / why we need it? Please describe the background and detailed changes of the PR. If it is a bugfix, please attach the related issue. DTS2026040735953 Does this PR introduce any user-facing change? Please describe whether the PR will result in any user-facing usage changes. If there is related documentation, please specify its path. How was this patch tested? Please explain how to verify the correctness and effectiveness of this feature, as well as its usage constraints and limitations. See merge request: Ascend/MindSpeed!3402	1 个月前
pre-commit	修改完善pre-commit开源代码检测工具 Co-authored-by: wujinyuan1<wujinyuan1@huawei.com> # message auto-generated for no-merge-commit merge: !3455 merge 26.0.0_core_r0.12.1 into 26.0.0_core_r0.12.1 修改完善pre-commit开源代码检测工具 Created-by: wujinyuan1 Commit-by: wujinyuan1 Merged-by: ascend-robot Description: What this PR does / why we need it? Please describe the background and detailed changes of the PR. If it is a bugfix, please attach the related issue. Does this PR introduce any user-facing change? Please describe whether the PR will result in any user-facing usage changes. If there is related documentation, please specify its path. How was this patch tested? Please explain how to verify the correctness and effectiveness of this feature, as well as its usage constraints and limitations. See merge request: Ascend/MindSpeed!3455	25 天前
tests_extend	fix: update software environment versions in README and script Co-authored-by: wangjinyi6<wangjinyi6@huawei.com> # message auto-generated for no-merge-commit merge: !3423 merge feat/2600-core-r0121 into 26.0.0_core_r0.12.1 fix: update software environment versions in README and script Created-by: wangjinyi6 Commit-by: wangjinyi6 Merged-by: ascend-robot Description: cherry pick至26.0.0_core_r0.12.1分支 1. 修改配套版本 See merge request: Ascend/MindSpeed!3423	8 天前
tools	fix：fix atten_mask_shape error when using transformer_engine Co-authored-by: Keilo_W<wangkaiyu11@h-partners.com> # message auto-generated for no-merge-commit merge: !3293 merge master into master fix：fix atten_mask_shape error when using transformer_engine Created-by: Keilo_W Commit-by: Keilo_W Merged-by: ascend-robot Description: An atten_mask_shape error will occur if `--attention-mask-type causal` is used together with `--transformer-impl transformer_engine`. To avoid this, you must also enable the `--use-flash-attn` option. See merge request: Ascend/MindSpeed!3293	2 个月前
.clang-format	修改完善pre-commit开源代码检测工具 Co-authored-by: wujinyuan1<wujinyuan1@huawei.com> # message auto-generated for no-merge-commit merge: !3455 merge 26.0.0_core_r0.12.1 into 26.0.0_core_r0.12.1 修改完善pre-commit开源代码检测工具 Created-by: wujinyuan1 Commit-by: wujinyuan1 Merged-by: ascend-robot Description: What this PR does / why we need it? Please describe the background and detailed changes of the PR. If it is a bugfix, please attach the related issue. Does this PR introduce any user-facing change? Please describe whether the PR will result in any user-facing usage changes. If there is related documentation, please specify its path. How was this patch tested? Please explain how to verify the correctness and effectiveness of this feature, as well as its usage constraints and limitations. See merge request: Ascend/MindSpeed!3455	25 天前
.gitignore	!352 [Fix] fix order of getting batch of ulysses Merge pull request !352 from 郭鹏/master	1 年前
.pre-commit-config.yaml	修改完善pre-commit开源代码检测工具 Co-authored-by: wujinyuan1<wujinyuan1@huawei.com> # message auto-generated for no-merge-commit merge: !3455 merge 26.0.0_core_r0.12.1 into 26.0.0_core_r0.12.1 修改完善pre-commit开源代码检测工具 Created-by: wujinyuan1 Commit-by: wujinyuan1 Merged-by: ascend-robot Description: What this PR does / why we need it? Please describe the background and detailed changes of the PR. If it is a bugfix, please attach the related issue. Does this PR introduce any user-facing change? Please describe whether the PR will result in any user-facing usage changes. If there is related documentation, please specify its path. How was this patch tested? Please explain how to verify the correctness and effectiveness of this feature, as well as its usage constraints and limitations. See merge request: Ascend/MindSpeed!3455	25 天前
LICENSE	!1138 【安全】冗余代码删除，Licnse添加 Merge pull request !1138 from jiangzhihan1/master	1 年前
OWNERS	Add reviewers Co-authored-by: wuweiqiang24<wuweiqiang11@huawei.com> # message auto-generated for no-merge-commit merge: merge add_reviewer into master Add reviewers Created-by: wuweiqiang24 Commit-by: wuweiqiang24 Merged-by: ascend-robot Description: Add reviewers in Owners See merge request: Ascend/MindSpeed !2874	7 个月前
README.md	[Docs]update document and rename release_notes Co-authored-by: z60119525<zhaotao68@h-partners.com> # message auto-generated for no-merge-commit merge: !3416 merge 26.0.0_core_r0.12.1 into 26.0.0_core_r0.12.1 [Docs]update document and rename release_notes Created-by: kali00 Commit-by: z60119525 Merged-by: ascend-robot Description: 1、修改版本说明名称 2、修改CANN版本为最新版本 3、新增快速入门目录 See merge request: Ascend/MindSpeed!3416	1 个月前
Third_Party_Open_Source_Software_Notice	!2602 添加免责声明 Merge pull request !2602 from glhyy/secrity	10 个月前
requirements.txt	!2664 Add verl document Merge pull request !2664 from Jializheng/master	10 个月前
setup.py	【INFO update!!!】url替换 Co-authored-by: EX_mitsu<yangjie409@h-partners.com> # message auto-generated for no-merge-commit merge: merge master into master 【INFO update!!!】url替换 Created-by: EX_mitsuX Commit-by: EX_mitsu Merged-by: ascend-robot Description: 清理替换gitee相关url，添加新商发版本相关信息。 See merge request: Ascend/MindSpeed !2897	7 个月前

简介

MindSpeed Core是针对华为昇腾设备的大模型加速库。

大模型训练是一种非常复杂的过程，涉及到许多技术和挑战，其中大模型训练需要大量的显存资源是一个难题，对计算卡提出了不小的挑战。为了在单个计算卡显存资源不足时，可以通过多张计算卡进行计算，业界出现了类似 Megatron、DeepSpeed 等第三方大模型加速库，对模型、输入数据等进行切分并分配到不同的计算卡上，最后再通过集合通信对结果进行汇总。

昇腾提供 MindSpeed Core 加速库，使客户大模型业务能快速迁移至昇腾设备，并且支持昇腾专有算法，确保开箱可用。

此外在 MindSpeed Core 加速库的基础之上也提供了大语言模型、多模态模型以及强化学习模型套件加速库:

📝 大语言模型库: MindSpeed LLM
🖼️ 多模态模型库: MindSpeed MM
🖥️ 强化学习加速库: MindSpeed RL

社区会议

MindSpeed系列TC及SIG会议安排请查看Ascend会议中心

版本说明

当前版本推荐配套表如下：

软件	版本
MindSpeed Core分支	26.0.0_core_r0.12.1
Mcore版本	0.12.1
CANN版本	9.0.0
PyTorch	2.7.1
torch_npu版本	26.0.0
Python版本	Python3.10.x

更多具体说明请参考：版本配套表。

安装

MindSpeed Core拉取源码后使用pip命令行安装pip install -e MindSpeed，具体请参考部署文档安装 MindSpeed Core 指定分支及其依赖软件。

获取并切换Megatron-LM版本至 core_v0.12.1 版本，可参考：

git clone https://github.com/NVIDIA/Megatron-LM.git
cd Megatron-LM
git checkout core_v0.12.1

快速上手

概述

使用MindSpeed Core仅须增加一行代码，即可在昇腾训练设备上运行Megatron-LM，并进一步参考特性介绍使能MindSpeed的各项加速特性。

操作方法

以 GPT 模型为例：在 Megatron-LM 目录下修改pretrain_gpt.py文件，在import torch下新增一行：import mindspeed.megatron_adaptor，即如下修改：

  import torch
  import mindspeed.megatron_adaptor # 新增代码行
  from functools import partial
  from contextlib import nullcontext
  import inspect

具体操作可以参考快速上手指导。

加速特性分级说明

MindSpeed Core 加速特性分为三个层级，用户可根据实际需求选择通过设置启动脚本中的 --optimization-level {层级} 参数来自定义开启的优化层级。该参数支持以下配置：

层级	层级名称	介绍
0	基础功能兼容	提供Megatron-LM框架对NPU的基本功能适配。
1	亲和性增强🔥	在L0基础上使能部分融合算子与昇腾亲和计算改写。
2	加速特性使能🔥🔥	默认值。在L0、L1基础上开启更丰富的加速特性，加速特性通常通过具体参数使能，可参考"特性介绍"章节。

特性介绍

MindSpeed 特性由七大模块组成，分别为：Megatron特性支持、并行策略特性、内存优化特性、亲和计算特性、通信优化特性、关键场景特性以及多模态特性。其中【Released】表示是否商用发布，原型特性为非商用发布。

特性的介绍中说明了对应特性的应用场景及使用说明。一般而言，在脚本中加入相关参数即可轻松使用对应特性。🛰️
MindSpeed 加速特性仅支持mcore，这也是Megatron在v0.6.0版本后主推分支，也是当前版本的默认分支。🛰️
当前大模型训练主要使用bf16数据类型，以下特性若无特殊声明原则上兼容fp16, 如使用其它数据类型遇到问题可提交issue, 我们会快速响应。🛰️
注意❗：在Megatron_core_r0.9.0后，alltoall dispatcher进行了调整，原版本alltoall dispatcher重命名为alltoall_seq。MindSpeed MoE特性对各分支的支持情况，见各特性说明。

Megatron特性支持

特性名称	介绍	Released
Megatron 数据并行	link	✅
Megatron 张量并行	link	✅
Megatron 流水并行	link	✅
Megatron 虚拟流水线并行	link	✅
Megatron 分布式优化器	link	✅
Megatron 序列并行	link	✅
Megatron 异步DDP	link	✅
Megatron 权重更新通信隐藏	link	✅
Megatron 重计算	link	✅
Megatron 分布式权重	link	✅
Megatron 全分片并行	link	✅
Megatron Transformer Engine	link	✅
Megatron Multi-head Latent Attention	link	✅

并行策略特性

特性名称	介绍	Released
Ascend Ulysses 长序列并行	link	✅
Ascend Ring Attention 长序列并行	link	✅
Ascend Double Ring Attention 长序列并行	link	✅
Ascend 混合长序列并行	link	✅
Ascend 自定义空操作层	link	✅
Ascend DualPipeV	link	✅

内存优化特性

特性名称	介绍	Released
Ascend 激活函数重计算	link	✅
Ascend 重计算流水线独立调度	link	✅
Ascend Mask归一	link	✅
Ascend BF16 参数副本复用	link	✅
Ascend swap_attention	link	✅
Ascend Norm重计算	link	✅
Ascend Hccl Buffer 自适应	link	✅
Ascend Swap Optimizer	link	✅
Virtual Optimizer	link	✅

亲和计算特性

特性名称	介绍	Released
Ascend rms_norm 融合算子	link	✅
Ascend swiglu 融合算子	link	✅
Ascend rotary_embedding 融合算子	link	✅
Ascend flash attention	link	✅
Ascend Moe Token Permute and Unpermute 融合算子	link	✅
Ascend npu_matmul_add_fp32 梯度累加融合算子	link	✅
Ascend 计算通信并行优化	link	❌
Ascend MC2	link	❌
Ascend fusion_attention_v2	link	❌

通信优化特性

特性名称	介绍	Released
Ascend Gloo 存档落盘优化	link	✅
Ascend 高维张量并行	link	✅

Mcore MoE特性

特性名称	介绍	Released
Ascend Megatron MoE GMM	link	✅
Ascend Megatron MoE Allgather Dispatcher 性能优化	link	✅
Ascend Megatron MoE Alltoall Dispatcher 性能优化	link	✅
Ascend Megatron MoE TP拓展EP	link	✅
Megatron MoE alltoall dispatcher分支通信隐藏优化	link	❌
Megatron MoE allgather dispatcher分支通信隐藏优化	link	✅
Ascend 共享专家	link	✅
1F1B Overlap	link	✅
专家并行动态负载均衡(数参互寻)	link	✅

关键场景特性

特性名称	介绍	Released
Ascend EOD Reset训练场景	link	✅
Ascend alibi	link	❌

多模态特性

特性名称	介绍	Released
Ascend fused ema adamw优化器	link	❌
Ascend PP支持动态形状	link	✅
Ascend PP支持多参数传递	link	✅
Ascend PP支持多参数传递和动态形状	link	✅
Ascend 非对齐线性层	link	✅
Ascend 非对齐Ulysses长序列并行	link	✅

其它特性

特性名称	介绍	Released
Ascend TFLOPS计算	link	✅
Ascend Auto Settings 并行策略自动搜索系统	link	❌
Ascend 确定性计算	link	❌
Ascend MindStudio Training Tools 精度对照	link	❌

自定义算子

昇腾训练自定义算子统一由torch_npu提供API，以下API预计2025年Q4起不维护，请优先使用torch_npu提供的自定义算子，如有新需求或问题可提issue反馈，我们会尽快回复。

部分自定义算子设置为公开接口，公开接口设置说明请参照 MindSpeed 安全声明中的公开接口声明，具体对外接口细节参照以下算子对应的手册链接。

自定义算子名称	介绍	Released
npu_dropout_add_layer_norm	link	✅
npu_rotary_position_embedding	link	✅
fusion_attention	link	✅
rms_norm	link	✅
swiglu	link	✅
npu_mm_all_reduce_add_rms_norm	link	✅
npu_mm_all_reduce_add_rms_norm_	link	✅
npu_gmm	link	✅
npu_grouped_mat_mul_all_reduce	link	✅
npu_ring_attention_update	link	✅
npu_matmul_add_fp32	link	✅
npu_groupmatmul_add_fp32	link	✅
npu_apply_fused_ema_adamw	link	❌
lcal_coc	link	❌
ffn	link	❌
npu_all_to_all_all_gather_bmm	link	❌
npu_bmm_reduce_scatter_all_to_all	link	❌
quant_gmm	link	❌
npu_apply_fused_adamw_v2	link	✅

分支维护策略

🛠️ MindSpeed 版本分支的维护阶段如下：

状态	时间	说明
计划 🕐	1-3 个月	计划特性
开发 🕔	3 个月	开发特性
维护 🕚	6-12 个月	合入所有已解决的问题并发布版本，针对不同的MindSpeed 版本采取不同的维护策略，常规版本和长期支持版本维护周期分别为6个月和12个月
无维护 🕛	0-3 个月	合入所有已解决的问题，无专职维护人员，无版本发布
生命周期终止（EOL）🚫	N/A	分支不再接受任何修改

🛠️ MindSpeed 版本维护策略：

MindSpeed版本	维护策略	当前状态	发布时间	后续状态
26.0.0_core_r0.12.1	常规版本	维护	2026/3/30	预计2026/9/30起无维护
2.3.0_core_r0.12.1	常规版本	维护	2025/12/30	预计2026/6/30起无维护
2.2.0_core_r0.12.1	常规版本	停止维护	2025/09/30	2026/3/30起无维护
2.1.0_core_r0.12.1	常规版本	停止维护	2025/06/30	2025/12/30起无维护
2.1.0_core_r0.8.0	常规版本	停止维护	2025/06/30	2025/12/30起无维护
2.0.0_core_r0.8.0	常规版本	停止维护	2025/03/30	2025/9/30起无维护
1.0.0_core_r0.7.0	常规版本	停止维护	2024/12/30	2025/6/30起无维护
1.0.0_core_r0.6.0	常规版本	停止维护	2024/12/30	2025/6/30起无维护
1.0.RC3_core_r0.7.0	常规版本	停止维护	2024/09/30	2025/3/30起无维护
1.0.RC3_core_r0.6.0	常规版本	停止维护	2024/09/30	2025/3/30起无维护
1.0.RC2	常规版本	停止维护	2024/06/30	2024/12/30起无维护
1.0.RC1	常规版本	停止维护	2024/03/30	2024/9/30起无维护

常见问题

现象	介绍
Data helpers 数据预处理出错 ❗	data_helpers数据预处理出错
Torch extensions 编译卡住 ❗	Torch extensions卡住
megatron0.7.0版本长稳测试出现grad norm为nan ❗	grad_norm_nan
Gloo建链失败Gloo connectFullMesh failed with ... ❗	hccl-replace-gloo

技术文章

安全声明

⚠️ MindSpeed 安全声明

免责声明

致MindSpeed使用者

MindSpeed提供的所有内容仅供您用于非商业目的。
对于MindSpeed测试用例以及示例文件中所涉及的各模型和数据集，平台仅用于功能测试，华为不提供任何模型权重和数据集，如您使用这些数据进行训练，请您特别注意应遵守对应模型和数据集的License，如您因使用这些模型和数据集而产生侵权纠纷，华为不承担任何责任。
如您在使用MindSpeed过程中，发现任何问题（包括但不限于功能问题、合规问题），请在Gitee提交issue，我们将及时审视并解决。
MindSpeed功能依赖的Megatron等第三方开源软件，均由第三方社区提供和维护，因第三方开源软件导致的问题的修复依赖相关社区的贡献和反馈。您应理解，MindSpeed仓库不保证对第三方开源软件本身的问题进行修复，也不保证会测试、纠正所有第三方开源软件的漏洞和错误。