文件最后提交记录最后更新时间
feat(ut/qos/torch): 补充ut,修复代码遗漏BUG Co-authored-by: Klayyy<wanglei886@h-partners.com> # message auto-generated for no-merge-commit merge: !3309 merge master into master feat(ut/qos/torch): 补充ut,修复代码遗漏BUG Created-by: Klayyy Commit-by: Klayyy Merged-by: ascend-robot Description: 1.补充AI QOS特性feature UT 2.ut补充过程中,自检代码,修复BUG 2.1 torch_npu._C._distributed_c10d.ProcessGroupHCCL.Options()调用名称修改 2.2 qos_feature.py 中 raiseValueError 提示词完善 2.3 qos.py中对于最小冲突度组合中优先级的赋值部分,去掉重复代码,去掉无用库导入,_PARALLEL_TYPES中有逗号未添加 2.4 qos.py中 应是sdma qos 部分的处理,误使用roce 3.补充H2D QOS 对于 PCIE异步通道的使用,对于DCMI接口新建set_h2d_qos接口,提供给python调用 4.修改aiQos Readme中关于DCMI接口的调用,补充DCMI接口SO编译方法 See merge request: Ascend/MindSpeed!33092 个月前
update megatron unit_test Co-authored-by: RyanWang1022<wangzhiwei108@huawei.com> # message auto-generated for no-merge-commit merge: !3038 merge master into master update megatron unit_test Created-by: RyanWang1022 Commit-by: RyanWang1022 Merged-by: ascend-robot Description: update megatron unit_test See merge request: Ascend/MindSpeed!30386 个月前
hamilton attention implementation in te Co-authored-by: Xiaoda Zhang<zhangxiaoda@huawei.com> # message auto-generated for no-merge-commit merge: !3430 merge add-HA-implement-on-new-master into master hamilton attention implementation in te Created-by: Xiaoda_zhang Commit-by: Xiaoda Zhang Merged-by: ascend-robot Description: What this PR does / why we need it? Please describe the background and detailed changes of the PR. If it is a bugfix, please attach the related issue. 本PR在MindSpeed现有的代码基础上实现了Hamilton attention(HA) (参考https://github.com/infinigence/HamiltonAttention),包括正反向实现。 HA的优势:相比于ring attention只利用到机内的单条链路,HA能够将机内的full mesh网络全部利用起来,有效地减缓了ring atten中可能存在的通信未被计算掩盖时的通信瓶颈。 本PR实现了SBH和TND两种格式的CP,并且通过UT已经验过了正确性。性能上,在WAN和Qwen3-vl模型上验证了性能提升情况: WAN2.2, seq_len=18K | | ring attn (send/recv)| HA (4条ring) (alltoall)| |--|--|--| | 单个通信算子时间 | 3.6ms | 1.2ms | | 整个core attention时间(正向) |33.5ms | 16ms | | 整个core attention时间(反向) |45.9ms | 28.9ms | |一次迭代E2E时间 | 8.3s | 6.0s | WAN2.2, seq_len=37K | | ring attn (send/recv)| HA (4条ring) (alltoall)| |--|--|--| | 单个通信算子时间 | 8.6ms | 3.8ms | | 整个core attention时间(正向) |65.6ms | 45.3ms | | 整个core attention时间(反向) |106.7ms | 90.8ms | |一次迭代E2E时间 | 15.5s | 12.9s | Qwen3-vl, TND格式,每张图片seq_len=1024,62张图片,总seq_len=62K,CP切分后seq_len=7936 | | ring attn (send/recv)| HA (4条ring) (alltoall)| |--|--|--| | 单个通信算子时间 | 2.1ms | 1ms | | 整个core attention时间(正向) | 104ms | 104ms | | 整个core attention时间(反向) |34ms | 14ms | Qwen3-vl, TND格式,每张图片seq_len=4096,62张图片,总seq_len=248K,CP切分后seq_len=31744 | | ring attn (send/recv)| HA (4条ring) (alltoall)| |--|--|--| | 单个通信算子时间 |8.9ms | 2.6ms | | 整个core attention时间(正向) | 112ms | 112ms | | 整个core attention时间(反向) |135ms | 86ms | Does this PR introduce any user-facing change? Please describe whether the PR will result in any user-facing usage changes. If there is related documentation, please specify its path. 使能HA,需要用户配置enable_ha参数,以及传入HA涉及到的in_mapping_list/out_mapping_list表明多条ring是如何收发数据的,以及在TND格式下重组各个seq所需的permute_index。 How was this patch tested? Please explain how to verify the correctness and effectiveness of this feature, as well as its usage constraints and limitations. 已通过UT测试了正确性。 See merge request: Ascend/MindSpeed!343024 天前
fix docs error Co-authored-by: Keilo_W<wangkaiyu11@h-partners.com> # message auto-generated for no-merge-commit merge: !3450 merge master into master fix docs error Created-by: Keilo_W Commit-by: Keilo_W Merged-by: ascend-robot Description: What this PR does / why we need it? Please describe the background and detailed changes of the PR. If it is a bugfix, please attach the related issue. Does this PR introduce any user-facing change? Please describe whether the PR will result in any user-facing usage changes. If there is related documentation, please specify its path. How was this patch tested? Please explain how to verify the correctness and effectiveness of this feature, as well as its usage constraints and limitations. See merge request: Ascend/MindSpeed!345026 天前
Distributed test framework 2 年前
fix【问题】use_custom_fsdp特性和dualpipev不兼容 Co-authored-by: lizhi<lizhi166@huawei.com> # message auto-generated for no-merge-commit merge: !3080 merge bug-master into master fix【问题】use_custom_fsdp特性和dualpipev不兼容 Created-by: liz_hw Commit-by: lizhi Merged-by: ascend-robot Description: fix【问题】use_custom_fsdp特性和dualpipev不兼容 See merge request: Ascend/MindSpeed!30805 个月前
feature(ut): uses third-party verification for mc2 UT. Co-authored-by: Muu<koimuu@163.com> # message auto-generated for no-merge-commit merge: !3247 merge feature_mc2 into master feature(ut): uses third-party verification for mc2 UT. Created-by: Muuyo Commit-by: Muu Merged-by: ascend-robot Description: 采用三方验证 输入值域 [-5,5] 验证文档: https://wiki.huawei.com/domains/76578/wiki/233229/WIKI2026013110011445 See merge request: Ascend/MindSpeed!32473 个月前