文件最后提交记录最后更新时间
add_comment Co-authored-by: Zhaiwenxuan<zhaiwenxuan4@h-partners.com> # message auto-generated for no-merge-commit merge: !3354 merge master into master add_comment Created-by: Zhaiwenxuan Commit-by: Zhaiwenxuan Merged-by: ascend-robot Description: tests_extend\example\train_distributed_ms.sh和tests_extend\example\train_distributed.sh增加注释 See merge request: Ascend/MindSpeed!33542 个月前
[mindspore] [ut][master]add uts for mindspore Co-authored-by: kongdeshuo<1670690897@qq.com> # message auto-generated for no-merge-commit merge: !2891 merge master into master [mindspore] [ut][master]add uts for mindspore Created-by: kongdeshuo Commit-by: kongdeshuo Merged-by: ascend-robot Description: add uts for mindspore See merge request: Ascend/MindSpeed!28917 个月前
!1981 fix oom script Merge pull request !1981 from wangyuansheng8/master 1 年前
fix: update dependencies in Verl documentation and scripts to latest versions Co-authored-by: wangjinyi6<wangjinyi6@huawei.com> # message auto-generated for no-merge-commit merge: !3418 merge mindspeed-master-work into master fix: update dependencies in Verl documentation and scripts to latest versions Created-by: wangjinyi6 Commit-by: wangjinyi6 Merged-by: gp513 Description: 1. 修改verl文档失效链接 2. 更新verl用例至vllm0.13相关配套版本 See merge request: Ascend/MindSpeed!34181 个月前
hamilton attention implementation in te Co-authored-by: Xiaoda Zhang<zhangxiaoda@huawei.com> # message auto-generated for no-merge-commit merge: !3430 merge add-HA-implement-on-new-master into master hamilton attention implementation in te Created-by: Xiaoda_zhang Commit-by: Xiaoda Zhang Merged-by: ascend-robot Description: What this PR does / why we need it? Please describe the background and detailed changes of the PR. If it is a bugfix, please attach the related issue. 本PR在MindSpeed现有的代码基础上实现了Hamilton attention(HA) (参考https://github.com/infinigence/HamiltonAttention),包括正反向实现。 HA的优势:相比于ring attention只利用到机内的单条链路,HA能够将机内的full mesh网络全部利用起来,有效地减缓了ring atten中可能存在的通信未被计算掩盖时的通信瓶颈。 本PR实现了SBH和TND两种格式的CP,并且通过UT已经验过了正确性。性能上,在WAN和Qwen3-vl模型上验证了性能提升情况: WAN2.2, seq_len=18K | | ring attn (send/recv)| HA (4条ring) (alltoall)| |--|--|--| | 单个通信算子时间 | 3.6ms | 1.2ms | | 整个core attention时间(正向) |33.5ms | 16ms | | 整个core attention时间(反向) |45.9ms | 28.9ms | |一次迭代E2E时间 | 8.3s | 6.0s | WAN2.2, seq_len=37K | | ring attn (send/recv)| HA (4条ring) (alltoall)| |--|--|--| | 单个通信算子时间 | 8.6ms | 3.8ms | | 整个core attention时间(正向) |65.6ms | 45.3ms | | 整个core attention时间(反向) |106.7ms | 90.8ms | |一次迭代E2E时间 | 15.5s | 12.9s | Qwen3-vl, TND格式,每张图片seq_len=1024,62张图片,总seq_len=62K,CP切分后seq_len=7936 | | ring attn (send/recv)| HA (4条ring) (alltoall)| |--|--|--| | 单个通信算子时间 | 2.1ms | 1ms | | 整个core attention时间(正向) | 104ms | 104ms | | 整个core attention时间(反向) |34ms | 14ms | Qwen3-vl, TND格式,每张图片seq_len=4096,62张图片,总seq_len=248K,CP切分后seq_len=31744 | | ring attn (send/recv)| HA (4条ring) (alltoall)| |--|--|--| | 单个通信算子时间 |8.9ms | 2.6ms | | 整个core attention时间(正向) | 112ms | 112ms | | 整个core attention时间(反向) |135ms | 86ms | Does this PR introduce any user-facing change? Please describe whether the PR will result in any user-facing usage changes. If there is related documentation, please specify its path. 使能HA,需要用户配置enable_ha参数,以及传入HA涉及到的in_mapping_list/out_mapping_list表明多条ring是如何收发数据的,以及在TND格式下重组各个seq所需的permute_index。 How was this patch tested? Please explain how to verify the correctness and effectiveness of this feature, as well as its usage constraints and limitations. 已通过UT测试了正确性。 See merge request: Ascend/MindSpeed!343024 天前
doc: adjust doc Co-authored-by: liutongtong27<liutongtong15@h-partners.com> # message auto-generated for no-merge-commit merge: !3305 merge master_menutest into master doc: adjust doc Created-by: liutongtong27 Commit-by: liutongtong27 Merged-by: ascend-robot Description: What this PR does / why we need it? Please describe the background and detailed changes of the PR. If it is a bugfix, please attach the related issue. Does this PR introduce any user-facing change? Please describe whether the PR will result in any user-facing usage changes. If there is related documentation, please specify its path. How was this patch tested? Please explain how to verify the correctness and effectiveness of this feature, as well as its usage constraints and limitations. See merge request: Ascend/MindSpeed!33052 个月前
!2114 实现pipeline parallel的noop layer的重构 Merge pull request !2114 from liurong1995/feature_noop 1 年前
!2664 Add verl document Merge pull request !2664 from Jializheng/master 10 个月前
[mindspore][ut]add mindspore_ut Co-authored-by: ybwang19<1605891897@qq.com> # message auto-generated for no-merge-commit merge: merge master into master [mindspore][ut]add mindspore_ut Created-by: ybwang19 Commit-by: ybwang19 Merged-by: ascend-robot Description: 用例:npu_rms_norm 耗时:13s 结果:PASS ![image.png](https://raw.gitcode.com/user-images/assets/7404741/3b6f129c-6674-4595-aa31-da8365a37d9f/image.png 'image.png') See merge request: Ascend/MindSpeed!28847 个月前
README.md

Tests Usage

  1. Install mindspeed

    pip install -e .
    
  2. Copy the entire tests_extend to the root path of Megatron-LM

    cp -r tests_extend {PATH_TO_MEGATRON_LM}
    
  3. Run a single test by pytest command line under Megatron-LM root path

    cd {PATH_TO_MEGATRON_LM}
    pytest tests_extend/unit_tests/megatron/test_distrib_optimizer.py
    
  4. Run the whole tests

    cd {PATH_TO_MEGATRON_LM}
    pytest tests_extend