文件最后提交记录最后更新时间
!2254 refactor: fused_rope, level1(l1) Merge pull request !2254 from 邓佳/refactor_rope 1 年前
!2726 [mindspore][master] adapt to megatron0.12.1 for mindspeed-mm Merge pull request !2726 from wangshuangling/master 9 个月前
[auto settings] fix auto settings feature Co-authored-by: mhlinoer<lvmuheng@h-partners.com> # message auto-generated for no-merge-commit merge: !2926 merge master into master [auto settings] fix auto settings feature Created-by: mhlinoer Commit-by: mhlinoer Merged-by: ascend-robot Description: [auto settings] fix auto settings feature See merge request: Ascend/MindSpeed!29267 个月前
perf(verl ckpt): ckpt load and save acceleration Co-authored-by: 李鸣沼<lmztju@126.com> # message auto-generated for no-merge-commit merge: !3074 merge verl_load_and_save_ckpt into master perf(verl ckpt): ckpt load and save acceleration Created-by: lmztju Commit-by: 李鸣沼;l30057177 Merged-by: ascend-robot Description: **测试场景1:** verl+megatron后端dapo-qwen3-30b 910A3双机加载和保存ckpt 本地存储 ![image.png](https://raw.gitcode.com/user-images/assets/7404741/cea9c052-2550-4d2c-9dd7-bcfe0cb33d99/image.png 'image.png') See merge request: Ascend/MindSpeed!30745 个月前
fix: fix the initialization error of compress-optimizer Co-authored-by: NingGuangyou<ningguangyou@h-partners.com> # message auto-generated for no-merge-commit merge: !3319 merge master into master fix: fix the initialization error of compress-optimizer Created-by: NingGuangyou Commit-by: NingGuangyou Merged-by: ascend-robot Description: fix the initialization error of compress-optimizer See merge request: Ascend/MindSpeed!33192 个月前
!2650 Add TE class and args modification to support verl Merge pull request !2650 from Jializheng/args 10 个月前
fix: te ulysses Co-authored-by: clc2025<chenlucong@huawei.com> # message auto-generated for no-merge-commit merge: !3349 merge fix_te_ulysses into master fix: te ulysses Created-by: clc2025 Commit-by: clc2025 Merged-by: ascend-robot Description: What this PR does / why we need it? Please describe the background and detailed changes of the PR. If it is a bugfix, please attach the related issue. Does this PR introduce any user-facing change? Please describe whether the PR will result in any user-facing usage changes. If there is related documentation, please specify its path. How was this patch tested? Please explain how to verify the correctness and effectiveness of this feature, as well as its usage constraints and limitations. See merge request: Ascend/MindSpeed!33492 个月前
!2738 【feat.】支持Megatron Custom FSDP特性 Merge pull request !2738 from yuqi/custom_fsdp 9 个月前
!2734 feat: balanced moe Merge pull request !2734 from 邓佳/core_r0.12.1_bm_v2 8 个月前
Bugfix: 在disable-gloo-groups不开启的情况下不去设置enable_gloo_process_groups Co-authored-by: fishhhqi<moeyfishyq@outlook.com> # message auto-generated for no-merge-commit merge: merge gloo_fix2 into master Bugfix: 在disable-gloo-groups不开启的情况下不去设置enable_gloo_process_groups Created-by: fishhhqi Commit-by: fishhhqi Merged-by: ascend-robot Description: 在disable-gloo-groups不开启的情况下不去设置enable_gloo_process_groups See merge request: Ascend/MindSpeed!28877 个月前
!2777 [Bugfix] 修复 dist_train Merge pull request !2777 from yuqi/dist_train_fix 9 个月前
[Bugfix] Fix Megatron checkpoint saving&loading compatibility for torch_dcp format Co-authored-by: 林明哲<linmingzhe3@huawei.com> # message auto-generated for no-merge-commit merge: !3077 merge fix1202 into master [Bugfix] Fix Megatron checkpoint saving&loading compatibility for torch_dcp format Created-by: LinMingZhe Commit-by: 林明哲 Merged-by: ascend-robot Description: Fix Megatron checkpoint saving&loading compatibility for torch_dcp format See merge request: Ascend/MindSpeed!30775 个月前
fix: NPU datadump level: L0 & mix Co-authored-by: yulelanmei<huangyijie8@huawei.com> # message auto-generated for no-merge-commit merge: !3351 merge master into master fix: NPU datadump level: L0 & mix Created-by: yulelanmei Commit-by: yulelanmei Merged-by: ascend-robot Description: What this PR does / why we need it? 当前--npu-datadump未适配 L0及mix 的dump等级,需要增强功能 Does this PR introduce any user-facing change? N/A How was this patch tested? 开启--npu-datadump,config.json配置level为L0或mix 测试:https://wiki.huawei.com/domains/148330/wiki/296621/WIKI2026032510543405 See merge request: Ascend/MindSpeed!33512 个月前
feat(triton):sort_chunks_by_idx Co-authored-by: guofanfeng<guofanfeng1@huawei.com> # message auto-generated for no-merge-commit merge: !2997 merge master into master feat(triton):sort_chunks_by_idx Created-by: guofanfeng23 Commit-by: guofanfeng Merged-by: ascend-robot Description: sort_chunks_by_idx triton算子接入 算子验证结果: https://wiki.huawei.com/domains/152732/wiki/307991/WIKI202511219117266 See merge request: Ascend/MindSpeed!29975 个月前
!2230 特性补全 Merge pull request !2230 from 刘哲续/master 1 年前
!1967 fix coc demo, import error and feature validate logic Merge pull request !1967 from wangyuansheng8/master 1 年前
feat: fp8_reuse_quant_w Co-authored-by: Jia_Austin<dengjia6@huawei.com> # message auto-generated for no-merge-commit merge: !3358 merge feat_fp8_reuse_quant_w into master feat: fp8_reuse_quant_w Created-by: Jia_Austin Commit-by: Jia_Austin Merged-by: ascend-robot Description: What this PR does / why we need it? Please describe the background and detailed changes of the PR. If it is a bugfix, please attach the related issue. Does this PR introduce any user-facing change? Please describe whether the PR will result in any user-facing usage changes. If there is related documentation, please specify its path. How was this patch tested? Please explain how to verify the correctness and effectiveness of this feature, as well as its usage constraints and limitations. See merge request: Ascend/MindSpeed!33582 个月前
quant fp8 optimizer 6 个月前
docs: bm Co-authored-by: Jia_Austin<dengjia6@huawei.com> # message auto-generated for no-merge-commit merge: !3226 merge bm into master docs: bm Created-by: Jia_Austin Commit-by: Jia_Austin Merged-by: ascend-robot Description: docs: bm See merge request: Ascend/MindSpeed!32264 个月前
SwapOptimizer to support fp32 weights Co-authored-by: JialiZheng<jializheng@huawei.com> # message auto-generated for no-merge-commit merge: !3095 merge swap_optimizer into master SwapOptimizer to support fp32 weights Created-by: JialiZheng1 Commit-by: JialiZheng Merged-by: ascend-robot Description: SwapOptimizer to support fp32 weights See merge request: Ascend/MindSpeed!30955 个月前
[test]Add fsdp2 feature to ST Co-authored-by: yulelanmei<huangyijie8@huawei.com> # message auto-generated for no-merge-commit merge: !3086 merge add_st into master [test]Add fsdp2 feature to ST Created-by: yulelanmei Commit-by: yulelanmei Merged-by: ascend-robot Description: 1. 为ST用例新增目前缺少的需要落A5转测的加速特性,包含 fsdp2 在内 2. 统一更新 acl.get_soc_name() 为 torch_npu.npu.get_device_name() ,后者在A2/A3/A5上都能使用,前者在A5不支持 3. multi_parameter_pipeline用例修复,删除重复用例文件 See merge request: Ascend/MindSpeed!30865 个月前
feat: add w4a16 quant Co-authored-by: xusiyang<xusiyang2@huawei.com> # message auto-generated for no-merge-commit merge: !3334 merge master into master feat: add w4a16 quant Created-by: weixin_44492126 Commit-by: xusiyang;weixin_44492126 Merged-by: ascend-robot Description: What this PR does / why we need it? QAT支持W4A16伪量化 Does this PR introduce any user-facing change? 详细说明见:docs/zh/features/qat_quant.md How was this patch tested? 参数添加"--qat-scheme w4a16-mxf4"时启用伪量化 See merge request: Ascend/MindSpeed!33342 个月前
feat(ut/qos/torch): 补充ut,修复代码遗漏BUG Co-authored-by: Klayyy<wanglei886@h-partners.com> # message auto-generated for no-merge-commit merge: !3309 merge master into master feat(ut/qos/torch): 补充ut,修复代码遗漏BUG Created-by: Klayyy Commit-by: Klayyy Merged-by: ascend-robot Description: 1.补充AI QOS特性feature UT 2.ut补充过程中,自检代码,修复BUG 2.1 torch_npu._C._distributed_c10d.ProcessGroupHCCL.Options()调用名称修改 2.2 qos_feature.py 中 raiseValueError 提示词完善 2.3 qos.py中对于最小冲突度组合中优先级的赋值部分,去掉重复代码,去掉无用库导入,_PARALLEL_TYPES中有逗号未添加 2.4 qos.py中 应是sdma qos 部分的处理,误使用roce 3.补充H2D QOS 对于 PCIE异步通道的使用,对于DCMI接口新建set_h2d_qos接口,提供给python调用 4.修改aiQos Readme中关于DCMI接口的调用,补充DCMI接口SO编译方法 See merge request: Ascend/MindSpeed!33092 个月前
!2675 Add once warning and args check Merge pull request !2675 from Jializheng/master 10 个月前
fix: mc2 validate args Co-authored-by: clc2025<chenlucong@huawei.com> # message auto-generated for no-merge-commit merge: !3402 merge 26q1 into 26.0.0_core_r0.12.1 fix: mc2 validate args Created-by: clc2025 Commit-by: clc2025 Merged-by: ascend-robot Description: What this PR does / why we need it? Please describe the background and detailed changes of the PR. If it is a bugfix, please attach the related issue. DTS2026040735953 Does this PR introduce any user-facing change? Please describe whether the PR will result in any user-facing usage changes. If there is related documentation, please specify its path. How was this patch tested? Please explain how to verify the correctness and effectiveness of this feature, as well as its usage constraints and limitations. See merge request: Ascend/MindSpeed!34021 个月前
!2635 安全:文件路径校验/权限 Merge pull request !2635 from glhyy/secmaster 10 个月前
refactor: rename --sub-seq-length to --fix-sub-seq-length for clarity Co-authored-by: Jia_Austin<dengjia6@huawei.com> # message auto-generated for no-merge-commit merge: !3231 merge sub_seq into master refactor: rename --sub-seq-length to --fix-sub-seq-length for clarity Created-by: Jia_Austin Commit-by: Jia_Austin Merged-by: ascend-robot Description: refactor: rename --sub-seq-length to --fix-sub-seq-length for clarity See merge request: Ascend/MindSpeed!32314 个月前
feat: add w4a16 quant Co-authored-by: xusiyang<xusiyang2@huawei.com> # message auto-generated for no-merge-commit merge: !3334 merge master into master feat: add w4a16 quant Created-by: weixin_44492126 Commit-by: xusiyang;weixin_44492126 Merged-by: ascend-robot Description: What this PR does / why we need it? QAT支持W4A16伪量化 Does this PR introduce any user-facing change? 详细说明见:docs/zh/features/qat_quant.md How was this patch tested? 参数添加"--qat-scheme w4a16-mxf4"时启用伪量化 See merge request: Ascend/MindSpeed!33342 个月前
!2112 MindSpeed L0 reconstruction Merge pull request !2112 from Jializheng/master 1 年前
!2650 Add TE class and args modification to support verl Merge pull request !2650 from Jializheng/args 10 个月前