文件最后提交记录最后更新时间
增加MOE专家负载均衡功能 Co-authored-by: zhanggaolu2<252028123@qq.com> # message auto-generated for no-merge-commit merge: !2845 merge expert_loadbalance2master into master 增加MOE专家负载均衡功能 Created-by: zhanggaolu2 Commit-by: zhanggaolu2 Merged-by: ascend-robot Description: 增加MOE专家负载均衡功能 See merge request: Ascend/MindSpeed!28457 个月前
feat: add custom pp layout Co-authored-by: wuweiqiang24<wuweiqiang11@huawei.com> # message auto-generated for no-merge-commit merge: !3496 merge add_pp_layout into master feat: add custom pp layout Created-by: wuweiqiang24 Commit-by: wuweiqiang24 Merged-by: ascend-robot Description: 新增pipeline-model-parallel-layout功能,支持自定义PP每个stage的层排布 验证链接:https://wiki.huawei.com/domains/137239/wiki/268925/WIKI2026052611233549 issue: https://gitcode.com/Ascend/MindSpeed/issues/166 See merge request: Ascend/MindSpeed!349611 小时前
!376 mcore moe optimization Merge pull request !376 from Jializheng/master 1 年前
【feat】adaptor for mc2 moe Co-authored-by: EX_mitsu<yangjie409@h-partners.com> Co-authored-by: libaokui<libaokui@huawei.com> # message auto-generated for no-merge-commit merge: !3123 merge master into master 【feat】adaptor for mc2 moe Created-by: EX_mitsuX Commit-by: EX_mitsuX;EX_mitsu;Co-authored-by:libaokui;Co-Author libaokui;libaokui Merged-by: ascend-robot Description: adaptor for mc2 moe 自验结果: https://wiki.huawei.com/domains/160198/wiki/327183/WIKI202601079697698 See merge request: Ascend/MindSpeed!31234 个月前
perf(fp8): enhance te Co-authored-by: Muu<koimuu@163.com> # message auto-generated for no-merge-commit merge: !3064 merge feature_fix into master perf(fp8): enhance te Created-by: Muuyo Commit-by: Muu Merged-by: ascend-robot Description: 1. 引入低精度重计算 2. mxfp8 mm之后清理无用quant tensor 和 scale 显存 3. 重构 te linner 抽象 dw 流程 4. 提取 gmm op 4. GMMFunction 引入 gemm_gradient_accumulation_fusion 5. 支持参数(--moe-router-dtype fp8) 控制 topK routing 开启低精度计算 6. mxfp8 mm 去除额外引入的转置操作 7. GMM add仅使用高精度 https://wiki.huawei.com/domains/76578/wiki/233229/WIKI202512189479523 See merge request: Ascend/MindSpeed!30645 个月前
!1949 010_temp Merge pull request !1949 from yangjie/master 1 年前
fix(quant): only hif8 add dst_type_max args Co-authored-by: Muu<koimuu@163.com> # message auto-generated for no-merge-commit merge: !3514 merge fix-hif8-tensorwise into master fix(quant): only hif8 add dst_type_max args Created-by: Muuyo Commit-by: Muu Merged-by: ascend-robot Description: fix(quant): only hif8 add dst_type_max args See merge request: Ascend/MindSpeed!35143 天前
refactor(moe): cache gradient quantization in TensorwiseGMMFunction with quant_grad classmethod Co-authored-by: Muu<koimuu@163.com> # message auto-generated for no-merge-commit merge: !3503 merge dev into master refactor(moe): cache gradient quantization in TensorwiseGMMFunction with quant_grad classmethod Created-by: Muuyo Commit-by: Muu Merged-by: ascend-robot Description: refactor(moe): cache gradient quantization in TensorwiseGMMFunction with quant_grad classmethod [自验报告](https://wiki.huawei.com/domains/76578/wiki/233229/WIKI2026052711238962) See merge request: Ascend/MindSpeed!35035 天前
refactor(moe): decouple GMM context from autograd Function ctx with GmmContext Co-authored-by: Muu<koimuu@163.com> # message auto-generated for no-merge-commit merge: !3495 merge dev into master refactor(moe): decouple GMM context from autograd Function ctx with GmmContext Created-by: Muuyo Commit-by: Muu Merged-by: ascend-robot Description: 1. Tensorwise GMM x, grad 采用 pertensor 量化, weight 采用 pertoken(按n轴分组tensor量化)量化 2. fix mlp overlap 多参数采用同一个gmm ctx 3. GMM Function op_dw 添加ctx参数 [自验文档](https://wiki.huawei.com/domains/76578/wiki/233229/WIKI2026052611220020) See merge request: Ascend/MindSpeed!34957 天前
fix: remove import acl Co-authored-by: yulelanmei<huangyijie8@huawei.com> # message auto-generated for no-merge-commit merge: !3148 merge add_st into master fix: remove import acl Created-by: yulelanmei Commit-by: yulelanmei Merged-by: ascend-robot Description: 1.更新 acl.get_soc_name() 为 torch_npu.npu.get_device_name() 时漏删除import acl,现补充 2.补充删除其他MOE下的import acl,避免A5上阻塞相关特性使用 See merge request: Ascend/MindSpeed!31484 个月前
!2425 rm: unused code Merge pull request !2425 from 邓佳/master_rm_v2 11 个月前
!2500 【feat】提供core0.12.1 alltoall_overlap & zero memory update. Merge pull request !2500 from yangjie/master 10 个月前
feat: fp8_reuse_quant_w Co-authored-by: Jia_Austin<dengjia6@huawei.com> # message auto-generated for no-merge-commit merge: !3358 merge feat_fp8_reuse_quant_w into master feat: fp8_reuse_quant_w Created-by: Jia_Austin Commit-by: Jia_Austin Merged-by: ascend-robot Description: What this PR does / why we need it? Please describe the background and detailed changes of the PR. If it is a bugfix, please attach the related issue. Does this PR introduce any user-facing change? Please describe whether the PR will result in any user-facing usage changes. If there is related documentation, please specify its path. How was this patch tested? Please explain how to verify the correctness and effectiveness of this feature, as well as its usage constraints and limitations. See merge request: Ascend/MindSpeed!33582 个月前
fix: remove import acl Co-authored-by: yulelanmei<huangyijie8@huawei.com> # message auto-generated for no-merge-commit merge: !3148 merge add_st into master fix: remove import acl Created-by: yulelanmei Commit-by: yulelanmei Merged-by: ascend-robot Description: 1.更新 acl.get_soc_name() 为 torch_npu.npu.get_device_name() 时漏删除import acl,现补充 2.补充删除其他MOE下的import acl,避免A5上阻塞相关特性使用 See merge request: Ascend/MindSpeed!31484 个月前
add moe fix router v6 4 个月前
!2176 fix bug of tp_extend_ep Merge pull request !2176 from yangcheng/master 1 年前
!1949 010_temp Merge pull request !1949 from yangjie/master 1 年前
!2233 [feat] moe_overlap with Megatron core_r0.10.0 Merge pull request !2233 from yangjie/master 1 年前