文件最后提交记录最后更新时间
doc: adjust doc Co-authored-by: liutongtong27<liutongtong15@h-partners.com> # message auto-generated for no-merge-commit merge: !3305 merge master_menutest into master doc: adjust doc Created-by: liutongtong27 Commit-by: liutongtong27 Merged-by: ascend-robot Description: What this PR does / why we need it? Please describe the background and detailed changes of the PR. If it is a bugfix, please attach the related issue. Does this PR introduce any user-facing change? Please describe whether the PR will result in any user-facing usage changes. If there is related documentation, please specify its path. How was this patch tested? Please explain how to verify the correctness and effectiveness of this feature, as well as its usage constraints and limitations. See merge request: Ascend/MindSpeed!33052 个月前
feat: fp8_reuse_quant_w Co-authored-by: Jia_Austin<dengjia6@huawei.com> # message auto-generated for no-merge-commit merge: !3358 merge feat_fp8_reuse_quant_w into master feat: fp8_reuse_quant_w Created-by: Jia_Austin Commit-by: Jia_Austin Merged-by: ascend-robot Description: What this PR does / why we need it? Please describe the background and detailed changes of the PR. If it is a bugfix, please attach the related issue. Does this PR introduce any user-facing change? Please describe whether the PR will result in any user-facing usage changes. If there is related documentation, please specify its path. How was this patch tested? Please explain how to verify the correctness and effectiveness of this feature, as well as its usage constraints and limitations. See merge request: Ascend/MindSpeed!33582 个月前
fix: mc2 validate args Co-authored-by: clc2025<chenlucong@huawei.com> # message auto-generated for no-merge-commit merge: !3402 merge 26q1 into 26.0.0_core_r0.12.1 fix: mc2 validate args Created-by: clc2025 Commit-by: clc2025 Merged-by: ascend-robot Description: What this PR does / why we need it? Please describe the background and detailed changes of the PR. If it is a bugfix, please attach the related issue. DTS2026040735953 Does this PR introduce any user-facing change? Please describe whether the PR will result in any user-facing usage changes. If there is related documentation, please specify its path. How was this patch tested? Please explain how to verify the correctness and effectiveness of this feature, as well as its usage constraints and limitations. See merge request: Ascend/MindSpeed!34021 个月前
fix: Decoupling FSDP basic capabilities from NPU Co-authored-by: zhyebin01<zhangyebin@h-partners.com> # message auto-generated for no-merge-commit merge: !3336 merge master into master fix: Decoupling FSDP basic capabilities from NPU Created-by: zhyebin01 Commit-by: zhyebin01 Merged-by: ascend-robot Description: ### What this PR does / why we need it? Decoupling FSDP basic capabilities from NPU ### Does this PR introduce any user-facing change? No ### How was this patch tested? pipeline test passed See merge request: Ascend/MindSpeed!33362 个月前
fix: NPU datadump level: L0 & mix Co-authored-by: yulelanmei<huangyijie8@huawei.com> # message auto-generated for no-merge-commit merge: !3351 merge master into master fix: NPU datadump level: L0 & mix Created-by: yulelanmei Commit-by: yulelanmei Merged-by: ascend-robot Description: What this PR does / why we need it? 当前--npu-datadump未适配 L0及mix 的dump等级,需要增强功能 Does this PR introduce any user-facing change? N/A How was this patch tested? 开启--npu-datadump,config.json配置level为L0或mix 测试:https://wiki.huawei.com/domains/148330/wiki/296621/WIKI2026032510543405 See merge request: Ascend/MindSpeed!33512 个月前
[fix]:solve the problem of chunk_bwd_dqkwg triton's time degradation. Co-authored-by: LinShua<707894133@qq.com> # message auto-generated for no-merge-commit merge: !3379 merge 26.0.0_core_r0.12.1_dqkwg_triton_time into 26.0.0_core_r0.12.1 [fix]:solve the problem of chunk_bwd_dqkwg triton's time degradation. Created-by: LinShua Commit-by: LinShua Merged-by: ascend-robot Description: What this PR does / why we need it? 解决chunk_bwd_dqkwg 算子定长性能优化带来的变长性能劣化问题: chunk_bwd_kernel_dqkwg 定长: 原逻辑:26158us 优化后:18855us 变长: 原逻辑:33450us 优化后:89597us 当前PR: 定长: 18769us 变长: 33509us Does this PR introduce any user-facing change? NA How was this patch tested? 见PR:[fix]:solve the UT of GDN triton. See merge request: Ascend/MindSpeed!33791 个月前
[modify][mindspore] register patchs for coalescing_manager Co-authored-by: weixin_47897441<wuyouqi1@h-partners.com> # message auto-generated for no-merge-commit merge: !3233 merge master-0129 into master [modify][mindspore] register patchs for coalescing_manager Created-by: weixin_47897441 Commit-by: weixin_47897441 Merged-by: ascend-robot Description: [modify][mindspore] register patchs for coalescing_manager: The coalescing_manager-related patches have recently been removed from megtron_basic.py. Given that MSA currently lacks support for communication operators associated with coalescing_manager, these removed patches are migrated to MindSpore to ensure normal launch of models such as qwen3vl. See merge request: Ascend/MindSpeed!32333 个月前
remove deprecated code 2 Co-authored-by: 赵一帆<zhaoyifan15@huawei.com> # message auto-generated for no-merge-commit merge: !2972 merge master into master remove deprecated code 2 Created-by: zhao-yifan27 Commit-by: 赵一帆 Merged-by: ascend-robot Description: 删除废弃代码 mindspeed/model下废弃代码 mindspeed/moe下ampipe废弃代码 See merge request: Ascend/MindSpeed!29726 个月前
remove deprecated code 2 Co-authored-by: 赵一帆<zhaoyifan15@huawei.com> # message auto-generated for no-merge-commit merge: !2972 merge master into master remove deprecated code 2 Created-by: zhao-yifan27 Commit-by: 赵一帆 Merged-by: ascend-robot Description: 删除废弃代码 mindspeed/model下废弃代码 mindspeed/moe下ampipe废弃代码 See merge request: Ascend/MindSpeed!29726 个月前
!872 [feature: conv3d支持depth维度并行] Merge pull request !872 from Shitong Li/master_new_conv3d 1 年前
feat(smart-swap): simplify the use of smart-swap Co-authored-by: ChenDonYY<caichendong2@huawei.com> # message auto-generated for no-merge-commit merge: !2833 merge master into master feat(smart-swap): simplify the use of smart-swap Created-by: ChenDonYY Commit-by: ChenDonYY Merged-by: ascend-robot Description: fix: simplify the use of smart-swap 1. 实验需要对比,在使能特性前后,Loss精度、吞吐均值、内存占用。2000步Loss精度相对误差要求2%以内。 - Dense模型用例选取:tests_extend/system_tests/feature_tests/coc.sh - 吞吐比对: swap0 recomput1:80.7 swap0 recompute0:87.7 swap1 recomput0:87.0 - 内存比对: swap0 recomput1 [Rank 0] memory (MB) | allocated: 15604.52587890625 | max allocated: 27669.36279296875 | reserved: 30404.0 | max reserved: 30404.0 [Rank 1] memory (MB) | allocated: 15604.52587890625 | max allocated: 27669.36279296875 | reserved: 30404.0 | max reserved: 30404.0 [Rank 4] memory (MB) | allocated: 16116.654296875 | max allocated: 25036.85986328125 | reserved: 26344.0 | max reserved: 26344.0 [Rank 5] memory (MB) | allocated: 16116.654296875 | max allocated: 25036.85986328125 | reserved: 26344.0 | max reserved: 26344.0 swap0 recompute0 [Rank 0] memory (MB) | allocated: 15604.52587890625 | max allocated: 35925.6298828125 | reserved: 37984.0 | max reserved: 37984.0 [Rank 1] memory (MB) | allocated: 15604.52587890625 | max allocated: 35925.6298828125 | reserved: 37984.0 | max reserved: 37984.0 [Rank 4] memory (MB) | allocated: 16116.654296875 | max allocated: 33549.12744140625 | reserved: 35164.0 | max reserved: 35164.0 [Rank 5] memory (MB) | allocated: 16116.654296875 | max allocated: 33549.12744140625 | reserved: 35164.0 | max reserved: 35164.0 swap1 recompute0 [Rank 0] memory (MB) | allocated: 15672.38427734375 | max allocated: 28631.20361328125 | reserved: 36132.0 | max reserved: 36132.0 [Rank 1] memory (MB) | allocated: 15672.38427734375 | max allocated: 28631.20361328125 | reserved: 36132.0 | max reserved: 36132.0 [Rank 4] memory (MB) | allocated: 16188.48046875 | max allocated: 29610.9287109375 | reserved: 33732.0 | max reserved: 33732.0 [Rank 5] memory (MB) | allocated: 16188.48046875 | max allocated: 29610.9287109375 | reserved: 33732.0 | max reserved: 33732.0 - loss比对: ![coc_swap_compare.PNG](https://raw.gitcode.com/user-images/assets/7404741/bba011fd-8710-497b-9ace-19cac98111d9/coc_swap_compare.PNG 'coc_swap_compare.PNG') - MOE模型用例选取:tests_extend/system_tests/feature_tests/deepseek_mla.sh - 吞吐比对: swap0:55.2 swap1:56.0 - 内存比对: swap0 [Rank 0] memory (MB) | allocated: 16443.3466796875 | max allocated: 26676.16259765625 | reserved: 32442.0 | max reserved: 32442.0 [Rank 4] memory (MB) | allocated: 25676.61572265625 | max allocated: 36900.34814453125 | reserved: 43500.0 | max reserved: 43500.0 swap1 [Rank 0] memory (MB) | allocated: 16518.9033203125 | max allocated: 27864.86279296875 | reserved: 32240.0 | max reserved: 32240.0 [Rank 4] memory (MB) | allocated: 25781.51123046875 | max allocated: 38881.0888671875 | reserved: 41112.0 | max reserved: 41112.0 - loss比对: ![deepseek_mla_swap_compare.PNG](https://raw.gitcode.com/user-images/assets/7404741/9212a78b-f179-419b-9761-b8b8deb128f3/deepseek_mla_swap_compare.PNG 'deepseek_mla_swap_compare.PNG') 2. 自定义cpp算子(例如atb等)的接入示例。 见docs/features/smart_swap.md。 See merge request: Ascend/MindSpeed!28335 个月前
docs:fix docs/zh mistakes Co-authored-by: Keilo_W<wangkaiyu11@h-partners.com> # message auto-generated for no-merge-commit merge: !3318 merge master into master docs:fix docs/zh mistakes Created-by: Keilo_W Commit-by: Keilo_W Merged-by: ascend-robot Description: 修改了一些被误操作的注释及代码 See merge request: Ascend/MindSpeed!33182 个月前
quant fp8 optimizer 6 个月前
!716 perf: gpt_dataset and initialize in megatron Merge pull request !716 from 邓佳/dev_patch 1 年前
feat: fp8_reuse_quant_w Co-authored-by: Jia_Austin<dengjia6@huawei.com> # message auto-generated for no-merge-commit merge: !3358 merge feat_fp8_reuse_quant_w into master feat: fp8_reuse_quant_w Created-by: Jia_Austin Commit-by: Jia_Austin Merged-by: ascend-robot Description: What this PR does / why we need it? Please describe the background and detailed changes of the PR. If it is a bugfix, please attach the related issue. Does this PR introduce any user-facing change? Please describe whether the PR will result in any user-facing usage changes. If there is related documentation, please specify its path. How was this patch tested? Please explain how to verify the correctness and effectiveness of this feature, as well as its usage constraints and limitations. See merge request: Ascend/MindSpeed!33582 个月前
!2238 refactor: tokenizer重构 Merge pull request !2238 from wangruiqi/master 1 年前
!2114 实现pipeline parallel的noop layer的重构 Merge pull request !2114 from liurong1995/feature_noop 1 年前
!2827 fix: fix get_full_args Merge pull request !2827 from YE ZHENYUAN/master0910 8 个月前
Support TransformerEngine Co-authored-by: MingzhenWang<wangmingzhen4@huawei.com> Co-authored-by: Muu<koimuu@163.com> Co-authored-by: x30061065<xuyuanhui3@h-partners.com> Co-authored-by: 耿瑞良<gengruiliang@huawei.com> # message auto-generated for no-merge-commit merge: !2947 merge lingqu_master into master Support TransformerEngine Created-by: mingzhenwang Commit-by: mingzhenwang;Muu;MingzhenWang;x30061065;耿瑞良 Merged-by: ascend-robot Description: 1. 支持TELinear层 2. 支持FP8计算,quantmatmul/gmm 3. 支持多种数据类型FP8/HiF8 4. 支持多种量化策略delayed/tensorwise/blockwise/mxfp8 5. TELinear层支持通算融合 See merge request: Ascend/MindSpeed!29476 个月前
[Bugfix] Fix Megatron checkpoint saving&loading compatibility for torch_dcp format Co-authored-by: 林明哲<linmingzhe3@huawei.com> # message auto-generated for no-merge-commit merge: !3077 merge fix1202 into master [Bugfix] Fix Megatron checkpoint saving&loading compatibility for torch_dcp format Created-by: LinMingZhe Commit-by: 林明哲 Merged-by: ascend-robot Description: Fix Megatron checkpoint saving&loading compatibility for torch_dcp format See merge request: Ascend/MindSpeed!30775 个月前
!2228 add MindSpeedFeaturesManager Merge pull request !2228 from Jializheng/master 1 年前
!2308 Adaptation core_r0.12.0 Merge pull request !2308 from 邓佳/core_r0.12.0_dev 1 年前
!2675 Add once warning and args check Merge pull request !2675 from Jializheng/master 10 个月前
【bugfix!!!】fbov COC&share_expert_sync fix Co-authored-by: EX_mitsu<yangjie409@h-partners.com> # message auto-generated for no-merge-commit merge: !3005 merge master into master 【bugfix!!!】fbov COC&share_expert_sync fix Created-by: EX_mitsuX Commit-by: EX_mitsuX;EX_mitsu Merged-by: ascend-robot Description: 修复TP1状态下开启COC未进行检查的BUG(非预期场景。TP1开COC应当无收益。)。 重新调整计算流,追加等待及同步,修复计算速度过快时可能产生的同步问题(问题场景:同时开启COC,permute融合算子及共享专家出现,使用launch_blocking该问题消失)。 修复不再兼容的TE检测。 修复TE部分module的属性缺失。 See merge request: Ascend/MindSpeed!30056 个月前
fix:Add mindspeed config to subclass of transformer config Co-authored-by: JialiZheng<jializheng@huawei.com> # message auto-generated for no-merge-commit merge: !3284 merge master into master fix:Add mindspeed config to subclass of transformer config Created-by: JialiZheng1 Commit-by: JialiZheng Merged-by: ascend-robot Description: Add mindspeed config to subclass of transformer config See merge request: Ascend/MindSpeed!32842 个月前
增加MOE专家负载均衡功能 Co-authored-by: zhanggaolu2<252028123@qq.com> # message auto-generated for no-merge-commit merge: !2845 merge expert_loadbalance2master into master 增加MOE专家负载均衡功能 Created-by: zhanggaolu2 Commit-by: zhanggaolu2 Merged-by: ascend-robot Description: 增加MOE专家负载均衡功能 See merge request: Ascend/MindSpeed!28457 个月前
fix: fix the alltoall_seq token dispatcher Nan bug Co-authored-by: guofanfeng<guofanfeng1@huawei.com> # message auto-generated for no-merge-commit merge: !3249 merge bug_fix into master fix: fix the alltoall_seq token dispatcher Nan bug Created-by: guofanfeng23 Commit-by: guofanfeng Merged-by: ascend-robot Description: fix the alltoall_seq token dispatcher Nan bug https://wiki.huawei.com/domains/152732/wiki/307991/WIKI2026020210028614 See merge request: Ascend/MindSpeed!32493 个月前
!1825 长序列支持BNSD和arguments整改 Merge pull request !1825 from YE ZHENYUAN/master 1 年前