文件最后提交记录最后更新时间
!2757 adjust hash implementation Merge pull request !2757 from wangyuansheng8/master 9 个月前
!2718 verl recompute adaptor Merge pull request !2718 from glhyy/msverl 9 个月前
!2635 安全:文件路径校验/权限 Merge pull request !2635 from glhyy/secmaster 10 个月前
feat: 内存压缩特性迭代升级 Co-authored-by: NingGuangyou<ningguangyou@h-partners.com> # message auto-generated for no-merge-commit merge: !3173 merge master into master feat: 内存压缩特性迭代升级 Created-by: NingGuangyou Commit-by: NingGuangyou Merged-by: ascend-robot Description: 本次PR将原首节点MLP模块激活值压缩特性迭代为各节点按transformer layer激活值压缩及AdamW一二阶动量压缩,是保留原特性及使用方法不变的前提下增加了新的功能。详情可参考readme。 See merge request: Ascend/MindSpeed!31734 个月前
!2264 fix: virtual optimizer bug fix and update doc Merge pull request !2264 from Kingsleyandher/master 1 年前
fix: TE + recompute_norm refix Co-authored-by: yulelanmei<huangyijie8@huawei.com> # message auto-generated for no-merge-commit merge: !3325 merge master into master fix: TE + recompute_norm refix Created-by: yulelanmei Commit-by: yulelanmei Merged-by: ascend-robot Description: What this PR does / why we need it? refix TE + recompute_norm Does this PR introduce any user-facing change? No How was this patch tested? Test using MindSpeed-Core ST cases and LLM+core case See merge request: Ascend/MindSpeed!33252 个月前
quant fp8 optimizer 6 个月前
feat(smart-swap): simplify the use of smart-swap Co-authored-by: ChenDonYY<caichendong2@huawei.com> # message auto-generated for no-merge-commit merge: !2833 merge master into master feat(smart-swap): simplify the use of smart-swap Created-by: ChenDonYY Commit-by: ChenDonYY Merged-by: ascend-robot Description: fix: simplify the use of smart-swap 1. 实验需要对比,在使能特性前后,Loss精度、吞吐均值、内存占用。2000步Loss精度相对误差要求2%以内。 - Dense模型用例选取:tests_extend/system_tests/feature_tests/coc.sh - 吞吐比对: swap0 recomput1:80.7 swap0 recompute0:87.7 swap1 recomput0:87.0 - 内存比对: swap0 recomput1 [Rank 0] memory (MB) | allocated: 15604.52587890625 | max allocated: 27669.36279296875 | reserved: 30404.0 | max reserved: 30404.0 [Rank 1] memory (MB) | allocated: 15604.52587890625 | max allocated: 27669.36279296875 | reserved: 30404.0 | max reserved: 30404.0 [Rank 4] memory (MB) | allocated: 16116.654296875 | max allocated: 25036.85986328125 | reserved: 26344.0 | max reserved: 26344.0 [Rank 5] memory (MB) | allocated: 16116.654296875 | max allocated: 25036.85986328125 | reserved: 26344.0 | max reserved: 26344.0 swap0 recompute0 [Rank 0] memory (MB) | allocated: 15604.52587890625 | max allocated: 35925.6298828125 | reserved: 37984.0 | max reserved: 37984.0 [Rank 1] memory (MB) | allocated: 15604.52587890625 | max allocated: 35925.6298828125 | reserved: 37984.0 | max reserved: 37984.0 [Rank 4] memory (MB) | allocated: 16116.654296875 | max allocated: 33549.12744140625 | reserved: 35164.0 | max reserved: 35164.0 [Rank 5] memory (MB) | allocated: 16116.654296875 | max allocated: 33549.12744140625 | reserved: 35164.0 | max reserved: 35164.0 swap1 recompute0 [Rank 0] memory (MB) | allocated: 15672.38427734375 | max allocated: 28631.20361328125 | reserved: 36132.0 | max reserved: 36132.0 [Rank 1] memory (MB) | allocated: 15672.38427734375 | max allocated: 28631.20361328125 | reserved: 36132.0 | max reserved: 36132.0 [Rank 4] memory (MB) | allocated: 16188.48046875 | max allocated: 29610.9287109375 | reserved: 33732.0 | max reserved: 33732.0 [Rank 5] memory (MB) | allocated: 16188.48046875 | max allocated: 29610.9287109375 | reserved: 33732.0 | max reserved: 33732.0 - loss比对: ![coc_swap_compare.PNG](https://raw.gitcode.com/user-images/assets/7404741/bba011fd-8710-497b-9ace-19cac98111d9/coc_swap_compare.PNG 'coc_swap_compare.PNG') - MOE模型用例选取:tests_extend/system_tests/feature_tests/deepseek_mla.sh - 吞吐比对: swap0:55.2 swap1:56.0 - 内存比对: swap0 [Rank 0] memory (MB) | allocated: 16443.3466796875 | max allocated: 26676.16259765625 | reserved: 32442.0 | max reserved: 32442.0 [Rank 4] memory (MB) | allocated: 25676.61572265625 | max allocated: 36900.34814453125 | reserved: 43500.0 | max reserved: 43500.0 swap1 [Rank 0] memory (MB) | allocated: 16518.9033203125 | max allocated: 27864.86279296875 | reserved: 32240.0 | max reserved: 32240.0 [Rank 4] memory (MB) | allocated: 25781.51123046875 | max allocated: 38881.0888671875 | reserved: 41112.0 | max reserved: 41112.0 - loss比对: ![deepseek_mla_swap_compare.PNG](https://raw.gitcode.com/user-images/assets/7404741/9212a78b-f179-419b-9761-b8b8deb128f3/deepseek_mla_swap_compare.PNG 'deepseek_mla_swap_compare.PNG') 2. 自定义cpp算子(例如atb等)的接入示例。 见docs/features/smart_swap.md。 See merge request: Ascend/MindSpeed!28335 个月前
!2718 verl recompute adaptor Merge pull request !2718 from glhyy/msverl 9 个月前
!359 change ascendspeed to mindspeed Merge pull request !359 from 邓佳/master 1 年前
perf(fp8): enhance te Co-authored-by: Muu<koimuu@163.com> # message auto-generated for no-merge-commit merge: !3064 merge feature_fix into master perf(fp8): enhance te Created-by: Muuyo Commit-by: Muu Merged-by: ascend-robot Description: 1. 引入低精度重计算 2. mxfp8 mm之后清理无用quant tensor 和 scale 显存 3. 重构 te linner 抽象 dw 流程 4. 提取 gmm op 4. GMMFunction 引入 gemm_gradient_accumulation_fusion 5. 支持参数(--moe-router-dtype fp8) 控制 topK routing 开启低精度计算 6. mxfp8 mm 去除额外引入的转置操作 7. GMM add仅使用高精度 https://wiki.huawei.com/domains/76578/wiki/233229/WIKI202512189479523 See merge request: Ascend/MindSpeed!30645 个月前