文件最后提交记录最后更新时间
!2370 fix compatibility issue between disable gloo and ema optim Merge pull request !2370 from wangyuansheng8/master 11 个月前
fix: fix low precision optimizer mxfp8 precision loss Co-authored-by: Tngbuko<tangxiong5@huawei.com> # message auto-generated for no-merge-commit merge: !3512 merge feature/low_precision_optimizer into master fix: fix low precision optimizer mxfp8 precision loss Created-by: Tngbuko Commit-by: Tngbuko Merged-by: ascend-robot Description: ## 修复MXFP8低精度优化器精度不收敛问题 问题背景: 开启低精度优化器 --quant-state mxfp8之后精度对不上 优化措施: 1、 精度敏感层保持高精度FP32 2、 引入k-scaling策略降低量化误差 3、 针对一、二阶动量特性使用不同MXFP8类型 4、修复已知的代码bug 修复后精度验证: 修复前,精度不能收敛,训练20 steps都对不齐 ![修复前.png](https://raw.gitcode.com/user-images/assets/7404741/b8d78706-e836-4b38-b075-525362f51a11/修复前.png '修复前.png') A5 机器上训练300 steps, 开启quant-states fp8,对比bf16基线,平均相对误差为0.09%,<0.1%(千分之一)(左图);开启quant-states mxfp8,平均相对误差0.1%(千分之一)(右图) ![1.png](https://raw.gitcode.com/user-images/assets/7404741/2e9cc835-4f33-4e60-a099-19b454031e85/1.png '1.png')![2.png](https://raw.gitcode.com/user-images/assets/7404741/ec4914e5-eb78-4f37-b942-c72442675a87/2.png '2.png') 端到端显存收益验证: ![显存收益.png](https://raw.gitcode.com/user-images/assets/7404741/0dad55fd-2d71-49d8-87cb-abcc1b463a5c/显存收益.png '显存收益.png') See merge request: Ascend/MindSpeed!35123 天前
feat: add mcore muon Co-authored-by: wuweiqiang24<1005334931@qq.com> Co-authored-by: wuweiqiang24<wuweiqiang11@huawei.com> # message auto-generated for no-merge-commit merge: !3442 merge add_mcore_muon into master feat: add mcore muon Created-by: wuweiqiang24 Commit-by: wuweiqiang24 Merged-by: ascend-robot Description: 1、增加Muon优化器特性 2、Muon + TP兼容适配 3、Muon + Zero兼容适配 See merge request: Ascend/MindSpeed!34428 天前
feat: SwapMuon add save/load ckpt support Co-authored-by: JialiZheng<jializheng@huawei.com> # message auto-generated for no-merge-commit merge: !3518 merge master into master feat: SwapMuon add save/load ckpt support Created-by: JialiZheng1 Commit-by: JialiZheng Merged-by: ascend-robot Description: SwapMuon add save/load ckpt support RFC:https://gitcode.com/Ascend/MindSpeed/issues/164 See merge request: Ascend/MindSpeed!351812 小时前
SwapOptimizer to support fp32 weights Co-authored-by: JialiZheng<jializheng@huawei.com> # message auto-generated for no-merge-commit merge: !3095 merge swap_optimizer into master SwapOptimizer to support fp32 weights Created-by: JialiZheng1 Commit-by: JialiZheng Merged-by: ascend-robot Description: SwapOptimizer to support fp32 weights See merge request: Ascend/MindSpeed!30955 个月前
!2264 fix: virtual optimizer bug fix and update doc Merge pull request !2264 from Kingsleyandher/master 1 年前
!2112 MindSpeed L0 reconstruction Merge pull request !2112 from Jializheng/master 1 年前
!2205 接口替换:将torch_npu.npu_apply_adam_w替换为torch._fused_adamw Merge pull request !2205 from wangruiqi/master 1 年前
fix duplicate all-gather bug in overlap-param-gather Co-authored-by: 赵一帆<zhaoyifan15@huawei.com> # message auto-generated for no-merge-commit merge: !2941 merge master into master fix duplicate all-gather bug in overlap-param-gather Created-by: zhao-yifan27 Commit-by: 赵一帆 Merged-by: ascend-robot Description: 同 https://gitcode.com/Ascend/MindSpeed/pull/2940 See merge request: Ascend/MindSpeed!29417 个月前