文件最后提交记录最后更新时间
quant fp8 optimizer 6 个月前
quant fp8 optimizer 6 个月前
quant fp8 optimizer 6 个月前
sync 6 个月前
fix: fix low precision optimizer mxfp8 precision loss Co-authored-by: Tngbuko<tangxiong5@huawei.com> # message auto-generated for no-merge-commit merge: !3512 merge feature/low_precision_optimizer into master fix: fix low precision optimizer mxfp8 precision loss Created-by: Tngbuko Commit-by: Tngbuko Merged-by: ascend-robot Description: ## 修复MXFP8低精度优化器精度不收敛问题 问题背景: 开启低精度优化器 --quant-state mxfp8之后精度对不上 优化措施: 1、 精度敏感层保持高精度FP32 2、 引入k-scaling策略降低量化误差 3、 针对一、二阶动量特性使用不同MXFP8类型 4、修复已知的代码bug 修复后精度验证: 修复前,精度不能收敛,训练20 steps都对不齐 ![修复前.png](https://raw.gitcode.com/user-images/assets/7404741/b8d78706-e836-4b38-b075-525362f51a11/修复前.png '修复前.png') A5 机器上训练300 steps, 开启quant-states fp8,对比bf16基线,平均相对误差为0.09%,<0.1%(千分之一)(左图);开启quant-states mxfp8,平均相对误差0.1%(千分之一)(右图) ![1.png](https://raw.gitcode.com/user-images/assets/7404741/2e9cc835-4f33-4e60-a099-19b454031e85/1.png '1.png')![2.png](https://raw.gitcode.com/user-images/assets/7404741/ec4914e5-eb78-4f37-b942-c72442675a87/2.png '2.png') 端到端显存收益验证: ![显存收益.png](https://raw.gitcode.com/user-images/assets/7404741/0dad55fd-2d71-49d8-87cb-abcc1b463a5c/显存收益.png '显存收益.png') See merge request: Ascend/MindSpeed!35123 天前
debug 6 个月前
debug 6 个月前
低精度优化器 增加reademe Co-authored-by: w30064656<wangzhuangzhuang8@h-partners.com> # message auto-generated for no-merge-commit merge: !3067 merge master into master 低精度优化器 增加reademe Created-by: w30064656 Commit-by: w30064656 Merged-by: ascend-robot Description: 增加reademe 修复bug See merge request: Ascend/MindSpeed!30675 个月前