| 文件 | 最后提交记录 | 最后更新时间 |
|---|---|---|
feat(FP8): Support W4A8 recipe Co-authored-by: mingzhenwang<wangmingzhen4@huawei.com> # message auto-generated for no-merge-commit merge: !73 merge w4a8-hybrid-linear into main feat(FP8): Support W4A8 recipe Created-by: mingzhenwang Commit-by: mingzhenwang Merged-by: ascend-robot Description: 1. deepseekv4 QAT方案,前向W4A8,反向W8A8 MXFP8 2. 区别点1,deepseekv4只在moe layer部分采用W4A8,TE作为低精基础能力,linear和grouped_linear均采用W4A8 3. 区别点2,deepseekv4采用量化为MXFP4,反量化回MXFP8,当前采用MXFP8@MXFP4伪量化算子代替该流程 See merge request: Ascend/TransformerEngineNPU!73 | 5 天前 | |
refactor(quantized_tensor): replace QuantizedTensorView with tensor tuples Co-authored-by: Muu<koimuu@163.com> | 19 天前 | |
refactor(fp8): fill fp8 vacancy Co-authored-by: Muu<koimuu@163.com> | 2 个月前 | |
refactor(quantized_tensor): replace QuantizedTensorView with tensor tuples Co-authored-by: Muu<koimuu@163.com> | 19 天前 | |
refactor(quantized_tensor): replace QuantizedTensorView with tensor tuples Co-authored-by: Muu<koimuu@163.com> | 19 天前 | |
feat(FP8): Support W4A8 recipe Co-authored-by: mingzhenwang<wangmingzhen4@huawei.com> # message auto-generated for no-merge-commit merge: !73 merge w4a8-hybrid-linear into main feat(FP8): Support W4A8 recipe Created-by: mingzhenwang Commit-by: mingzhenwang Merged-by: ascend-robot Description: 1. deepseekv4 QAT方案,前向W4A8,反向W8A8 MXFP8 2. 区别点1,deepseekv4只在moe layer部分采用W4A8,TE作为低精基础能力,linear和grouped_linear均采用W4A8 3. 区别点2,deepseekv4采用量化为MXFP4,反量化回MXFP8,当前采用MXFP8@MXFP4伪量化算子代替该流程 See merge request: Ascend/TransformerEngineNPU!73 | 5 天前 |
| 文件 | 最后提交记录 | 最后更新时间 |
|---|---|---|
| 5 天前 | ||
| 19 天前 | ||
| 2 个月前 | ||
| 19 天前 | ||
| 19 天前 | ||
| 5 天前 |