| 新增GroupedMatmul (MxScaling)样例
Co-authored-by: init__zhb__<zhanghaobo6@huawei.com>
Co-authored-by: ayj<ayj0815@163.com>
# message auto-generated for no-merge-commit merge:
!605 merge pr_582 into master
新增GroupedMatmul (MxScaling)样例
Created-by: init__zhb__
Commit-by: init__zhb__;ayj
Merged-by: cann-robot
Description: ## 描述
- 内容: 新增 Ascend 950 MX 分组矩阵乘(Slice M)算子示例 55_ascend950_mx_grouped_matmul_slice_m,支持 MX FP8/FP4 + float8_e8m0 缩放因子
- 变更: 8 个文件,+1813 行(新增 kernel、示例代码、CMake、测试用例等)
## 关联的Issue
<!-- 如果这个PR是为了解决特定的Issue,请在这里提供Issue链接。-->
## 原因
<!--说明此次改动的目的、解决的问题等,应与类型标签匹配 -->
## 测试
精度测试通过(FP8E5M2, FP8E3M4, FP4E2M1) * (trans, no_trans)
```
...
[Index: 198 / ] Running 55_ascend950_mx_grouped_matmul_slice_m with parameters: 20,500,128,302 (repeat=1)
[Index: 199 / ] Running 55_ascend950_mx_grouped_matmul_slice_m with parameters: 20,400,109,1009 (repeat=1)
[Index: 200 / ] Running 55_ascend950_mx_grouped_matmul_slice_m with parameters: 20,160,244,752 (repeat=1)
Writing back to csv: ./eva_rec/qtfp4_e2m1_tb0/log/55_ascend950_mx_grouped_matmul_slice_m_run_qtfp4_e2m1_tb0_20260516185337.csv
Writing log back to: ./eva_rec/qtfp4_e2m1_tb0/log/55_ascend950_mx_grouped_matmul_slice_m_run_qtfp4_e2m1_tb0_20260516185337.log
Kernel 55_ascend950_mx_grouped_matmul_slice_m ran successfully
[1/2] Element=fp4_e2m1 transB=0 ... DONE (see ./eva_rec/qtfp4_e2m1_tb0/log/)
...
[Index: 199 / ] Running 55_ascend950_mx_grouped_matmul_slice_m with parameters: 20,400,109,1009 (repeat=1)
[Index: 200 / ] Running 55_ascend950_mx_grouped_matmul_slice_m with parameters: 20,160,244,752 (repeat=1)
Writing back to csv: ./eva_rec/qtfp4_e2m1_tb1/log/55_ascend950_mx_grouped_matmul_slice_m_run_qtfp4_e2m1_tb1_20260516191527.csv
Writing log back to: ./eva_rec/qtfp4_e2m1_tb1/log/55_ascend950_mx_grouped_matmul_slice_m_run_qtfp4_e2m1_tb1_20260516191527.log
Kernel 55_ascend950_mx_grouped_matmul_slice_m ran successfully
[2/2] Element=fp4_e2m1 transB=1 ... DONE (see ./eva_rec/qtfp4_e2m1_tb1/log/)
============================================================
Result: 2 PASS, 0 FAIL (total 2)
Output under: ./eva_rec/
============================================================
```
针对fp4 gmOffsetB(Group间)上对齐的bugfix PASS
```bash
$ python examples/55_ascend950_mx_grouped_matmul_slice_m/gen_data_compare.py 2 64 32 3 4 --quant_type=float4_e2m1fn_x2 --trans_b=0
------计算golden------
npu op run log =
------ 计算相对误差 -----
------ 综合精度指标 ------
result: [ 3.75 -0.15625 -1.875 ... 0.484375 -1.875 -0.8671875], golden: [ 3.75 -0.15625 -1.875 ... 0.484375 -1.875 -0.8671875]
npu mare = 2.0000, upgrade mare = 2.000000
npu mere = 0.9786, upgrade mere = 0.978606
npu rmse = 3.2752, upgrade rmse = 3.275185
------ 开始比较 ------
比较结果:Compare success
$ python examples/55_ascend950_mx_grouped_matmul_slice_m/gen_data_compare.py 2 64 32 3 4 --quant_type=float4_e2m1fn_x2 --trans_b=1
------计算golden------
npu op run log =
------ 计算相对误差 -----
------ 综合精度指标 ------
result: [ 0.125 -0.046875 -0.625 ... -0.453125 0.75 0.5 ], golden: [ 0.125 -0.046875 -0.625 ... 0.09375 0.375 -0.84375 ]
npu mare = 5.5000, upgrade mare = 2.000000
npu mere = 1.0406, upgrade mere = 0.998397
npu rmse = 4.0842, upgrade rmse = 4.062459
------ 开始比较 ------
比较结果:Compare success
```
## 文档更新
<!--如果这个PR包含文档的更新,请在这里指出。例如:更新了README.md文件。-->
## 类型标签
<!-- [x] 表示选中 -->
- [ ] Bug修复
- [x] 新特性
- [ ] 性能优化
- [ ] 文档更新
- [ ] 其他,请描述:
See merge request: cann/catlass!605 | 14 天前 |