npu_matmul_add_fp32对外接口(只支持前向)
输入:
- x:必选输入,数据类型float16, bf16
- weight:必选输入,数据类型float16, bf16
- C:必选输入,数据类型float32
输出:
- output:必选输出,数据类型float32
案例
import torch
import torch_npu
from mindspeed.ops.npu_matmul_add import npu_matmul_add_fp32
x = torch.rand((4096, 8192),dtype=torch.float16).npu()
weight = torch.rand((4096, 8192),dtype=torch.float16).npu()
C = torch.rand((8192, 8192),dtype=torch.float32).npu()
# 分开算子计算过程
product = torch.mm(x.T, weight)
result = product + C
# 融合算子计算过程
npu_matmul_add_fp32(weight, x, C)