文件最后提交记录最后更新时间
sip仓补充torch调用方式 Co-authored-by: lishuai<lishuai279@huawei.com>2 个月前
sip仓补充torch调用方式 Co-authored-by: lishuai<lishuai279@huawei.com>2 个月前
sip仓补充torch调用方式 Co-authored-by: lishuai<lishuai279@huawei.com>2 个月前
modified md files(for readability improvement) Co-authored-by: ”fwx1169505“<fengying27@h-partners.com> # message auto-generated for no-merge-commit merge: !77 merge master into master modified md files(for readability improvement) Created-by: fengying555682 Commit-by: ”fwx1169505“ Merged-by: cann-robot Description: ## 描述 <!--在这里详细描述你的改动,包括改动的原因和所采取的方法。--> ## 关联的Issue <!-- 如果这个PR是为了解决特定的Issue,请在这里提供Issue链接。例如:关联Issue #123--> ## 测试 <!--描述进行了哪些测试来验证你的改动。包括但不限于二级冒烟、算子泛化等。--> ## 文档更新 <!--如果这个PR包含文档的更新,请在这里指出。例如:更新了README.md文件。--> ## 类型标签 <!-- [x] 表示选中 --> - [ ] Bug修复 - [ ] 新特性 - [ ] 性能优化 - [ ] 文档更新 - [ ] 任务 - [ ] 其他,请描述: See merge request: cann/sip!771 个月前
sip仓补充torch调用方式 Co-authored-by: lishuai<lishuai279@huawei.com>2 个月前
README.md

torch_sip - PyTorch 扩展用于 AscendSiP

这是一个使用 Torch Library (TORCH_LIBRARY) 为 AscendSiP 操作提供 Python 绑定的 PyTorch 扩展。

目录结构

sip_pta/
├── torch_sip/
│   └── __init__.py          # Python 接口
├── csrc/
│   └── base
│       └── asd_mul.cpp      # C++ 实现及 TORCH_LIBRARY 注册
├── test/
│   └── base
│        └── asd_mul.py      # python接口测试脚本
└── setup.py                 # 构建脚本

构建方法

前置要求

  • C++ 编译器 (g++, gcc >= 10.3.0)
  • Python >= 3.8
  • 支持 NPU 的 PyTorch
  • 昇腾 CANN 工具包 >= 8.5.0
  • AscendSiP 库 (asdsip)
  • 领域加速库公共组件 ascend-boost-comm

构建步骤

配置环境

  1. 环境变量: export ASDSIP_HOME_PATH=asdsip安装目录,如不配置默认为/usr/local/Ascend/asdsip/latest
  2. 环境变量: export BOOST_COMM_PATH=ascend-boost-comm头文件目录, 如不配置默认为 ../3rdparty/ascend-boost-comm/src/include
  3. 环境变量: export USE_NINJA=ON,可选配置, 是否启用ninja构建
  4. CANN配置: 如 source /opt/xxx/ascend-toolkit/set_env.sh
  5. conda环境激活: conda activate your_env 需要安装torch_npu
cd sip_pta
python3 setup.py build bdist_wheel
pip install --force-reinstall --no-deps ./dist/torch_sip-<version>-<python_tag>-<platform_tag>.whl

这将执行以下操作:

  1. 执行构建
  2. 生成 torch_sip-0.1.0-cp39-cp39-linux_aarch64.whl
  3. 覆盖安装到当前环境

使用方法

import torch
import torch_sip

# 在 NPU 上创建张量
x = torch.randn(10, 10).npu()
y = torch.randn(10, 10).npu()

# 调用 asd_mul 操作
result = torch_sip.asd_mul(x, y)

print(result)  # 逐元素乘法结果

脚本运行方法

  1. export LD_LIBRARY_PATH=/usr/local/Ascend/asdsip/latest/lib:$LD_LIBRARY_PATH 配置sip算子so文件目录
  2. source xxx/ascend-toolkit/set_env.sh 配置CANN环境
  3. conda activate your_env 环境激活
  4. python asd_mul.py 执行脚本

实现细节

Torch Library

此实现使用 Torch Library (TORCH_LIBRARY):

优势:

  • 更好地集成到 PyTorch 的算子系统
  • 自动分发到 NPU 后端
  • 可与 torch.compile() 和其他 PyTorch 优化一起使用
  • 比 pybind11 更简单 - 无需手动 Python 绑定代码

关键文件:

base类算子:

  1. csrc/base/asd_mul.cpp - 完整实现

    namespace sip_pta {
    at::Tensor asdMul(const at::Tensor& x, const at::Tensor& y)
    {
    c10_npu::NPUGuard guard(x.device());
    
        at::Tensor z = at::empty_like(x);
    
        auto sip_stream = c10_npu::getCurrentNPUStream().stream(false);
        int64_t n = x.numel();
        uint8_t* workspace_ptr = nullptr;
        EXEC_FUNC(AsdSip::asdMul, n, x, y, z, sip_stream, workspace_ptr);
        return z;
    }
    TORCH_LIBRARY_FRAGMENT(torch_sip, m) { m.def("asd_mul(Tensor x, Tensor y) -> Tensor"); }
    
    TORCH_LIBRARY_IMPL(torch_sip, PrivateUse1, m) { m.impl("asd_mul", &asdMul); }
    } // namespace sip_pta
    
  2. torch_sip/init.py - Python 接口

    def asd_mul(x: torch.Tensor, y: torch.Tensor) -> torch.Tensor:
         return torch.ops.torch_sip.asd_mul(x, y)
    

fft类算子: - 完整实现

  1. csrc/fft/asd_fft_c2c.cpp - 完整实现

    namespace sip_pta {
    at::Tensor asdFftC2C(const at::Tensor& in, bool isForward)
    {
        c10_npu::NPUGuard guard(in.device());
    
        at::Tensor input = in.is_contiguous() ? in : in.contiguous();
        at::Tensor output = at::empty_like(input);
        auto selfShape = input.sizes();
        TORCH_CHECK(selfShape.size() >= sip_pta::DIM_2 && selfShape.size() <= sip_pta::DIM_4,
                    "Input tensor must have 2, 3 or 4 dimensions, but got ", selfShape.size());
        op_api::FFTParam param; // 准备FFTParam
    
        if (isForward) {
            param.direction = AsdSip::ASCEND_FFT_FORWARD;
        } else {
            param.direction = AsdSip::ASCEND_FFT_INVERSE;
        }
        param.batchSize = selfShape[0];
        param.fftType = AsdSip::ASCEND_FFT_C2C;
        param.dimType = AsdSip::asdFft1dDimType::ASCEND_FFT_HORIZONTAL;
        if (selfShape.size() == sip_pta::DIM_2) {
            // 1D
            param.fftXSize = selfShape[sip_pta::DIM_1];
        } else if (selfShape.size() == sip_pta::DIM_3) {
            // 2D
            param.fftXSize = selfShape[sip_pta::DIM_1];
            param.fftYSize = selfShape[sip_pta::DIM_2];
        } else if (selfShape.size() == sip_pta::DIM_4) {
            // 3D
            param.fftXSize = selfShape[sip_pta::DIM_1];
            param.fftYSize = selfShape[sip_pta::DIM_2];
            param.fftZSize = selfShape[sip_pta::DIM_3];
        }
        EXEC_FFT_FUNC(AsdSip::asdFftExecC2C, param, input, output); 
        // 使用宏EXEC_FFT_FUNC, 参数依次为算子入口函数, FFTParam, 算子入口函数的入参(handle除外, 严格按顺序排序)
        return output;
    }
    
    // 2. 算子注册 (Fragment)
    TORCH_LIBRARY_FRAGMENT(torch_sip, m)
    {
        m.def("asd_fft_c2c(Tensor input, bool is_forward) -> Tensor", &asdFftC2C);
    }
    TORCH_LIBRARY_IMPL(torch_sip, PrivateUse1, m) { m.impl("asd_fft_c2c", &asdFftC2C); }
    } // namespace sip_pta
    
  2. torch_sip/init.py - Python 接口

    def asd_fft_c2c(input_tensor: torch.Tensor, is_forward: bool = True) -> torch.Tensor:
         return torch.ops.torch_sip.asd_fft_c2c(input_tensor, is_forward)
    

blas类算子: - 完整实现

  1. csrc/blas/asd_blas_ctrmv.cpp - 完整实现

    namespace sip_pta {
    at::Tensor asdBlasCtrmv(const at::Tensor& A, at::Tensor& x, int64_t uplo, int64_t trans, int64_t diag)
    {
    // 基本数据类型校验
    // 根据算子定义,目前仅支持 ComplexFloat 单精度复数
    TORCH_CHECK(A.scalar_type() == at::kComplexFloat && x.scalar_type() == at::kComplexFloat,
    "asd_blas_ctrmv: Tensors must be ComplexFloat (Complex64).");
    
    // 维度校验
    // 矩阵必须是方阵,向量长度必须与矩阵维度匹配
    TORCH_CHECK(A.dim() == sip_pta::DIM_2 && A.size(0) == A.size(1),
    "asd_blas_ctrmv: Tensor A must be a 2D square matrix.");
    TORCH_CHECK(x.dim() == sip_pta::DIM_1 && x.size(0) == A.size(0),
    "asd_blas_ctrmv: Tensor x length must match matrix A dimension.");
    
    c10_npu::NPUGuard guard(A.device());
    // 提取张量维度和属性参数
    int64_t n = A.size(0);
    int64_t lda = n;
    int64_t incx = 1;
    
    // 转换枚举参数至底层类型
    AsdSip::asdBlasFillMode_t uplo_val = static_cast<AsdSip::asdBlasFillMode_t>(uplo);
    AsdSip::asdBlasOperation_t trans_val = static_cast<AsdSip::asdBlasOperation_t>(trans);
    AsdSip::asdBlasDiagType_t diag_val = static_cast<AsdSip::asdBlasDiagType_t>(diag);
    
    // 构建传递给 getBlasHandle 的 Plan 参数列表, 按顺序添加AsdSip::asdBlasMakeCtrmvPlan除handle外所有参数
    std::vector<int64_t> planParam = {static_cast<int64_t>(uplo_val), n};
    
    // 定义 Plan 构建函数
    auto makePlan = [uplo_val, n](AsdSip::asdBlasHandle handle) {
    AsdSip::asdBlasMakeCtrmvPlan(handle, uplo_val, n);
    };
    
    // 执行底层算子, 
    EXEC_BLAS_FUNC(AsdSip::asdBlasCtrmv, makePlan, planParam,
    uplo_val, trans_val, diag_val, n, A, lda, x, incx);
    // EXEC_BLAS_FUNC, 参数依次为算子入口函数, makePlan函数, planParam, 算子入口函数的入参(handle除外, 严格按顺序排序)
    // CTRMV 为原地修改算子,直接返回被修改后的 x
    return x;
    }
    
    // 算子注册模块
    TORCH_LIBRARY_FRAGMENT(torch_sip, m)
    {
    // x 为原地更新,使用 Tensor(a!) 标识可变引用
    m.def("asd_blas_ctrmv(Tensor A, Tensor(a!) x, int uplo, int trans, int diag) -> Tensor(a!)");
    }
    
    TORCH_LIBRARY_IMPL(torch_sip, PrivateUse1, m)
    {
    m.impl("asd_blas_ctrmv", &asdBlasCtrmv);
    }
    }
    
  2. torch_sip/init.py - Python 接口

    def asd_blas_ctrmv(mat_a, vec_x, uplo="L", trans="N", diag="N"):
    """
    Args:
      mat_a: (N, N) 复数三角矩阵
      vec_x: (N,) 复数向量
      uplo: "L" (下三角), "U" (上三角)
      trans: "N" (无转置), "T" (转置) 或 "C" (共轭转置)
      diag: "N" (非单位对角) 或 "U" (单位对角,对角线元素视为1)
    """
    uplo_map = {"L": 0, "U": 1}
    trans_map = {"N": 0, "T": 1, "C": 2}
    diag_map = {"N": 0, "U": 1}
    
    u_val = uplo_map.get(uplo.upper(), 0)
    t_val = trans_map.get(trans.upper(), 0)
    d_val = diag_map.get(diag.upper(), 0)
    
    return torch.ops.torch_sip.asd_blas_ctrmv(mat_a, vec_x, u_val, t_val, d_val)
    

添加新算子

要添加新算子(例如 asd_add):

  1. 添加到 csrc/asd_mul.cpp(或创建新文件):

    at::Tensor asd_add(const at::Tensor &x, const at::Tensor &y) {
        // 实现代码
    }
    
    TORCH_LIBRARY(TorchSip, m) {
        m.def("asd_add(Tensor x, Tensor y) -> Tensor");
    }
    
    TORCH_LIBRARY_IMPL(TorchSip, NPU, m) {
        m.impl("asd_add", &asd_add);
    }
    
  2. 添加到 torch_sip/init.py:

    def asd_add(x: torch.Tensor, y: torch.Tensor) -> torch.Tensor:
        return torch.ops.TorchSip.asd_add(x, y)
    

故障排除

构建错误

  • "torch/extension.h not found":确保已安装支持 NPU 的 PyTorch
  • "asdsip library not found":在 setup.py 中设置正确的库路径
  • "g++ not found":安装 g++ 编译器

运行时错误

  • "Device mismatch":确保张量在 NPU 设备上
  • "Shape mismatch":检查输入张量形状是否兼容
  • "torch.ops.TorchSip not found":确保扩展已构建并加载