文件最后提交记录最后更新时间
megamoe torch_extension接口修改入参weight_type默认值 Co-authored-by: weixin_49300047<lijianhua56@huawei.com> # message auto-generated for no-merge-commit merge: !5894 merge fix_torch into master megamoe torch_extension接口修改入参weight_type默认值 Created-by: weixin_49300047 Commit-by: weixin_49300047 Merged-by: cann-robot Description: ## 描述 <!--在这里详细描述你的改动,包括改动的原因和所采取的方法。--> 修改接口入参的默认值,保证当用户不传参的情况下,weight_type默认为None ## 关联的Issue <!-- 如果这个PR是为了解决特定的Issue,请在这里提供Issue链接。例如:关联Issue #000--> <!-- 如果这个PR是为了解决特定的问题单,请在这里描述问题单单号。--> https://gitcode.com/cann/ops-transformer/issues/2664 ## 测试 <!--描述进行了哪些测试来验证你的改动。包括但不限于二级冒烟、算子泛化等。--> ## 文档更新 <!--如果这个PR包含文档的更新,请在这里指出。例如:更新了README.md文件。--> ## 类型标签 <!-- [x] 表示选中 --> - [x] 🐛 Bug 修复 - [ ] ✨ 新特性 - [ ] ⚡ 性能优化 - [ ] ♻️ 重构 - [ ] 🧪 测试 - [ ] 📦 构建/CI - [ ] 🔧 配置变更 - [ ] 📝 文档更新 - [ ] ⬆️ 依赖升级 - [ ] 🔒 安全修复 - [ ] 🧹 代码清理 - [ ] ❓ 其他,请描述: See merge request: cann/ops-transformer!58944 小时前
doc Tools工具扫描问题修改 Co-authored-by: gitee-yanglulu<yanglulul@h-partners.com> # message auto-generated for no-merge-commit merge: !3443 merge master into master doc Tools工具扫描问题修改 Created-by: gitee-yanglulu Commit-by: gitee-yanglulu Merged-by: cann-robot Description: doc Tools工具扫描问题修改 See merge request: cann/ops-transformer!34431 个月前
Add MoeDistributeDispatchV2 to npu_ops_transformer Co-authored-by: GJQ<gaojiaqin@hisilicon.com> # message auto-generated for no-merge-commit merge: !915 merge master into master Add MoeDistributeDispatchV2 to npu_ops_transformer Created-by: Jiaqin Commit-by: GJQ Merged-by: cann-robot Description: ## 描述 <!--在这里详细描述你的改动,包括改动的原因和所采取的方法。--> ## 关联的Issue <!-- 如果这个PR是为了解决特定的Issue,请在这里提供Issue链接。例如:关联Issue #000--> <!-- 如果这个PR是为了解决特定的问题单,请在这里描述问题单单号。--> ## 测试 <!--描述进行了哪些测试来验证你的改动。包括但不限于二级冒烟、算子泛化等。--> ## 文档更新 <!--如果这个PR包含文档的更新,请在这里指出。例如:更新了README.md文件。--> ## 类型标签 <!-- [x] 表示选中 --> - [ ] Bug修复 - [x] 新特性 - [ ] 性能优化 - [ ] 文档更新 - [ ] 其他,请描述: See merge request: cann/ops-transformer!9153 个月前
Add MoeDistributeDispatchV2 to npu_ops_transformer Co-authored-by: GJQ<gaojiaqin@hisilicon.com> # message auto-generated for no-merge-commit merge: !915 merge master into master Add MoeDistributeDispatchV2 to npu_ops_transformer Created-by: Jiaqin Commit-by: GJQ Merged-by: cann-robot Description: ## 描述 <!--在这里详细描述你的改动,包括改动的原因和所采取的方法。--> ## 关联的Issue <!-- 如果这个PR是为了解决特定的Issue,请在这里提供Issue链接。例如:关联Issue #000--> <!-- 如果这个PR是为了解决特定的问题单,请在这里描述问题单单号。--> ## 测试 <!--描述进行了哪些测试来验证你的改动。包括但不限于二级冒烟、算子泛化等。--> ## 文档更新 <!--如果这个PR包含文档的更新,请在这里指出。例如:更新了README.md文件。--> ## 类型标签 <!-- [x] 表示选中 --> - [ ] Bug修复 - [x] 新特性 - [ ] 性能优化 - [ ] 文档更新 - [ ] 其他,请描述: See merge request: cann/ops-transformer!9153 个月前
README.md

NPU Ops Transformer

npu_ops_transformer is a high-performance operator extension library designed for Ascend NPU. It leverages Just-In-Time(JIT) compilation to bridge PyTorch functional interfaces with ACLNN library.

Build & Installation

Prerequisites

  • OS: Linux
  • Python: 3.8+
  • Compiler: GCC 9.4.0+
  • Frameworks:
    • PyTorch>=2.6.0
    • torch_npu (matching your PyTorch version)
  • Toolkit: Ascend CANN Toolkit

Installation Steps

  1. Install Dependencies:

    python3 -m pip install -r requirements.txt
    
  2. Build the Wheel:

    # -n: non-isolated build (uses existing environment)
    python3 -m build --wheel -n
    
  3. Install Package:

    python3 -m pip install dist/*.whl --force-reinstall --no-deps
    

Quick Start

Using npu_ops_transformer is seamless. You can invoke NPU-accelerated operators directly through the library's opset.

import torch
import torch_npu
import npu_ops_transformer

# Initialize data on NPU
x = torch.randn(10, 32, dtype=torch.float32).npu()

# Call the custom NPU operator
# This triggers JIT compilation on the first call
npu_result = npu_ops_transformer.ops.abs(x)

# Verify against CPU ATen implementation
cpu_x = x.cpu()
cpu_result = torch.ops.aten.abs(cpu_x)

assert torch.allclose(cpu_result, npu_result.cpu(), rtol=1e-6)
print("Verification successful!")

Developer Guide: Adding a New Operator

To implement a new operator (e.g. abs), you need to provide two components: a C++ kernel wrapper and a Python JIT builder.

1. C++ Backend(ops/csrc/<OP_NAME>.cpp)

This file bridges PyTorch tensors to the ACLNN C-API.

#include <torch/extension.h>
#include "aclnn_common.h"

/**
 * @brief ACLNN Warpper for aclnnAbs
 * @param x Input Tensor (on NPU)
 * @return Result Tensor
 */
at::Tensor npu_abs(const at::Tensor &x)
{
    // 1. Manually allocate output tensor (standrad PyTorch practice)
    at::Tensor y = at::empty_like(x);

    // 2. Launch ACLNN kernel using the helper macro
    ACLNN_CMD(aclnnAbs, x, y);

    return y;
}

// Bind the C++ function to Python module
PYBIND11_MODULE(TORCH_EXTENSION_NAME, m)
{
    m.def("npu_abs", &npu_abs, "abs");
}

2. Python Frontend(ops/<OP_NAME>.py)

This file manages the JIT compilation logic and registers the operator into the PyTorch Dispatcher.

import torch
import torch_npu
from torch.library import impl
from npu_ops_transformer.op_builder.builder import OpBuilder
from npu_ops_transformer.op_builder.builder import AS_LIBRARY

class AbsOpBuilder(OpBuilder):
    def __init__(self):
        super(AbsOpBuilder, self).__init__("abs")

    def sources(self):
        """Path to C++ source code."""
        return ['ops/csrc/abs.cpp']

    def schema(self) -> str:
        """PyTorch operator signature."""
        return "abs(Tensor x) -> Tensor"

    def register_meta(self):
        """
        Registers the Meta implementation (Shape/Dtype inference).
        Essential for Autograd and FakeTensor support.
        """
        @impl(AS_LIBRARY, self.name, "Meta")
        def abs_meta(x):
            return torch.empty_like(x)

# Instantiate the builder
abs_op_builder = AbsOpBuilder()

@impl(AS_LIBRARY, abs_op_builder.name, "PrivateUse1")
def abs(x):
    """
    Dispatcher implementation for NPU.
    'PrivateUse1' is the dispatch key for custom NPU backends.
    """
    op_module = abs_op_builder.load()  # Compiles/loads the .so file
    return op_module.npu_abs(x)

Technical Notes

Component Responsibility
OpBuilder Handles JIT compilation of C++ source using ninja.
Meta Dispatch Allows PyTorch to know the output shape/type without running NPU code.
PrivateUse1 The specific backend key PyTorch uses to route NPU-specific operations.