AIKG-Triton

Triton Introduction

Triton is a high-performance AI kernel programming language, optimized for deep learning applications, supporting both Huawei Ascend and NVIDIA GPU devices. It currently serves as the backend for AIKG Triton kernel generation, providing efficient kernel code generation capabilities.

Reference Code

An example of relu_op generated by AIKG is as follows:

# ref: tests/resources/relu_op/relu_op_triton.py
@triton.jit
def relu_kernel(
    x_ptr,  # Input pointer
    output_ptr,  # Output pointer
    n_elements,  # Total number of elements
    BLOCK_SIZE: tl.constexpr,  # Number of elements processed per block
):
    # Get the program ID
    pid = tl.program_id(axis=0)
    # Calculate the starting position for this block
    block_start = pid * BLOCK_SIZE
    # Create offsets
    offsets = block_start + tl.arange(0, BLOCK_SIZE)
    # Create a mask to ensure no out-of-bounds access
    mask = offsets < n_elements

    # Load input data
    x = tl.load(x_ptr + offsets, mask=mask)

    # Perform ReLU: max(0, x)
    output = tl.maximum(x, 0.0)

    # Store the result
    tl.store(output_ptr + offsets, output, mask=mask)

Supported Backends

Huawei Atlas A2 Training Series Triton Backend Dependencies

Please refer to: https://gitee.com/ascend/triton-ascend

NVIDIA GPU Triton Backend Dependencies

Please refer to: https://github.com/triton-lang/triton