Gggppff24docs: add api extension documentation for al and bl

to_buffer

1. Hardware Background

Used to convert a tl.tensor object into a bl.buffer object specific to Ascend hardware. It is the core conversion interface between tensors and hardware memory buffers.

2. Interface Definition

Python
def to_buffer(
tensor: tl.tensor,
space: address_space = None,
bind_buffer: buffer = None,
_builder=None
) -> buffer:

3. Parameter Description

Parameter	Type	Required	Description
tensor	tl.tensor	Yes	The input tensor to be converted into a buffer
space	bl.address_space	No	Specifies the Ascend hardware address space where the target buffer resides
bind_buffer	bl.buffer	No	Optional. Bind the tensor directly to the specified target buffer
_builder	-	Internal parameter	Automatically passed by the compiler; users do not need to use it

4. Return Value

Returns the bl.buffer object corresponding to the input tensor.
If a bind_buffer parameter is provided, the function returns that bound buffer directly.

5. Constraints

The interface follows the same constraint rules as bl.allocate_local_buffer.
The address-space parameter must strictly match memory regions supported by Ascend hardware (UB/L1/L0A/L0B/L0C).

6. Complete Example

Basic Usage (Kernel Definition + Compilation Verification)

Python
import triton
import triton.language as tl
from triton.compiler import ASTSource
import triton.extension.buffer.language as bl
import triton.language.extra.cann.extension as al

# Get the current hardware compilation target
target = triton.runtime.driver.active.get_current_target()

@triton.jit
def to_buffer_kernel():
# 1. Basic conversion: no address space specified
a = tl.full((32, 2, 4), 0, dtype=tl.int64)
a_buf = bl.to_buffer(a)
# 2. Convert and specify the UB address space
b = tl.full((32, 2, 4), 0, dtype=tl.int64)
b_buf = bl.to_buffer(b, al.ascend_address_space.UB)
# 3. Convert and specify the L1 address space
c = tl.full((32, 2, 4), 0, dtype=tl.int64)
c_buf = bl.to_buffer(c, al.ascend_address_space.L1)
# 4. Convert and specify the L0A address space
d = tl.full((32, 2, 4), 0, dtype=tl.int64)
d_buf = bl.to_buffer(d, al.ascend_address_space.L0A)
# 5. Convert and specify the L0B address space
e = tl.full((32, 2, 4), 0, dtype=tl.int64)
e_buf = bl.to_buffer(e, al.ascend_address_space.L0B)
# 6. Convert and specify the L0C address space
f = tl.full((32, 2, 4), 0, dtype=tl.int64)
f_buf = bl.to_buffer(f, al.ascend_address_space.L0C)

# Compilation test function
def test_to_buffer():
src = ASTSource(
fn=to_buffer_kernel,
constants={},
signature={},
)
# Compile the kernel (to validate API legality)
triton.compile(src=src, target=target)
print("✅ to_buffer API compilation verified successfully")

if __name__ == "__main__":
test_to_buffer()

Advanced Usage (Compilation + IR Printing)

Python
# Compile and print Triton IR (recommended for debugging)
def test_to_buffer_print_ir():
src = ASTSource(
fn=to_buffer_kernel,
constants={},
signature={},
)
# Enable IR dumping
compile_options = {"dump_ir": True, "optimization_level": 0}
compiled_kernel = triton.compile(src=src, target=target, options=compile_options)
print("\n📄 Kernel IR dump complete")

if __name__ == "__main__":
test_to_buffer_print_ir()

7. Key Notes

This interface is the core entry point for converting between tensors and hardware buffers.
Supports manual selection of the full set of Ascend hardware address spaces (UB/L1/L0).
Supports binding to existing buffers to satisfy fine-grained memory management requirements.
Can only be used inside kernel functions decorated with @triton.jit.