Gggppff24docs: add api extension documentation for al and bl

to_buffer

1. 硬件背景

用于将 tl.tensor 张量对象转换为昇腾硬件专用的 bl.buffer 缓冲区对象，是张量与硬件内存缓冲区的核心转换接口。

2. 接口定义

Python
def to_buffer(
tensor: tl.tensor,
space: address_space = None,
bind_buffer: buffer = None,
_builder=None
) -> buffer:

3. 参数说明

参数名	类型	是否必需	说明
tensor	tl.tensor	是	需要转换为缓冲区的输入张量
space	bl.address_space	否	指定目标缓冲区所在的昇腾硬件地址空间
bind_buffer	bl.buffer	否	可选，将张量直接绑定到指定的目标缓冲区
_builder	-	内部参数	编译器自动传参，用户无需使用

4. 返回值

返回与输入张量对应的 bl.buffer 对象
若传入 bind_buffer 参数，直接返回该绑定缓冲区本身

5. 约束说明

接口约束规则与 bl.allocate_local_buffer 保持一致
地址空间参数需严格匹配昇腾硬件支持的内存区域（UB/L1/L0A/L0B/L0C）

6. 完整使用示例

基础用法（内核定义 + 编译验证）

Python
import triton
import triton.language as tl
from triton.compiler import ASTSource
import triton.extension.buffer.language as bl
import triton.language.extra.cann.extension as al

# 获取当前硬件编译目标
target = triton.runtime.driver.active.get_current_target()

@triton.jit
def to_buffer_kernel():
# 1. 基础转换：无指定地址空间
a = tl.full((32, 2, 4), 0, dtype=tl.int64)
a_buf = bl.to_buffer(a)
# 2. 转换并指定 UB 地址空间
b = tl.full((32, 2, 4), 0, dtype=tl.int64)
b_buf = bl.to_buffer(b, al.ascend_address_space.UB)
# 3. 转换并指定 L1 地址空间
c = tl.full((32, 2, 4), 0, dtype=tl.int64)
c_buf = bl.to_buffer(c, al.ascend_address_space.L1)
# 4. 转换并指定 L0A 地址空间
d = tl.full((32, 2, 4), 0, dtype=tl.int64)
d_buf = bl.to_buffer(d, al.ascend_address_space.L0A)
# 5. 转换并指定 L0B 地址空间
e = tl.full((32, 2, 4), 0, dtype=tl.int64)
e_buf = bl.to_buffer(e, al.ascend_address_space.L0B)
# 6. 转换并指定 L0C 地址空间
f = tl.full((32, 2, 4), 0, dtype=tl.int64)
f_buf = bl.to_buffer(f, al.ascend_address_space.L0C)

# 编译测试函数
def test_to_buffer():
src = ASTSource(
fn=to_buffer_kernel,
constants={},
signature={},
)
# 编译内核（验证接口合法性）
triton.compile(src=src, target=target)
print("✅ to_buffer 接口编译验证成功")

if __name__ == "__main__":
test_to_buffer()

进阶用法（编译 + 打印 IR）

Python
# 编译并打印 Triton IR（推荐用于调试）
def test_to_buffer_print_ir():
src = ASTSource(
fn=to_buffer_kernel,
constants={},
signature={},
)
# 开启 IR 打印
compile_options = {"dump_ir": True, "optimization_level": 0}
compiled_kernel = triton.compile(src=src, target=target, options=compile_options)
print("\n📄 内核 IR 打印完成")

if __name__ == "__main__":
test_to_buffer_print_ir()

7. 核心说明

该接口是张量 ↔ 硬件缓冲区的核心转换入口
支持手动指定昇腾全系列硬件地址空间（UB/L1/L0）
支持绑定现有缓冲区，满足精细化内存管理需求
仅可在 @triton.jit 修饰的内核函数中使用