0_launch_kernel
Description
This sample demonstrates the flow of running Add custom operator using CANN Runtime Kernel loading and execution interfaces, covering binary loading, kernel function handle retrieval, parameter assembly, task dispatch, Stream synchronization, and result verification. The sample supports simple and placeholder parameter organization modes. After running, input data is generated, Kernel is executed, and output result is verified.
Product Support
This sample supports the following products:
| Product | Supported |
|---|---|
| Ascend 950PR/Ascend 950DT | Yes |
| Atlas A3 training series products/Atlas A3 inference series products | Yes |
| Atlas A2 training series products/Atlas A2 inference series products | Yes |
Build and Run
- Download sample code to environment with CANN software installed, switch to sample directory.
cd ${git_clone_path}/example/2_advanced_features/kernel/0_launch_kernel
- Set environment variables.
# Replace ${install_root} with CANN installation root directory, default installation at /usr/local/Ascend
source ${install_root}/cann/set_env.sh
export ASCEND_INSTALL_PATH=${install_root}/cann
# Replace ${ascend_name} with Ascend AI processor model, obtained by checking Name field using npu-smi info and removing spaces
export SOC_VERSION=${ascend_name}
# Replace ${cmake_path} with ascendc.cmake directory, for example ${install_root}/cann/aarch64-linux/tikcpp/ascendc_kernel_cmake
export ASCENDC_CMAKE_DIR=${cmake_path}
If environment variables are not set beforehand, run.sh automatically attempts to detect ASCEND_INSTALL_PATH, ASCEND_HOME_PATH, $HOME/Ascend/cann, /usr/local/Ascend/cann, /opt/Ascend/cann, SOC_VERSION, and ASCENDC_CMAKE_DIR; if automatic detection fails, set manually using the above commands.
This sample data generation and result verification depends on numpy. Ensure Python environment has numpy installed before executing run.sh.
- Run the following command to execute the sample.
# mode can be simple or placeholder; defaults to simple if not specified
bash run.sh -r simple
In simple mode, Kernel pointer-type parameters use Device memory address allocated and data copied by user beforehand. In placeholder mode, data corresponding to placeholder parameters is transferred to Device side by Runtime at Kernel Launch time.
CANN RUNTIME API
Key features and interfaces in this sample:
- Initialization
- Call
aclInitinterface to initialize configuration. - Call
aclFinalizeinterface to deinitialize.
- Call
- Device Management
- Call
aclrtSetDeviceinterface to specify Device for computation. - Call
aclrtResetDeviceForceinterface to forcibly reset current computation Device and reclaim Device resources.
- Call
- Stream Management
- Call
aclrtCreateStreaminterface to create Stream. - Call
aclrtSynchronizeStreaminterface to block waiting for Stream task execution completion. - Call
aclrtDestroyStreamForceinterface to forcibly destroy Stream.
- Call
- Memory Management
- Call
aclrtMallocHostinterface to allocate Host memory. - Call
aclrtMallocinterface to allocate Device memory. - Call
aclrtFreeHostinterface to release Host memory. - Call
aclrtFreeinterface to release Device memory.
- Call
- Data Transfer
- Call
aclrtMemcpyinterface to implement data transfer between Host and Device.
- Call
- Kernel Loading and Execution
- Call
aclrtBinaryLoadFromFileinterface to load and parse operator binary file from file. - Call
aclrtBinaryGetFunctioninterface to get kernel function handle. - Call
aclrtKernelArgsInitinterface to initialize parameter list based on kernel function handle. - Call
aclrtKernelArgsAppendinterface to append parameters to parameter list. - Call
aclrtKernelArgsAppendPlaceHolderinterface to append placeholder parameters. - Call
aclrtKernelArgsGetPlaceHolderBufferinterface to get Host memory address corresponding to placeholder parameters. - Call
aclrtKernelArgsFinalizeinterface to mark parameter assembly complete. - Call
aclrtLaunchKernelWithConfiginterface to dispatch Kernel computation task. - Call
aclrtBinaryUnLoadinterface to unload operator binary file.
- Call
Sample Output
Configuring CMake...
Building...
...
[INFO] Kernel launch sample runs in simple mode.
[INFO] Run the launch_kernel sample successfully.
... output/output_z.bin
... output/golden.bin
error ratio: 0.0000, tolerance: 0.0010
[SUCCESS] result correct