The APIs provided by the msKL tool can call the tiling function in the msOpGen project and user-defined Kernel functions. It also provides a series of autotune APIs to help developers efficiently perform code replacement, compilation, execution, and performance comparison for multiple tuning points.
Must be filled in based on the implementation of the tiling function, for example, AddCustom, MatmulLeakyreluCustom, etc. This is the sole basis for the msKL tool to locate the tiling function. Refer to the lib_path parameter for the lookup logic.
Data Type: str.
NOTE:
If an operator of the same type (op_type) has been previously deployed in CANN, and the user modifies the tiling function and recompiles it, the operator must be redeployed in the CANN environment.
inputs
Input
Optional parameter.
Fills in tensor information in the order of the Kernel function input arguments. If a parameter is not used, pass None as a placeholder in the corresponding position.
Data Type: list. Each element must be a tensor or list[tensor]. If format or ori_format is not explicitly specified in inputs_info, all tensors default to ND format.
outputs
Input
Optional parameter.
Fills in tensor information in the order of the Kernel function input arguments. If a parameter is not used, pass None as a placeholder in the corresponding position.
Data Type: list. Each element must be a tensor or list[tensor]. If format or ori_format is not explicitly specified in inputs_info, all tensors default to ND format.
inputs_info
Input
Optional parameter.
Fills in info information in the order of the Kernel function input arguments. If a parameter is not used, pass an empty dict or None as a placeholder in the corresponding position.
Data Type: list. The data type of elements in the inputs_info parameter is dict or list[dict]. The description of each dict element is as follows:
ori_shape: Original dimension information of the input tensor.
shape: Dimension information of the input tensor at runtime.
dtype: Data type of the input tensor. For details, refer to "AI CPU API > Data Type Description > DataType" in the TBE&AI CPU Operator Development API.
ori_format: Original data layout format of the input tensor. Defaults to ND. For details, refer to "AI CPU API > Data Type Description > Format" in the TBE&AI CPU Operator Development API.
format: Data layout format of the input tensor. Defaults to ND. For details, refer to "AI CPU API > Data Type Description > Format" in the TBE&AI CPU Operator Development API.
data_path: File path of the input tensor's bin file in value dependency scenarios.
This input parameter has a constraint relationship with inputs:
When inputs is a tensor, inputs_info must be a dict.
When inputs is list[tensor], inputs_info must be list[dict].
When inputs is None, each element of inputs_info must contain at least [shape, dtype].
outputs_info
Input
Optional parameter.
Stores output information. If a parameter is not used, pass an empty dict as a placeholder in the corresponding position.
Data Type: list. The data type of elements in the outputs_info parameter is dict or list[dict]. The description of each dict element is as follows:
ori_shape: Original dimension information of the output tensor.
shape: Dimension information of the output tensor.
dtype: Data type of the output tensor. For details, refer to "AI CPU API > Data Type Description > DataType" in the TBE&AI CPU Operator Development API.
ori_format: Original data layout format of the output tensor. Defaults to ND. For details, refer to "AI CPU API > Data Type Description > Format" in the TBE&AI CPU Operator Development API.
format: Data layout format of the output tensor. Defaults to ND. For details, refer to "AI CPU API > Data Type Description > Format" in the TBE&AI CPU Operator Development API.
data_path: Reserved parameter, does not take effect.
Path to the liboptiling.so file generated by compiling the msOpGen project. It can be found in the project directory using find . -name 'liboptiling.so'. The msKL tool retrieves the user's tiling function in the order of deployed operators and then .so files.
Data Type: str.
soc_version
Input
Optional parameter.
Configured as the type of Ascend AI Processor.
NOTE:
Non-Atlas A3 Training Series Products/Atlas A3 Inference Series Products: Execute the npu-smi info command on the server where the Ascend AI Processor is installed to query and obtain the Chip Name information. The actual configuration value is AscendChip Name. For example, if the Chip Name value is xxxyy, the actual configuration value is Ascendxxxyy. When Ascendxxxyy is a path in a code sample, it needs to be configured as ascendxxxyy.
Atlas A3 Training Series Products/Atlas A3 Inference Series Products: Execute the npu-smi info -t board -i id -c Returns
Parameter Name
Description
blockdim
The number of cores configured by the user's tiling function.
Data Type: int.
workspace_size
This value is the workspace size requested by the user plus 78,643,200 Bytes reserved by the msKL tool.
Data Type: int.
workspace
The workspace space requested by the msKL tool for the user, with a size of workspace_size.
Data Type: numpy.array.
tiling_data
Stores tiling_data for calling the Kernel function.
Data Type: numpy.array.
tiling_key
The tiling_key configured by the user's tiling function. If not set by the user, the msKL tool defaults it to 0.
Path to the operator's kernel.o file. It can be found by executing the find . -name '*.o' command in the project directory.
Data Type: str.
kernel_type
Input
Optional parameter.
Operator type. Can be set to vec, cube, or mix.
If this parameter is not configured, the msKL tool may fail to obtain it. Therefore, manual assignment is recommended.
Data Type: str.
tiling_key
Input
Optional parameter.
The tiling_key used when calling the user's Kernel function. If this parameter is not configured, the msKL tool will use the result of the most recent call to tiling_func.
Data Type: int.
Returns
An executable Kernel object.
Table 1 Kernel Input Argument Introduction
Parameter Name
Input/Output
Description
device_id
Input
NPU device ID, which sets the ID of the Ascend AI Processor for running ST test cases.
Data Type: int.
If this parameter is not set, it defaults to 0.
timeout
Input
In camodel simulation scenarios, a longer timeout period needs to be set by default. Setting it to -1 means no limit.
Data Type: int.
Unit: ms, default value is 300000.
repeat
Input
Number of repeated runs, default value is 1.
Data Type: int.
stream
Input
Reserved Parameter.
kernel_name
Input
Reserved Parameter.
Note
The Kernel object type is CompiledKernel, which supports invoking the Kernel as follows: kernel[blockdim](arg1, arg2, ..., timeout=-1, device_id=0, repeat=1). During actual invocation, ensure that the input arguments of the CompiledKernel function are consistent with those when invoking the Kernel.
Sample
Sample 1:
defrun_kernel(input_a, input_b, input_bias, output, workspace, tiling_data):
kernel_binary_file = "MatmulLeakyreluCustom.o"# The names of .o files may vary slightly across different hardware and operating systems.
kernel = get_kernel_from_binary(kernel_binary_file)
return kernel(input_a, input_b, input_bias, output, workspace, tiling_data)
Sample 2:
defrun_kernel(input_a, input_b, input_bias, output, workspace, tiling_data, tiling_key, blockdim):
kernel_binary_file = "MatmulLeakyreluCustom.o"# The names of .o files may vary slightly across different hardware and operating systems.
kernel = get_kernel_from_binary(kernel_binary_file, kernel_type='mix', tiling_key=tiling_key)
return kernel[blockdim](input_a, input_b, input_bias, output, workspace, tiling_data, device_id=1, timeout=-1) # When running simulation, you need to manually set the timeout parameter to -1.
autotune
Function
Traverses the search space, tries different parameter combinations, and displays the runtime duration of each combination and the optimal combination.
Prototype
defautotune(configs: List[Dict], warmup: int = 300, repeat: int = 1, device_ids = [0]):
Parameters
Parameter Name
Input/Output
Required
Description
configs
Input
Required Parameter.
Search space definition.
Data Type: list[dict].
warmup
Input
Optional Parameter.
Device warm-up time before performance collection. Generally, a longer warm-up time results in more stable operator performance.
Unit: microseconds.
Default Value: 1000, with a value range of integers from 1 to 100000.
repeat
Input
Optional Parameter.
Number of repetitions. The average running time over multiple repetitions is taken as the operator's execution time.
Default Value: 1, with a value range of integers from 1 to 10000.
device_ids
Input
Optional Parameter.
List of Device IDs. Currently, only single-Device mode is supported. If multiple Device IDs are provided, only the first one takes effect.
classKernelInvokeConfig:
...
A configuration descriptor for a possible kernel developed based on an Act example
...
def__init__(self, kernel_src_file : str, kernel_name : str):
pass# The user can only pass a parameter of type KernelInvokeConfig.classLauncher:
def__init__(self, config: KernelInvokeConfig):
...
a classthat generates launch source code for a kernel
Args:
config (KernelInvokeConfig): A configuration descriptor for a kernel
...
compile
Function
Compiles the Kernel code delivery and returns an executable Kernel object.
Prototype
kernel = compile(build_script, gen_file)
Parameters
Parameter Name
Input/Output
Required
Description
build_script
Input
Required parameter.
Script used for template library Kernel compilation.
Data Type: str.
gen_file
Input
Required parameter.
File path of the Kernel code delivery generated by the code_gen interface. Generally, the return value of the code_gen interface is used directly.
Data Type: str.
output_bin_path
Input
Optional parameter.
Specifies the path of the executable file generated by compilation.
Data Type: str.
Default Value: _gen_module.so.
use_cache
Input
Optional parameter.
When enabled, compilation is not executed, and the file specified by output_bin_path is loaded instead.
Data Type: bool.
Default Value: False.
Returns
A runnable Kernel object, type: CompiledKernel, supports invoking the kernel as follows: kernel[blockdim](arg1, arg2, ..., timeout=-1, device_id=0, repeat=1), where arg1, arg2, ... are the input arguments of the Kernel.
Script file path used to compile the application to be tuned.
Data Type: str.
src_file
Input
Required parameter.
Code file path.
Data Type: str.
output_bin_path
Input
Optional parameter.
Specifies the path of the executable file generated by compilation.
Data Type: str.
Default Value: _gen_executable.
use_cache
Input
Optional parameter.
When enabled, compilation is not executed, and the file specified by output_bin_path is loaded instead.
Data Type: bool.
Default Value: False.
NOTE: When using the msDebug tool to invoke the compile interface, use_cache=True must be configured.
profiling_cmd
Input
-
Reserved parameter.
Returns
An executable program object, executable, of type CompiledExecutable. It supports invocation in the following manner: executable(arg1, arg2, ...), where arg1, arg2, ... are custom input arguments for the program.
Sample
executable = compile_executable(build_script, src_file)
executable(a, b, c)