MindStudio Kernel Launcher API Reference

API List

The APIs provided by the msKL tool can call the tiling function in the msOpGen project and user-defined Kernel functions. It also provides a series of autotune APIs to help developers efficiently perform code replacement, compilation, execution, and performance comparison for multiple tuning points.

Table 1 msKL API List

Category

Interface

Description

Calling msOpGen Project

tiling_func

Calls the user's tiling function.

get_kernel_from_binary

Generates an instance that can call the user's Kernel function.

Auto Tuning

autotune

Traverses the search space, tries different parameter combinations, and displays the runtime of each combination along with the optimal combination.

code_gen

Generates Kernel delivery code based on the input template library Kernel information.

compile

Compiles the Kernel delivery code and returns an executable Kernel object.

autotune_v2

Traverses the search space, tries different parameter combinations, and displays the runtime of each combination along with the optimal combination.

compile_executable

Compiles the code and returns an executable object.

API Details

tiling_func

Function

Calls the user's tiling function.

Note

tiling_func does not support calling the GetCompileInfo interface in Basic Data Structure and Interface Reference.

Prototype

def tiling_func(op_type: str, inputs: list, outputs: list, lib_path: str,
                inputs_info: list = None, outputs_info: list = None, attr=None, soc_version: str = None) -> TilingOutput

Parameters

Parameter Name

Input/Output

Required

Description

op_type

Input

Required parameter.

Must be filled in based on the implementation of the tiling function, for example, AddCustom, MatmulLeakyreluCustom, etc. This is the sole basis for the msKL tool to locate the tiling function. Refer to the lib_path parameter for the lookup logic.

Data Type: str.

NOTE:

If an operator of the same type (op_type) has been previously deployed in CANN, and the user modifies the tiling function and recompiles it, the operator must be redeployed in the CANN environment.

inputs

Input

Optional parameter.

Fills in tensor information in the order of the Kernel function input arguments. If a parameter is not used, pass None as a placeholder in the corresponding position.

Data Type: list. Each element must be a tensor or list[tensor]. If format or ori_format is not explicitly specified in inputs_info, all tensors default to ND format.

outputs

Input

Optional parameter.

Fills in tensor information in the order of the Kernel function input arguments. If a parameter is not used, pass None as a placeholder in the corresponding position.

Data Type: list. Each element must be a tensor or list[tensor]. If format or ori_format is not explicitly specified in inputs_info, all tensors default to ND format.

inputs_info

Input

Optional parameter.

Fills in info information in the order of the Kernel function input arguments. If a parameter is not used, pass an empty dict or None as a placeholder in the corresponding position.

Data Type: list. The data type of elements in the inputs_info parameter is dict or list[dict]. The description of each dict element is as follows:

  • ori_shape: Original dimension information of the input tensor.
  • shape: Dimension information of the input tensor at runtime.
  • dtype: Data type of the input tensor. For details, refer to "AI CPU API > Data Type Description > DataType" in the TBE&AI CPU Operator Development API.
  • ori_format: Original data layout format of the input tensor. Defaults to ND. For details, refer to "AI CPU API > Data Type Description > Format" in the TBE&AI CPU Operator Development API.
  • format: Data layout format of the input tensor. Defaults to ND. For details, refer to "AI CPU API > Data Type Description > Format" in the TBE&AI CPU Operator Development API.
  • data_path: File path of the input tensor's bin file in value dependency scenarios.

Example:

[{"ori_shape": [8, 2048], "shape": [8, 2048], "dtype": "float16", "ori_format": "ND", "format": "ND"},
 {"ori_shape": [8, 2048], "shape": [8, 2048], "dtype": "float16", "ori_format": "ND", "format": "ND"}]
NOTE:

This input parameter has a constraint relationship with inputs:

  • When inputs is a tensor, inputs_info must be a dict.
  • When inputs is list[tensor], inputs_info must be list[dict].
  • When inputs is None, each element of inputs_info must contain at least [shape, dtype].

outputs_info

Input

Optional parameter.

Stores output information. If a parameter is not used, pass an empty dict as a placeholder in the corresponding position.

Data Type: list. The data type of elements in the outputs_info parameter is dict or list[dict]. The description of each dict element is as follows:

  • ori_shape: Original dimension information of the output tensor.
  • shape: Dimension information of the output tensor.
  • dtype: Data type of the output tensor. For details, refer to "AI CPU API > Data Type Description > DataType" in the TBE&AI CPU Operator Development API.
  • ori_format: Original data layout format of the output tensor. Defaults to ND. For details, refer to "AI CPU API > Data Type Description > Format" in the TBE&AI CPU Operator Development API.
  • format: Data layout format of the output tensor. Defaults to ND. For details, refer to "AI CPU API > Data Type Description > Format" in the TBE&AI CPU Operator Development API.
  • data_path: Reserved parameter, does not take effect.

Example:

[{"shape": [8, 2048], "dtype": "float16", "format": "ND"},
 {"shape": [8, 2048], "dtype": "float16", "format": "ND"}]
NOTE:

This input parameter has a constraint relationship with inputs:

  • When outputs is a tensor, outputs_info must be a dict.
  • When outputs is list[tensor], outputs_info must be list[dict].
  • When outputs is None, each element of outputs_info must contain at least [shape, dtype].

attr

Input

Optional parameter.

Operator attributes used by the tiling function.

Data Type: dict or list.

NOTE:
  • For the dict format, keys and values can only contain uppercase and lowercase English letters, digits, and underscores.
    {
      "a1": 1,
      "a2": False,
      "a3": "ssss",
      "a4": 1.2,
      "a5": [111, 222, 333],
      "a6": [111.111, 111.222, 111.333],
      "a7": [True, False],
      "a8": ["asdf", "zxcv"],
      "a9": [[1, 2, 3, 4], [5, 6, 7, 8], [5646, 2345]],
    }
  • For the list format, it is recommended. If an attr needs to pass an empty list, this format must be used (e.g., "a10" below).
    • The values of "name" and "value" can only contain uppercase and lowercase English letters, digits, and underscores.
    • "dtype": Data type of the input tensor.
    [
      {"name": "a1", "dtype": "int", "value": 1},
      {"name": "a2", "dtype": "bool", "value": False},
      {"name": "a3", "dtype": "str", "value": "ssss"},
      {"name": "a4", "dtype": "float", "value": 1.2},
      {"name": "a5", "dtype": "list_float", "value": [111.111, 111.222, 111.333]},
      {"name": "a6", "dtype": "list_bool", "value": [True, False]},
      {"name": "a7", "dtype": "list_str", "value": ["asdf", "zxcv"]},
      {"name": "a8", "dtype": "list_list_int", "value": [[1, 2, 3, 4], [5, 6, 7, 8], [5646, 2345]]},
      {"name": "a9", "dtype": "list_int", "value": [111, 222, 333]},
      {"name": "a10", "dtype": "list_int", "value": []},
      {"name": "a11", "dtype": "int64", "value": 2},
      {"name": "a12", "dtype": "float32", "value": 1.3},
      {"name": "a13", "dtype": "string", "value": "ssss"},
      {"name": "a14", "dtype": "list_string", "value": ["asdf", "zxcv"]},
    ]

lib_path

Input

Optional parameter.

Path to the liboptiling.so file generated by compiling the msOpGen project. It can be found in the project directory using find . -name 'liboptiling.so'. The msKL tool retrieves the user's tiling function in the order of deployed operators and then .so files.

Data Type: str.

soc_version

Input

Optional parameter.

Configured as the type of Ascend AI Processor.

NOTE:
  • Non-Atlas A3 Training Series Products/Atlas A3 Inference Series Products: Execute the npu-smi info command on the server where the Ascend AI Processor is installed to query and obtain the Chip Name information. The actual configuration value is AscendChip Name. For example, if the Chip Name value is xxxyy, the actual configuration value is Ascendxxxyy. When Ascendxxxyy is a path in a code sample, it needs to be configured as ascendxxxyy.
  • Atlas A3 Training Series Products/Atlas A3 Inference Series Products: Execute the npu-smi info -t board -i id -c Returns

    Parameter Name

    Description

    blockdim

    The number of cores configured by the user's tiling function.

    Data Type: int.

    workspace_size

    This value is the workspace size requested by the user plus 78,643,200 Bytes reserved by the msKL tool.

    Data Type: int.

    workspace

    The workspace space requested by the msKL tool for the user, with a size of workspace_size.

    Data Type: numpy.array.

    tiling_data

    Stores tiling_data for calling the Kernel function.

    Data Type: numpy.array.

    tiling_key

    The tiling_key configured by the user's tiling function. If not set by the user, the msKL tool defaults it to 0.

    Data Type: int.

    Sample

    M = 1024
    N = 640
    K = 256
    input_a = np.random.randint(1, 10, [M, K]).astype(np.float16)
    input_b = np.random.randint(1, 10, [K, N]).astype(np.float16)
    input_bias = np.random.randint(1, 10, [N]).astype(np.float32)
    output = np.zeros([M, N]).astype(np.float32)
    # tiling data
    tiling_output = mskl.tiling_func(
        op_type="MatmulLeakyreluCustom",
        inputs=[input_a, input_b, input_bias], outputs=[output],
        lib_path="liboptiling.so",  # Tiling code compilation artifact. 
    )
    

    get_kernel_from_binary

    Function

    Generates an instance that can call the user's Kernel function.

    Prototype

    def get_kernel_from_binary(kernel_binary_file: str, kernel_type: str = None, tiling_key: int = None) -> CompiledKernel
    

    Parameters

    Parameter Name

    Input/Output

    Required

    Description

    kernel_binary_file

    Input

    Required parameter.

    Path to the operator's kernel.o file. It can be found by executing the find . -name '*.o' command in the project directory.

    Data Type: str.

    kernel_type

    Input

    Optional parameter.

    Operator type. Can be set to vec, cube, or mix.

    If this parameter is not configured, the msKL tool may fail to obtain it. Therefore, manual assignment is recommended.

    Data Type: str.

    tiling_key

    Input

    Optional parameter.

    The tiling_key used when calling the user's Kernel function. If this parameter is not configured, the msKL tool will use the result of the most recent call to tiling_func.

    Data Type: int.

    Returns

    An executable Kernel object.

    Table 1 Kernel Input Argument Introduction

    Parameter Name

    Input/Output

    Description

    device_id

    Input

    NPU device ID, which sets the ID of the Ascend AI Processor for running ST test cases.

    Data Type: int.

    If this parameter is not set, it defaults to 0.

    timeout

    Input

    In camodel simulation scenarios, a longer timeout period needs to be set by default. Setting it to -1 means no limit.

    Data Type: int.

    Unit: ms, default value is 300000.

    repeat

    Input

    Number of repeated runs, default value is 1.

    Data Type: int.

    stream

    Input

    Reserved Parameter.

    kernel_name

    Input

    Reserved Parameter.

    Note

    The Kernel object type is CompiledKernel, which supports invoking the Kernel as follows: kernel[blockdim](arg1, arg2, ..., timeout=-1, device_id=0, repeat=1). During actual invocation, ensure that the input arguments of the CompiledKernel function are consistent with those when invoking the Kernel.

    Sample

    • Sample 1:

      def run_kernel(input_a, input_b, input_bias, output, workspace, tiling_data):
          kernel_binary_file = "MatmulLeakyreluCustom.o"   # The names of .o files may vary slightly across different hardware and operating systems.
          kernel = get_kernel_from_binary(kernel_binary_file)
          return kernel(input_a, input_b, input_bias, output, workspace, tiling_data)
      
    • Sample 2:

      def run_kernel(input_a, input_b, input_bias, output, workspace, tiling_data, tiling_key, blockdim):
          kernel_binary_file = "MatmulLeakyreluCustom.o"    # The names of .o files may vary slightly across different hardware and operating systems.
          kernel = get_kernel_from_binary(kernel_binary_file, kernel_type='mix', tiling_key=tiling_key)
          return kernel[blockdim](input_a, input_b, input_bias, output, workspace, tiling_data, device_id=1, timeout=-1) # When running simulation, you need to manually set the timeout parameter to -1.
      

    autotune

    Function

    Traverses the search space, tries different parameter combinations, and displays the runtime duration of each combination and the optimal combination.

    Prototype

    def autotune(configs: List[Dict], warmup: int = 300, repeat: int = 1, device_ids = [0]):
    

    Parameters

    Parameter Name

    Input/Output

    Required

    Description

    configs

    Input

    Required Parameter.

    Search space definition.

    Data Type: list[dict].

    warmup

    Input

    Optional Parameter.

    Device warm-up time before performance collection. Generally, a longer warm-up time results in more stable operator performance.

    Unit: microseconds.

    Default Value: 1000, with a value range of integers from 1 to 100000.

    repeat

    Input

    Optional Parameter.

    Number of repetitions. The average running time over multiple repetitions is taken as the operator's execution time.

    Default Value: 1, with a value range of integers from 1 to 10000.

    device_ids

    Input

    Optional Parameter.

    List of Device IDs. Currently, only single-Device mode is supported. If multiple Device IDs are provided, only the first one takes effect.

    Default Value: [0].

    Returns

    None.

    Sample

    @mskl.autotune(configs=[
        {'L1TileShape': 'MatmulShape<64, 64, 64>', 'L0TileShape': 'MatmulShape<128, 256, 64>'},
        {'L1TileShape': 'MatmulShape<64, 64, 128>', 'L0TileShape': 'MatmulShape<128, 256, 64>'},
        {'L1TileShape': 'MatmulShape<64, 128, 128>', 'L0TileShape': 'MatmulShape<128, 256, 64>'},
        {'L1TileShape': 'MatmulShape<64, 128, 128>', 'L0TileShape': 'MatmulShape<64, 256, 64>'},
        {'L1TileShape': 'MatmulShape<128, 128, 128>', 'L0TileShape': 'MatmulShape<128, 256, 64>'},
    ], warmup=500, repeat=10, device_ids=[0])
    def basic_matmul(problem_shape, a, layout_a, b, layout_b, c, layout_c):
        kernel = get_kernel()
        blockdim = 20
        return kernel[blockdim](problem_shape, a, layout_a, b, layout_b, c, layout_c)
    

    code_gen

    Function

    Generates the Kernel code delivery based on the input template library Kernel information.

    Prototype

    gen_file = mskl.Launcher(config).code_gen()
    

    Parameters

    Parameter Name

    Input/Output

    Required

    Description

    gen_file

    Input

    Optional Parameter.

    Specifies the File Path for generating the Kernel-side code delivery.

    Data Type: str.

    Defaults to _gen_launch.cpp.

    Returns

    The File Path of the generated code.

    Sample

    config = mskl.KernelInvokeConfig(kernel_file, kernel_name) 
    gen_file = mskl.Launcher(config).code_gen() 
    

    Related Class/Structure Definitions

    class KernelInvokeConfig:
        ...
        A configuration descriptor for a possible kernel developed based on an Act example
        ...
        def __init__(self, kernel_src_file : str, kernel_name : str):
            pass
    # The user can only pass a parameter of type KernelInvokeConfig.
    class Launcher:
        def __init__(self, config: KernelInvokeConfig): 
          ...
            a class that generates launch source code for a kernel
    
            Args:
                config (KernelInvokeConfig): A configuration descriptor for a kernel
            ...
    

    compile

    Function

    Compiles the Kernel code delivery and returns an executable Kernel object.

    Prototype

    kernel = compile(build_script, gen_file)
    

    Parameters

    Parameter Name

    Input/Output

    Required

    Description

    build_script

    Input

    Required parameter.

    Script used for template library Kernel compilation.

    Data Type: str.

    gen_file

    Input

    Required parameter.

    File path of the Kernel code delivery generated by the code_gen interface. Generally, the return value of the code_gen interface is used directly.

    Data Type: str.

    output_bin_path

    Input

    Optional parameter.

    Specifies the path of the executable file generated by compilation.

    Data Type: str.

    Default Value: _gen_module.so.

    use_cache

    Input

    Optional parameter.

    When enabled, compilation is not executed, and the file specified by output_bin_path is loaded instead.

    Data Type: bool.

    Default Value: False.

    Returns

    A runnable Kernel object, type: CompiledKernel, supports invoking the kernel as follows: kernel[blockdim](arg1, arg2, ..., timeout=-1, device_id=0, repeat=1), where arg1, arg2, ... are the input arguments of the Kernel.

    Sample

    kernel = compile(build_script, gen_file)
    kernel[blockdim](arg1, arg2, ..., device_id=0)
    

    Table 1 Optional input arguments of CompiledKernel

    Parameter Name

    Input/Output

    Description

    device_id

    Input

    NPU device ID, which sets the ID of the Ascend AI Processor for running ST test cases.

    Data Type: int.

    Defaults to 0 if this parameter is not set.

    timeout

    Input

    A longer timeout period needs to be set by default in camodel simulation scenarios. Setting it to -1 means no limit.

    Data Type: int.

    Unit: ms. The default value is 300000.

    repeat

    Input

    Number of repeated runs. The default value is 1.

    Data Type: int.

    stream

    Input

    Reserved Parameter.

    kernel_name

    Input

    Reserved Parameter.

    autotune_v2

    Function

    Traverses the search space, tries different parameter combinations, and displays the runtime of each combination and the optimal combination.

    Prototype

    def autotune_v2(configs: List[Dict], warmup_times = 5)
    

    Parameters

    Parameter Name

    Input/Output

    Required

    Description

    configs

    Input

    Required parameter.

    Search space definition.

    Data Type: list[dict].

    warmup_times

    Input

    Optional parameter.

    Number of device warm-up iterations before performance collection.

    Default Value: 5, with a value range of integers from 1 to 500.

    Returns

    None.

    Sample

    @mskl.autotune_v2(configs=[
        {'L1TileShape': 'GemmShape<128, 256, 256>', 'L0TileShape': 'GemmShape<128, 256, 64>'},
        {'L1TileShape': 'GemmShape<256, 128, 256>', 'L0TileShape': 'GemmShape<256, 128, 64>'},
        {'L1TileShape': 'GemmShape<128, 128, 256>', 'L0TileShape': 'GemmShape<128, 128, 64>'},
        {'L1TileShape': 'GemmShape<128, 128, 512>', 'L0TileShape': 'GemmShape<128, 128, 64>'},
        {'L1TileShape': 'GemmShape<64, 256, 128>', 'L0TileShape': 'GemmShape<64, 256, 64>'},
    ], warmup_times=10)
    def run_executable(m, n, k, device_id):
        src_file = "./basic_matmul.cpp"
        build_script = "./jit_build_executable.sh" # executable compile script
        executable = mskl.compile_executable(build_script=build_script, src_file=src_file, use_cache=False)
        return executable(m, n, k, device_id)
    

    compile_executable

    Function

    Compiles code and returns an executable object.

    Prototype

    executable = compile_executable(build_script, src_file)
    

    Parameters

    Parameter Name

    Input/Output

    Required

    Description

    build_script

    Input

    Required parameter.

    Script file path used to compile the application to be tuned.

    Data Type: str.

    src_file

    Input

    Required parameter.

    Code file path.

    Data Type: str.

    output_bin_path

    Input

    Optional parameter.

    Specifies the path of the executable file generated by compilation.

    Data Type: str.

    Default Value: _gen_executable.

    use_cache

    Input

    Optional parameter.

    When enabled, compilation is not executed, and the file specified by output_bin_path is loaded instead.

    Data Type: bool.

    Default Value: False.

    NOTE: When using the msDebug tool to invoke the compile interface, use_cache=True must be configured.

    profiling_cmd

    Input

    -

    Reserved parameter.

    Returns

    An executable program object, executable, of type CompiledExecutable. It supports invocation in the following manner: executable(arg1, arg2, ...), where arg1, arg2, ... are custom input arguments for the program.

    Sample

    executable = compile_executable(build_script, src_file)
    executable(a, b, c)