Extended Functions

JSON Configuration File Description

Write the JSON file for operator definition. For details about the parameters, see Table 1 Parameters in the JSON file and Table 2 test_cases_parameters.

For example, the JSON configuration file is named add_test.json. Developers can modify test data and other configuration parameters based on this template.

{
  "kernel_name": "add_custom",
  "kernel_path": "./add_custom.o",
  "blockdim": 8,
  "mode": "ca",
  "device_id": 0,
  "magic": "RT_DEV_BINARY_MAGIC_ELF_AIVEC",
  "test_cases": [
    {
      "case_name": "Test_AddCustom_001",
      "param_desc": [
        {
          "param_type": "input",
          "type": "float16",
          "shape": [
            8,
            2048
          ],
          "data_path": "./input_x.bin",
          "name": "x"
        },
        {
          "param_type": "input",
          "type": "float16",
          "shape": [
            8,
            2048
          ],
          "data_path": "./input_y.bin",
          "name": "y"
        },
        {
          "param_type": "output",
          "type": "float16",
          "shape": [
            8,
            2048
          ],
          "name": "z"
        },
        {
          "param_type": "workspace",
          "user_workspace_size": 4096
        },
        {
          "param_type": "tiling",
          "tiling_data_size": 8,
          "tiling_data_path": "./tiling.bin"
        }
      ]
    }
  ]
}

Table 1 Parameters in the JSON file

Parameter Description Type Mandatory
kernel_name Kernel function name. string Yes
kernel_path Path of the binary .o file of the kernel function. The path can be either absolute or relative. string Yes
blockdim Number of cores required for running the kernel function. The default value is 1. int No
mode Test mode.
- Onboard: onboard
- Performance simulation: ca
string Yes
device_id ID of the AI processor used for running. The default value is 0. int No
tiling_key Tiling key of the current dynamic operator. uint64 No
magic Operator type.
- Cube operator: RT_DEV_BINARY_MAGIC_ELF_AICUBE
- Vector operator: RT_DEV_BINARY_MAGIC_ELF_AIVEC
- Mixed fusion operator: RT_DEV_BINARY_MAGIC_ELF (only for Atlas A3 training products, Atlas A3 inference products, Atlas A2 training products, and Atlas A2 inference products)
string Yes
test_cases Test data. This can be a list, with each element containing a test case. For details, see Table 2 test_cases parameters. list Yes

NOTE

  • The tiling_key parameter applies only to dynamic operators.
  • For Atlas inference products, the magic parameter must be set to RT_DEV_BINARY_MAGIC_ELF.
  • For operator on-board or simulation tuning, only one case can be configured for the test_cases parameter.

Table 2 test_cases parameters

Parameter

Note

Type

Mandatory

case_name

-

-

Test case name, which must be unique.

string

Yes

param_desc

-

-

Test case description. This can be a list, with each element representing a kernel function parameter.

list

Yes

-

param_type

input/output/workspace/tiling/fftsAddr

Parameter type.

string

Yes

-

type

-

Supported input and output data types, such as **uint8**, **int16**, **int32**, **float16**, **float32** and **float**.

This parameter is mandatory when **param_type** is set to **input** or **output**.

string

No

-

shape

-

Shapes supported by the input and output tensors. All input and output tensors must support the same number of shapes.

For example, **[8, 3, 256, 256]**.

If an invalid shape is entered, for example, **[0]**, an error is reported.

This parameter is mandatory when **param_type** is set to **input** or **output**.

list

No

-

data_path

-

Path of the input data .bin file.

  • When **param_type** is set to **input**, **data_path** or **value_range** must be set, and **data_path** has a higher priority.
  • To set an empty **data_path** in the JSON file, set **"data_path":"null"**. For details about the JSON file, see JSON Configuration Description.

string

No

-

name

-

Parameter name, which must be unique.

This parameter is mandatory when **param_type** is set to **input** or **output**.

string

No

-

user_workspace_size

-

Size of **workspace** set by the user.

This parameter is mandatory when **param_type** is set to **workspace**.

int

No

-

tiling_data_size

-

Size of **tiling** data.

This parameter is mandatory when **param_type** is set to **tiling**.

int

No

-

tiling_data_path

-

Path of the tiling data .bin file.

This parameter is mandatory when **param_type** is set to **tiling**.

string

No

-

data_size

-

Size of **fftsAddr** data.

This parameter is mandatory when **param_type** is set to **fftsAddr**.

int

No

[!NOTICE]NOTICE

  • The number of parameter values in output must be the same as that in input. Otherwise, test case generation fails. For example, if input supports two types, output must also support two types. Similarly, the number of values of type, shape, or value_range in each input or output must be the same.
  • The number of parameter values in each input of an operator must be the same. Otherwise, test case generation fails. The number of values of type, shape, and value_range in each input must be the same.

mstx Extended Functions

mstx API Overview

MindStudio provides the mstx profiling API, which enables users to embed custom markers within their applications. These markers allow for the precise identification of critical code segments during performance analysis. For details, see Table 1 C/C++ mstx API List and Table 2 Python mstx API List. For further details about the API usage, see MindStudio mstx API Reference.

Table 1 C/C++ mstx API List

API Description msOpProf Support
mstxRangeStartA Marks the beginning of a specific mstx range. Supported
mstxRangeEnd Marks the end of a specific mstx range. Supported

Table 2 Python mstx API List

API Description msOpProf Support
mstx.range_start Marks the beginning of a specific mstx range. Supported
mstx.range_end Marks the end of a specific mstx range. Supported

mstx API Usage

  • msOpProf allows users to use the mstx API to tune specific operators, customize the start time and end time of the code segment or specified key functions, identify key functions or computing APIs, and quickly demarcate performance issues.

  • The mstx API is disabled by default. If the mstx API is called in the application, the mstx instrumentation function is enabled based on the actual application scenario. For example, the --mstx=on flag enables mstx APIs within the user program, while --mstx-include can be used to target specific mstx APIs. For detailed usage, refer to the --mstx and --mstx-include parameters in the "Command Reference" sections of the msopprof User Guide and the msopprof Simulator Mode User Guide.

  • The mstx API can be used via library files or header files. An implementation example can be found at this link:

    NOTE

    • This sample project does not support Atlas A3 training products.
    • Replace ${INSTALL_DIR} with the file storage path after CANN is installed. For example, if the installation is performed by the root user, the default file storage path is /usr/local/Ascend/cann.
    • Add the libms_tools_ext.so library file located at ${INSTALL_DIR}/lib64/libms_tools_ext.so to the CMakeLists.txt file at ${git_clone_path}/samples/operator/ascendc/0_introduction/1_add_frameworklaunch/AclNNInvocation/src/CMakeLists.txt.

      # Header path
      include_directories(
           ...
          ${CUST_PKG_PATH}/include
      )
      ...
      target_link_libraries( 
          ...
          dl
      )
      
    • In the main.cpp file at ${git_clone_path}/samples/operator/ascendc/0_introduction/1_add_frameworklaunch/AclNNInvocation/src/main.cpp, compile and link the user program with the dl library. The corresponding header file ms_tools_ext.h is located at ${INSTALL_DIR}/include/mstx.

      ...
      #include "mstx/ms_tools_ext.h"
      ...
      

Example

After msOpProf is started, run the msprof op --mstx=on --mstx-include=range1 --launch-count=2 python cal.py command. This command will profile the operators defined within the range1 scope, specifically the sub and mul operators.

import mstx
import torch
import torch_npu
 
x = torch.Tensor([1,2,3,4]).npu()
y = torch.Tensor([1,2,3,4]).npu()

a = x + y
range1_id = mstx.range_start("range1", None)
b = a - x
c = a * x
mstx.range_end(range1_id)
range2_id = mstx.range_start("range2", None)
d = x / y
range3_id = mstx.range_start("range3", None)
e = torch.abs(y)
mstx.range_end(range3_id)
f = x + e
mstx.range_end(range2_id)