Extended Functions
JSON Configuration File Description
Write the JSON file for operator definition. For details about the parameters, see Table 1 Parameters in the JSON file and Table 2 test_cases_parameters.
For example, the JSON configuration file is named add_test.json. Developers can modify test data and other configuration parameters based on this template.
{
"kernel_name": "add_custom",
"kernel_path": "./add_custom.o",
"blockdim": 8,
"mode": "ca",
"device_id": 0,
"magic": "RT_DEV_BINARY_MAGIC_ELF_AIVEC",
"test_cases": [
{
"case_name": "Test_AddCustom_001",
"param_desc": [
{
"param_type": "input",
"type": "float16",
"shape": [
8,
2048
],
"data_path": "./input_x.bin",
"name": "x"
},
{
"param_type": "input",
"type": "float16",
"shape": [
8,
2048
],
"data_path": "./input_y.bin",
"name": "y"
},
{
"param_type": "output",
"type": "float16",
"shape": [
8,
2048
],
"name": "z"
},
{
"param_type": "workspace",
"user_workspace_size": 4096
},
{
"param_type": "tiling",
"tiling_data_size": 8,
"tiling_data_path": "./tiling.bin"
}
]
}
]
}
Table 1 Parameters in the JSON file
| Parameter | Description | Type | Mandatory |
|---|---|---|---|
| kernel_name | Kernel function name. | string | Yes |
| kernel_path | Path of the binary .o file of the kernel function. The path can be either absolute or relative. | string | Yes |
| blockdim | Number of cores required for running the kernel function. The default value is 1. | int | No |
| mode | Test mode. - Onboard: onboard - Performance simulation: ca |
string | Yes |
| device_id | ID of the AI processor used for running. The default value is 0. | int | No |
| tiling_key | Tiling key of the current dynamic operator. | uint64 | No |
| magic | Operator type. - Cube operator: RT_DEV_BINARY_MAGIC_ELF_AICUBE - Vector operator: RT_DEV_BINARY_MAGIC_ELF_AIVEC - Mixed fusion operator: RT_DEV_BINARY_MAGIC_ELF (only for Atlas A3 training products, Atlas A3 inference products, Atlas A2 training products, and Atlas A2 inference products) |
string | Yes |
| test_cases | Test data. This can be a list, with each element containing a test case. For details, see Table 2 test_cases parameters. | list | Yes |
NOTE
- The tiling_key parameter applies only to dynamic operators.
- For Atlas inference products, the magic parameter must be set to RT_DEV_BINARY_MAGIC_ELF.
- For operator on-board or simulation tuning, only one case can be configured for the test_cases parameter.
Parameter |
Note |
Type |
Mandatory |
||
|---|---|---|---|---|---|
case_name |
- |
- |
Test case name, which must be unique. |
string |
Yes |
param_desc |
- |
- |
Test case description. This can be a list, with each element representing a kernel function parameter. |
list |
Yes |
- |
param_type |
input/output/workspace/tiling/fftsAddr |
Parameter type. |
string |
Yes |
- |
type |
- |
Supported input and output data types, such as **uint8**, **int16**, **int32**, **float16**, **float32** and **float**. This parameter is mandatory when **param_type** is set to **input** or **output**. |
string |
No |
- |
shape |
- |
Shapes supported by the input and output tensors. All input and output tensors must support the same number of shapes. For example, **[8, 3, 256, 256]**. If an invalid shape is entered, for example, **[0]**, an error is reported. This parameter is mandatory when **param_type** is set to **input** or **output**. |
list |
No |
- |
data_path |
- |
Path of the input data .bin file.
|
string |
No |
- |
name |
- |
Parameter name, which must be unique. This parameter is mandatory when **param_type** is set to **input** or **output**. |
string |
No |
- |
user_workspace_size |
- |
Size of **workspace** set by the user. This parameter is mandatory when **param_type** is set to **workspace**. |
int |
No |
- |
tiling_data_size |
- |
Size of **tiling** data. This parameter is mandatory when **param_type** is set to **tiling**. |
int |
No |
- |
tiling_data_path |
- |
Path of the tiling data .bin file. This parameter is mandatory when **param_type** is set to **tiling**. |
string |
No |
- |
data_size |
- |
Size of **fftsAddr** data. This parameter is mandatory when **param_type** is set to **fftsAddr**. |
int |
No |
[!NOTICE]NOTICE
- The number of parameter values in output must be the same as that in input. Otherwise, test case generation fails. For example, if input supports two types, output must also support two types. Similarly, the number of values of type, shape, or value_range in each input or output must be the same.
- The number of parameter values in each input of an operator must be the same. Otherwise, test case generation fails. The number of values of type, shape, and value_range in each input must be the same.
mstx Extended Functions
mstx API Overview
MindStudio provides the mstx profiling API, which enables users to embed custom markers within their applications. These markers allow for the precise identification of critical code segments during performance analysis. For details, see Table 1 C/C++ mstx API List and Table 2 Python mstx API List. For further details about the API usage, see MindStudio mstx API Reference.
| API | Description | msOpProf Support |
|---|---|---|
| mstxRangeStartA | Marks the beginning of a specific mstx range. | Supported |
| mstxRangeEnd | Marks the end of a specific mstx range. | Supported |
| API | Description | msOpProf Support |
|---|---|---|
| mstx.range_start | Marks the beginning of a specific mstx range. | Supported |
| mstx.range_end | Marks the end of a specific mstx range. | Supported |
mstx API Usage
-
msOpProf allows users to use the mstx API to tune specific operators, customize the start time and end time of the code segment or specified key functions, identify key functions or computing APIs, and quickly demarcate performance issues.
-
The mstx API is disabled by default. If the mstx API is called in the application, the mstx instrumentation function is enabled based on the actual application scenario. For example, the --mstx=on flag enables mstx APIs within the user program, while --mstx-include can be used to target specific mstx APIs. For detailed usage, refer to the --mstx and --mstx-include parameters in the "Command Reference" sections of the msopprof User Guide and the msopprof Simulator Mode User Guide.
-
The mstx API can be used via library files or header files. An implementation example can be found at this link:
NOTE
- This sample project does not support Atlas A3 training products.
- Replace ${INSTALL_DIR} with the file storage path after CANN is installed. For example, if the installation is performed by the root user, the default file storage path is /usr/local/Ascend/cann.
-
Add the
libms_tools_ext.solibrary file located at${INSTALL_DIR}/lib64/libms_tools_ext.soto theCMakeLists.txtfile at${git_clone_path}/samples/operator/ascendc/0_introduction/1_add_frameworklaunch/AclNNInvocation/src/CMakeLists.txt.# Header path include_directories( ... ${CUST_PKG_PATH}/include ) ... target_link_libraries( ... dl ) -
In the
main.cppfile at${git_clone_path}/samples/operator/ascendc/0_introduction/1_add_frameworklaunch/AclNNInvocation/src/main.cpp, compile and link the user program with thedllibrary. The corresponding header filems_tools_ext.his located at${INSTALL_DIR}/include/mstx.... #include "mstx/ms_tools_ext.h" ...
Example
After msOpProf is started, run the msprof op --mstx=on --mstx-include=range1 --launch-count=2 python cal.py command. This command will profile the operators defined within the range1 scope, specifically the sub and mul operators.
import mstx
import torch
import torch_npu
x = torch.Tensor([1,2,3,4]).npu()
y = torch.Tensor([1,2,3,4]).npu()
a = x + y
range1_id = mstx.range_start("range1", None)
b = a - x
c = a * x
mstx.range_end(range1_id)
range2_id = mstx.range_start("range2", None)
d = x / y
range3_id = mstx.range_start("range3", None)
e = torch.abs(y)
mstx.range_end(range3_id)
f = x + e
mstx.range_end(range2_id)