msOpProf Usage Scenarios
Collecting Performance Data of Kernel Launch Ascend C Operators
Overview
This section demonstrates how to use the msOpProf tool to collect performance data of Ascend C operators in kernel launch mode, using the kernel launch operator <<<>>> invocation as an example.
For details about the kernel launch scenario, see the Kernel Launch Operator Development chapter in the Ascend C Operator Development Guide.
Preparations
-
Click the Add sample to obtain the sample project. The Add vector addition operator is used as an example.
git clone https://gitcode.com/cann/asc-devkit.git -b 9.0.0 -
Refer to the Preparations section of the msOpProf Mode User Guide and the Preparations section of the msOpProf Simulator Mode User Guide to configure the required environment variables for on-board and simulator tuning data collection.
Procedure
-
Based on the instructions in the sample project, build an operator executable file that can run on Ascend devices. After the compilation is complete, the executable file
addis generated in the project directory.mkdir -p build && cd build; # Create and enter the build directory. cmake ..;make -j; # Build the project.NOTE
The executable file name (
add) in this example is for reference only. Use the actual compiled file name in the current project. -
Run the following command to collect msopprof on-board performance data and refined tuning data. You can also specify other command parameters by referring to the msopprof Mode Command Reference.
msprof op add -
Modify the build file
CMakeLists.txtin the sample project to build an operator executable file that can run on the simulator. After the compilation is complete, the executable fileadd_simis generated in the project directory.target_compile_options(add_sim PRIVATE $<$<COMPILE_LANGUAGE:ASC>:--npu-arch=dav-XXXX> # Select the corresponding 'npu-arch' parameter based on the actual deployed NPU hardware architecture. -g # This compilation option is required for features such as the code hot spot map. -O2 ) target_link_directories(add_sim PRIVATE $ENV{ASCEND_HOME_PATH}/tools/simulator/Ascendxxxyy/lib ) target_link_libraries(add_sim PRIVATE runtime_camodel npu_drv ) -
Run the following command to collect msopprof simulator performance data, pipeline chart data, and hot spot map data. You can also specify other command parameters by referring to the msopprof Simulator Mode Command Reference.
NOTE
The value of the parameter
--soc-versioncan be obtained by running the following command:python3 -c "import acl; print(acl.get_soc_name())".msprof op simulator --soc-version=Ascendxxxyy add_sim -
If the following message is displayed, the operator performance data collection is successful.
[INFO] Profiling running finished. All task success. -
View the on-board and simulation performance data of the operator. You can import the generated
visualize_data.binfile into MindStudio Insight. For details about the import operation, see the Importing Profile Data chapter in the MindStudio Insight User Guide.NOTE
The performance data files obtained in other operator invocation scenarios can be viewed in the same way.
Collecting Performance Data of Single-Operator API Calls
Overview
This section demonstrates how to use the msOpProf tool to collect performance data of single-operator API calls, using a custom operator project and the aclnn single-operator API invocation as an example.
For details about the single-operator API invocation scenario, see the Project-based Operator Development > Single-Operator API Invocation chapter in the Ascend C Operator Development Guide.
Preparations
-
Click the custom operator project sample to obtain the custom operator project.
-
Refer to the Preparations section of the msOpProf Mode User Guide and the Preparations section of the msOpProf Simulator Mode User Guide to configure the required environment variables for on-board and simulator tuning data collection.
Procedure
-
Based on the sample project description, compile, package, and deploy the custom operator.
mkdir -p build && cd build cmake .. && make -j binary package ./custom_opp_*.run -
Based on the aclnn single-operator API invocation sample, build the operator executable file. After the compilation is complete, the executable file
execute_add_opis generated in the project directory. This file can run on both Ascend devices and the simulator.mkdir -p build; cd build cmake .. && make -j -
Run the following command to collect msopprof on-board performance data and refined tuning data.
msprof op execute_add_op -
Run the following command to collect msopprof simulator performance data, pipeline chart data, and hot spot map data.
NOTE
The value of the parameter
--soc-versioncan be obtained by running the following command:python3 -c "import acl; print(acl.get_soc_name())".msprof op simulator --soc-version=Ascendxxxyy execute_add_op
Collecting Performance Data of PyTorch Framework Operators
Overview
For the scenario of single-operator invocation through the PyTorch framework, for details, see the OpPlugin in the Ascend-developed Plugins chapter of the Ascend Extension for PyTorch Suite and Third-party Library Support List.
The procedure for collecting performance data in the PyTorch framework operator invocation scenario is basically the same as that in the Collecting Performance Data of Triton Operators scenario.
Collecting Performance Data of Triton Operators
Overview
This section demonstrates how to use the msOpProf tool to collect performance data of Triton operators.
Preparations
-
Click Triton-Ascend Quick Start to complete the installation and configuration of Triton and the Triton-Ascend plugin.
-
Prepare the Triton operator implementation file. If you have not prepared a Triton operator, refer to the example in the procedure.
-
Refer to the Preparations section of the msOpProf Mode User Guide and the Preparations section of the msOpProf Simulator Mode User Guide to configure the required environment variables for on-board and simulator tuning data collection.
Procedure
-
Prepare a basic Triton operator sample file
test_add.py.import torch import torch_npu import triton import triton.language as tl M : tl.constexpr = 128 N : tl.constexpr = 32 @triton.jit def add_kernel(output_ptr, x_ptr, y_ptr): offsets = tl.arange(0, M * N) x = tl.load(x_ptr + offsets) y = tl.load(y_ptr + offsets) output = x + y tl.store(output_ptr + offsets, output) z = torch.randn((M, N), dtype=torch.float32).npu() res = torch.empty_like(z) add_kernel[8, 1, 1](res, z, z) -
Run the following command to collect msopprof on-board performance data and refined tuning data.
msprof op python3 test_add.py -
Run the following command to collect msopprof simulator performance data, pipeline chart data, and hot spot map data.
NOTE
The value of the parameter
--soc-versioncan be obtained by running the following command:python3 -c "import acl; print(acl.get_soc_name())".msprof op simulator --soc-version=Ascendxxxyy python3 test_add.pyNOTE
This sample operator has removed the redundant computation of non-Triton operators, retaining only one Triton operator
add_kernelfor which the simulation performance needs to be collected. This can greatly reduce the overall simulation runtime. Even when--kernel-nameis specified, the simulator still runs operators sequentially. Therefore, you are advised to minimize the number of unnecessary operators before simulation.
Collecting Performance Data of Catlass Operators
Overview
This section demonstrates how to use the msOpProf tool to collect performance data of Catlass operators.
Preparations
-
Click the Catlass Community to obtain the sample project.
git clone https://gitcode.com/cann/catlass.git -b v1.5.0 -
Refer to the Preparations section of the msOpProf Mode User Guide and the Preparations section of the msOpProf Simulator Mode User Guide to configure the required environment variables for on-board and simulator tuning data collection.
Procedure
-
Follow the example in the Catlass Quick Start guide to prepare the environment and compile the on-board operator executable file. The
basic_matmulsample is used as an example.bash scripts/build.sh 00_basic_matmul -
Run the following command to collect msopprof on-board performance data and refined tuning data.
# Switch to the build output directory. cd output/bin # ./00_basic_matmul m n k [deviceId] msprof op ./00_basic_matmul 256 512 1024 0 -
Add the
--simulatoroption to the build script to compile the operator simulation executable file, and load the simulator binary path as prompted.bash scripts/build.sh --simulator 00_basic_matmul # Execute the following commands based on the actual output after compilation: export LD_LIBRARY_PATH=/usr/local/Ascend/ascend-toolkit/latest/tools/simulator/Ascendxxxyy/lib:$LD_LIBRARY_PATH export LD_PRELOAD=/usr/local/Ascend/ascend-toolkit/latest/tools/simulator/Ascendxxxyy/lib/libruntime_camodel.so:/usr/local/Ascend/ascend-toolkit/latest/tools/simulator/Ascendxxxyy/lib/libnpu_drv_camodel.so -
Run the following command to collect msopprof simulator performance data, pipeline chart data, and hot spot map data.
NOTE
The value of the parameter
--soc-versioncan be obtained by running the following command:python3 -c "import acl; print(acl.get_soc_name())".# Switch to the build output directory. cd output/bin # Executable file name |m axis|n axis|k axis|Device ID (optional) msprof op simulator --soc-version=Ascendxxxyy ./00_basic_matmul 256 512 1024 0
Collecting Performance Data of MC2 Operators
Overview
This section demonstrates how to use the msOpProf tool to tune an MC2 operator on the board and generate a communication and computing pipeline chart.
This example uses AscendCL single-operator call as an example. For other invocation scenarios, see the Ascend C Operator Development Guide.
Preparations
- Complete the development of the MC2 operator.
- Refer to the Preparations section of the msOpProf Mode User Guide to configure the required environment variables.
Procedure
-
Refer to Operator Compilation and Deployment to compile and deploy the operator.
-
Add the following compilation options to the
CMakeLists.txtfile in theop_kerneldirectory of the operator build file to enable the AIC instrumentation and code line mapping functions of the MC2 operator.add_ops_compile_options(ALL OPTIONS -DASCENDC_TIME_STAMP_ON -g) -
Go to the custom operator project directory and compile and deploy the operator.
./build_out/custom_opp_<target_os>_<target_architecture>.run
-
-
Use msOpProf to collect the performance data of the MC2 operator.
msprof op --output=$HOME/projects/output $HOME/projects/MyApp blockdim 1 # --output is an optional parameter. $HOME/projects/MyApp is the application. blockdim 1 is an optional parameter of the user application. -
The following directory structure and performance data files are generated. For details, see the msOpProf Mode User Guide.
-
Import the
trace.jsonorvisualize_data.binfile into MindStudio Insight for visual presentation. For details, see the Computing Memory Heatmap, Communication and Computing Pipeline Chart, and Roofline Bottleneck Analysis Chart sections in the msOpProf Mode User Guide.