MindStudio Ops Profiler Quick Start
1. Overview
The msOpProf performance analysis tool is used to collect and analyze key performance metrics of operators running on Ascend AI Processors. You can efficiently locate software and hardware performance bottlenecks of operators based on the output performance data, thereby enhancing the overall efficiency of operator performance analysis. This document demonstrates the core functions of msOpProf based on the simple addition operator developed in the introductory tutorial. It helps beginners intuitively experience the efficiency and convenience the tool brings to the operator development process.
1.1 Recommendations
This document assumes that you have completed all operations in Ascend Operator Development Toolchain Quick Start. If you have not done so, complete that guide first for a better learning experience.
1.2 Environment Preparation
Strictly follow the Ascend AI Operator Development Toolchain Learning Environment Installation Guide to complete the environment installation and workspace configuration. Even if you have a similar environment, perform the steps in the guide again to ensure that all dependent components and environment variables are complete and consistent.
2. Procedure
2.1 [Environment] Pre-checking the Runtime Environment
2.1.1 Verifying Installation of Python Dependencies
Run the following command. If "All is OK" is displayed, the required Python packages and their versions meet the specifications:
python3 -c "import numpy, sympy, scipy, attrs, psutil, decorator; from packaging import version; assert version.parse(numpy.__version__) <= version.parse('1.26.4'); print('All is OK')"
If an error occurs, refer to Section 1.2 for correct installation.
2.2 [Prerequisite] Completing Operator Project Preparation
Follow the instructions in Ascend Operator Development Toolchain Quick Start to complete sections 2.1 and 2.3.
2.3 [Tuning] Analyzing Operator Performance (msOpProf)
If operator performance does not meet expectations, you can use msOpProf to collect runtime performance data for in-depth analysis and optimization, ensuring efficient execution across different Ascend hardware platforms. Follow the operations first to experience the effect. You can read the principles later.
2.3.1 Modifying Compilation Options and Re-deploying
1. Modify the compilation options.
In the first line of the CMakeLists.txt file on the kernel side, insert a configuration line to enable debugging information:
cd ~/ot_demo/workspace/src/AddCustom
\cp -f op_kernel/CMakeLists.txt op_kernel/CMakeLists.txt.orig.bak
sed -i "1i\\add_ops_compile_options(ALL OPTIONS -g)" op_kernel/CMakeLists.txt
2. Re-compile and deploy the operator.
bash ./build.sh
MY_OP_PKG=$(find ./build_out -maxdepth 1 -name "custom_opp_*.run" | head -1) && bash $MY_OP_PKG
2.3.2 Starting Collection on On-board Hardware and Simulator
NOTE
Knowledge point: Difference between on-board hardware and simulator collection
On-board hardware: precisely captures real hardware characteristics such as operator execution time, pipe usage, memory bandwidth, and cache behavior, which are often difficult to reproduce with high fidelity on a simulator.
Simulator: provides more complete and stable analysis capabilities in instruction stream tracing and code hot spot location, but has limited simulation accuracy for hardware-related behaviors like memory access latency and bandwidth bottlenecks.
Therefore, combine both methods to leverage their complementary advantages for comprehensive performance diagnosis. If you do not have on-board hardware (NPU) in some scenarios, use the simulator mode for preliminary performance estimation and hotspot analysis.
2.3.2.1 Performance Collection on On-board Hardware
Run the following commands:
cd ~/ot_demo/workspace/src/caller/build
msprof op --output=./msprof_output_npu ./execute_add_op
2.3.2.2 Performance Collection on Simulator
Refer to Chip SoC Type Acquisition Method to obtain the chip type, and use it as the value of the --soc-version parameter.
msprof op simulator --soc-version=Ascendxxxyy --output=./msprof_output_sim ./execute_add_op
2.3.3 Viewing Performance Data Results
The tool generates results in .csv and .bin files in the directory specified by --output. If no error is reported, the execution is successful.
.csv files
For example, you can see the following information after opening the MemoryUB.csv file:
The data shows that the task is equally divided into eight blocks, all of which are scheduled to the Vector Core for execution. For example, the bandwidth of Block 0 (1.02 GB/s) is significantly higher than that of Block 1 (0.77 GB/s). If the difference is too large, it may indicate room for optimization.
| block_id | sub_block_id | aiv_time(us) | aiv_total_cycles | aiv_ub_read_bw_vector(GB/s) | aiv_ub_write_bw_vector(GB/s) |
|---|---|---|---|---|---|
| 0 | vector0 | 7.456666 | 13422 | 1.023164 | 0.511582 |
| 1 | vector0 | 9.914444 | 17846 | 0.769523 | 0.384762 |
| 2 | vector0 | 10.001111 | 18002 | 0.762855 | 0.381427 |
| 3 | vector0 | 9.684444 | 17432 | 0.787799 | 0.393899 |
| 4 | vector0 | 9.747222 | 17545 | 0.782725 | 0.391363 |
| 5 | vector0 | 9.062222 | 16312 | 0.84189 | 0.420945 |
| 6 | vector0 | 9.293889 | 16729 | 0.820904 | 0.410452 |
| 7 | vector0 | 8.658889 | 15586 | 0.881105 | 0.440553 |
.bin files
See the following section.
2.3.4 Visualizing Performance Data Results via MindStudio Insight
The aforementioned .bin file can be opened using the MindStudio Insight tool to visualize various performance views, such as computing memory heatmaps, cache heatmaps, and operator code hot spot maps.
2.3.4.1 Installing MindStudio Insight
Refer to the MindStudio Insight Tool Documentation to install the tool.
2.3.4.2 Viewing with MindStudio Insight
MindStudio Insight is a standalone application after installation. Perform the following operations: click Import Data in the upper left corner, import visualize_data.bin, and then open the Details page to see many detailed charts.
For detailed operations and the specific meanings of the charts, refer to the MindStudio Insight Tool Documentation.
2.3.5 Restoring Modified Files
Run the following commands:
cd ~/ot_demo/workspace/src/AddCustom
\cp -f op_kernel/CMakeLists.txt.orig.bak op_kernel/CMakeLists.txt