msOpProf Simulator Mode User Guide
Overview
MindStudio Ops Profiler (msOpProf, an operator tuning tool) is used to collect and analyze the key performance metrics of operators running on AI Processors. Based on the output profile data, you can quickly locate the hardware and software performance bottlenecks of operators, improving the efficiency of operator performance analysis.
Currently, profile data for different file formats (executable files or operator binary .o files) can be collected and automatically parsed in on-board (msopprof) and simulator (msopprof simulator) modes. This document describes how to use the msopprof simulator mode.
Features
msOpProf demonstrates single-operator tuning capabilities such as instruction pipeline chart, operator code hot spot maps, memory channel throughput waveform charts, and profile data files through MindStudio Insight. For details, see Table 1 msopprof simulator mode features.
Table 1 msopprof simulator mode features
| Feature | Link |
|---|---|
| Instruction pipeline chart | Instruction Pipeline Chart |
| Operator code hot spot map | Operator Code Hot Spot Map |
| Memory channel throughput waveform chart | Memory Channel Throughput Waveform Chart |
| Profile data files | msopprof Simulator Profile Data |
Scenarios
The following scenarios are supported. For details, see Collecting Profile Data of Ascend C Operators and Collecting Profile Data of MC2 Operators.
NOTE
Refer to Chip SoC Type Acquisition Method to obtain the chip type, and use it as the value of the --soc-version parameter.
-
Kernel launch operator development: kernel launch
-
In the kernel launch scenario, for details, see Kernel Launch Operator Development in the Ascend C Operator Development Guide.
-
In the kernel launch scenario, configure the prerequisites and then run the following command:
msprof op simulator --soc-version=Ascendxxxyy ./main # main indicates the name of the user operator program, including the program name of the operator to be tuned. xxxyy indicates the type of the processor used by the user. -
If you need to perform simulation-based tuning on an operator that runs on the board without recompilation, perform the following steps:
-
Create a soft link named
libruntime.sopointing tolibruntime_camodel.soin any directory.ln -s /{simulator_path}/lib/libruntime_camodel.so /{so_path}/libruntime.so # For example, if the CANN package is installed in the default path of the root user, simulator_path is /usr/local/Ascend/cann/tools/simulator/Ascendxxxyy. -
Add the parent directory of the created soft link to the environment variable
LD_LIBRARY_PATH.export LD_LIBRARY_PATH={so_path}:$LD_LIBRARY_PATH
-
-
-
Project-based operator development: single-operator API calling
-
In the single-operator API execution scenario, see the Project-based Operator Development > Single-Operator API Execution in the Ascend C Operator Development Guide.
-
In the single-operator API execution scenario, configure the prerequisites and then run the following command:
msprof op simulator --soc-version=Ascendxxxyy ./main # main indicates the name of the user operator program, including the program name of the operator to be tuned. xxxyy indicates the type of the processor used by the user.
-
-
AI framework operator adaptation: PyTorch framework
-
When msOpProf is used for simulated tuning of the operators in the PyTorch script on Atlas inference products, only the Kernels-based operator package calling mode is supported. Refer to the content related to Kernels operator package installation in the Installing CANN of the CANN Software Installation Guide. Install the binary Kernels operator package, and modify the script entry file (such as main
.py) by adding the bold information belowimport torch_nputo ensure that the operators in the Kernels operator package are used.import torch import torch_npu torch_npu.npu.set_compile_mode(jit_compile=False) ...... -
In the single-operator calling scenario through the PyTorch framework, for details, see the OpPlugin in Ascend-developed Plugins of the Ascend Extension for PyTorch Suite and Third-party Library Support List.
-
When the PyTorch framework is used to call a single-operator, configure the prerequisites and then run the following command:
msprof op simulator --soc-version=Ascendxxxyy python a.py # a.py indicates the name of the user operator program, including the program name of the operator to be tuned. xxxyy indicates the type of the processor used by the user.
-
-
Triton operator development: Triton operator calling
- Install and configure Triton and the Triton-Ascend plug-in. For details, see Triton Ascend.
- The Triton operator calling scenario does not apply to Atlas inference products.
Preparations
Preparing the environment
Configure related environment variables by referring to the MindStudio Ops Profiler Installation Guide.
-
To use MindStudio Insight for viewing, install the MindStudio Insight software package separately. For download links, see the MindStudio Insight Installation Guide.
-
For Atlas A2 training products/Atlas A2 inference products, if you want to use the template library for simulation, add the
--simulatoroption to the compilation script to compile the operator in simulator mode. For details, see this link.bash scripts/build.sh --simulator 00_basic_matmul
Constraints
- You are advised to collect profile data within 5 minutes and ensure that the set memory size is greater than 20 GB (for example, container configuration
docker run --memory=20g container_name). - Ensure that the profile data is stored in the current user directory that does not contain soft links. Otherwise, security issues may occur.
Precautions
- msOpProf depends on the msopprof executable file in the CANN package. The API usage in this file is the same as that in msopprof. This file is provided by the CANN package and does not need to be installed separately.
- After you press
CTRL+C, the operator execution stops, and the tool generates a profile data file based on existing information. If you do not need to generate the file, pressCtrl+Cagain. - If the
--outputoption is not specified, ensure that other users do not have the write permission on the upper-level directory of the current path. - Before using msopprof simulator, ensure that the application functions properly.
- Do not initiate more than one profile data collection task on the same device.
- The simulation result of msopprof simulator in the document is for reference only. The actual running status of the operator is subject to the actual simulation data.
- You need to ensure the execution security of executable files or applications.
- You are advised to restrict the operation permission on executable files or applications to avoid privilege escalation risks.
- Avoid high-risk operations (such as deleting files, deleting directories, changing passwords, and running privilege escalation commands) to prevent security risks.
Command Reference
Log in to the operating environment, use msopprof simulator to enable the operator simulation and tuning function, and use the optional simulation parameters and the program to be tuned (blockdim 1) for tuning. For details about the optional simulation parameters, see Table 1 Optional msopprof simulator parameters. Refer to Chip SoC Type Acquisition Method to obtain the chip type, and use it as the value of the --soc-version parameter. An example command is as follows:
msprof op simulator --soc-version=Ascendxxxyy --output=/home/projects/output /home/projects/MyApp/out/main blockdim 1 # --output is an optional parameter, /home/projects/MyApp/out/main is the used app, blockdim 1 is an optional parameter of the user application, and xxxyy is the type of the processor used by the user.
Table 1 Optional msopprof simulator parameters
Optional Parameter |
Description |
Mandatory |
|---|---|---|
--application |
Specifies th e executable file to profile. You are advised to use When using Currently, this command is compatible with |
Yes. Choose one of |
--config |
Specifies the absolute or relative path of the binary file Before operator tuning, you can obtain the operator binary
Ensure that users in the group and other groups do not have the write permission on the JSON file specified by You need to use the
LD_LIBRARY_PATH environment variable to set the simulator type. export LD_LIBRARY_PATH=${INSTALL_DIR}/tools/simulator/Ascendxxxyy/lib:$LD_LIBRARY_PATH // xxxyy indicates the type of the processor used by the user.
|
|
--export |
Specifies a folder containing single-operator simulation results, which will be directly parsed for MindStudio Insight to display the single-core or multi-core instruction pipeline chart of a single operator. Note:
|
|
--kernel-name |
Specifies the operator name to collect. Fuzzy matching using operator name prefixes is supported. If this option is not specified, only data of the first operator scheduled during program running is collected. Note:
|
No |
--launch-count |
Sets the maximum number of operators that can be collected. The default value is 1, and the value is an integer ranging from 1 to 5000. |
No |
--aic-metrics |
Enables operator performance metric collection. The following performance metrics can be collected.
|
No |
--core-id |
This parameter is used when the operators are evenly distributed. You can use The core ID range is [0,49]. If the simulation data of multiple cores needs to be parsed, use vertical bars (|) to combine them. For example,
|
No |
--timeout |
This parameter is applicable to operators with a large amount of data and repetitive computation. Running such operators to completion takes significant time, but partial pipeline data provides sufficient information. Set
The value is an integer ranging from 1 to 2880, in minutes. An example is as follows: msprof op simulator --soc-version=Ascendxxxyy --timeout=1 ./add_custom // xxxyy indicates the type of the processor used by the user. |
No |
--mstx |
Determines whether the operator tuning tool enables the mstx APIs used in the user code program. The default value is When For example: msprof op simulator --soc-version=Ascendxxxyy --mstx=on ./add_custom // xxxyy indicates the type of the processor used by the user.
The mstxRangeStartA and mstxRangeEnd interfaces in the mstx API are supported, allowing for the enabling of operator tuning in specified ranges. For details about parameters, see the mstxRangeStartA and mstxRangeEnd interfaces in the MindStudio mstx API Reference.
|
No |
--mstx-include |
Enables the specified mstx APIs in msOpProf. If this parameter is not set, all mstx APIs used in user code are enabled by default. If this parameter is set, only the specified mstx APIs are enabled. The input of For example: --mstx=on --mstx-include="hello|hi" // This enables only mstx APIs where the message parameter is "hello" or "hi". This parameter must be used with The message can only contain letters, digits, and underscores (_). Use vertical bars (|) to combine multiple messages. |
No |
--soc-version |
Use this parameter or the
|
No |
--output |
Specifies the path for storing the collected performance data, which defaults to the current directory. Ensure that users in the group and other groups do not have the write permission on the parent directory of the path specified by |
No |
--dump |
Specifies whether to generate the dump file of the simulator. The value can be Note:
|
No |
-h, --help |
Outputs help information. |
No |
Usage
msOpProf assists in identifying exceptions in the operator memory, code, and instructions, enabling comprehensive operator tuning. For details about the usage, see Table 1 msopprof simulator functions.
Table 1 msopprof simulator functions
| Scenario | Usage | Displayed Graphs |
|---|---|---|
| It is applicable to the development and debugging phases for detailed simulation tuning, allowing you to analyze operator instructions and code hotspots. | Configure environment variables (such as LD_LIBRARY_PATH) and compilation options (such as -g to generate debugging information) as detailed in msopprof simulator configuration. This enables detailed analysis of operator behavior in a simulated environment. |
Instruction Pipeline Chart Operator Code Hot Spot Map Memory Channel Throughput Waveform Chart |
msopprof simulator configuration
NOTE
The simulation function of the msOpProf tool only supports single-device scenarios and cannot simulate multi-device environments.
Refer to Chip SoC Type Acquisition Method to obtain the chip type, and use it as the value of the --soc-version parameter.
-
Before using msOpProf to perform operator simulation-based tuning in
--configmode, run the following command to configure environment variables:export LD_LIBRARY_PATH=${INSTALL_DIR}/tools/simulator/Ascendxxxyy/lib:$LD_LIBRARY_PATHModify the preceding environment variables based on the actual installation path of the CANN package and the AI processor type.
-
Add the
-gcompilation option to enable the operator code hot spot map and code call stack features.NOTE
- If the
-gcompilation option is added, the generated binary file contains debugging information. You are advised to restrict access to user programs with debugging information to authorized personnel only. - If the functions provided by the llvm-symbolizer component are not used, do not include
-gwhen compiling the program that is input to msOpProf. In this case, msOpProf does not call the functions of the llvm-symbolizer component.
-
For an operator project created by referring to the msOpGen tool, edit the
CMakeLists.txtfile in theop_kerneldirectory of the operator project. For details, see Creating an Operator Project.add_ops_compile_options(ALL OPTIONS -g) -
For a project created by referring to the complete example, for example, the sample here, add the following code to the
cmake/npu_lib.cmakefile in the sample project directory.NOTE
-
This sample project does not support Atlas A3 training products.
-
When downloading the code sample, run the following command to specify the branch version:
git clone https://gitee.com/ascend/samples.git -b v1.9-8.3.RC1
ascendc_compile_options(ascendc_kernels_${RUN_MODE} PRIVATE -g -O2 ) -
-
-
For Triton operators, add
-gby configuring the following environment variable.export TRITON_DISABLE_LINE_INFO=0
-
- If the
-
When msOpProf is used to perform simulation-based tuning on the operator of the PyTorch script, the built-in
printfunction of Python cannot print the variables and values on the device. -
For the simulators of the Atlas A3 training products, Atlas A3 inference products, Atlas A2 training products, and Atlas A2 inference products, if the simulated
blockdimexceeds the number of physical cores during running, the simulator may report the following error. You can resolve this issue by configuring thecore_ostd_numparameter in thepem_config_cloud.tomlfile. The path to thepem_config_cloud.tomlfile is$\{INSTALL\_DIR\}/tools/simulator/Ascendxxxyy/lib/pem_config_cloud.toml.
[ARCH] cube_core_num = 1 vec_core_num = 2 core_ostd_num = 2 # 2 early end 1 normal mode -
When using the msProf tool for operator simulation and tuning on Ascend 950 products, you need to change the
flush_levelparameter in theconfig.jsonfile to theinfolevel. That is, change"flush_level": 3to"flush_level": 2in the file. The path of theconfig.jsonfile is${INSTALL_DIR}/tools/simulator/Ascendxxxyy/lib/config.json.
Startup
Configure msopprof simulator, and then perform the following steps to enable the simulation-based tuning function of the msOpProf tool. The operator tuning tool supports profile data collection and automatic parsing in a simulation environment.
NOTE
-
Currently, msOpProf does not support the
-O0compilation option. -
The collection of MC2 and HCCL operators is not supported in the simulation environment.
-
The number of simulation cores set by the user cannot exceed the number of physical cores.
-
If you only need to focus on the performance of specific operators, invoke the
TRACE_STARTandTRACE_STOPAPIs within a single core on Atlas A3 training products, Atlas A3 inference products, Atlas inference products, Atlas A2 training products, and Atlas A2 inference products. These interfaces are described in the "Operator Debugging APIs" section of the Ascend C Operator Development API. Additionally, add-DASCENDC_TRACE_ONto the compilation configuration file. For details, see the method for adding-DASCENDC_TRACE_ON. Only after this can pipeline chart information for the specified range be generated. For details on the pipeline chart content, see Instruction Pipeline Chart. -
Add
-DASCENDC_TRACE_ONto the compilation configuration file. For details, see the following sample project. For the AddKernelInvocationNeo Operator Project, add the following code to the$\{git\_clone\_path\}/samples/operator/ascendc/0\_introduction/3\_add\_kernellaunch/AddKernelInvocationNeo/cmake/npu\_lib.cmakefile.ascendc_compile_definitions ( ... -DASCENDC_TRACE_ON )
-
Log in to the operating environment. Use msopprof simulator to start operator simulation and tuning, combined with the optional simulation parameters and the program to be tuned (
app [arguments]). For details about the optional simulation parameters, see Command Reference. You can use either of the following methods for operator simulation-based tuning:-
Based on an executable file
-
Single-operator scenario (using
testas an example)NOTE
The executable file name
testin the example is for demonstration only. Use the actual name of the executable file generated by compilation in the current project.msprof op simulator --soc-version=Ascendxxxyy --output=./output_data ./test # xxxyy indicates the type of the processor used by the user. -
Multi-operator scenario
If the
testexecutable containsAdd,MatMul, andSuboperators, you can use--launch-countand--kernel-nameto specify collecting data for theAddandSuboperators only.msprof op simulator --soc-version=Ascendxxxyy --launch-count=10 --kernel-name="Add|Sub" --output=./output_data ./test # xxxyy indicates the type of the processor used by the user. ./test must be placed at the end of the command.
-
-
Based on a JSON configuration file of the input operator binary file
*.oNOTE
--When using
--config, you can import environment variables only viaLD_LIBRARY_PATH. The--soc-versionparameter is not supported.export LD_LIBRARY_PATH=${INSTALL_DIR}/tools/simulator/Ascendxxxyy/lib:$LD_LIBRARY_PATH # xxxyy indicates the type of the processor used by the user. msprof op simulator --config=./add_test.json --output=./output_data
-
-
After the command is executed, a folder named
OPPROF__{timestamp}___XXX_is generated in the specified--outputdirectory. An example of the folder structure is as follows:-
Collecting data of a single-operator
OPPROF_{timestamp}_XXX ├── dump └── simulator ├── core0.veccore0 // Stores data files for each core in directories named "core*.veccore*" or "core*.cubecore*". │ ├── core0.veccore0_code_exe.csv │ ├── core0.veccore0_instr_exe.csv │ └── trace.json // Simulation instruction pipeline chart file of this core. ├── core0.veccore1 │ ├── core0.veccore1_code_exe.csv │ ├── core0.veccore1_instr_exe.csv │ └── trace.json ├── core1.veccore0 │ ├── core1.veccore0_code_exe.csv │ ├── core1.veccore0_instr_exe.csv │ └── trace.json ├── ... ├── visualize_data.bin └── trace.json // Simulation instruction pipeline chart file for all cores. -
Collecting data of multiple operators
└──OPPROF_{timestamp}_XXX ├── OpName1 // "OpName1" is the name of the operator to be collected. │ ├── 0 // Sequence in which the operator is scheduled. │ │ ├── dump // Folder storing intermediate files, which functions in the same way as in single-operator collection. │ │ ├── simulator // The content is the same as that in the single-operator simulator scenario, but the .csv files in the simulator folder have timestamp suffixes added, for example, core*_code_exe_20240429111143146.csv. │ ├── 1 │ │ ├── dump │ │ └──simulator │ ├── dump // Folder storing intermediate files. ├── OpName2 │ ├── 0 │ │ ├── dump │ │ └── simulator │ ├── dump
Table 2 msopprof simulator files
Name
Description
dump folder
Folder for storing dump data generated by the simulation.
simulator folder (storing analysis results of dump data files)
core*_code_exe.csv
Code line time consumption. The asterisk (*) represents cores 0 to n, allowing for quick identification of the most time-consuming sections of the code. For details, see Code Line Time Consumption Data Files.
core*_instr_exe.csv
Records detailed code instruction information. The asterisk (*) represents cores 0 to n, allowing for quick identification of the most time-consuming instructions. For details, see Code Instruction Information Files.
visualize_data.bin
Visualization file for information such as the simulation pipeline chart and simulation hot spot functions.
trace.json
Simulation instruction pipeline chart file, including sub-files for each core and a summary file for all cores.
-
-
After the
visualize_data.binfile is imported to MindStudio Insight, the instruction pipeline chart, operator code hot spot map, and memory channel throughput waveform chart are displayed. -
After the
trace.jsonfile is imported to the Chrome browser or MindStudio Insight, the instruction pipeline chart and Memory channel throughput waveform chart are displayed.
Instruction Pipeline Chart
Description
Visualizes the visualize_data.bin or trace.json files generated by msopprof simulator. The instruction pipeline chart displays timing relationship by instruction and associates with the call stack to quickly locate bottlenecks.
Precautions
- For detailed MindStudio Insight operations and field explanations, see Timeline in MindStudio Insight Operator Tuning.
- If the
-gcompilation option is added, the generated binary file contains debugging information. You are advised to restrict access to user programs with debugging information to authorized personnel only. - If the functions provided by the llvm-symbolizer component are not used, do not include
-gwhen compiling the program that is input to msOpProf. In this case, msOpProf does not call the functions of the llvm-symbolizer component. - If you only need to focus on the performance of specific operators, invoke the
TRACE_STARTandTRACE_STOPAPIs within a single core on Atlas A3 training products, Atlas A3 inference products, Atlas inference products, Atlas A2 training products, and Atlas A2 inference products. These interfaces are described in the "Operator Debugging APIs" section of the Ascend C Operator Development API. Additionally, add-DASCENDC_TRACE_ONto the compilation configuration file. For details, see the method for adding-DASCENDC_TRACE_ON. Only after this can pipeline chart information for the specified range be generated.
Usage Instructions
The trace.json file can be visualized using either the Chrome browser or MindStudio Insight, while the visualize_data.bin file can be visualized only using MindStudio Insight.
-
Chrome
Enter
chrome://tracingin the address box, drag and drop the instruction pipeline chart file (trace.json) generated by msopprof simulator into the blank area to view the file. Use the keyboard shortcuts to navigate:W(zoom in),S(zoom out),A(pan left), andD(pan right). For details about the key fields, see Table 1 Key fields.Field Description VECTOR Vector unit. SCALAR Scalar unit. Cube Cube unit. MTE1 Pipeline of data transfer from L1 to {L0A/L0B, UBUF}. MTE2 Pipeline of data transfer from {DDR/GM, L2} to {L1, L0A/B, UBUF}. MTE3 Pipeline of data transfer from UBUF to {DDR/GM, L2, L1} or L1 to {DDR/L2}. FIXP Pipeline of data transfer from FIXPIPE L0C to OUT/L1 (displayed only for Atlas A2 training products and Atlas A2 inference products). FLOWCTRL Control flow instruction. CACHEMISS iCache miss. USEMASK Custom instrumentation range. If there are nested ranges within the same USEMASK, or if there is only TRACE_STARTbut noTRACE_STOP, the instruction pipeline chart cannot be drawn correctly.ALL Indicates that instructions in this channel are executed in all channels. PUSHQ VF/SMIT_VF instructions. RVECLP Vector register LOOP instructions. RVECSU Vector register ASU instructions, including jumps and scalar data processing. RVECLD Vector register LOAD instructions. RVECEX Vector register EXECUTE instructions. RVECST Vector register SET instructions. -
MindStudio Insight
Visualizes the generated
trace.jsonorvisualize_data.binfiles.MindStudio Insight provides a timeline view of instruction execution on Ascend AI Processors. You can identify the timing optimization opportunities of micro instructions by analyzing the instruction details, execution times, call stacks of the code associated with the instruction, and synchronization lines between instructions and pipelines. By observing pipeline arrangements on the timeline, you can identify potential performance issues during operator execution, such as ineffective parallelization between instructions.
Figure 1 Timeline page

- Shows the execution duration of each instruction within each pipeline and the instruction dependencies across different pipelines, helping you to identify potential performance optimization opportunities of pipelines.
- Associates pipeline instruction information with code to guide you through code-based pipeline layout optimization.
- Displays the data transfer volume for instructions related to GM in the selected details.
Operator Code Hot Spot Map
Description
Visualizes the visualize_data.bin files generated by msopprof simulator. On the page, you can view the mapping between operator source code and instructions, as well as the time consumption. This helps developers identify hot spot code distribution and analyze the feasibility of hot spot function optimization.
Precautions
- For detailed MindStudio Insight operations and field explanations, see Source in MindStudio Insight Operator Tuning.
- If the
-gcompilation option is added, the generated binary file contains debugging information. You are advised to restrict access to user programs with debugging information to authorized personnel only. - The operator program must be compiled with the
-goption. Otherwise, msOpProf will not display the hot spot map and will not call the relevant functions of the llvm-symbolizer component to implement code-to-PC mapping. - Operator code hotspot maps cannot be generated for MC2 or LCCL operators.
Usage Instructions
The following figure shows the operator code hotspot map.
Figure 1 msopprof simulator source code page

- On the top of the page, you can switch between compute units and kernel function files.
- The left pane displays the time consumed by each line of code of the operator kernel, register usage, read and write conflicts of vector instructions on the UB Bank, Vector unit usage, and GM-related data transfer along with the number of corresponding instructions, helping developers quickly locate bottlenecks.
- The right pane displays the time consumed by each instruction, register usage, GM-related data transfer, read and write conflicts of vector instructions on the UB Bank, Vector unit usage, execution counts, and code associations, helping developers further analyze the cause of long code execution times.
NOTE
- The maximum number of general-purpose registers is 32. When the number of used registers reaches 32, the simulation can be performed only after the registers in use are released.
- Register usage for certain operators using the
TRACE_STARTandTRACE_STOPAPIs cannot be displayed. - "NA" is displayed if no GM-related unit is involved when Process Bytes is checked.
-
For details about the features supported by msopprof simulator, see Table 1 msopprof simulator hot spot map features.
Table 1 msopprof simulator hot spot map features;
Column Atlas A2 training products/Atlas A2 inference products Atlas A3 training products/Atlas A3 inference products: Atlas inference products Ascend 950 products Description Source Code Supported Supported Supported Supported - Instruction PC Address Supported Supported Supported Supported - Pipeline Supported Supported Supported Supported - Execution Cycles Supported Supported Supported Supported Execution time (cycles) of operator source code and instructions. Execution Count Supported Supported Supported Supported Execution count of operator source code and instructions. GPR Count Supported Supported Supported Not supported Register usage.
Register usage for certain operators using theTRACE_STARTandTRACE_STOPAPIs cannot be displayed.UB Bank Conflict Supported Supported Supported Not supported - Vector Unit Utilization Supported Supported Supported Not supported - Process Bytes Supported Supported Not supported Not supported GM-related data transfer volume. Stall_Cycles (NOP Stall) Not supported Not supported Not supported Supported Ratio chart comparing expected stalls with actual stalls. A stall refers to the waiting time incurred during instruction execution due to resource conflicts, data dependencies, or other reasons.
Memory Channel Throughput Waveform Chart
Description
Visualizes the visualize_data.bin files generated by msopprof simulator. On the page, you can view the statistical analysis of the memory bandwidth of the operator MTE log channel over time, helping you identify the bandwidth usage of the operator during different operator stages and evaluate the feasibility of bandwidth optimization.
Precautions
- For detailed MindStudio Insight operations and field explanations, see Timeline in MindStudio Insight Operator Tuning.
- Memory channel throughput waveform charts can only be displayed for Atlas A2 training products, Atlas A2 inference products, Atlas A3 training products, and Atlas A3 inference products.
- This feature is disabled by default. The
--core-idsetting has no effect on this feature.
Usage Instructions
The following figure shows the memory channel throughput waveform chart.
Figure 1 Memory channel throughput waveform chart

-
Displays the data throughput (in MB/s) for various types of memory channels (currently limited to
GM_TO_L1,GM_TO_TOTAL,GM_TO_UB,L1_TO_GM,TOTAL_TO_GM, andUB_TO_GM). For example,GM_TO_UBrepresents the throughput from GM to UB, whileGM_TO_TOTALrepresents the throughput from GM to each memory unit. -
By combining this with MTE-related instructions, you can observe the throughput during execution of related commands to help identify operator performance issues.
NOTE
- The data used for throughput calculation corresponds to the completion of multiple requests for a specific instruction.
- The throughput waveform may appear within the time range between the start and end of an instruction (inclusive). For example, for an instruction with a duration of 1 to 3 µs, the throughput data might be distributed across three bar charts covering the 1 to 2 µs, 2 to 3 µs, and 3 to 4 µs intervals.