Extended Functions
Obtaining Device Information
| Product | Supported |
|---|---|
| Atlas 350 Accelerator Card | √ |
| Atlas A3 training products/Atlas A3 inference products | √ |
| Atlas A2 training products/Atlas A2 inference products | √ |
| Atlas 200I/500 A2 inference products | √ |
| Atlas inference products | √ |
| Atlas training products | √ |
After collecting the profile data, use get_msprof_info.py to obtain device information from the device_{id} or host directory within the PROF_XXX directory. The following table describes the function and installation directory of get_msprof_info.py.
Table 1 Script description
| Script | Function | Directory |
|---|---|---|
| get_msprof_info.py | Obtains device information. | ${INSTALL_DIR}/tools/profiler/profiler_tool/analysis/interface. Replace ${INSTALL_DIR} with the CANN installation directory. If the Toolkit is installed by the root user, the default installation directory is /usr/local/Ascend/cann. |
python3 get_msprof_info.py -dir <dir> [--help]
Table 2 Command-line options
| Option | Mandatory (Yes/No) | Description |
|---|---|---|
| -dir or --collection-dir | Yes | Specifies the directory of collected profile data. In non-cluster scenarios, set this option to the host or device_{id} directory within the PROF_XXX directory. In cluster scenarios, set it to the parent directory of the PROF_XXX directory. |
| -h or --help | No | Displays the help information. This option is used only to obtain usage instructions. |
Examples
-
Log in to the environment where the tool is located as the running user.
-
Go to the directory of
get_msprof_info.py. -
Run
get_msprof_info.py. Example commands are as follows:-
Non-cluster scenarios
python3 get_msprof_info.py -dir /home/1/PROF_000001_20220129014731273_KEDKPORHMAGPGD/device_0 -
Cluster scenarios
python3 get_msprof_info.py -dir /home/1/
-
In non-cluster scenarios, results are displayed as shown in Figure 1. Field descriptions are provided in Table 3. In cluster scenarios, a /query/cluster_info.json file is generated in the directory specified by the -dir option to store node information, as shown in Figure 2. Field descriptions are provided in Table 4.
Figure 1 Device information (non-cluster scenarios)
.png)
Table 3 Fields (non-cluster scenarios)
| Field | Description |
|---|---|
| collection_info | Collection information |
| Collection end time | End time of information collection |
| Collection start time | Start time of information collection |
| Result Size | Result data size (MB) |
| device_info | Device information |
| AI Core Number | Number of AI Cores |
| AI CPU Number | Number of AICPUs |
| Control CPU Number | Number of Control CPUs |
| Control CPU Type | Control CPU type |
| Device Id | Device ID |
| TS CPU Number | Number of TS CPUs |
| host_info | Host information |
| cpu_info | Host CPU information |
| CPU ID | Host CPU ID |
| Name | Host CPU name |
| Type | Host CPU type |
| Frequency | Host CPU frequency |
| Logical_CPU_Count | Number of logical CPUs on the host |
| cpu_num | Number of host CPUs |
| Host Computer Name | Host device name |
| Host Operating System | Host operating system |
| model_info | Model information |
| Device Id | Device ID |
| iterations | Iteration statistics |
| Iteration Number | Number of iterations |
| Model Id | Model ID, which is displayed based on the number of models |
| version_info | Version information |
| analysis_version | Parsing version information |
| collection_version | Collection version information |
| drv_version | Driver version information |
Figure 2 Device information (cluster scenarios)
.png)
Table 4 Fields (cluster scenarios)
| Field | Description |
|---|---|
| Rank Id | Node ID that uniquely identifies a device in cluster scenarios |
| Device Id | Device ID, which is not used as a unique device identifier in cluster scenarios |
| Prof Dir | The PROF_XXX directory on the device with the current Rank ID in cluster scenarios |
| Device Dir | The device_{id} directory in the PROF_XXX directory in cluster scenarios |
| Models | Model information, including all model IDs on the device with the current Rank ID and the number of iterations for each model |
Profile Data File Slicing
| Product | Supported |
|---|---|
| Atlas 350 Accelerator Card | √ |
| Atlas A3 training products/Atlas A3 inference products | √ |
| Atlas A2 training products/Atlas A2 inference products | √ |
| Atlas 200I/500 A2 inference products | √ |
| Atlas inference products | √ |
| Atlas training products | √ |
For a parsed timeline data file in JSON format, the system identifies the duration for opening it on the Google Chrome browser (chrome://tracing) and slices it into a proper number of slice files, so that you can quickly open it. The slicing operation is triggered during profile data export.
Data file slicing attributes are configured using the msprof_slice.json file. The following example shows the content format of the msprof_slice.json file:
{
"slice_switch": "off",
"slice_file_size(MB)": 0,
"strategy": 0
}
The directory for storing the msprof_slice.json file is as follows:
${INSTALL_DIR}/tools/profiler/profiler_tool/analysis/msconfig. Replace ${INSTALL_DIR} with the CANN installation directory. If the Toolkit is installed by the root user, the default installation directory is /usr/local/Ascend/cann.
Table 1 Parameters
| Parameter | Mandatory (Yes/No) | Description |
|---|---|---|
| slice_switch | No | Specifies whether to enable slicing. Valid values: • on: enables slicing.• off: disables slicing.Default value: off.The current maximum slice size is 20 GB. If slicing is enabled and a file exceeds 20 GB, the export will fail. In addition, slicing will not be triggered for files smaller than 200 MB, even if slicing is enabled. Data slicing is disabled by default. To enable it, set this parameter to on in the msprof_slice.json file. Any other value results in the use of the default setting.By default, the system determines whether to perform data slicing based on the time required to open a timeline data file in the Chrome browser. Slicing is triggered if the file opening time exceeds the upper limit. The size of the slice files is controlled by the slice_file_size parameter, and the number of files is controlled by the strategy parameter.The format of the sliced file name is {Module name}_{slice_n}_{timestamp}.json, where slice_n represents the sequence number of the slice. |
| slice_file_size(MB) | No | Specifies the maximum slice file size. The unit is MB. The value is a positive integer greater than or equal to 200. By default, the size of a slice file is not limited. When this parameter is set to a positive integer greater than or equal to 200, the size of each slice file is capped at the specified value. For any other value, file size is unlimited, and only the number of slice files is restricted by the strategy parameter. |
| strategy | No | Specifies the slicing policy. Valid values: • 0: slices files to minimize the number of slices while keeping the file opening time for each file within an acceptable range.• 1: slices files to ensure fast opening time, resulting in more slices.Default value: 0.As file opening time depends on computer performance, exact durations cannot be provided. Typical reference values for file opening time are as follows: • Excessive opening time: ≥ 30s • Acceptable opening time: [10,30) (seconds) • Fast opening time: (0,10) (seconds) The actual opening time varies depending on the device performance. |
Parsing, Querying, and Exporting Profile Data Using msprof.py
Overview
The msProf parsing tool is encapsulated using msprof.py. You can directly use the msProf parsing tool to parse and export the profile data.
Tool Usage Process
To export profile data using msprof.py, perform the following steps:
-
Parse profile data by referring to Parsing Profile Data.
-
(Optional) Query profile data file information by referring to Querying Profile Data File Information.
Perform this step if you need to specify the iteration ID or model ID for parsing. Otherwise, skip it.
-
Export profile data by referring to Exporting Profile Data.
Note
-
Direct parsing, querying, and exporting are not supported on the device for the Atlas 200I/500 A2 inference products in the Ascend RC scenario. The generated
PROF_XXXdirectory must be copied to an environment with the Toolkit package installed. -
msprof.pymust be executed by the user created during installation.
Parsing Profile Data
| Product | Supported |
|---|---|
| Atlas 350 Accelerator Card | √ |
| Atlas A3 training products/Atlas A3 inference products | √ |
| Atlas A2 training products/Atlas A2 inference products | √ |
| Atlas 200I/500 A2 inference products | √ |
| Atlas inference products | √ |
| Atlas training products | √ |
Parses profile data.
None
python3 msprof.py import -dir <dir>
Parameters and Command-line Options
Table 1 Parsing command parameters and options
| Parameter/Option | Mandatory (Yes/No) | Description |
|---|---|---|
| import | Yes | Parses profile data in import mode. When the profile data is parsed using the import method, a new .db file will be regenerated even if one already exists in the raw profile data directory. |
| --cluster | Yes (for cluster scenarios) | Parses and aggregates profile data in cluster scenarios. This option is supported only when the import parameter is specified.The -dir option specifies the parent directory of PROF_XXX. The parsing result is stored in the sqlite directory generated under PROF_XXX. |
| -dir or --collection-dir | Yes | Specifies the directory of the collected profile data. It must be PROF_XXX or its parent directory, such as /home/profiler_data/PROF_XXX. |
| -h or --help | No | Displays the help information. This option is used only to obtain usage instructions. |
-
Log in to the environment where the Toolkit package is installed.
-
Go to the directory of
msprof.py.${INSTALL_DIR}/tools/profiler/profiler_tool/analysis/msprof. Replace${INSTALL_DIR}with the CANN installation directory. If the Toolkit is installed by therootuser, the default installation directory is/usr/local/Ascend/cann. -
Parse the profile data.
python3 msprof.py import -dir /home/profiler_data/PROF_XXX
After the commands are executed and parsing is complete, a sqlite directory containing a .db file is generated in the device_{id} and host subdirectories of PROF_XXX. This .db file stores intermediate results and can be ignored.
To export final timeline data or .db files, proceed to Exporting Profile Data.
Querying Profile Data File Information
| Product | Supported |
|---|---|
| Atlas 350 Accelerator Card | √ |
| Atlas A3 training products/Atlas A3 inference products | √ |
| Atlas A2 training products/Atlas A2 inference products | √ |
| Atlas 200I/500 A2 inference products | √ |
| Atlas inference products | √ |
| Atlas training products | √ |
Queries profile data file information, including the iteration ID and model ID.
Before you run the query command, run the import command to parse the profile data. Otherwise, the query result is meaningless.
python3 msprof.py query -dir <dir>
Table 1 Command-line options for querying profile data
| Option | Mandatory (Yes/No) | Description |
|---|---|---|
| -dir or --collection-dir | Yes | Specifies the directory of the collected profile data. It must be PROF_XXX or its parent directory, such as /home/profiler_data/PROF_XXX. |
| --data-type | No | Specifies the data type. This option is used for interconnection with MindStudio and does not need to be specified. Valid values: • 0: cluster data. You can query whether the current data is collected in a cluster scenario.• 1: iteration trace data, which is detailed data of each iteration, including the FP/BP elapsed time, iteration refresh lag, and iteration interval.• 2: compute volume, which is the number of floating-point operations on AI Core.• 3: data preparation, including sending training data to the device and reading it on the device.• 4: parallelism tuning suggestions.• 5: parallelism data, including the pure communication duration and computation duration.• 6: slow communication rank and link data and tuning suggestions.• 7: communication matrix data and tuning suggestions.• 8: CPU and memory performance metrics of the host-side system and processes.• 9: communication duration with critical path analysis enabled.• 10: communication matrix with critical path analysis enabled. |
| --id | No | Specifies the rank ID of a cluster node in cluster scenarios, and device ID in non-cluster scenarios. This option is used for interconnection with MindStudio and does not need to be specified. |
| --model-id | No | Specifies the model ID. This option is used for interconnection with MindStudio and does not need to be specified. |
| --iteration-id | No | Specifies the iteration ID for graph-based statistics collection. The iteration ID is incremented by 1 each time a graph is executed. When a script is compiled into multiple graphs, the iteration ID is different from the step ID at the script layer. Default value: 1.This option is used for interconnection with MindStudio and does not need to be specified. |
| -h or --help | No | Displays the help information. This option is used only to obtain usage instructions. |
-
Log in to the environment where the Toolkit package is installed.
-
Go to the directory of
msprof.py.${INSTALL_DIR}/tools/profiler/profiler_tool/analysis/msprof. Replace${INSTALL_DIR}with the CANN installation directory. If the Toolkit is installed by therootuser, the default installation directory is/usr/local/Ascend/cann. -
To query the profile data information, run the following command:
python3 msprof.py query -dir /home/profiler_data/PROF_XXX
After the command for querying profile data is executed, the results will be printed and displayed.
Table 2 describes the information obtained through the query function of the msProf tool.
Table 2 Profile data file information
| Field | Description |
|---|---|
| Job Info | Job name |
| Device ID | Device ID |
| Dir Name | Directory name |
| Collection Time | Data collection time |
| Model ID | Model ID |
| Iteration Number | Total number of iterations |
| Top Time Iteration | Top five iterations with the longest durations |
| Rank ID | Node ID in the cluster scenario |
Exporting Profile Data
| Product | Supported |
|---|---|
| Atlas 350 Accelerator Card | √ |
| Atlas A3 training products/Atlas A3 inference products | √ |
| Atlas A2 training products/Atlas A2 inference products | √ |
| Atlas 200I/500 A2 inference products | √ |
| Atlas inference products | √ |
| Atlas training products | √ |
Exports profile data.
Before exporting profile data, you need to parse profile data.
-
Export timeline data and DB files
python3 msprof.py export timeline -dir <dir> [-reports <reports_sample_config.json>] [--model-id <model-id>] [--iteration-id <iteration_id>] [--iteration-count <iteration_count>] [--clear] -
Export summary data and DB files
python3 msprof.py export summary -dir <dir> [--model-id <model-id>] [--iteration-id <iteration_id>] [--iteration-count <iteration_count>] [--format <export_format>] [--clear] -
Export DB files
python3 msprof.py export db -dir <dir>
Table 1 Command-line options for exporting profile data
| Option | Mandatory (Yes/No) | Description |
|---|---|---|
| -dir or --collection-dir | Yes | Specifies the directory of the collected profile data. It must be PROF_XXX or its parent directory, such as /home/HwHiAiUser/profiler_data/PROF_XXX. |
| -reports | No | Specifies a custom reports_sample_config.json configuration file to export the corresponding profile data based on the scope specified in the file. The parameter implementation is the same as that of msprof --reports. For details, see Example (--reports Option). |
| --model-id | No | Specifies the model ID. The value must be a positive integer. This option must be specified in combination with --iteration-id to export the profile data of a specified compute iteration in the model. If neither --model-id nor --iteration-id is specified, all profile data is exported by default.• For Atlas A2 training products/Atlas A2 inference products as well as Atlas A3 training products/Atlas A3 inference products, --model-id can be set to 4294967295, which specifies the step mode. That is, the value of --iteration-id specifies parsing by step. Only profile data of the MindSpore framework (version 2.3 or later) can be parsed.• If --model-id is set to other values, this option specifies the iteration ID for graph-based statistics collection. (The iteration ID is incremented by 1 each time a graph is executed. When a script is compiled into multiple graphs, the iteration ID is different from the step ID at the script layer.) |
| --iteration-id | No | Specifies the iteration ID. The value must be a positive integer. This option must be specified in combination with --model-id to export the profile data of a specified compute iteration in the model. If neither --model-id nor --iteration-id is specified, all profile data is exported by default.• For Atlas A2 training products/Atlas A2 inference products, as well as Atlas A3 training products/Atlas A3 inference products, --model-id can be set to 4294967295, which specifies the iteration ID for step-based statistics collection. The iteration ID is incremented by 1 each time a step is executed. Only profile data of the MindSpore framework (version 2.3 or later) can be parsed.• If --model-id is set to other values, this option specifies the iteration ID for graph-based statistics collection. (The iteration ID is incremented by 1 each time a graph is executed. When a script is compiled into multiple graphs, the iteration ID is different from the step ID at the script layer.) |
| --iteration-count | No | Specifies the number of consecutive iterations for which data will be exported. The value ranges from 1 to 5. The value of --iteration-id is used as the starting step. For example, if --iteration-count is 3 and --iteration-id is 1, data for steps 1, 2, and 3 will be exported. |
| --format | No | Specifies the format of the exported summary data file. The value can be csv (default) or json. This option is supported only when the summary parameter is configured.This document uses CSV files as examples for all summary file descriptions. |
| --clear | No | Sets the data clearance mode. After this option is enabled, the sqlite directory in PROF_XXX/device_{id} is deleted (after profile data is exported) to save storage space. When this option is specified, the data clearance mode is enabled. This option is not specified by default. |
| -h or --help | No | Displays the help information. This option is used only to obtain usage instructions. |
-
Log in to the environment where the Toolkit package is installed.
-
Go to the directory of
msprof.py.${INSTALL_DIR}/tools/profiler/profiler_tool/analysis/msprof. Replace${INSTALL_DIR}with the CANN installation directory. If the Toolkit is installed by therootuser, the default installation directory is/usr/local/Ascend/cann. -
Export profile data. The timeline, summary, and DB files can be exported. The command formats are as follows:
-
Export timeline data and DB files
python3 msprof.py export timeline -dir /home/HwHiAiUser/profiler_data/PROF_XXX -
Export summary data and DB files
python3 msprof.py export summary -dir /home/HwHiAiUser/profiler_data/PROF_XXX -
Export the DB file to generate a .db file (
msprof_timestamp.db) that aggregates all profile data.python3 msprof.py export db -dir /home/HwHiAiUser/profiler_data/PROF_XXX
Note
- By default, all profile data is exported.
- In single-operator scenarios or scenarios where only Ascend AI Processor system data is collected (that is, the
--applicationoption is not specified in themsprofdata collection command), the--iteration-idand--model-idoptions are not supported.
-
After the preceding command is executed, the mindstudio_profiler_output directory and msprof_*.db file are generated in the PROF_XXX directory under the --collection-dir directory.
The following examples show the directory structure of the generated profile data:
-
Single-process collection
└── PROF_XXX ├── device_0 │ └── data ├── device_1 │ └── data ├── host │ └── data ├── msprof_*.db └── mindstudio_profiler_output ├── msprof_{timestamp}.json ├── step_trace_{timestamp}.json ├── xx_*.csv ... └── README.txt -
Multi-process collection
└── PROF_XXX1 ├── device_0 │ └── data ├── host │ └── data ├── msprof_*.db └── mindstudio_profiler_output ├── msprof_{timestamp}.json ├── step_trace_{timestamp}.json ├── xx_*.csv ... └── README.txt └── PROF_XXX2 ├── device_1 │ └── data ├── host │ └── data ├── msprof_*.db └── mindstudio_profiler_output ├── msprof_{timestamp}.json ├── step_trace_{timestamp}.json ├── xx_*.csv ... └── README.txt
Note
- In multi-device scenarios, if a single collection process is started, only one
PROF_XXXdirectory is generated. If multiple processes are started, multiplePROF_XXXdirectories are generated. The device directories are created within thesePROF_XXXdirectories. The specific number of device directories perPROF_XXXdepends on the actual user operations and does not affect profile data analysis. - For details about profile data, see Profile Data File References.
- The files in the
mindstudio_profiler_outputdirectory are generated based on the actual profile data. If specific data files are not collected, the corresponding timeline and summary data will not be exported. - You can run the
exportcommand to directly export summary reports from the profile data parsing result. Even if the profile data has not been parsed, running theexportcommand separately will parse the data and export the result files. - If the msProf collection process is forcibly interrupted, the tool saves the raw profile data already collected, which can still be parsed and exported using the
exportcommand.
Performance Tuning Suggestions
Note
This function provides tuning suggestions after msProf parses the profile data and is no longer being updated. For more advanced profile data analysis and tuning suggestions, see msprof-analyze.
Note
For details about Ascend product models, see Ascend Product Models.
| Product | Supported |
|---|---|
| Atlas 350 Accelerator Card | √ |
| Atlas A3 training products/Atlas A3 inference products | √ |
| Atlas A2 training products/Atlas A2 inference products | √ |
| Atlas 200I/500 A2 inference products | √ |
| Atlas inference products | √ |
| Atlas training products | √ |
In cluster or multi-rank communication scenarios, performance tuning suggestions will be output to the screen after the profile data export command is executed. The details are as follows:
-
Obtain optimization suggestions based on communication duration analysis.
Since collective communication operators are executed synchronously, any slow nodes in the cluster will drag down the performance of the entire cluster due to the bottleneck effect.
Optimization principles:
-
Check whether there is a rank in an iteration with a
Wait Time Ratiogreater than the threshold (0.2):- If yes, a communication bottleneck exists in this iteration. For more information, see 1.2.
- If no, it can be preliminarily determined that no communication bottleneck exists in this iteration. Proceed to check the overall bandwidth usage.
-
Identify the rank with the maximum
Wait Time Ratioand check whether itsSynchronization Time Ratio Before Transitexceeds the threshold (0.2):- If yes, a slow rank exists (the rank with the smallest
Wait Time Ratio). Check its forward and backward calculation time. If this time is significantly longer than that of other cards, check for load imbalances or processor faults. If the calculation time is consistent with other ranks, check the data preprocessing time. - If no, the links are abnormal. In this case, check for link failures or cases where the communication volume is too low.
- If yes, a slow rank exists (the rank with the smallest
Note
- Wait Time Ratio = Wait Time/(Wait Time + Transit Time). A higher
Wait Time Ratioindicates that the wait duration of the rank accounts for a larger portion of the total communication duration, resulting in lower communication efficiency. - Synchronization Time Ratio Before Transit = Synchronization Time/(Synchronization Time + Transit Time).
Synchronization Timerefers to the synchronization duration before the first data transmission. A higherSynchronization Time Ratio Before Transitindicates lower communication efficiency and the possible existence of slow ranks.
-
-
Obtain optimization suggestions based on communication matrix analysis.
Slow links in cluster scenarios generally involve the following two cases:
- Some slow links cause increased communication time between a few ranks. Other ranks must wait for the communication to complete, thereby dragging down the performance of the entire cluster.
- Abnormalities in bandwidth or communication operators prevent network-wide links from reaching normal bandwidth rates. This increases the communication time for all ranks, and in this case, no typical slow ranks or slow links exist.
Analysis of HCCS, PCIe, and RDMA is performed using the communication matrix. Bottleneck analysis and tuning suggestions are provided based on the average status of each link type. For scenarios involving slow links, full details of the slow links and corresponding tuning suggestions are provided.
The analysis suggestions are as follows:
- Time consumption ratios of the three link types.
- Specific status of each link type:
- Average link information: Includes total transmission duration, average bandwidth, and average large packet transmission rate. Tuning suggestions are provided based on this information.
- Slowest link information: If the link bandwidth is less than 20% of the average bandwidth, the tool outputs information regarding the slowest link, including transmission duration, transmission size, transmission bandwidth, bandwidth usage, and large packet ratio. Tuning suggestions are provided based on this information.
Optimization principles:
- If the bandwidth usage is greater than 0.8, bandwidth usage is normal and no bottleneck exists in the network-wide links. For more information, see 2.2.
- If the communication packet ratio is greater than 0.8, the size of communication packets is normal. However, the link configuration may be incorrect or link degradation may exist. For more information, see 2.3.
- If the communication packet size is too small, the packets transmitted during each communication are undersized, leading to low bandwidth usage and a bandwidth bottleneck.

