Extended Functions

Obtaining Device Information

Supported Products

Product Supported
Atlas 350 Accelerator Card
Atlas A3 training products/Atlas A3 inference products
Atlas A2 training products/Atlas A2 inference products
Atlas 200I/500 A2 inference products
Atlas inference products
Atlas training products

Function

After collecting the profile data, use get_msprof_info.py to obtain device information from the device_{id} or host directory within the PROF_XXX directory. The following table describes the function and installation directory of get_msprof_info.py.

Table 1 Script description

Script Function Directory
get_msprof_info.py Obtains device information. ${INSTALL_DIR}/tools/profiler/profiler_tool/analysis/interface. Replace ${INSTALL_DIR} with the CANN installation directory. If the Toolkit is installed by the root user, the default installation directory is /usr/local/Ascend/cann.

Syntax

python3 get_msprof_info.py -dir <dir> [--help]

Command-line Options

Table 2 Command-line options

Option Mandatory (Yes/No) Description
-dir or --collection-dir Yes Specifies the directory of collected profile data. In non-cluster scenarios, set this option to the host or device_{id} directory within the PROF_XXX directory. In cluster scenarios, set it to the parent directory of the PROF_XXX directory.
-h or --help No Displays the help information. This option is used only to obtain usage instructions.

Examples

  1. Log in to the environment where the tool is located as the running user.

  2. Go to the directory of get_msprof_info.py.

  3. Run get_msprof_info.py. Example commands are as follows:

    • Non-cluster scenarios

      python3 get_msprof_info.py -dir /home/1/PROF_000001_20220129014731273_KEDKPORHMAGPGD/device_0
      
    • Cluster scenarios

      python3 get_msprof_info.py -dir /home/1/
      

Output Description

In non-cluster scenarios, results are displayed as shown in Figure 1. Field descriptions are provided in Table 3. In cluster scenarios, a /query/cluster_info.json file is generated in the directory specified by the -dir option to store node information, as shown in Figure 2. Field descriptions are provided in Table 4.

Figure 1 Device information (non-cluster scenarios)

Table 3 Fields (non-cluster scenarios)

Field Description
collection_info Collection information
Collection end time End time of information collection
Collection start time Start time of information collection
Result Size Result data size (MB)
device_info Device information
AI Core Number Number of AI Cores
AI CPU Number Number of AICPUs
Control CPU Number Number of Control CPUs
Control CPU Type Control CPU type
Device Id Device ID
TS CPU Number Number of TS CPUs
host_info Host information
cpu_info Host CPU information
CPU ID Host CPU ID
Name Host CPU name
Type Host CPU type
Frequency Host CPU frequency
Logical_CPU_Count Number of logical CPUs on the host
cpu_num Number of host CPUs
Host Computer Name Host device name
Host Operating System Host operating system
model_info Model information
Device Id Device ID
iterations Iteration statistics
Iteration Number Number of iterations
Model Id Model ID, which is displayed based on the number of models
version_info Version information
analysis_version Parsing version information
collection_version Collection version information
drv_version Driver version information

Figure 2 Device information (cluster scenarios)

Table 4 Fields (cluster scenarios)

Field Description
Rank Id Node ID that uniquely identifies a device in cluster scenarios
Device Id Device ID, which is not used as a unique device identifier in cluster scenarios
Prof Dir The PROF_XXX directory on the device with the current Rank ID in cluster scenarios
Device Dir The device_{id} directory in the PROF_XXX directory in cluster scenarios
Models Model information, including all model IDs on the device with the current Rank ID and the number of iterations for each model

Profile Data File Slicing

Supported Products

Product Supported
Atlas 350 Accelerator Card
Atlas A3 training products/Atlas A3 inference products
Atlas A2 training products/Atlas A2 inference products
Atlas 200I/500 A2 inference products
Atlas inference products
Atlas training products

Function

For a parsed timeline data file in JSON format, the system identifies the duration for opening it on the Google Chrome browser (chrome://tracing) and slices it into a proper number of slice files, so that you can quickly open it. The slicing operation is triggered during profile data export.

File Format

Data file slicing attributes are configured using the msprof_slice.json file. The following example shows the content format of the msprof_slice.json file:

{
  "slice_switch": "off",
  "slice_file_size(MB)": 0,
  "strategy": 0
}

The directory for storing the msprof_slice.json file is as follows:

${INSTALL_DIR}/tools/profiler/profiler_tool/analysis/msconfig. Replace ${INSTALL_DIR} with the CANN installation directory. If the Toolkit is installed by the root user, the default installation directory is /usr/local/Ascend/cann.

Parameters

Table 1 Parameters

Parameter Mandatory (Yes/No) Description
slice_switch No Specifies whether to enable slicing. Valid values:
on: enables slicing.
off: disables slicing.
Default value: off.
The current maximum slice size is 20 GB. If slicing is enabled and a file exceeds 20 GB, the export will fail. In addition, slicing will not be triggered for files smaller than 200 MB, even if slicing is enabled.
Data slicing is disabled by default. To enable it, set this parameter to on in the msprof_slice.json file. Any other value results in the use of the default setting.
By default, the system determines whether to perform data slicing based on the time required to open a timeline data file in the Chrome browser. Slicing is triggered if the file opening time exceeds the upper limit. The size of the slice files is controlled by the slice_file_size parameter, and the number of files is controlled by the strategy parameter.
The format of the sliced file name is {Module name}_{slice_n}_{timestamp}.json, where slice_n represents the sequence number of the slice.
slice_file_size(MB) No Specifies the maximum slice file size. The unit is MB. The value is a positive integer greater than or equal to 200. By default, the size of a slice file is not limited.
When this parameter is set to a positive integer greater than or equal to 200, the size of each slice file is capped at the specified value. For any other value, file size is unlimited, and only the number of slice files is restricted by the strategy parameter.
strategy No Specifies the slicing policy. Valid values:
0: slices files to minimize the number of slices while keeping the file opening time for each file within an acceptable range.
1: slices files to ensure fast opening time, resulting in more slices.
Default value: 0.
As file opening time depends on computer performance, exact durations cannot be provided. Typical reference values for file opening time are as follows:
• Excessive opening time: ≥ 30s
• Acceptable opening time: [10,30) (seconds)
• Fast opening time: (0,10) (seconds)
The actual opening time varies depending on the device performance.

Parsing, Querying, and Exporting Profile Data Using msprof.py

Overview

The msProf parsing tool is encapsulated using msprof.py. You can directly use the msProf parsing tool to parse and export the profile data.

Tool Usage Process

To export profile data using msprof.py, perform the following steps:

  1. Parse profile data by referring to Parsing Profile Data.

  2. (Optional) Query profile data file information by referring to Querying Profile Data File Information.

    Perform this step if you need to specify the iteration ID or model ID for parsing. Otherwise, skip it.

  3. Export profile data by referring to Exporting Profile Data.

Note

  • Direct parsing, querying, and exporting are not supported on the device for the Atlas 200I/500 A2 inference products in the Ascend RC scenario. The generated PROF_XXX directory must be copied to an environment with the Toolkit package installed.

  • msprof.py must be executed by the user created during installation.

Parsing Profile Data

Supported Products

Product Supported
Atlas 350 Accelerator Card
Atlas A3 training products/Atlas A3 inference products
Atlas A2 training products/Atlas A2 inference products
Atlas 200I/500 A2 inference products
Atlas inference products
Atlas training products

Function

Parses profile data.

Precautions

None

Syntax

python3 msprof.py import -dir <dir>

Parameters and Command-line Options

Table 1 Parsing command parameters and options

Parameter/Option Mandatory (Yes/No) Description
import Yes Parses profile data in import mode. When the profile data is parsed using the import method, a new .db file will be regenerated even if one already exists in the raw profile data directory.
--cluster Yes (for cluster scenarios) Parses and aggregates profile data in cluster scenarios. This option is supported only when the import parameter is specified.
The -dir option specifies the parent directory of PROF_XXX. The parsing result is stored in the sqlite directory generated under PROF_XXX.
-dir or --collection-dir Yes Specifies the directory of the collected profile data. It must be PROF_XXX or its parent directory, such as /home/profiler_data/PROF_XXX.
-h or --help No Displays the help information. This option is used only to obtain usage instructions.

Example

  1. Log in to the environment where the Toolkit package is installed.

  2. Go to the directory of msprof.py.

    ${INSTALL_DIR}/tools/profiler/profiler_tool/analysis/msprof. Replace ${INSTALL_DIR} with the CANN installation directory. If the Toolkit is installed by the root user, the default installation directory is /usr/local/Ascend/cann.

  3. Parse the profile data.

    python3 msprof.py import -dir /home/profiler_data/PROF_XXX
    

Output Description

After the commands are executed and parsing is complete, a sqlite directory containing a .db file is generated in the device_{id} and host subdirectories of PROF_XXX. This .db file stores intermediate results and can be ignored.

To export final timeline data or .db files, proceed to Exporting Profile Data.

Querying Profile Data File Information

Supported Products

Product Supported
Atlas 350 Accelerator Card
Atlas A3 training products/Atlas A3 inference products
Atlas A2 training products/Atlas A2 inference products
Atlas 200I/500 A2 inference products
Atlas inference products
Atlas training products

Function

Queries profile data file information, including the iteration ID and model ID.

Precautions

Before you run the query command, run the import command to parse the profile data. Otherwise, the query result is meaningless.

Syntax

python3 msprof.py query -dir <dir> 

Command-line Options

Table 1 Command-line options for querying profile data

Option Mandatory (Yes/No) Description
-dir or --collection-dir Yes Specifies the directory of the collected profile data. It must be PROF_XXX or its parent directory, such as /home/profiler_data/PROF_XXX.
--data-type No Specifies the data type. This option is used for interconnection with MindStudio and does not need to be specified. Valid values:
0: cluster data. You can query whether the current data is collected in a cluster scenario.
1: iteration trace data, which is detailed data of each iteration, including the FP/BP elapsed time, iteration refresh lag, and iteration interval.
2: compute volume, which is the number of floating-point operations on AI Core.
3: data preparation, including sending training data to the device and reading it on the device.
4: parallelism tuning suggestions.
5: parallelism data, including the pure communication duration and computation duration.
6: slow communication rank and link data and tuning suggestions.
7: communication matrix data and tuning suggestions.
8: CPU and memory performance metrics of the host-side system and processes.
9: communication duration with critical path analysis enabled.
10: communication matrix with critical path analysis enabled.
--id No Specifies the rank ID of a cluster node in cluster scenarios, and device ID in non-cluster scenarios. This option is used for interconnection with MindStudio and does not need to be specified.
--model-id No Specifies the model ID.
This option is used for interconnection with MindStudio and does not need to be specified.
--iteration-id No Specifies the iteration ID for graph-based statistics collection. The iteration ID is incremented by 1 each time a graph is executed. When a script is compiled into multiple graphs, the iteration ID is different from the step ID at the script layer. Default value: 1.
This option is used for interconnection with MindStudio and does not need to be specified.
-h or --help No Displays the help information. This option is used only to obtain usage instructions.

Example

  1. Log in to the environment where the Toolkit package is installed.

  2. Go to the directory of msprof.py.

    ${INSTALL_DIR}/tools/profiler/profiler_tool/analysis/msprof. Replace ${INSTALL_DIR} with the CANN installation directory. If the Toolkit is installed by the root user, the default installation directory is /usr/local/Ascend/cann.

  3. To query the profile data information, run the following command:

    python3 msprof.py query -dir /home/profiler_data/PROF_XXX
    

Output Description

After the command for querying profile data is executed, the results will be printed and displayed.

Table 2 describes the information obtained through the query function of the msProf tool.

Table 2 Profile data file information

Field Description
Job Info Job name
Device ID Device ID
Dir Name Directory name
Collection Time Data collection time
Model ID Model ID
Iteration Number Total number of iterations
Top Time Iteration Top five iterations with the longest durations
Rank ID Node ID in the cluster scenario

Exporting Profile Data

Supported Products

Product Supported
Atlas 350 Accelerator Card
Atlas A3 training products/Atlas A3 inference products
Atlas A2 training products/Atlas A2 inference products
Atlas 200I/500 A2 inference products
Atlas inference products
Atlas training products

Function

Exports profile data.

Precautions

Before exporting profile data, you need to parse profile data.

Syntax

  • Export timeline data and DB files

    python3 msprof.py export timeline -dir <dir> [-reports <reports_sample_config.json>] [--model-id <model-id>] [--iteration-id <iteration_id>] [--iteration-count <iteration_count>] [--clear]
    
  • Export summary data and DB files

    python3 msprof.py export summary -dir <dir> [--model-id <model-id>] [--iteration-id <iteration_id>] [--iteration-count <iteration_count>] [--format <export_format>] [--clear]
    
  • Export DB files

    python3 msprof.py export db -dir <dir>
    

Command-line Options

Table 1 Command-line options for exporting profile data

Option Mandatory (Yes/No) Description
-dir or --collection-dir Yes Specifies the directory of the collected profile data. It must be PROF_XXX or its parent directory, such as /home/HwHiAiUser/profiler_data/PROF_XXX.
-reports No Specifies a custom reports_sample_config.json configuration file to export the corresponding profile data based on the scope specified in the file. The parameter implementation is the same as that of msprof --reports. For details, see Example (--reports Option).
--model-id No Specifies the model ID. The value must be a positive integer. This option must be specified in combination with --iteration-id to export the profile data of a specified compute iteration in the model. If neither --model-id nor --iteration-id is specified, all profile data is exported by default.
• For Atlas A2 training products/Atlas A2 inference products as well as Atlas A3 training products/Atlas A3 inference products, --model-id can be set to 4294967295, which specifies the step mode. That is, the value of --iteration-id specifies parsing by step. Only profile data of the MindSpore framework (version 2.3 or later) can be parsed.
• If --model-id is set to other values, this option specifies the iteration ID for graph-based statistics collection. (The iteration ID is incremented by 1 each time a graph is executed. When a script is compiled into multiple graphs, the iteration ID is different from the step ID at the script layer.)
--iteration-id No Specifies the iteration ID. The value must be a positive integer. This option must be specified in combination with --model-id to export the profile data of a specified compute iteration in the model. If neither --model-id nor --iteration-id is specified, all profile data is exported by default.
• For Atlas A2 training products/Atlas A2 inference products, as well as Atlas A3 training products/Atlas A3 inference products, --model-id can be set to 4294967295, which specifies the iteration ID for step-based statistics collection. The iteration ID is incremented by 1 each time a step is executed. Only profile data of the MindSpore framework (version 2.3 or later) can be parsed.
• If --model-id is set to other values, this option specifies the iteration ID for graph-based statistics collection. (The iteration ID is incremented by 1 each time a graph is executed. When a script is compiled into multiple graphs, the iteration ID is different from the step ID at the script layer.)
--iteration-count No Specifies the number of consecutive iterations for which data will be exported. The value ranges from 1 to 5. The value of --iteration-id is used as the starting step. For example, if --iteration-count is 3 and --iteration-id is 1, data for steps 1, 2, and 3 will be exported.
--format No Specifies the format of the exported summary data file. The value can be csv (default) or json. This option is supported only when the summary parameter is configured.
This document uses CSV files as examples for all summary file descriptions.
--clear No Sets the data clearance mode. After this option is enabled, the sqlite directory in PROF_XXX/device_{id} is deleted (after profile data is exported) to save storage space. When this option is specified, the data clearance mode is enabled. This option is not specified by default.
-h or --help No Displays the help information. This option is used only to obtain usage instructions.

Examples

  1. Log in to the environment where the Toolkit package is installed.

  2. Go to the directory of msprof.py.

    ${INSTALL_DIR}/tools/profiler/profiler_tool/analysis/msprof. Replace ${INSTALL_DIR} with the CANN installation directory. If the Toolkit is installed by the root user, the default installation directory is /usr/local/Ascend/cann.

  3. Export profile data. The timeline, summary, and DB files can be exported. The command formats are as follows:

    • Export timeline data and DB files

      python3 msprof.py export timeline -dir /home/HwHiAiUser/profiler_data/PROF_XXX
      
    • Export summary data and DB files

      python3 msprof.py export summary -dir /home/HwHiAiUser/profiler_data/PROF_XXX
      
    • Export the DB file to generate a .db file (msprof_timestamp.db) that aggregates all profile data.

      python3 msprof.py export db -dir /home/HwHiAiUser/profiler_data/PROF_XXX
      

    Note

    • By default, all profile data is exported.
    • In single-operator scenarios or scenarios where only Ascend AI Processor system data is collected (that is, the --application option is not specified in the msprof data collection command), the --iteration-id and --model-id options are not supported.

Output Description

After the preceding command is executed, the mindstudio_profiler_output directory and msprof_*.db file are generated in the PROF_XXX directory under the --collection-dir directory.

The following examples show the directory structure of the generated profile data:

  • Single-process collection

    └── PROF_XXX
          ├── device_0
          │    └── data
          ├── device_1
          │    └── data
          ├── host
          │    └── data
          ├── msprof_*.db
          └── mindstudio_profiler_output
                ├── msprof_{timestamp}.json
                ├── step_trace_{timestamp}.json
                ├── xx_*.csv
                 ...
                └── README.txt
    
  • Multi-process collection

    └── PROF_XXX1
          ├── device_0
          │    └── data
          ├── host
          │    └── data
          ├── msprof_*.db
          └── mindstudio_profiler_output
                ├── msprof_{timestamp}.json
                ├── step_trace_{timestamp}.json
                ├── xx_*.csv
                 ...
                └── README.txt
    └── PROF_XXX2
          ├── device_1
          │    └── data
          ├── host
          │    └── data
          ├── msprof_*.db
          └── mindstudio_profiler_output
                ├── msprof_{timestamp}.json
                ├── step_trace_{timestamp}.json
                ├── xx_*.csv
                 ...
                └── README.txt
    

Note

  • In multi-device scenarios, if a single collection process is started, only one PROF_XXX directory is generated. If multiple processes are started, multiple PROF_XXX directories are generated. The device directories are created within these PROF_XXX directories. The specific number of device directories per PROF_XXX depends on the actual user operations and does not affect profile data analysis.
  • For details about profile data, see Profile Data File References.
  • The files in the mindstudio_profiler_output directory are generated based on the actual profile data. If specific data files are not collected, the corresponding timeline and summary data will not be exported.
  • You can run the export command to directly export summary reports from the profile data parsing result. Even if the profile data has not been parsed, running the export command separately will parse the data and export the result files.
  • If the msProf collection process is forcibly interrupted, the tool saves the raw profile data already collected, which can still be parsed and exported using the export command.

Performance Tuning Suggestions

Note

This function provides tuning suggestions after msProf parses the profile data and is no longer being updated. For more advanced profile data analysis and tuning suggestions, see msprof-analyze.

Supported Products

Note

For details about Ascend product models, see Ascend Product Models.

Product Supported
Atlas 350 Accelerator Card
Atlas A3 training products/Atlas A3 inference products
Atlas A2 training products/Atlas A2 inference products
Atlas 200I/500 A2 inference products
Atlas inference products
Atlas training products

In cluster or multi-rank communication scenarios, performance tuning suggestions will be output to the screen after the profile data export command is executed. The details are as follows:

  1. Obtain optimization suggestions based on communication duration analysis.

    Since collective communication operators are executed synchronously, any slow nodes in the cluster will drag down the performance of the entire cluster due to the bottleneck effect.

    Optimization principles:

    1. Check whether there is a rank in an iteration with a Wait Time Ratio greater than the threshold (0.2):

      1. If yes, a communication bottleneck exists in this iteration. For more information, see 1.2.
      2. If no, it can be preliminarily determined that no communication bottleneck exists in this iteration. Proceed to check the overall bandwidth usage.
    2. Identify the rank with the maximum Wait Time Ratio and check whether its Synchronization Time Ratio Before Transit exceeds the threshold (0.2):

      1. If yes, a slow rank exists (the rank with the smallest Wait Time Ratio). Check its forward and backward calculation time. If this time is significantly longer than that of other cards, check for load imbalances or processor faults. If the calculation time is consistent with other ranks, check the data preprocessing time.
      2. If no, the links are abnormal. In this case, check for link failures or cases where the communication volume is too low.

    Note

    • Wait Time Ratio = Wait Time/(Wait Time + Transit Time). A higher Wait Time Ratio indicates that the wait duration of the rank accounts for a larger portion of the total communication duration, resulting in lower communication efficiency.
    • Synchronization Time Ratio Before Transit = Synchronization Time/(Synchronization Time + Transit Time). Synchronization Time refers to the synchronization duration before the first data transmission. A higher Synchronization Time Ratio Before Transit indicates lower communication efficiency and the possible existence of slow ranks.

    Figure 1 Communication duration-based suggestion

  2. Obtain optimization suggestions based on communication matrix analysis.

    Slow links in cluster scenarios generally involve the following two cases:

    • Some slow links cause increased communication time between a few ranks. Other ranks must wait for the communication to complete, thereby dragging down the performance of the entire cluster.
    • Abnormalities in bandwidth or communication operators prevent network-wide links from reaching normal bandwidth rates. This increases the communication time for all ranks, and in this case, no typical slow ranks or slow links exist.

    Analysis of HCCS, PCIe, and RDMA is performed using the communication matrix. Bottleneck analysis and tuning suggestions are provided based on the average status of each link type. For scenarios involving slow links, full details of the slow links and corresponding tuning suggestions are provided.

    The analysis suggestions are as follows:

    1. Time consumption ratios of the three link types.
    2. Specific status of each link type:
      1. Average link information: Includes total transmission duration, average bandwidth, and average large packet transmission rate. Tuning suggestions are provided based on this information.
      2. Slowest link information: If the link bandwidth is less than 20% of the average bandwidth, the tool outputs information regarding the slowest link, including transmission duration, transmission size, transmission bandwidth, bandwidth usage, and large packet ratio. Tuning suggestions are provided based on this information.

    Optimization principles:

    1. If the bandwidth usage is greater than 0.8, bandwidth usage is normal and no bottleneck exists in the network-wide links. For more information, see 2.2.
    2. If the communication packet ratio is greater than 0.8, the size of communication packets is normal. However, the link configuration may be incorrect or link degradation may exist. For more information, see 2.3.
    3. If the communication packet size is too small, the packets transmitted during each communication are undersized, leading to low bandwidth usage and a bandwidth bottleneck.

    Figure 2 Communication matrix-based suggestion