cluster_time_summary
Overview
Large-scale cluster scenarios involve multiple compute nodes and massive amounts of data. Single-rank profile data statistics and analysis cannot evaluate the overall operational performance of a cluster.
The original deliverable cluster_step_trace_time.csv does not have a dedicated execution command, making it inconvenient to use. Additionally, it does not provide metrics such as memory copies. Therefore, enhancement is required.
Fine-grained cluster profile data breakdown (cluster_time_summary) provides a breakdown of iteration duration during cluster training. By analyzing the computation, communication, and memory copy durations, it helps users identify performance bottlenecks.
Preparations
Environment Setup
Install msprof-analyze. For details, see MindStudio Profiler Analyze Installation Guide.
Data Preparation
msprof-analyze requires an input directory containing the collected profile data. For instructions on how to collect such data, see Data Preparation.
Fine-grained Cluster Profile Data Breakdown
Function
Analyzes the collected cluster data by using the cluster_time_summary feature of msprof-analyze.
Syntax
msprof-analyze -m cluster_time_summary -d <cluster_data> [-o <output_path>]
Command-line Options
| Option | Mandatory (Yes/No) | Description |
|---|---|---|
| -m | Yes | Specifies the analysis mode to execute. Set it to cluster_time_summary to enable fine-grained breakdown of cluster profile data. |
| -d | Yes | Specifies the cluster profile data directory. |
| -o | No | Specifies the output directory. The default value is the directory specified by the -d option. |
For details about more options, see Command-line Options and Parameters of msprof-analyze.
Example
Perform fine-grained breakdown of cluster profile data.
msprof-analyze -m cluster_time_summary -d ./xxx/cluster_data -o ./xxx/output_path
Output Description
-
If the export type is set to
db,cluster_analysis_output/cluster_analysis.dbis generated in the output directory. If the export type is set totext,cluster_analysis_output/ClusterTimeSummary/cluster_time_summary_{timestamp}.csvis generated in the output directory. -
Data table name:
ClusterTimeSummary
Output File Description
The following table describes the fields in the ClusterTimeSummary table.
| Field | Type | Description |
|---|---|---|
| rank | INTEGER | Rank ID |
| step | INTEGER | Iteration number |
| stepTime | REAL | Total iteration duration |
| computation | REAL | Total computation duration of operators on the NPU |
| communicationNotOverlapComputation | REAL | Communication duration not overlapped by computation |
| communicationOverlapComputation | REAL | Overlap duration of computation and communication |
| communication | REAL | Total communication duration of operators on the NPU |
| free | REAL | Idle duration (total iteration duration minus computation, communication, and copy durations) |
| communicationWaitStageTime | REAL | Total wait duration during communication |
| communicationTransmitStageTime | REAL | Total transmission duration during communication |
| memory | REAL | Copy duration |
| memoryNotOverlapComputationCommunication | REAL | Copy duration not overlapped by computation or communication |
Time-related fields in the preceding table are in microseconds (μs).
Except for the header format, the data in cluster_time_summary_{timestamp}.csv is consistent with that in the .db file.
Output Analysis
- Identify performance bottlenecks by analyzing the proportions of computation, communication, memory copy, and idle durations.
- Compare duration metrics across ranks within the cluster to locate performance issues. For example, significant fluctuations in computation duration typically indicate inter-rank desynchronization or uneven compute rank performance. Excessive variance in communication duration suggests a need to prioritize troubleshooting for parameter plane network congestion or configuration anomalies.
- The
cluster_time_compare_summaryfeature can be used in conjunction to effectively locate the root cause of cluster performance deterioration.