computational_op_masking
Overview
Large-scale cluster scenarios involve multiple compute nodes and massive amounts of data. Single-rank profile data statistics and analysis cannot evaluate the overlap degree of overall cluster operator execution.
The fine-grained operator overlap breakdown feature (computational_op_masking) provides detailed calculation of overlapped operator execution durations across various parallelism scenarios in cluster training.
By analyzing the overlap between computation and communication components, it helps users identify performance bottlenecks.
Preparations
Environment Setup
Install msprof-analyze. For details, see MindStudio Profiler Analyze Installation Guide.
Data Preparation
msprof-analyze requires an input directory containing the collected profile data. For instructions on how to collect such data, see Data Preparation.
Fine-grained Cluster Profile Data Breakdown
Function
Analyzes the collected cluster data by using the cluster_time_summary feature of msprof-analyze.
Syntax
msprof-analyze -m computational_op_masking [--export_type <export_type>] [--step_id <step_id>] [--parallel_types <parallel_types>] -d <cluster_data> [-o <output_path>]
Command-line Options
| Option | Mandatory (Yes/No) | Description |
|---|---|---|
| -m | Yes | Specifies the analysis mode to execute. Set it to computational_op_masking to enable fine-grained breakdown of cluster profile data. |
| --export_type | No | Specifies the export file format. Set it to db (default), as only the db format is supported for data persistence. |
| --step_id | No | Specifies the step ID for which results will be saved. If not specified, results for all steps are output by default. |
| --parallel_types | No | Specifies the extent to which communication operators are overlapped by compute operators across different parallelism modes. For example, "edp,dp;dp;edp" represents [('edp','dp'), ('dp',), ('edp',)]. |
| -d | Yes | Specifies the cluster profile data directory. |
| -o | No | Specifies the output directory. The default value is the directory specified by the -d option. |
For details about more options, see Command-line Options and Parameters of msprof-analyze.
Example
Perform fine-grained breakdown of cluster profile data.
msprof-analyze -m computational_op_masking --export_type db --step_id 11 --parallel_types "edp,dp;dp;edp" -d ./xxx/cluster_data -o ./xxx/output_path
Output Description
-
Storage location:
cluster_analysis_output/cluster_analysis.dbin the output directory. -
Data table name:
ComputationalOperatorMaskingLinearity
Output File Description
The following table describes fields in the ComputationalOperatorMaskingLinearity table.
| Field | Type | Description |
|---|---|---|
| stepId | INTEGER | Iteration ID |
| parallelType | TEXT | Operator parallel mode |
| stepStartTime | INTEGER | Step start time |
| stepEndTime | INTEGER | Step end time |
| totalCommunicationOperatorTime | INTEGER | Total communication operator duration within a step |
| timeRatioOfStepCommunicationOperator | REAL | Ratio of total communication operator duration to total step duration |
| totalTimeWithoutCommunicationBlackout | INTEGER | Total overlap duration of communication operators by compute operators within a step |
| ratioOfUnmaskedCommunication | REAL | Ratio of total overlap duration of communication operators by compute operators to total step duration |
Time-related fields in the preceding table are in microseconds (μs).
Output Analysis
- Analyze computation and communication durations to identify performance bottlenecks.
- Compare duration metrics across ranks within the cluster to locate performance issues. For example, significant fluctuations in computation duration typically indicate inter-rank desynchronization or uneven compute rank performance. Excessive variance in communication duration suggests a need to prioritize troubleshooting for parameter plane network congestion or configuration anomalies.