Table Structures of Recipe Results and `cluster_analysis.db` Deliverables

NOTE

When msprof-analyze is configured with the --mode option, the profile data is analyzed and the cluster_analysis.db deliverables are generated. This topic describes the table structures and fields of these deliverables.
Some analysis features do not generate the cluster_analysis.db file.

`cluster_step_trace_time.csv`

Generated when the data parsing mode is communication_matrix, communication_time, or all.

Column A: Steps. This column is set during profile data collection. Generally, profile data for a single step is sufficient for cluster performance analysis. If multiple steps are collected, filter them first.

Column B: Type. Valid values are rank and stage, which are closely related to the index. rank represents a single rank, while stage represents a rank group (PP parallel stage). If the type is stage, the information in columns D through K represents the maximum values within the rank group.

Column C: Index. This column is related to the type and indicates the device ID.

Column D: Computing. This column displays the computation duration.

Column E: Communication (Not Overlapped). This column displays the communication duration not overlapped by computation.

Column F: Overlapped. This column displays the duration where computation and communication overlap.

Column G: Communication. This column displays the total communication duration.

Column H: Free. This column displays the idle duration, which indicates the duration where the device is neither communicating nor computing. This may include the SDMA copy and idle wait durations.

Column I: Stage. This column and the following two columns are valid only for PP parallelism. Stage duration represents the total time excluding the duration of receive operators.

Column J: Bubble. This column displays the bubble time, which is the sum of the duration of all receive operators.

Column K: Communication (Not Overlapped and Exclude Receive). This column indicates the communication duration that is not overlapped and excludes the duration of receive operators.

Column L: Preparing. This column displays the duration from the start of an iteration to the execution of the first computation or communication operator.

Column M: DP Index. This column displays the index of the DP group to which the cluster data belongs after being partitioned based on the parallel strategy. If the data is not collected, this column is not displayed.

Column N: PP Index. This column displays the index of the PP group to which the cluster data belongs after being partitioned based on the parallel strategy. If not collected, this column is not displayed.

Column O: TP Index. This column displays the index of the TP group to which the cluster data belongs after being partitioned based on the parallel strategy. If not collected, this column is not displayed.

Tips: Filter Column B by the stage type to check for issues between stages. Then, filter Column B by the rank type to check for issues between ranks. Perform the following troubleshooting checks:

Check for slow ranks or load imbalance based on the computation duration difference.
Check for host-bound issues or uneven distribution based on the idle duration statistics.
Check for excessive communication duration based on the duration displayed in the Communication (Not Overlapped and Exclude Receive) column.
Check whether the bubble configuration is appropriate and whether imbalance exists between stages based on the proportion of bubble time and the theoretical calculation formula.

Theoretically, the values for these durations should remain relatively consistent. If the difference between the maximum and minimum values exceeds 5%, a slow rank may exist.

`cluster_communication_matrix.json`

Generated when the data parsing mode is communication_matrix or all.

Open the JSON file using VS Code or a JSON viewer and search for Total. There will be multiple results. Generally, the structure of the link bandwidth information is as follows:

{src_rank}-{dst_rank}: {
    "Transport Type": "LOCAL",
    "Transit Time(ms)": 0.02462,
    "Transit Size(MB)": 16.777216,
    "Bandwidth(GB/s)": 681.4466
}

Tips: You can identify slow link issues based on the rank interconnection bandwidth and the link type.

LOCAL: represents on-chip copy, which provides the highest speed.
HCCS or PCIE: represents intra-node inter-chip copy, which provides medium speed.
RDMA: represents inter-node copy, which provides the lowest speed.

`cluster_communication.json`

Generated when the data parsing mode is set to communication_time or all. It mainly provides the communication duration data.

`compute_op_sum`

When -m compute_op_sum is set, the following tables are generated.

`ComputeOpAllRankStats`

Description:

Provides statistical analysis of computation duration for all ranks, grouped by OpType and TaskType. This analysis is based on cluster profile data in db format.

Table fields

Field	Type	Description
OpType	TEXT	Computation operator type
TaskType	TEXT	Accelerator type for operator execution
Count	INTEGER	Number of operators grouped by `OpType` and `TaskType`
MeanNs	REAL	Average duration
StdNs	REAL	Standard deviation of the duration
MinNs	REAL	Minimum duration
Q1Ns	REAL	25th percentile of duration
MedianNs	REAL	50th percentile of duration
Q3Ns	REAL	75th percentile of duration
MaxNs	REAL	Maximum duration
SumNs	REAL	Total duration

`ComputeOpPerRankStatsByOpType`

Description:

Provides statistical analysis of computation duration for each rank, grouped by OpType and TaskType. This analysis is based on cluster profile data in db format.

Table fields

Field	Type	Description
OpType	TEXT	Computation operator type
TaskType	TEXT	Accelerator type for operator execution
Count	INTEGER	Number of operators grouped by `OpType` and `TaskType`
MeanNs	REAL	Average duration
StdNs	REAL	Standard deviation of the duration
MinNs	REAL	Minimum duration
Q1Ns	REAL	25th percentile of duration
MedianNs	REAL	50th percentile of duration
Q3Ns	REAL	75th percentile of duration
MaxNs	REAL	Maximum duration
SumNs	REAL	Total duration
Rank	INTEGER	Rank ID

`ComputeOpPerRankStatsByOpName`

Description:

Not generated when the --exclude_op_name option is specified. It provides a statistical analysis of computation duration for each rank, grouped by OpName, OpType, TaskType, and InputShapes. This analysis is based on cluster profile data in db format.

Table fields

Field	Type	Description
OpName	TEXT	Computation operator name
OpType	TEXT	Computation operator type
TaskType	TEXT	Accelerator type for operator execution
InputShapes	TEXT	Input shape of the operator
Count	INTEGER	Number of operators in this group
MeanNs	REAL	Average duration
StdNs	REAL	Standard deviation of the duration
MinNs	REAL	Minimum duration
Q1Ns	REAL	25th percentile of duration
MedianNs	REAL	50th percentile of duration
Q3Ns	REAL	75th percentile of duration
MaxNs	REAL	Maximum duration
SumNs	REAL	Total duration
Rank	INTEGER	Rank ID

`cann_api_sum`

When -m cann_api_sum is set, the following tables are generated:

`CannApiSum`

Description:

Provides statistical analysis of the duration of each unique API across all ranks. This analysis is based on cluster profile data in db format.

Table fields

Field	Type	Description
name	TEXT	API name
timeRatio	REAL	Percentage of the duration of the API relative to the total duration of all APIs
totalTimeNs	INTEGER	Total duration of the API
totalCount	INTEGER	Number of APIs
averageNs	REAL	Average duration
Q1Ns	REAL	25th percentile of duration
medNs	REAL	50th percentile of duration
Q3Ns	REAL	75th percentile of duration
minNs	REAL	Minimum duration
maxNs	REAL	Maximum duration
stdev	REAL	Standard deviation of the duration
minRank	TEXT	A set of ranks corresponding to `minNs`
maxRank	TEXT	A set of ranks corresponding to `maxNs`

`CannApiSumRank`

Description:

Provides statistical analysis of the duration of each unique API on each rank. This analysis is based on cluster profile data in db format.

Table fields

Field	Type	Description
name	TEXT	API name
durationRatio	REAL	Percentage of the duration of the API relative to the total duration of all APIs on the rank
totalTimeNs	INTEGER	Total duration of the API
totalCount	INTEGER	Number of APIs
averageNs	REAL	Average duration
minNs	REAL	Minimum duration
Q1Ns	REAL	25th percentile of duration
medNs	REAL	50th percentile of duration
Q3Ns	REAL	75th percentile of duration
maxNs	REAL	Maximum duration
stdev	REAL	Standard deviation of the duration
rank	INTEGER	Rank ID

`hccl_sum`

When -m hccl_sum is set, the following tables are generated:

`HcclAllRankStats`

Description:

Provides statistical analysis of the duration of each communication operator type (such as hcom_broadcast_) across all ranks. This analysis is based on cluster profile data in db format.

Table fields

Field	Type	Description
OpType	TEXT	Communication operator type
Count	INTEGER	Count
MeanNs	REAL	Average duration
StdNs	REAL	Standard deviation of the duration
MinNs	REAL	Minimum duration
Q1Ns	REAL	25th percentile of duration
MedianNs	REAL	50th percentile of duration
Q3Ns	REAL	75th percentile of duration
MaxNs	REAL	Maximum duration
SumNs	REAL	Total duration

`HcclPerRankStats`

Description:

Provides statistical analysis of the duration of each communication operator type (such as hcom_broadcast_) on each rank. This analysis is based on cluster profile data in db format.

Table fields

Field	Type	Description
OpType	TEXT	Communication operator type
Count	INTEGER	Count
MeanNs	REAL	Average duration
StdNs	REAL	Standard deviation of the duration
MinNs	REAL	Minimum duration
Q1Ns	REAL	25th percentile of duration
MedianNs	REAL	50th percentile of duration
Q3Ns	REAL	75th percentile of duration
MaxNs	REAL	Maximum duration
SumNs	REAL	Total duration
Rank	INTEGER	Rank ID

`HcclGroupNameMap`

Description:

Provides a mapping of ranks contained within each communication group.

Table fields

Field	Type	Description
GroupName	TEXT	Communication group, such as `{ip_address}%enp67s0f5_60000_0_1708156014257149`
GroupId	TEXT	Last three digits of the hash value of the communication group
Ranks	TEXT	All ranks within the communication group

`HcclTopOpStats`

Description:

Provides an analysis of the computation duration for all communication operators across all ranks. It displays data for the top N (default value: 15) communication operators with the largest average durations. This analysis is based on cluster profile data in db format.

Table fields

Field	Type	Description
OpName	TEXT	Communication operator name, such as `hcom_allReduce__606_0_1`
Count	INTEGER	Count
MeanNs	REAL	Average duration
StdNs	REAL	Standard deviation of the duration
MinNs	REAL	Minimum duration
Q1Ns	REAL	25th percentile of duration
MedianNs	REAL	50th percentile of duration
Q3Ns	REAL	75th percentile of duration
MaxNs	REAL	Maximum duration
SumNs	REAL	Total duration
MinRank	INTEGER	Rank with the minimum duration for the communication operator
MaxRank	INTEGER	Rank with the maximum duration for the communication operator

`mstx_sum`

When -m mstx_sum is set, the following tables are generated:

`MSTXAllFrameworkStats`

Description:

Provides statistical analysis of the framework-side duration of MSTX instrumentation. This analysis is based on cluster profile data in db format and does not distinguish between ranks.

Table fields

Field	Type	Description
Name	TEXT	Information carried by the MSTX instrumentation data
Count	INTEGER	Number of instrumentation events grouped by `Name` within the iteration
MeanNs	REAL	Average duration
StdNs	REAL	Standard deviation of the duration
MinNs	REAL	Minimum duration
Q1Ns	REAL	25th percentile of duration
MedianNs	REAL	50th percentile of duration
Q3Ns	REAL	75th percentile of duration
MaxNs	REAL	Maximum duration
SumNs	REAL	Total duration
StepId	INTEGER	Iteration ID

`MSTXAllCannStats`

Description:

Provides statistical analysis of the CANN-layer duration of MSTX instrumentation. This analysis is based on cluster profile data in db format and does not distinguish between ranks.

Table fields

Field	Type	Description
Name	TEXT	Information carried by the MSTX instrumentation data
Count	INTEGER	Number of instrumentation events grouped by `Name` within the iteration
MeanNs	REAL	Average duration
StdNs	REAL	Standard deviation of the duration
MinNs	REAL	Minimum duration
Q1Ns	REAL	25th percentile of duration
MedianNs	REAL	50th percentile of duration
Q3Ns	REAL	75th percentile of duration
MaxNs	REAL	Maximum duration
SumNs	REAL	Total duration
StepId	INTEGER	Iteration ID

`MSTXAllDeviceStats`

Description:

Provides statistical analysis of the device-side duration of MSTX instrumentation. This analysis is based on cluster profile data in db format and does not distinguish between ranks.

Table fields

Field	Type	Description
Name	TEXT	Information carried by the MSTX instrumentation data
Count	INTEGER	Number of instrumentation events grouped by `Name` within the iteration
MeanNs	REAL	Average duration
StdNs	REAL	Standard deviation of the duration
MinNs	REAL	Minimum duration
Q1Ns	REAL	25th percentile of duration
MedianNs	REAL	50th percentile of duration
Q3Ns	REAL	75th percentile of duration
MaxNs	REAL	Maximum duration
SumNs	REAL	Total duration
StepId	INTEGER	Iteration ID

`MSTXMarkStats`

Description:

Provides statistical analysis of the duration of MSTX instrumentation for each rank, grouped by Rank and StepId. This analysis is based on cluster profile data in db format.

Table fields

Field	Type	Description
Name	TEXT	Information carried by the MSTX instrumentation data
FrameworkDurationNs	REAL	Framework-side duration
CannDurationNs	REAL	CANN layer duration
DeviceDurationNs	REAL	Device-side duration
Rank	INTEGER	global rank
StepId	INTEGER	Iteration ID

`communication_group_map`

When -m communication_group_map is set, the following tables are generated:

`CommunicationGroupMapping`

Description:

Provides the mapping between communication groups and parallel strategies based on the cluster profile data in db format.

Table fields

Field	Type	Description
type	TEXT	Operator type (`collective` or `p2p`). Operators with names containing `send`, `recv`, or `receive` are classified as `p2p`.
rank_set	TEXT	A set of ranks (global ranks) within the communication group.
group_name	TEXT	Hash value of the communication group, which maps to `group_id`.
group_id	TEXT	Communication group name defined within HCCL, such as `{ip_address}%enp67s0f5_60000_0_1708156014257149`
pg_name	TEXT	Service-defined communication group name (such as `dp`, `dp_cp`, and `mp`).

`cluster_time_summary`

When -m cluster_time_summary is set, the following tables are generated:

Note: This table is similar to cluster_step_trace_time.csv, which will be replaced later.

`ClusterTimeSummary`

Description:

Provides statistical analysis of cluster duration for all ranks to facilitate performance issue identification. This analysis is based on cluster profile data in db format.

Table fields (time unit: μs)

Field	Type	Description
rank	INTEGER	global rank
step	INTEGER	Iteration ID
stepTime	REAL	Total iteration duration
computation	REAL	Total computation duration
communicationNotOverlapComputation	REAL	Communication duration not overlapped by computation
communicationOverlapComputation	REAL	Duration of the overlap between computation and communication
communication	REAL	Total communication duration
free	REAL	Idle time (total duration when the device is neither communicating nor computing, excluding asynchronous memory copy)
communicationWaitStageTime	REAL	Total communication wait duration
communicationTransmitStageTime	REAL	Total communication transmission duration
memory	REAL	Total asynchronous memory copy duration
memoryNotOverlapComputationCommunication	REAL	Total duration of asynchronous memory copy not overlapped by computation or communication
taskLaunchDelayAvgTime	REAL	Delivery duration (average duration from the start of the host-side API to the start of the device-side task)

`cluster_time_compare_summary`

When -m cluster_time_compare_summary is set, the following tables are generated.

Note: This analysis feature requires the cluster_time_summary results. Both cluster data and benchmark cluster data must contain a cluster_analysis.db file including the ClusterTimeSummary table.

`ClusterTimeCompareSummary`

Description: Provides a comparison between the current cluster and the benchmark cluster. For example, computationDiff indicates the difference in computation time between the current cluster and the benchmark cluster. A positive computationDiff value indicates the current cluster computation time exceeds that of the benchmark cluster, while a negative value indicates the opposite.

Table fields (time unit: μs)

Field	Type	Description
rank	INTEGER	global rank
step	INTEGER	Iteration ID
stepTime	REAL	Iteration duration for current cluster data
stepTimeBase	REAL	Computation time for benchmark cluster data
stepTimeDiff	REAL	Difference in iteration duration
......	-	Some fields omitted (for the `ClusterTimeSummary` table, current cluster data, benchmark cluster data, and the difference between the two are displayed)
taskLaunchDelayAvgTime	REAL	Delivery duration for current cluster data
taskLaunchDelayAvgTimeBase	REAL	Delivery duration for benchmark cluster data
taskLaunchDelayAvgTimeDiff	REAL	Difference in delivery duration

`freq_analysis`

Description:

Provides AI Core frequency analysis to enable one-click NPU frequency reduction detection. This analysis is based on cluster profile data in db format. There are three frequency scenarios:

Normal: The frequency remains stable at 1800 MHz.
Idle state: When the NPU is idle for an extended period, the device automatically reduces the frequency to 800 MHz.
Abnormal reduction: When NPU frequency reduction occurs due to other factors, abnormal frequencies apart from 1800 MHz and 800 MHz are detected.

When -m freq_analysis is set, the following tables are generated if frequency reduction occurs.

`FreeFrequencyRanks`

Description:

Idle state: When the NPU is idle for an extended period, the device automatically reduces the frequency to 800 MHz.

Table fields

Field	Type	Description
rankId	INTEGER	global rank
aicoreFrequency	TEXT	[800, 1800]

`AbnormalFrequencyRanks`

Description:

Abnormal reduction: When NPU frequency reduction occurs due to other factors, abnormal frequencies apart from 1800 MHz and 800 MHz are detected.

Table fields

Field	Type	Description
rankId	INTEGER	global rank
aicoreFrequency	TEXT	List of frequencies in abnormal reduction scenarios, such as [800, 1150, 1450, 1800]

`ep_load_balance`

Description:

In cluster training scenarios, MoE load imbalance refers to the uneven distribution of tasks across different expert models in a distributed environment, causing some expert models to overload while others remain idle. This imbalance reduces overall system efficiency and creates potential performance bottlenecks.

When -m ep_load_balance is set, the following tables are generated.

`EPTokensSummary`

Description:

Provides GroupedMatmul operator shape analysis. This analysis is based on cluster profile data in db format.

Table fields

Field	Type	Description
rank	INTEGER	global rank
epRanks	TEXT	A set of ranks within the same Expert Parallelism (EP) group, such as [rank0,rank1]
inputShapesSummary	INTEGER	Sum of the first dimension of all `input_shapes` for the `GroupedMatmul` operator on this rank

`TopEPTokensInfo`

Description:

Provides information about EP groups with load imbalance.

Table fields

Field	Type	Description
epRanks	TEXT	A set of ranks within the EP group with load imbalance, such as [rank0, rank1]
tokensDiff	INTEGER	Difference between the maximum and minimum values within the same EP group

`mstx2commop`

When -m mstx2commop is set, cluster_analysis.db is not generated, and the built-in communication instrumentation data is converted into communication operators.

Note: This setting generates a new COMMUNICATION_OP table. You are advised to use it in combination with Level_none. Otherwise, the original table structure will be damaged.

Output:

When Level_none is set, the unified database does not contain a COMMUNICATION_OP table. This analysis feature converts built-in communication instrumentation data into communication operators for display in MindStudio Insight.

`slow_rank`

When -m slow_rank is set, the following tables are generated.

`SlowRank`

Description:

Provides slow rank analysis based on cluster profile data in db format.

Table fields

Field	Type	Description
rankId	INTEGER	Slow rank
slowAffectCount	INTEGER	Number of communications affected by this rank

`SlowOpStats`

Description:

Provides communication operator statistics corresponding to slow rank bottleneck locations. This analysis is based on cluster profile data in db format.

Table fields

Field	Type	Description
SlowRank	TEXT	Slow rank ID
OpName	TEXT	Communication operator name
GroupName	TEXT	Communication group name
Timestamp	TEXT	Communication operator timestamp
Count	INTEGER	Count
MeanNs	REAL	Average duration
StdNs	REAL	Standard deviation of the duration
MinNs	REAL	Minimum duration
Q1Ns	REAL	25th percentile of duration
MedianNs	REAL	50th percentile of duration
Q3Ns	REAL	75th percentile of duration
MaxNs	REAL	Maximum duration
SumNs	REAL	Total duration
MinRank	INTEGER	Rank with the minimum duration for the communication operator
MaxRank	INTEGER	Rank with the maximum duration for the communication operator

`p2p_pairing`

When -m p2p_pairing is set, cluster_analysis.db is not generated.

This analysis feature displays P2P operator connection lines, allowing users to identify the source rank (src_rank) and destination rank (dst_rank) for send and receive operations. Currently, MindStudio Insight does not support this feature.

Output:

An opConnectionId column is added to the COMMUNICATION_OP table in the ascend_pytorch_profiler_{rank_id}.db file of the cluster data. P2P operators across different ranks can be linked based on this operator connection ID (opConnectionId).

`pp_chart`

Note: This capability requires lightweight instrumentation before and after forward and backward passes. Use msprof-analyze for processing and MindStudio Insight for result visualization.

Instrumentation

Taking DualpipeV2 as an example, locate the forward and backward pass code and add the following code to dualpipev_schedules.py (for reference only; ensure the code is added at the correct location):

import torch_npu
def step_wrapper(func, msg: str):
    def wrapper(*args, **kwargs):
        new_msg = {"name": msg}
        if msg == "forward_step_with_model_graph" and kwargs.get("extra_block_kwargs") is not None:
            new_msg["name"] = "forward_backward_overlaping"
        if "current_microbatch" in kwargs:
            new_msg["current_microbatch"] = kwargs["current_microbatch"]
        if msg == "WeightGradStore_pop" and len(WeightGradStore.cache) == 0:
            mstx_state_step_range_id = None
        else:
            mstx_state_step_range_id = torch_npu.npu.mstx.range_start(str(new_msg), torch_npu.npu.current_stream())
        out = func(*args, **kwargs)
        if mstx_state_step_range_id is not None:
            torch_npu.npu.mstx.range_end(mstx_state_step_range_id)
            mstx_state_step_range_id = None
        return out
    return wrapper

forward_step_with_model_graph = step_wrapper(forward_step_with_model_graph, "forward_step_with_model_graph")
forward_step_no_model_graph = step_wrapper(forward_step_no_model_graph, "forward_step_no_model_graph")
backward_step_with_model_graph = step_wrapper(backward_step_with_model_graph, "backward_step_with_model_graph")
backward_step = step_wrapper(backward_step, "backward_step")
WeightGradStore.pop = step_wrapper(WeightGradStore.pop, "WeightGradStore.pop")

Add metadata when collecting profile data:

prof.add_metadata('pp_info', json.dumps(
    {
        'pp_type': 'dualpipev',
        'microbatch_num': 10,
    }
))
# Replace microbatch_num with the actual value.

`StepTaskInfo`

Description:

Provides a table for visualized display. This table is generated by processing the db format cluster profile data instrumented in the previous section.

Table fields

Field	Type	Description
name	TEXT	Forward and backward propagation information
startNs	INTEGER	Start time on the device
endNs	INTEGER	End time on the device
type	INTEGER	Type (different types are displayed in different colors)

Communication

When profiler_level is set to Level_none, the COMMUNICATION_OP table is not generated. Use the mstx2commop analysis feature to convert built-in communication instrumentation data into communication operators to generate this table. The PP chart can also display send and recv operators.

With the COMMUNICATION_OP table, use the p2p_pairing analysis feature to display send and recv connection lines in the PP chart. This allows the PP pipeline to also display the send and recv lines. However, this feature requires level 1 or higher.

`communication_group.json`

Records communication group information. It is generated by parsing analysis.db. collective indicates a collective communication group, and P2P indicates point-to-point communication. Ignore this file.

`stats.ipynb`

Generated when the analysis feature is set to cann_api_sum and stored in the cluster_analysis_output/CannApiSum directory.

Open this file using Jupyter Notebook or MindStudio Insight to view cluster API duration information.
Generated when the analysis feature is set to compute_op_sum and stored in the cluster_analysis_output/ComputeOpSum directory.

Open this file using Jupyter Notebook or MindStudio Insight to view cluster computation operator duration analysis results (summarizing all cluster computation operators in charts) and cluster rank computation operator duration analysis results (summarizing computation operators for each rank).
Generated when the analysis feature is set to hccl_sum and stored in the cluster_analysis_output/HcclSum directory.

Open this file using Jupyter Notebook or MindStudio Insight to view cluster communication operator duration analysis results (summarizing all cluster communication operators in charts), cluster rank communication operator duration analysis results (summarizing communication operators for each rank), and top communication operator information.
Generated when the analysis feature is set to mstx_sum and stored in the cluster_analysis_output/MstxSum directory.

Open this file using Jupyter Notebook or MindStudio Insight to view MSTX instrumentation information for cluster scenarios across framework, CANN, and device sides.
Generated when the analysis feature is set to slow_link and stored in the cluster_analysis_output/SlowLink directory.

Open this file using Jupyter Notebook or MindStudio Insight to view abnormal slow link data analysis results for cluster scenarios (summarizing all cluster links in charts) and cluster slow link total duration analysis results (displaying data for detected potential slow links).

`export_summary`

When -m export_summary is set, the following files are generated in the ASCEND_PROFILER_OUTPUT directory of each rank.

`api_statistic.csv`

Description:

Provides the API statistics of each rank based on the cluster profile data in db format.

Table fields

Field	Type	Description
API Name	TEXT	API name
Count	INTEGER	Call count
Total Time(us)	REAL	Total duration (μs)
Avg Time(us)	REAL	Average duration (μs)
Min Time(us)	REAL	Minimum duration (μs)
Max Time(us)	REAL	Maximum duration (μs)

`kernel_details.csv`

Description:

Provides the kernel details of each rank based on the cluster profile data in db format.

Table fields

Field	Type	Description
op_name	TEXT	Operator name
op_type	TEXT	Operator type
task_type	TEXT	Task type
task_duration	REAL	Task duration (μs)
input_shapes	TEXT	Input shape
output_shapes	TEXT	Output shape
block_dim	TEXT	Block dimension
input_data_types	TEXT	Input data type
output_data_types	TEXT	Output data type

Table Structures of Recipe Results and cluster_analysis.db Deliverables

cluster_step_trace_time.csv

cluster_communication_matrix.json

cluster_communication.json

compute_op_sum

ComputeOpAllRankStats

ComputeOpPerRankStatsByOpType

ComputeOpPerRankStatsByOpName

cann_api_sum

CannApiSum

CannApiSumRank

hccl_sum

HcclAllRankStats

HcclPerRankStats

HcclGroupNameMap

HcclTopOpStats

mstx_sum

MSTXAllFrameworkStats

MSTXAllCannStats

MSTXAllDeviceStats

MSTXMarkStats

communication_group_map

CommunicationGroupMapping

cluster_time_summary

ClusterTimeSummary

cluster_time_compare_summary

ClusterTimeCompareSummary

freq_analysis

FreeFrequencyRanks

AbnormalFrequencyRanks

ep_load_balance

EPTokensSummary

TopEPTokensInfo

mstx2commop

slow_rank

SlowRank

SlowOpStats

p2p_pairing

pp_chart

Instrumentation

StepTaskInfo

Communication

communication_group.json

stats.ipynb

export_summary

api_statistic.csv

kernel_details.csv

Table Structures of Recipe Results and `cluster_analysis.db` Deliverables

`cluster_step_trace_time.csv`

`cluster_communication_matrix.json`

`cluster_communication.json`

`compute_op_sum`

`ComputeOpAllRankStats`

`ComputeOpPerRankStatsByOpType`

`ComputeOpPerRankStatsByOpName`

`cann_api_sum`

`CannApiSum`

`CannApiSumRank`

`hccl_sum`

`HcclAllRankStats`

`HcclPerRankStats`

`HcclGroupNameMap`

`HcclTopOpStats`

`mstx_sum`

`MSTXAllFrameworkStats`

`MSTXAllCannStats`

`MSTXAllDeviceStats`

`MSTXMarkStats`

`communication_group_map`

`CommunicationGroupMapping`

`cluster_time_summary`

`ClusterTimeSummary`

`cluster_time_compare_summary`

`ClusterTimeCompareSummary`

`freq_analysis`

`FreeFrequencyRanks`

`AbnormalFrequencyRanks`

`ep_load_balance`

`EPTokensSummary`

`TopEPTokensInfo`

`mstx2commop`

`slow_rank`

`SlowRank`

`SlowOpStats`

`p2p_pairing`

`pp_chart`

`StepTaskInfo`

`communication_group.json`

`stats.ipynb`

`export_summary`

`api_statistic.csv`

`kernel_details.csv`