Configuring Layer Mapping for Hierarchical Model Visualization

Usage Scenario

In cross-suite comparison within the same framework (e.g., PyTorch DeepSpeed vs. Megatron) or cross-framework comparison (e.g., PyTorch vs. MindSpore), some model layers and layer names may differ due to code implementation differences, making it impossible to match them directly. Therefore, layer name mapping is required for comparison.

Module Naming Rules

Some node names are excessively long—for example, Module.module.module.language_model.embedding.Embedding.forward.0—making them difficult to display fully on graph nodes. As a result, forward or backward information may be omitted. To address this, the prefix Module is removed from the displayed node name, and the forward or backward information is moved to the second position in the name string.

Naming Format

{Module}.{module_name}.{class_name}.{forward/backward}.{number_of_calls}

Layer mapping is mainly used to map module names.

Naming Example

Module.module.Float16Module.forward.0 -----> Module{Module}.module{module_name}.Float16Module{class_name}.forward.0{number_of_calls}
Module.module.module.GPTModel.forward.0 -----> Module{Module}.module.module{module_name}.GPTModel{class_name}.forward.0{number_of_calls}
Module.module.module.language_model.TransformerLanguageModel.forward.0 -----> Module{Module}.module.module.language_model{module_name}.TransformerLanguageModel{class_name}.forward.0{number_of_calls}
Module.module.module.language_model.embedding.Embedding.forward.0 -----> Module{Module}.module.module.language_model.embedding{module_name}.Embedding{class_name}.forward.0{number_of_calls}

As shown in the preceding examples, module_name grows longer as the model level deepens. For the embedding layer, module_name combines thelanguage_model layer with its upper and top-level modules.

Example

As shown in the figure, the left part represents the NPU model and the right part the GPU model. Due to differences in code implementation, the model levels and level names differ, preventing node matching. Gray nodes in the figure indicate unmatched nodes.

Analyzing the Figure

Even when the same model is implemented with different suites or frameworks, the level relationships and level names may differ. Nevertheless, matching relationships can still be identified from the node names in the figure. For instance, the embedding layer is named xxx_embedding rather than xxx_norm in the code. The node names retain embedding-related information, and the level relationships remain similar.

According to the analysis, the node matching relationships are as follows.

Note that you only need to pay attention to the differences in module_name.

NPU Node Name	GPU Node Name	module_name Difference
Module.module.Float16Module.forward.0	Module.model.FloatModule.forward.0	`module` for the NPU; `model` for the GPU.
Module.module.module.GPTModel.forward.0	Module.model.module.GPT2Model.forward.0	`module` for both NPU and GPU.
Module.module.module.language_model.TransformerLanguageModel.forward.0	None	The NPU has one more layer.
Module.module.module.language_model.embedding.Embedding.forward.0	Module.module.module.embedding.LanguageModelEmbedding.forward.0	`language_model.embedding` for the NPU; `embedding` for the GPU.
Module.module.module.language_model.rotary_pos_emb.RotaryEmbedding.forward.0	Module.module.module.rotary_pos_emb.RotaryEmbedding.forward.0	`language_model.rotary_pos_emb` for the NPU; `rotary_pos_emb` for the GPU.
Module.module.module.language_model.encoder.ParallelTransformer.forward.0	Module.module.module.decoder.TransformerBlock.forward.0	`language_model.encoder` for the NPU; `decoder` for the GPU.
Module.module.module.language_model.encoder.layers.0.ParallelTransformerLayer.forward.0	Module.module.module.decoder.layers.0.TransformerLayer.forward.0	`layers` for both NPU and GPU, with difference only between parent layers.

Creating the `layer_mapping` Configuration File

Prepare a file named mapping.yaml and establish the mapping relationship of module_name.

Top-Level Module Mapping

Module.module.Float16Module.forward.0 (for the NPU) and Module.model.FloatModule.forward.0 (for the GPU) are at the top layer of a graph. You need to perform the following configurations.

TopLayer:
  module: model

Other Module Mapping

Configure the submodules under module. Although the class names on the two sides are different (GPTModel for the NPU and GPT2Model for the GPU), you only need to use the NPU-side class name (i.e., the class name on the left side of the graph) for configuration—no need to consider the class name on the right side.

Cross-layer configuration is involved here. The NPU has an additional language_model layer, which is used as the prefix of the embedding layer, rotary_pos_emb layer, and encoder layer.

GPTModel:
    language_model.embedding: embedding
    language_model.rotary_pos_emb: rotary_pos_emb
    language_model.encoder: decoder

Then, check the submodules under theModule.module.module.language_model.encoder.ParallelTransformer.forward.0 layer.

The layers beneath this layer are named layers on both the NPU and GPU. Since the layer names match, no configuration is required.

Effect Viewing

Run the following command to specify -lm:

msprobe graph_visualize -i ./compare.json -o ./output -lm ./mapping.yaml

It can be seen that all the layers configured in the mapping.yaml file are matched, except the language_model layer (which is only on the NPU and has no matching layer on the GPU).

Further Configuration

If unmatched nodes appear during node expansion, continue by configuring the mapping.yaml file.

According to the preceding analysis, the node mapping is as follows:

NPU Node Name	GPU Node Name	Difference
Module.module.module.language_model.encoder.layers.0.mlp.dense_h_to_4h.ColumnParallelLinear.forward.0	Module.module.module.decoder.layers.0.mlp.linear_fc1.TELayerNormColumnParallelLinear.forward.0	`dense_h_to_4h` for the NPU; `linear_fc1` for the GPU.
Module.module.module.language_model.encoder.layers.0.mlp.dense_4h_to_h.RowParallelLinear.forward.0	Module.module.module.decoder.layers.0.mlp.linear_fc2.TERowParallelLinear.forward.0	`dense_4h_to_h` for the NPU; `linear_fc2` for the GPU.

Add the following configuration to the mapping.yaml file:

TopLayer:
  module: model

GPTModel:
    language_model.embedding: embedding
    language_model.rotary_pos_emb: rotary_pos_emb
    language_model.encoder: decoder

ParallelMLP:
    dense_h_to_4h: linear_fc1
    dense_4h_to_h: linear_fc2

Check the result, and it can be seen that nodes are successfully matched.