MindSpeed-MM/tests/ut/vtp · Ascend/MindSpeed-MM - AtomGit

ascend-robot[Feature] model fine-tuning supports time division multiplexing of DP on PP0

文件	最后提交记录	最后更新时间
__init__.py	[Feature] model fine-tuning supports time division multiplexing of DP on PP0 Co-authored-by: f00620112<fangminghao@huawei.com> # message auto-generated for no-merge-commit merge: !2527 merge 20260430_iter6 into master [Feature] model fine-tuning supports time division multiplexing of DP on PP0 Created-by: fangminghao Commit-by: f00620112 Merged-by: ascend-robot Description: https://gitcode.com/Ascend/MindSpeed-MM/issues/176 ## What this PR does / why we need it? [Feature] model fine-tuning supports time division multiplexing of DP on PP0 - 功能说明：针对边侧节点数量不足的情况，支持边侧DP小于云侧DP。 - 非对称DP实现逻辑：在对称DP场景下，通过卡分复用的方式，不同DP域的节点或卡处理各自DP域的数据。不同于对称DP，非对称DP场景下，边侧通过时分复用的方式处理多个DP域的数据，并分别与云侧进行通信。 - 修改概览： \| \| 修改点 \| 修改路径 \| 原路径 \| \| ----------------- \| ------------------------------------------------------------ \| ------------------------------------------------------------ \| ---------------------------------------------------- \| \| vit模型 \| 侵入式修改，vit模型层求和操作，适配vpp \| mindspeed_mm/models/vision/vision_encoders/vision_transformer_block.py \| / \| \| patch注册 \| / \| mindspeed_mm/patchs/patch_manager.py \| / \| \| 训练流水 \| 修改流水编排，适配多DP分时复用 \| mindspeed_mm/patchs/layerwise_disaggregated_training/schedules_patch.py \| megatron/core/pipeline_parallel/schedules.py \| \| 通信算子 \| 修改通信算子（recv_forward、recv_backward等），适配多DP分时复用 \| mindspeed_mm/patchs/layerwise_disaggregated_training/p2p_communication_patch.py \| megatron/core/pipeline_parallel/p2p_communication.py \| \| 训练初始化 \| 修改通信组初始化、并行初始化，适配多DP分时复用 \| mindspeed_mm/patchs/layerwise_disaggregated_training/parallel_state_patch.py \| megatron/core/parallel_state.py \| \| 模型初始化 \| 修改模型初始化和ckpt加载逻辑，适配首尾共部署 \| mindspeed_mm/patchs/layerwise_disaggregated_training/vlm_model_patch.py \| mindspeed_mm/models/vlm_model.py \| \| 训练后处理 \| 后处理通信优化 \| mindspeed_mm/patchs/layerwise_disaggregated_training/utils_patch.py mindspeed_mm/patchs/layerwise_disaggregated_training/distributed_data_parallel_patch.py \| megatron/core/utils.py \| \| 模型切分 \| ckpt切分方法hf_to_mm_ldt，适配首尾共部署 \| checkpoint/vlm_model/hf_to_mm_ldt.py \| / \| \| 校验前处理/后处理 \| 为了通过参数校验，对args进行前处理和后处理 \| mindspeed_mm/patchs/validate_args_patch.py \| / \| \| 删除文件 \| / \| mindspeed_mm/patchs/layerwise_disaggregated_training/training_patch.py mindspeed_mm/patchs/layerwise_disaggregated_training/utils.py \| / \| - 关于侵入式修改的说明：侵入式修改路径：mindspeed_mm/models/vision/vision_encoders/vision_transformer_block.py:298-301 原代码：对当前PP之前所有pp_rank上的VIT模型层数求和。 ```python previous_layer = sum(self.config.pipeline_num_layers[:pp_rank]) ``` 新代码：由于边云特性会开启VPP功能，此时`self.config.pipeline_num_layers`是一个二维数组，不能通过`sum`进行求和。修改点：补充了对`self.config.pipeline_num_layers`是不是二维数组的判断。在开启边云特性的情况下，`self.config.pipeline_num_layers`是二维数组，此处改为对`self.config.pipeline_num_layers[0]`进行求和。在不开启边云特性的情况下，`self.config.pipeline_num_layers`是一维数组，代码进入else分支走原生逻辑。因此此处修改不会影响原有代码逻辑。 ```python if isinstance(self.config.pipeline_num_layers[0], list): previous_layer = sum(self.config.pipeline_num_layers[0][:pp_rank]) else: previous_layer = sum(self.config.pipeline_num_layers[:pp_rank]) ``` ## Does this PR introduce any user-facing change? Please describe whether the PR will result in any user-facing usage changes. If there is related documentation, please specify its path. ## How was this patch tested? Please explain how to verify the correctness and effectiveness of this feature, as well as its usage constraints and limitations. See merge request: Ascend/MindSpeed-MM!2527	1 天前
test_vtp_p2p_communication.py	[Feature] model fine-tuning supports time division multiplexing of DP on PP0 Co-authored-by: f00620112<fangminghao@huawei.com> # message auto-generated for no-merge-commit merge: !2527 merge 20260430_iter6 into master [Feature] model fine-tuning supports time division multiplexing of DP on PP0 Created-by: fangminghao Commit-by: f00620112 Merged-by: ascend-robot Description: https://gitcode.com/Ascend/MindSpeed-MM/issues/176 ## What this PR does / why we need it? [Feature] model fine-tuning supports time division multiplexing of DP on PP0 - 功能说明：针对边侧节点数量不足的情况，支持边侧DP小于云侧DP。 - 非对称DP实现逻辑：在对称DP场景下，通过卡分复用的方式，不同DP域的节点或卡处理各自DP域的数据。不同于对称DP，非对称DP场景下，边侧通过时分复用的方式处理多个DP域的数据，并分别与云侧进行通信。 - 修改概览： \| \| 修改点 \| 修改路径 \| 原路径 \| \| ----------------- \| ------------------------------------------------------------ \| ------------------------------------------------------------ \| ---------------------------------------------------- \| \| vit模型 \| 侵入式修改，vit模型层求和操作，适配vpp \| mindspeed_mm/models/vision/vision_encoders/vision_transformer_block.py \| / \| \| patch注册 \| / \| mindspeed_mm/patchs/patch_manager.py \| / \| \| 训练流水 \| 修改流水编排，适配多DP分时复用 \| mindspeed_mm/patchs/layerwise_disaggregated_training/schedules_patch.py \| megatron/core/pipeline_parallel/schedules.py \| \| 通信算子 \| 修改通信算子（recv_forward、recv_backward等），适配多DP分时复用 \| mindspeed_mm/patchs/layerwise_disaggregated_training/p2p_communication_patch.py \| megatron/core/pipeline_parallel/p2p_communication.py \| \| 训练初始化 \| 修改通信组初始化、并行初始化，适配多DP分时复用 \| mindspeed_mm/patchs/layerwise_disaggregated_training/parallel_state_patch.py \| megatron/core/parallel_state.py \| \| 模型初始化 \| 修改模型初始化和ckpt加载逻辑，适配首尾共部署 \| mindspeed_mm/patchs/layerwise_disaggregated_training/vlm_model_patch.py \| mindspeed_mm/models/vlm_model.py \| \| 训练后处理 \| 后处理通信优化 \| mindspeed_mm/patchs/layerwise_disaggregated_training/utils_patch.py mindspeed_mm/patchs/layerwise_disaggregated_training/distributed_data_parallel_patch.py \| megatron/core/utils.py \| \| 模型切分 \| ckpt切分方法hf_to_mm_ldt，适配首尾共部署 \| checkpoint/vlm_model/hf_to_mm_ldt.py \| / \| \| 校验前处理/后处理 \| 为了通过参数校验，对args进行前处理和后处理 \| mindspeed_mm/patchs/validate_args_patch.py \| / \| \| 删除文件 \| / \| mindspeed_mm/patchs/layerwise_disaggregated_training/training_patch.py mindspeed_mm/patchs/layerwise_disaggregated_training/utils.py \| / \| - 关于侵入式修改的说明：侵入式修改路径：mindspeed_mm/models/vision/vision_encoders/vision_transformer_block.py:298-301 原代码：对当前PP之前所有pp_rank上的VIT模型层数求和。 ```python previous_layer = sum(self.config.pipeline_num_layers[:pp_rank]) ``` 新代码：由于边云特性会开启VPP功能，此时`self.config.pipeline_num_layers`是一个二维数组，不能通过`sum`进行求和。修改点：补充了对`self.config.pipeline_num_layers`是不是二维数组的判断。在开启边云特性的情况下，`self.config.pipeline_num_layers`是二维数组，此处改为对`self.config.pipeline_num_layers[0]`进行求和。在不开启边云特性的情况下，`self.config.pipeline_num_layers`是一维数组，代码进入else分支走原生逻辑。因此此处修改不会影响原有代码逻辑。 ```python if isinstance(self.config.pipeline_num_layers[0], list): previous_layer = sum(self.config.pipeline_num_layers[0][:pp_rank]) else: previous_layer = sum(self.config.pipeline_num_layers[:pp_rank]) ``` ## Does this PR introduce any user-facing change? Please describe whether the PR will result in any user-facing usage changes. If there is related documentation, please specify its path. ## How was this patch tested? Please explain how to verify the correctness and effectiveness of this feature, as well as its usage constraints and limitations. See merge request: Ascend/MindSpeed-MM!2527	1 天前
test_vtp_parallel_state_patch.py	[Feature] model fine-tuning supports time division multiplexing of DP on PP0 Co-authored-by: f00620112<fangminghao@huawei.com> # message auto-generated for no-merge-commit merge: !2527 merge 20260430_iter6 into master [Feature] model fine-tuning supports time division multiplexing of DP on PP0 Created-by: fangminghao Commit-by: f00620112 Merged-by: ascend-robot Description: https://gitcode.com/Ascend/MindSpeed-MM/issues/176 ## What this PR does / why we need it? [Feature] model fine-tuning supports time division multiplexing of DP on PP0 - 功能说明：针对边侧节点数量不足的情况，支持边侧DP小于云侧DP。 - 非对称DP实现逻辑：在对称DP场景下，通过卡分复用的方式，不同DP域的节点或卡处理各自DP域的数据。不同于对称DP，非对称DP场景下，边侧通过时分复用的方式处理多个DP域的数据，并分别与云侧进行通信。 - 修改概览： \| \| 修改点 \| 修改路径 \| 原路径 \| \| ----------------- \| ------------------------------------------------------------ \| ------------------------------------------------------------ \| ---------------------------------------------------- \| \| vit模型 \| 侵入式修改，vit模型层求和操作，适配vpp \| mindspeed_mm/models/vision/vision_encoders/vision_transformer_block.py \| / \| \| patch注册 \| / \| mindspeed_mm/patchs/patch_manager.py \| / \| \| 训练流水 \| 修改流水编排，适配多DP分时复用 \| mindspeed_mm/patchs/layerwise_disaggregated_training/schedules_patch.py \| megatron/core/pipeline_parallel/schedules.py \| \| 通信算子 \| 修改通信算子（recv_forward、recv_backward等），适配多DP分时复用 \| mindspeed_mm/patchs/layerwise_disaggregated_training/p2p_communication_patch.py \| megatron/core/pipeline_parallel/p2p_communication.py \| \| 训练初始化 \| 修改通信组初始化、并行初始化，适配多DP分时复用 \| mindspeed_mm/patchs/layerwise_disaggregated_training/parallel_state_patch.py \| megatron/core/parallel_state.py \| \| 模型初始化 \| 修改模型初始化和ckpt加载逻辑，适配首尾共部署 \| mindspeed_mm/patchs/layerwise_disaggregated_training/vlm_model_patch.py \| mindspeed_mm/models/vlm_model.py \| \| 训练后处理 \| 后处理通信优化 \| mindspeed_mm/patchs/layerwise_disaggregated_training/utils_patch.py mindspeed_mm/patchs/layerwise_disaggregated_training/distributed_data_parallel_patch.py \| megatron/core/utils.py \| \| 模型切分 \| ckpt切分方法hf_to_mm_ldt，适配首尾共部署 \| checkpoint/vlm_model/hf_to_mm_ldt.py \| / \| \| 校验前处理/后处理 \| 为了通过参数校验，对args进行前处理和后处理 \| mindspeed_mm/patchs/validate_args_patch.py \| / \| \| 删除文件 \| / \| mindspeed_mm/patchs/layerwise_disaggregated_training/training_patch.py mindspeed_mm/patchs/layerwise_disaggregated_training/utils.py \| / \| - 关于侵入式修改的说明：侵入式修改路径：mindspeed_mm/models/vision/vision_encoders/vision_transformer_block.py:298-301 原代码：对当前PP之前所有pp_rank上的VIT模型层数求和。 ```python previous_layer = sum(self.config.pipeline_num_layers[:pp_rank]) ``` 新代码：由于边云特性会开启VPP功能，此时`self.config.pipeline_num_layers`是一个二维数组，不能通过`sum`进行求和。修改点：补充了对`self.config.pipeline_num_layers`是不是二维数组的判断。在开启边云特性的情况下，`self.config.pipeline_num_layers`是二维数组，此处改为对`self.config.pipeline_num_layers[0]`进行求和。在不开启边云特性的情况下，`self.config.pipeline_num_layers`是一维数组，代码进入else分支走原生逻辑。因此此处修改不会影响原有代码逻辑。 ```python if isinstance(self.config.pipeline_num_layers[0], list): previous_layer = sum(self.config.pipeline_num_layers[0][:pp_rank]) else: previous_layer = sum(self.config.pipeline_num_layers[:pp_rank]) ``` ## Does this PR introduce any user-facing change? Please describe whether the PR will result in any user-facing usage changes. If there is related documentation, please specify its path. ## How was this patch tested? Please explain how to verify the correctness and effectiveness of this feature, as well as its usage constraints and limitations. See merge request: Ascend/MindSpeed-MM!2527	1 天前
test_vtp_patch_manager.py	[Feature] model fine-tuning supports time division multiplexing of DP on PP0 Co-authored-by: f00620112<fangminghao@huawei.com> # message auto-generated for no-merge-commit merge: !2527 merge 20260430_iter6 into master [Feature] model fine-tuning supports time division multiplexing of DP on PP0 Created-by: fangminghao Commit-by: f00620112 Merged-by: ascend-robot Description: https://gitcode.com/Ascend/MindSpeed-MM/issues/176 ## What this PR does / why we need it? [Feature] model fine-tuning supports time division multiplexing of DP on PP0 - 功能说明：针对边侧节点数量不足的情况，支持边侧DP小于云侧DP。 - 非对称DP实现逻辑：在对称DP场景下，通过卡分复用的方式，不同DP域的节点或卡处理各自DP域的数据。不同于对称DP，非对称DP场景下，边侧通过时分复用的方式处理多个DP域的数据，并分别与云侧进行通信。 - 修改概览： \| \| 修改点 \| 修改路径 \| 原路径 \| \| ----------------- \| ------------------------------------------------------------ \| ------------------------------------------------------------ \| ---------------------------------------------------- \| \| vit模型 \| 侵入式修改，vit模型层求和操作，适配vpp \| mindspeed_mm/models/vision/vision_encoders/vision_transformer_block.py \| / \| \| patch注册 \| / \| mindspeed_mm/patchs/patch_manager.py \| / \| \| 训练流水 \| 修改流水编排，适配多DP分时复用 \| mindspeed_mm/patchs/layerwise_disaggregated_training/schedules_patch.py \| megatron/core/pipeline_parallel/schedules.py \| \| 通信算子 \| 修改通信算子（recv_forward、recv_backward等），适配多DP分时复用 \| mindspeed_mm/patchs/layerwise_disaggregated_training/p2p_communication_patch.py \| megatron/core/pipeline_parallel/p2p_communication.py \| \| 训练初始化 \| 修改通信组初始化、并行初始化，适配多DP分时复用 \| mindspeed_mm/patchs/layerwise_disaggregated_training/parallel_state_patch.py \| megatron/core/parallel_state.py \| \| 模型初始化 \| 修改模型初始化和ckpt加载逻辑，适配首尾共部署 \| mindspeed_mm/patchs/layerwise_disaggregated_training/vlm_model_patch.py \| mindspeed_mm/models/vlm_model.py \| \| 训练后处理 \| 后处理通信优化 \| mindspeed_mm/patchs/layerwise_disaggregated_training/utils_patch.py mindspeed_mm/patchs/layerwise_disaggregated_training/distributed_data_parallel_patch.py \| megatron/core/utils.py \| \| 模型切分 \| ckpt切分方法hf_to_mm_ldt，适配首尾共部署 \| checkpoint/vlm_model/hf_to_mm_ldt.py \| / \| \| 校验前处理/后处理 \| 为了通过参数校验，对args进行前处理和后处理 \| mindspeed_mm/patchs/validate_args_patch.py \| / \| \| 删除文件 \| / \| mindspeed_mm/patchs/layerwise_disaggregated_training/training_patch.py mindspeed_mm/patchs/layerwise_disaggregated_training/utils.py \| / \| - 关于侵入式修改的说明：侵入式修改路径：mindspeed_mm/models/vision/vision_encoders/vision_transformer_block.py:298-301 原代码：对当前PP之前所有pp_rank上的VIT模型层数求和。 ```python previous_layer = sum(self.config.pipeline_num_layers[:pp_rank]) ``` 新代码：由于边云特性会开启VPP功能，此时`self.config.pipeline_num_layers`是一个二维数组，不能通过`sum`进行求和。修改点：补充了对`self.config.pipeline_num_layers`是不是二维数组的判断。在开启边云特性的情况下，`self.config.pipeline_num_layers`是二维数组，此处改为对`self.config.pipeline_num_layers[0]`进行求和。在不开启边云特性的情况下，`self.config.pipeline_num_layers`是一维数组，代码进入else分支走原生逻辑。因此此处修改不会影响原有代码逻辑。 ```python if isinstance(self.config.pipeline_num_layers[0], list): previous_layer = sum(self.config.pipeline_num_layers[0][:pp_rank]) else: previous_layer = sum(self.config.pipeline_num_layers[:pp_rank]) ``` ## Does this PR introduce any user-facing change? Please describe whether the PR will result in any user-facing usage changes. If there is related documentation, please specify its path. ## How was this patch tested? Please explain how to verify the correctness and effectiveness of this feature, as well as its usage constraints and limitations. See merge request: Ascend/MindSpeed-MM!2527	1 天前
test_vtp_schedules.py	[Feature] model fine-tuning supports time division multiplexing of DP on PP0 Co-authored-by: f00620112<fangminghao@huawei.com> # message auto-generated for no-merge-commit merge: !2527 merge 20260430_iter6 into master [Feature] model fine-tuning supports time division multiplexing of DP on PP0 Created-by: fangminghao Commit-by: f00620112 Merged-by: ascend-robot Description: https://gitcode.com/Ascend/MindSpeed-MM/issues/176 ## What this PR does / why we need it? [Feature] model fine-tuning supports time division multiplexing of DP on PP0 - 功能说明：针对边侧节点数量不足的情况，支持边侧DP小于云侧DP。 - 非对称DP实现逻辑：在对称DP场景下，通过卡分复用的方式，不同DP域的节点或卡处理各自DP域的数据。不同于对称DP，非对称DP场景下，边侧通过时分复用的方式处理多个DP域的数据，并分别与云侧进行通信。 - 修改概览： \| \| 修改点 \| 修改路径 \| 原路径 \| \| ----------------- \| ------------------------------------------------------------ \| ------------------------------------------------------------ \| ---------------------------------------------------- \| \| vit模型 \| 侵入式修改，vit模型层求和操作，适配vpp \| mindspeed_mm/models/vision/vision_encoders/vision_transformer_block.py \| / \| \| patch注册 \| / \| mindspeed_mm/patchs/patch_manager.py \| / \| \| 训练流水 \| 修改流水编排，适配多DP分时复用 \| mindspeed_mm/patchs/layerwise_disaggregated_training/schedules_patch.py \| megatron/core/pipeline_parallel/schedules.py \| \| 通信算子 \| 修改通信算子（recv_forward、recv_backward等），适配多DP分时复用 \| mindspeed_mm/patchs/layerwise_disaggregated_training/p2p_communication_patch.py \| megatron/core/pipeline_parallel/p2p_communication.py \| \| 训练初始化 \| 修改通信组初始化、并行初始化，适配多DP分时复用 \| mindspeed_mm/patchs/layerwise_disaggregated_training/parallel_state_patch.py \| megatron/core/parallel_state.py \| \| 模型初始化 \| 修改模型初始化和ckpt加载逻辑，适配首尾共部署 \| mindspeed_mm/patchs/layerwise_disaggregated_training/vlm_model_patch.py \| mindspeed_mm/models/vlm_model.py \| \| 训练后处理 \| 后处理通信优化 \| mindspeed_mm/patchs/layerwise_disaggregated_training/utils_patch.py mindspeed_mm/patchs/layerwise_disaggregated_training/distributed_data_parallel_patch.py \| megatron/core/utils.py \| \| 模型切分 \| ckpt切分方法hf_to_mm_ldt，适配首尾共部署 \| checkpoint/vlm_model/hf_to_mm_ldt.py \| / \| \| 校验前处理/后处理 \| 为了通过参数校验，对args进行前处理和后处理 \| mindspeed_mm/patchs/validate_args_patch.py \| / \| \| 删除文件 \| / \| mindspeed_mm/patchs/layerwise_disaggregated_training/training_patch.py mindspeed_mm/patchs/layerwise_disaggregated_training/utils.py \| / \| - 关于侵入式修改的说明：侵入式修改路径：mindspeed_mm/models/vision/vision_encoders/vision_transformer_block.py:298-301 原代码：对当前PP之前所有pp_rank上的VIT模型层数求和。 ```python previous_layer = sum(self.config.pipeline_num_layers[:pp_rank]) ``` 新代码：由于边云特性会开启VPP功能，此时`self.config.pipeline_num_layers`是一个二维数组，不能通过`sum`进行求和。修改点：补充了对`self.config.pipeline_num_layers`是不是二维数组的判断。在开启边云特性的情况下，`self.config.pipeline_num_layers`是二维数组，此处改为对`self.config.pipeline_num_layers[0]`进行求和。在不开启边云特性的情况下，`self.config.pipeline_num_layers`是一维数组，代码进入else分支走原生逻辑。因此此处修改不会影响原有代码逻辑。 ```python if isinstance(self.config.pipeline_num_layers[0], list): previous_layer = sum(self.config.pipeline_num_layers[0][:pp_rank]) else: previous_layer = sum(self.config.pipeline_num_layers[:pp_rank]) ``` ## Does this PR introduce any user-facing change? Please describe whether the PR will result in any user-facing usage changes. If there is related documentation, please specify its path. ## How was this patch tested? Please explain how to verify the correctness and effectiveness of this feature, as well as its usage constraints and limitations. See merge request: Ascend/MindSpeed-MM!2527	1 天前
test_vtp_utils.py	[Feature] model fine-tuning supports time division multiplexing of DP on PP0 Co-authored-by: f00620112<fangminghao@huawei.com> # message auto-generated for no-merge-commit merge: !2527 merge 20260430_iter6 into master [Feature] model fine-tuning supports time division multiplexing of DP on PP0 Created-by: fangminghao Commit-by: f00620112 Merged-by: ascend-robot Description: https://gitcode.com/Ascend/MindSpeed-MM/issues/176 ## What this PR does / why we need it? [Feature] model fine-tuning supports time division multiplexing of DP on PP0 - 功能说明：针对边侧节点数量不足的情况，支持边侧DP小于云侧DP。 - 非对称DP实现逻辑：在对称DP场景下，通过卡分复用的方式，不同DP域的节点或卡处理各自DP域的数据。不同于对称DP，非对称DP场景下，边侧通过时分复用的方式处理多个DP域的数据，并分别与云侧进行通信。 - 修改概览： \| \| 修改点 \| 修改路径 \| 原路径 \| \| ----------------- \| ------------------------------------------------------------ \| ------------------------------------------------------------ \| ---------------------------------------------------- \| \| vit模型 \| 侵入式修改，vit模型层求和操作，适配vpp \| mindspeed_mm/models/vision/vision_encoders/vision_transformer_block.py \| / \| \| patch注册 \| / \| mindspeed_mm/patchs/patch_manager.py \| / \| \| 训练流水 \| 修改流水编排，适配多DP分时复用 \| mindspeed_mm/patchs/layerwise_disaggregated_training/schedules_patch.py \| megatron/core/pipeline_parallel/schedules.py \| \| 通信算子 \| 修改通信算子（recv_forward、recv_backward等），适配多DP分时复用 \| mindspeed_mm/patchs/layerwise_disaggregated_training/p2p_communication_patch.py \| megatron/core/pipeline_parallel/p2p_communication.py \| \| 训练初始化 \| 修改通信组初始化、并行初始化，适配多DP分时复用 \| mindspeed_mm/patchs/layerwise_disaggregated_training/parallel_state_patch.py \| megatron/core/parallel_state.py \| \| 模型初始化 \| 修改模型初始化和ckpt加载逻辑，适配首尾共部署 \| mindspeed_mm/patchs/layerwise_disaggregated_training/vlm_model_patch.py \| mindspeed_mm/models/vlm_model.py \| \| 训练后处理 \| 后处理通信优化 \| mindspeed_mm/patchs/layerwise_disaggregated_training/utils_patch.py mindspeed_mm/patchs/layerwise_disaggregated_training/distributed_data_parallel_patch.py \| megatron/core/utils.py \| \| 模型切分 \| ckpt切分方法hf_to_mm_ldt，适配首尾共部署 \| checkpoint/vlm_model/hf_to_mm_ldt.py \| / \| \| 校验前处理/后处理 \| 为了通过参数校验，对args进行前处理和后处理 \| mindspeed_mm/patchs/validate_args_patch.py \| / \| \| 删除文件 \| / \| mindspeed_mm/patchs/layerwise_disaggregated_training/training_patch.py mindspeed_mm/patchs/layerwise_disaggregated_training/utils.py \| / \| - 关于侵入式修改的说明：侵入式修改路径：mindspeed_mm/models/vision/vision_encoders/vision_transformer_block.py:298-301 原代码：对当前PP之前所有pp_rank上的VIT模型层数求和。 ```python previous_layer = sum(self.config.pipeline_num_layers[:pp_rank]) ``` 新代码：由于边云特性会开启VPP功能，此时`self.config.pipeline_num_layers`是一个二维数组，不能通过`sum`进行求和。修改点：补充了对`self.config.pipeline_num_layers`是不是二维数组的判断。在开启边云特性的情况下，`self.config.pipeline_num_layers`是二维数组，此处改为对`self.config.pipeline_num_layers[0]`进行求和。在不开启边云特性的情况下，`self.config.pipeline_num_layers`是一维数组，代码进入else分支走原生逻辑。因此此处修改不会影响原有代码逻辑。 ```python if isinstance(self.config.pipeline_num_layers[0], list): previous_layer = sum(self.config.pipeline_num_layers[0][:pp_rank]) else: previous_layer = sum(self.config.pipeline_num_layers[:pp_rank]) ``` ## Does this PR introduce any user-facing change? Please describe whether the PR will result in any user-facing usage changes. If there is related documentation, please specify its path. ## How was this patch tested? Please explain how to verify the correctness and effectiveness of this feature, as well as its usage constraints and limitations. See merge request: Ascend/MindSpeed-MM!2527	1 天前