文件最后提交记录最后更新时间
[Feature] model fine-tuning supports time division multiplexing of DP on PP0 Co-authored-by: f00620112<fangminghao@huawei.com> # message auto-generated for no-merge-commit merge: !2527 merge 20260430_iter6 into master [Feature] model fine-tuning supports time division multiplexing of DP on PP0 Created-by: fangminghao Commit-by: f00620112 Merged-by: ascend-robot Description: https://gitcode.com/Ascend/MindSpeed-MM/issues/176 ## What this PR does / why we need it? [Feature] model fine-tuning supports time division multiplexing of DP on PP0 - 功能说明:针对边侧节点数量不足的情况,支持边侧DP小于云侧DP。 - 非对称DP实现逻辑:在对称DP场景下,通过卡分复用的方式,不同DP域的节点或卡处理各自DP域的数据。不同于对称DP,非对称DP场景下,边侧通过时分复用的方式处理多个DP域的数据,并分别与云侧进行通信。 - 修改概览: | | 修改点 | 修改路径 | 原路径 | | ----------------- | ------------------------------------------------------------ | ------------------------------------------------------------ | ---------------------------------------------------- | | vit模型 | 侵入式修改,vit模型层求和操作,适配vpp | mindspeed_mm/models/vision/vision_encoders/vision_transformer_block.py | / | | patch注册 | / | mindspeed_mm/patchs/patch_manager.py | / | | 训练流水 | 修改流水编排,适配多DP分时复用 | mindspeed_mm/patchs/layerwise_disaggregated_training/schedules_patch.py | megatron/core/pipeline_parallel/schedules.py | | 通信算子 | 修改通信算子(recv_forward、recv_backward等),适配多DP分时复用 | mindspeed_mm/patchs/layerwise_disaggregated_training/p2p_communication_patch.py | megatron/core/pipeline_parallel/p2p_communication.py | | 训练初始化 | 修改通信组初始化、并行初始化,适配多DP分时复用 | mindspeed_mm/patchs/layerwise_disaggregated_training/parallel_state_patch.py | megatron/core/parallel_state.py | | 模型初始化 | 修改模型初始化和ckpt加载逻辑,适配首尾共部署 | mindspeed_mm/patchs/layerwise_disaggregated_training/vlm_model_patch.py | mindspeed_mm/models/vlm_model.py | | 训练后处理 | 后处理通信优化 | mindspeed_mm/patchs/layerwise_disaggregated_training/utils_patch.py mindspeed_mm/patchs/layerwise_disaggregated_training/distributed_data_parallel_patch.py | megatron/core/utils.py | | 模型切分 | ckpt切分方法hf_to_mm_ldt,适配首尾共部署 | checkpoint/vlm_model/hf_to_mm_ldt.py | / | | 校验前处理/后处理 | 为了通过参数校验,对args进行前处理和后处理 | mindspeed_mm/patchs/validate_args_patch.py | / | | 删除文件 | / | mindspeed_mm/patchs/layerwise_disaggregated_training/training_patch.py mindspeed_mm/patchs/layerwise_disaggregated_training/utils.py | / | - 关于侵入式修改的说明: 侵入式修改路径:mindspeed_mm/models/vision/vision_encoders/vision_transformer_block.py:298-301 原代码:对当前PP之前所有pp_rank上的VIT模型层数求和。 ```python previous_layer = sum(self.config.pipeline_num_layers[:pp_rank]) ``` 新代码:由于边云特性会开启VPP功能,此时self.config.pipeline_num_layers是一个二维数组,不能通过sum进行求和。 修改点:补充了对self.config.pipeline_num_layers是不是二维数组的判断。在开启边云特性的情况下,self.config.pipeline_num_layers是二维数组,此处改为对self.config.pipeline_num_layers[0]进行求和。在不开启边云特性的情况下,self.config.pipeline_num_layers是一维数组,代码进入else分支走原生逻辑。因此此处修改不会影响原有代码逻辑。 ```python if isinstance(self.config.pipeline_num_layers[0], list): previous_layer = sum(self.config.pipeline_num_layers[0][:pp_rank]) else: previous_layer = sum(self.config.pipeline_num_layers[:pp_rank]) ``` ## Does this PR introduce any user-facing change? Please describe whether the PR will result in any user-facing usage changes. If there is related documentation, please specify its path. ## How was this patch tested? Please explain how to verify the correctness and effectiveness of this feature, as well as its usage constraints and limitations. See merge request: Ascend/MindSpeed-MM!25271 天前
[Feature] model fine-tuning supports time division multiplexing of DP on PP0 Co-authored-by: f00620112<fangminghao@huawei.com> # message auto-generated for no-merge-commit merge: !2527 merge 20260430_iter6 into master [Feature] model fine-tuning supports time division multiplexing of DP on PP0 Created-by: fangminghao Commit-by: f00620112 Merged-by: ascend-robot Description: https://gitcode.com/Ascend/MindSpeed-MM/issues/176 ## What this PR does / why we need it? [Feature] model fine-tuning supports time division multiplexing of DP on PP0 - 功能说明:针对边侧节点数量不足的情况,支持边侧DP小于云侧DP。 - 非对称DP实现逻辑:在对称DP场景下,通过卡分复用的方式,不同DP域的节点或卡处理各自DP域的数据。不同于对称DP,非对称DP场景下,边侧通过时分复用的方式处理多个DP域的数据,并分别与云侧进行通信。 - 修改概览: | | 修改点 | 修改路径 | 原路径 | | ----------------- | ------------------------------------------------------------ | ------------------------------------------------------------ | ---------------------------------------------------- | | vit模型 | 侵入式修改,vit模型层求和操作,适配vpp | mindspeed_mm/models/vision/vision_encoders/vision_transformer_block.py | / | | patch注册 | / | mindspeed_mm/patchs/patch_manager.py | / | | 训练流水 | 修改流水编排,适配多DP分时复用 | mindspeed_mm/patchs/layerwise_disaggregated_training/schedules_patch.py | megatron/core/pipeline_parallel/schedules.py | | 通信算子 | 修改通信算子(recv_forward、recv_backward等),适配多DP分时复用 | mindspeed_mm/patchs/layerwise_disaggregated_training/p2p_communication_patch.py | megatron/core/pipeline_parallel/p2p_communication.py | | 训练初始化 | 修改通信组初始化、并行初始化,适配多DP分时复用 | mindspeed_mm/patchs/layerwise_disaggregated_training/parallel_state_patch.py | megatron/core/parallel_state.py | | 模型初始化 | 修改模型初始化和ckpt加载逻辑,适配首尾共部署 | mindspeed_mm/patchs/layerwise_disaggregated_training/vlm_model_patch.py | mindspeed_mm/models/vlm_model.py | | 训练后处理 | 后处理通信优化 | mindspeed_mm/patchs/layerwise_disaggregated_training/utils_patch.py mindspeed_mm/patchs/layerwise_disaggregated_training/distributed_data_parallel_patch.py | megatron/core/utils.py | | 模型切分 | ckpt切分方法hf_to_mm_ldt,适配首尾共部署 | checkpoint/vlm_model/hf_to_mm_ldt.py | / | | 校验前处理/后处理 | 为了通过参数校验,对args进行前处理和后处理 | mindspeed_mm/patchs/validate_args_patch.py | / | | 删除文件 | / | mindspeed_mm/patchs/layerwise_disaggregated_training/training_patch.py mindspeed_mm/patchs/layerwise_disaggregated_training/utils.py | / | - 关于侵入式修改的说明: 侵入式修改路径:mindspeed_mm/models/vision/vision_encoders/vision_transformer_block.py:298-301 原代码:对当前PP之前所有pp_rank上的VIT模型层数求和。 ```python previous_layer = sum(self.config.pipeline_num_layers[:pp_rank]) ``` 新代码:由于边云特性会开启VPP功能,此时self.config.pipeline_num_layers是一个二维数组,不能通过sum进行求和。 修改点:补充了对self.config.pipeline_num_layers是不是二维数组的判断。在开启边云特性的情况下,self.config.pipeline_num_layers是二维数组,此处改为对self.config.pipeline_num_layers[0]进行求和。在不开启边云特性的情况下,self.config.pipeline_num_layers是一维数组,代码进入else分支走原生逻辑。因此此处修改不会影响原有代码逻辑。 ```python if isinstance(self.config.pipeline_num_layers[0], list): previous_layer = sum(self.config.pipeline_num_layers[0][:pp_rank]) else: previous_layer = sum(self.config.pipeline_num_layers[:pp_rank]) ``` ## Does this PR introduce any user-facing change? Please describe whether the PR will result in any user-facing usage changes. If there is related documentation, please specify its path. ## How was this patch tested? Please explain how to verify the correctness and effectiveness of this feature, as well as its usage constraints and limitations. See merge request: Ascend/MindSpeed-MM!25271 天前
[Feature] model fine-tuning supports time division multiplexing of DP on PP0 Co-authored-by: f00620112<fangminghao@huawei.com> # message auto-generated for no-merge-commit merge: !2527 merge 20260430_iter6 into master [Feature] model fine-tuning supports time division multiplexing of DP on PP0 Created-by: fangminghao Commit-by: f00620112 Merged-by: ascend-robot Description: https://gitcode.com/Ascend/MindSpeed-MM/issues/176 ## What this PR does / why we need it? [Feature] model fine-tuning supports time division multiplexing of DP on PP0 - 功能说明:针对边侧节点数量不足的情况,支持边侧DP小于云侧DP。 - 非对称DP实现逻辑:在对称DP场景下,通过卡分复用的方式,不同DP域的节点或卡处理各自DP域的数据。不同于对称DP,非对称DP场景下,边侧通过时分复用的方式处理多个DP域的数据,并分别与云侧进行通信。 - 修改概览: | | 修改点 | 修改路径 | 原路径 | | ----------------- | ------------------------------------------------------------ | ------------------------------------------------------------ | ---------------------------------------------------- | | vit模型 | 侵入式修改,vit模型层求和操作,适配vpp | mindspeed_mm/models/vision/vision_encoders/vision_transformer_block.py | / | | patch注册 | / | mindspeed_mm/patchs/patch_manager.py | / | | 训练流水 | 修改流水编排,适配多DP分时复用 | mindspeed_mm/patchs/layerwise_disaggregated_training/schedules_patch.py | megatron/core/pipeline_parallel/schedules.py | | 通信算子 | 修改通信算子(recv_forward、recv_backward等),适配多DP分时复用 | mindspeed_mm/patchs/layerwise_disaggregated_training/p2p_communication_patch.py | megatron/core/pipeline_parallel/p2p_communication.py | | 训练初始化 | 修改通信组初始化、并行初始化,适配多DP分时复用 | mindspeed_mm/patchs/layerwise_disaggregated_training/parallel_state_patch.py | megatron/core/parallel_state.py | | 模型初始化 | 修改模型初始化和ckpt加载逻辑,适配首尾共部署 | mindspeed_mm/patchs/layerwise_disaggregated_training/vlm_model_patch.py | mindspeed_mm/models/vlm_model.py | | 训练后处理 | 后处理通信优化 | mindspeed_mm/patchs/layerwise_disaggregated_training/utils_patch.py mindspeed_mm/patchs/layerwise_disaggregated_training/distributed_data_parallel_patch.py | megatron/core/utils.py | | 模型切分 | ckpt切分方法hf_to_mm_ldt,适配首尾共部署 | checkpoint/vlm_model/hf_to_mm_ldt.py | / | | 校验前处理/后处理 | 为了通过参数校验,对args进行前处理和后处理 | mindspeed_mm/patchs/validate_args_patch.py | / | | 删除文件 | / | mindspeed_mm/patchs/layerwise_disaggregated_training/training_patch.py mindspeed_mm/patchs/layerwise_disaggregated_training/utils.py | / | - 关于侵入式修改的说明: 侵入式修改路径:mindspeed_mm/models/vision/vision_encoders/vision_transformer_block.py:298-301 原代码:对当前PP之前所有pp_rank上的VIT模型层数求和。 ```python previous_layer = sum(self.config.pipeline_num_layers[:pp_rank]) ``` 新代码:由于边云特性会开启VPP功能,此时self.config.pipeline_num_layers是一个二维数组,不能通过sum进行求和。 修改点:补充了对self.config.pipeline_num_layers是不是二维数组的判断。在开启边云特性的情况下,self.config.pipeline_num_layers是二维数组,此处改为对self.config.pipeline_num_layers[0]进行求和。在不开启边云特性的情况下,self.config.pipeline_num_layers是一维数组,代码进入else分支走原生逻辑。因此此处修改不会影响原有代码逻辑。 ```python if isinstance(self.config.pipeline_num_layers[0], list): previous_layer = sum(self.config.pipeline_num_layers[0][:pp_rank]) else: previous_layer = sum(self.config.pipeline_num_layers[:pp_rank]) ``` ## Does this PR introduce any user-facing change? Please describe whether the PR will result in any user-facing usage changes. If there is related documentation, please specify its path. ## How was this patch tested? Please explain how to verify the correctness and effectiveness of this feature, as well as its usage constraints and limitations. See merge request: Ascend/MindSpeed-MM!25271 天前
[Feature] model fine-tuning supports time division multiplexing of DP on PP0 Co-authored-by: f00620112<fangminghao@huawei.com> # message auto-generated for no-merge-commit merge: !2527 merge 20260430_iter6 into master [Feature] model fine-tuning supports time division multiplexing of DP on PP0 Created-by: fangminghao Commit-by: f00620112 Merged-by: ascend-robot Description: https://gitcode.com/Ascend/MindSpeed-MM/issues/176 ## What this PR does / why we need it? [Feature] model fine-tuning supports time division multiplexing of DP on PP0 - 功能说明:针对边侧节点数量不足的情况,支持边侧DP小于云侧DP。 - 非对称DP实现逻辑:在对称DP场景下,通过卡分复用的方式,不同DP域的节点或卡处理各自DP域的数据。不同于对称DP,非对称DP场景下,边侧通过时分复用的方式处理多个DP域的数据,并分别与云侧进行通信。 - 修改概览: | | 修改点 | 修改路径 | 原路径 | | ----------------- | ------------------------------------------------------------ | ------------------------------------------------------------ | ---------------------------------------------------- | | vit模型 | 侵入式修改,vit模型层求和操作,适配vpp | mindspeed_mm/models/vision/vision_encoders/vision_transformer_block.py | / | | patch注册 | / | mindspeed_mm/patchs/patch_manager.py | / | | 训练流水 | 修改流水编排,适配多DP分时复用 | mindspeed_mm/patchs/layerwise_disaggregated_training/schedules_patch.py | megatron/core/pipeline_parallel/schedules.py | | 通信算子 | 修改通信算子(recv_forward、recv_backward等),适配多DP分时复用 | mindspeed_mm/patchs/layerwise_disaggregated_training/p2p_communication_patch.py | megatron/core/pipeline_parallel/p2p_communication.py | | 训练初始化 | 修改通信组初始化、并行初始化,适配多DP分时复用 | mindspeed_mm/patchs/layerwise_disaggregated_training/parallel_state_patch.py | megatron/core/parallel_state.py | | 模型初始化 | 修改模型初始化和ckpt加载逻辑,适配首尾共部署 | mindspeed_mm/patchs/layerwise_disaggregated_training/vlm_model_patch.py | mindspeed_mm/models/vlm_model.py | | 训练后处理 | 后处理通信优化 | mindspeed_mm/patchs/layerwise_disaggregated_training/utils_patch.py mindspeed_mm/patchs/layerwise_disaggregated_training/distributed_data_parallel_patch.py | megatron/core/utils.py | | 模型切分 | ckpt切分方法hf_to_mm_ldt,适配首尾共部署 | checkpoint/vlm_model/hf_to_mm_ldt.py | / | | 校验前处理/后处理 | 为了通过参数校验,对args进行前处理和后处理 | mindspeed_mm/patchs/validate_args_patch.py | / | | 删除文件 | / | mindspeed_mm/patchs/layerwise_disaggregated_training/training_patch.py mindspeed_mm/patchs/layerwise_disaggregated_training/utils.py | / | - 关于侵入式修改的说明: 侵入式修改路径:mindspeed_mm/models/vision/vision_encoders/vision_transformer_block.py:298-301 原代码:对当前PP之前所有pp_rank上的VIT模型层数求和。 ```python previous_layer = sum(self.config.pipeline_num_layers[:pp_rank]) ``` 新代码:由于边云特性会开启VPP功能,此时self.config.pipeline_num_layers是一个二维数组,不能通过sum进行求和。 修改点:补充了对self.config.pipeline_num_layers是不是二维数组的判断。在开启边云特性的情况下,self.config.pipeline_num_layers是二维数组,此处改为对self.config.pipeline_num_layers[0]进行求和。在不开启边云特性的情况下,self.config.pipeline_num_layers是一维数组,代码进入else分支走原生逻辑。因此此处修改不会影响原有代码逻辑。 ```python if isinstance(self.config.pipeline_num_layers[0], list): previous_layer = sum(self.config.pipeline_num_layers[0][:pp_rank]) else: previous_layer = sum(self.config.pipeline_num_layers[:pp_rank]) ``` ## Does this PR introduce any user-facing change? Please describe whether the PR will result in any user-facing usage changes. If there is related documentation, please specify its path. ## How was this patch tested? Please explain how to verify the correctness and effectiveness of this feature, as well as its usage constraints and limitations. See merge request: Ascend/MindSpeed-MM!25271 天前
[Feature] model fine-tuning supports time division multiplexing of DP on PP0 Co-authored-by: f00620112<fangminghao@huawei.com> # message auto-generated for no-merge-commit merge: !2527 merge 20260430_iter6 into master [Feature] model fine-tuning supports time division multiplexing of DP on PP0 Created-by: fangminghao Commit-by: f00620112 Merged-by: ascend-robot Description: https://gitcode.com/Ascend/MindSpeed-MM/issues/176 ## What this PR does / why we need it? [Feature] model fine-tuning supports time division multiplexing of DP on PP0 - 功能说明:针对边侧节点数量不足的情况,支持边侧DP小于云侧DP。 - 非对称DP实现逻辑:在对称DP场景下,通过卡分复用的方式,不同DP域的节点或卡处理各自DP域的数据。不同于对称DP,非对称DP场景下,边侧通过时分复用的方式处理多个DP域的数据,并分别与云侧进行通信。 - 修改概览: | | 修改点 | 修改路径 | 原路径 | | ----------------- | ------------------------------------------------------------ | ------------------------------------------------------------ | ---------------------------------------------------- | | vit模型 | 侵入式修改,vit模型层求和操作,适配vpp | mindspeed_mm/models/vision/vision_encoders/vision_transformer_block.py | / | | patch注册 | / | mindspeed_mm/patchs/patch_manager.py | / | | 训练流水 | 修改流水编排,适配多DP分时复用 | mindspeed_mm/patchs/layerwise_disaggregated_training/schedules_patch.py | megatron/core/pipeline_parallel/schedules.py | | 通信算子 | 修改通信算子(recv_forward、recv_backward等),适配多DP分时复用 | mindspeed_mm/patchs/layerwise_disaggregated_training/p2p_communication_patch.py | megatron/core/pipeline_parallel/p2p_communication.py | | 训练初始化 | 修改通信组初始化、并行初始化,适配多DP分时复用 | mindspeed_mm/patchs/layerwise_disaggregated_training/parallel_state_patch.py | megatron/core/parallel_state.py | | 模型初始化 | 修改模型初始化和ckpt加载逻辑,适配首尾共部署 | mindspeed_mm/patchs/layerwise_disaggregated_training/vlm_model_patch.py | mindspeed_mm/models/vlm_model.py | | 训练后处理 | 后处理通信优化 | mindspeed_mm/patchs/layerwise_disaggregated_training/utils_patch.py mindspeed_mm/patchs/layerwise_disaggregated_training/distributed_data_parallel_patch.py | megatron/core/utils.py | | 模型切分 | ckpt切分方法hf_to_mm_ldt,适配首尾共部署 | checkpoint/vlm_model/hf_to_mm_ldt.py | / | | 校验前处理/后处理 | 为了通过参数校验,对args进行前处理和后处理 | mindspeed_mm/patchs/validate_args_patch.py | / | | 删除文件 | / | mindspeed_mm/patchs/layerwise_disaggregated_training/training_patch.py mindspeed_mm/patchs/layerwise_disaggregated_training/utils.py | / | - 关于侵入式修改的说明: 侵入式修改路径:mindspeed_mm/models/vision/vision_encoders/vision_transformer_block.py:298-301 原代码:对当前PP之前所有pp_rank上的VIT模型层数求和。 ```python previous_layer = sum(self.config.pipeline_num_layers[:pp_rank]) ``` 新代码:由于边云特性会开启VPP功能,此时self.config.pipeline_num_layers是一个二维数组,不能通过sum进行求和。 修改点:补充了对self.config.pipeline_num_layers是不是二维数组的判断。在开启边云特性的情况下,self.config.pipeline_num_layers是二维数组,此处改为对self.config.pipeline_num_layers[0]进行求和。在不开启边云特性的情况下,self.config.pipeline_num_layers是一维数组,代码进入else分支走原生逻辑。因此此处修改不会影响原有代码逻辑。 ```python if isinstance(self.config.pipeline_num_layers[0], list): previous_layer = sum(self.config.pipeline_num_layers[0][:pp_rank]) else: previous_layer = sum(self.config.pipeline_num_layers[:pp_rank]) ``` ## Does this PR introduce any user-facing change? Please describe whether the PR will result in any user-facing usage changes. If there is related documentation, please specify its path. ## How was this patch tested? Please explain how to verify the correctness and effectiveness of this feature, as well as its usage constraints and limitations. See merge request: Ascend/MindSpeed-MM!25271 天前
[Feature] model fine-tuning supports time division multiplexing of DP on PP0 Co-authored-by: f00620112<fangminghao@huawei.com> # message auto-generated for no-merge-commit merge: !2527 merge 20260430_iter6 into master [Feature] model fine-tuning supports time division multiplexing of DP on PP0 Created-by: fangminghao Commit-by: f00620112 Merged-by: ascend-robot Description: https://gitcode.com/Ascend/MindSpeed-MM/issues/176 ## What this PR does / why we need it? [Feature] model fine-tuning supports time division multiplexing of DP on PP0 - 功能说明:针对边侧节点数量不足的情况,支持边侧DP小于云侧DP。 - 非对称DP实现逻辑:在对称DP场景下,通过卡分复用的方式,不同DP域的节点或卡处理各自DP域的数据。不同于对称DP,非对称DP场景下,边侧通过时分复用的方式处理多个DP域的数据,并分别与云侧进行通信。 - 修改概览: | | 修改点 | 修改路径 | 原路径 | | ----------------- | ------------------------------------------------------------ | ------------------------------------------------------------ | ---------------------------------------------------- | | vit模型 | 侵入式修改,vit模型层求和操作,适配vpp | mindspeed_mm/models/vision/vision_encoders/vision_transformer_block.py | / | | patch注册 | / | mindspeed_mm/patchs/patch_manager.py | / | | 训练流水 | 修改流水编排,适配多DP分时复用 | mindspeed_mm/patchs/layerwise_disaggregated_training/schedules_patch.py | megatron/core/pipeline_parallel/schedules.py | | 通信算子 | 修改通信算子(recv_forward、recv_backward等),适配多DP分时复用 | mindspeed_mm/patchs/layerwise_disaggregated_training/p2p_communication_patch.py | megatron/core/pipeline_parallel/p2p_communication.py | | 训练初始化 | 修改通信组初始化、并行初始化,适配多DP分时复用 | mindspeed_mm/patchs/layerwise_disaggregated_training/parallel_state_patch.py | megatron/core/parallel_state.py | | 模型初始化 | 修改模型初始化和ckpt加载逻辑,适配首尾共部署 | mindspeed_mm/patchs/layerwise_disaggregated_training/vlm_model_patch.py | mindspeed_mm/models/vlm_model.py | | 训练后处理 | 后处理通信优化 | mindspeed_mm/patchs/layerwise_disaggregated_training/utils_patch.py mindspeed_mm/patchs/layerwise_disaggregated_training/distributed_data_parallel_patch.py | megatron/core/utils.py | | 模型切分 | ckpt切分方法hf_to_mm_ldt,适配首尾共部署 | checkpoint/vlm_model/hf_to_mm_ldt.py | / | | 校验前处理/后处理 | 为了通过参数校验,对args进行前处理和后处理 | mindspeed_mm/patchs/validate_args_patch.py | / | | 删除文件 | / | mindspeed_mm/patchs/layerwise_disaggregated_training/training_patch.py mindspeed_mm/patchs/layerwise_disaggregated_training/utils.py | / | - 关于侵入式修改的说明: 侵入式修改路径:mindspeed_mm/models/vision/vision_encoders/vision_transformer_block.py:298-301 原代码:对当前PP之前所有pp_rank上的VIT模型层数求和。 ```python previous_layer = sum(self.config.pipeline_num_layers[:pp_rank]) ``` 新代码:由于边云特性会开启VPP功能,此时self.config.pipeline_num_layers是一个二维数组,不能通过sum进行求和。 修改点:补充了对self.config.pipeline_num_layers是不是二维数组的判断。在开启边云特性的情况下,self.config.pipeline_num_layers是二维数组,此处改为对self.config.pipeline_num_layers[0]进行求和。在不开启边云特性的情况下,self.config.pipeline_num_layers是一维数组,代码进入else分支走原生逻辑。因此此处修改不会影响原有代码逻辑。 ```python if isinstance(self.config.pipeline_num_layers[0], list): previous_layer = sum(self.config.pipeline_num_layers[0][:pp_rank]) else: previous_layer = sum(self.config.pipeline_num_layers[:pp_rank]) ``` ## Does this PR introduce any user-facing change? Please describe whether the PR will result in any user-facing usage changes. If there is related documentation, please specify its path. ## How was this patch tested? Please explain how to verify the correctness and effectiveness of this feature, as well as its usage constraints and limitations. See merge request: Ascend/MindSpeed-MM!25271 天前