| [feature]support experts weight merge for mixstral style moe or split back.
Co-authored-by: pjgao<gaopengju3@huawei.com>
# message auto-generated for no-merge-commit merge:
!1725 merge master into master
[feature]support experts weight merge for mixstral style moe or split back.
Created-by: PIPIXIU
Commit-by: pjgao
Merged-by: ascend-robot
Description: ## Motivation
internVL3.5/Qwen3Omni等模型直接使用了Qwen3MOE结构,qwen3MOE结构的实现为Mixtral的写法,即每个专家一个权重,存在问题:
1. fsdp2、zero2/3并行专家激活不均衡时会在梯度reduce_scatter卡住
2. 不易适配moe group gemm等融合算子优化
因此该PR提供了将多个专家权重合并以及合并训练后再拆分的能力,支持 InternVL3.5/Qwen3OmniMoe
## Modification
提供如下命令行接口:
```bash
mm-convert moe_expert --style merge --hf_dir "OpenGVLab/InternVL3_5-30B-A3B-Instruct" --save_dir "merged_weight"
mm-convert moe_expert --style split --hf_dir "merge" --save_dir "splited_weight"
```
## Self-test (Optional)
If modifications to this PR may cause/fix function/accuracy/performance DTSs/issues, a self-inspection record needs to be attached.
## BC-breaking (Optional)
If there are compatibility issues, such as dependencies on cann/torch_npu versions, they need to be explained in the PR.
## Checklist
**Before PR**:
- [x] The new code needs to comply with the Clean Code specification.
- [x] The PR content is self-checked, and the expression can be clear and the writing standardized
**After PR**:
- [x] CLA has been signed and all committers have signed the CLA in this PR.
- [x] The ci-pipeline is passed, Code Check is passed.
See merge request: Ascend/MindSpeed-MM!1725 | 6 个月前 |