MindSpeed-MM/mindspeed_mm/fsdp/params · Ascend/MindSpeed-MM - AtomGit

ascend-robot[Bugfix] fix data packing for qwen dataset

文件	最后提交记录	最后更新时间
argument.py	feat(torch): Squash merge fsdp2_dev into master. Co-authored-by: zs-derrick1<1434012475@qq.com> # message auto-generated for no-merge-commit merge: !2223 merge master into master feat(torch): Squash merge fsdp2_dev into master. Created-by: zs-derrick1 Commit-by: zs-derrick1 Merged-by: ascend-robot Description: ## Motivation Squash merge fsdp2_dev into master. ## Modification fsdp2_dev分支合并至master ## Self-test (Optional) If modifications to this PR may cause/fix function/accuracy/performance DTSs/issues, a self-inspection record needs to be attached. ## BC-breaking (Optional) If there are compatibility issues, such as dependencies on cann/torch_npu versions, they need to be explained in the PR. ## Checklist Before PR: - [x] The new code needs to comply with the Clean Code specification. - [x] The PR content is self-checked, and the expression can be clear and the writing standardized After PR: - [x] CLA has been signed and all committers have signed the CLA in this PR. - [x] The ci-pipeline is passed, Code Check is passed. See merge request: Ascend/MindSpeed-MM!2223	3 个月前
data_args.py	[Bugfix] fix data packing for qwen dataset Co-authored-by: htwang<wanghaitao60@huawei.com> # message auto-generated for no-merge-commit merge: !2288 merge master into master [Bugfix] fix data packing for qwen dataset Created-by: htwang Commit-by: htwang Merged-by: ascend-robot Description: ## What this PR does / why we need it? 图文数据、图文数据+纯文本混合数据开启packing时出现position ids相关报错 ## Does this PR introduce any user-facing change? 1、packing processer处理时，postion ids生产的时连续的序号，在存在图片等模态数据时，需要在collator中重新生成正确的position ids 2、llamafactory数据集支持设置自定义pad_to_multiple_of ## How was this patch tested? 原始的postion ids产生的序列（多个样本时）： ``` tensor([[[ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 0, 0, 0, 0, 0, 0, 0, 0, 0]], [[ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 0, 0, 0, 0, 0, 0, 0, 0, 0]], [[ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 0, 0, 0, 0, 0, 0, 0, 0, 0]]]) ``` 修改后： ``` tensor([[[ 0, 1, 2, 3, 4, 4, 4, 4, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 0, 0, 0, 0, 0, 0, 0, 0, 0]], [[ 0, 1, 2, 3, 4, 4, 5, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 0, 0, 0, 0, 0, 0, 0, 0, 0]], [[ 0, 1, 2, 3, 4, 5, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 0, 0, 0, 0, 0, 0, 0, 0, 0]]]) ``` 对于图片数据，位置编码是存在问题的，而且thw三个轴上的编码相同，不符合mrope的逻辑。内部业务团队验证loss与不packing一致 See merge request: Ascend/MindSpeed-MM!2288	2 个月前
lora_args.py	[feature] pure fsdp2 backend add lora finetune ability. Co-authored-by: pjgao<gaopengju3@huawei.com> # message auto-generated for no-merge-commit merge: !2281 merge master into master [feature] pure fsdp2 backend add lora finetune ability. Created-by: PIPIXIU Commit-by: pjgao Merged-by: ascend-robot Description: ## What this PR does / why we need it? 一、背景问题：当前仓上，MM的新FSDP2训练框架不支持LoRA微调，用户无法在新的纯FSDP2 backend上进行高效的参数高效微调。（megatron后端、megatron+fsdp2后端支持，纯fsdp2后端不支持）二、本次修改：通过`peft`库为纯FSDP2后端添加完整的LoRA微调能力，包括： 1. 新增配置模块 `mindspeed_mm/fsdp/params/lora_args.py` - 定义`LoraArguments`配置类，包含rank、alpha、target_modules、dropout、init_lora_weights等参数 - 支持`save_mode`配置（"lora_only"或"full_model"） 2. 新增工具函数 `mindspeed_mm/fsdp/utils/lora_utils.py` - `match_target_modules()`: 支持通配符模式匹配目标模块 - `add_lora_to_model()`: 注入LoRA适配器到模型 - `freeze_parameters()`: 冻结基础模型参数 - `validate_lora_config()`: 验证LoRA配置合法性 - `get_lora_trainable_params()`: 获取LoRA可训练参数统计 3. 新增权重管理器 `mindspeed_mm/fsdp/utils/lora_weight_manager.py` - `LoraWeightManager`类管理LoRA权重的保存和加载 - `_gather_dtensor()`: 兼容FSDP2的DTensor分布式张量 - `save_lora_only()`: 仅保存LoRA适配器权重（safetensors格式） - `load_lora_weights()`: 加载预训练LoRA权重 - `verify_lora_weights()`: 验证LoRA权重有效性 4. 集成到训练流程 - 修改`trainer.py`：新增`enable_lora()`方法，在FSDP2分片前注入LoRA - 修改`train_engine.py`：支持`lora_only`和`full_model`两种保存模式 ## Does this PR introduce any user-facing change? 一、有用户接口变更。新增LoRA配置选项： ```yaml training: lora: enable: true # 启用LoRA微调 rank: 8 # LoRA秩 alpha: 16 # LoRA缩放因子 target_modules: # 目标模块（支持通配符） - "q_proj" - "k_proj" - "v_proj" dropout: 0.0 # LoRA层dropout init_lora_weights: "kaiming" # 权重初始化方法 pretrained_lora_path: null # 预训练LoRA权重路径（可选） save_mode: "lora_only" # 保存模式："lora_only"或"full_model" ``` 二、依赖要求： - 需要安装`peft`库：`pip install peft` - 保存LoRA权重需要`safetensors`库：`pip install safetensors` ## How was this patch tested? （由于代码量原因，另提PR）一、测试场景： - FSDP2分布式训练 + LoRA微调 - LoRA权重保存与加载 - 预训练LoRA权重加载二、验证点： 1. LoRA适配器正确注入到目标模块 2. 冻结基础模型参数，仅训练LoRA参数 3. `save_lora_only`模式正确保存LoRA权重（safetensors格式） 4. DTensor分布式张量正确gather到完整权重 5. LoRA权重无NaN/Inf异常值三、使用约束： - LoRA注入需在FSDP2分片前完成 - 多卡训练权重保存时，dtensor.full_tensor将权重拉回 --- COCO数据集训练loss： ![image.png](https://raw.gitcode.com/user-images/assets/7404510/d0b4c39b-8ae9-4942-9f5c-ba68193c8773/image.png 'image.png') See merge request: Ascend/MindSpeed-MM!2281	2 个月前
model_args.py	feat(torch): Squash merge fsdp2_dev into master. Co-authored-by: zs-derrick1<1434012475@qq.com> # message auto-generated for no-merge-commit merge: !2223 merge master into master feat(torch): Squash merge fsdp2_dev into master. Created-by: zs-derrick1 Commit-by: zs-derrick1 Merged-by: ascend-robot Description: ## Motivation Squash merge fsdp2_dev into master. ## Modification fsdp2_dev分支合并至master ## Self-test (Optional) If modifications to this PR may cause/fix function/accuracy/performance DTSs/issues, a self-inspection record needs to be attached. ## BC-breaking (Optional) If there are compatibility issues, such as dependencies on cann/torch_npu versions, they need to be explained in the PR. ## Checklist Before PR: - [x] The new code needs to comply with the Clean Code specification. - [x] The PR content is self-checked, and the expression can be clear and the writing standardized After PR: - [x] CLA has been signed and all committers have signed the CLA in this PR. - [x] The ci-pipeline is passed, Code Check is passed. See merge request: Ascend/MindSpeed-MM!2223	3 个月前
parallel_args.py	feat: torch fully_shard patch, optimize interaction between fully_shard and checkpoint_wrapper Co-authored-by: liyingxuan<liyingxuan3@huawei.com> # message auto-generated for no-merge-commit merge: !2290 merge master into master feat: torch fully_shard patch, optimize interaction between fully_shard and checkpoint_wrapper Created-by: liyx616 Commit-by: liyingxuan Merged-by: ascend-robot Description: 之前关闭的pr（冲突太多）：https://gitcode.com/Ascend/MindSpeed-MM/pull/2289，已按照老pr中的检视意见修改 ## What this PR does / why we need it? torch原生的fully_shard如果被包在重计算内部的时候，会在反向重计算的时候触发 pre_forward和pre_backward，重复unshard参数，并且没有及时释放，会导致性能和显存的劣化通过給torch原生的fully_shard打patch，对这个缺陷进行修复。修复的设计思路如下： 1. fully_shard添加hook_module入参，该入参表示fsdp2 做的那些pre_forward, post_forward, pre_backward, post_backward这些hook，都添加到这个hook_module上，这个hook_module一般重计算是设哪个，就设哪个，确保fully_shard的hook可以添加在重计算的外面 2. 原来全局只有一个comm_ctx, 导致所有的allgather, reduce scatter都要等上一个子模块的copyout和chunkcat执行完成，但是copyout和chunkcat是在计算流上执行的，容易导致通信被阻塞。当前全局设置了多个comm_ctx(id会自动推导，无需自己设定)，同属于一个hook_module的参数allgather和reduce scatter无需相互等待。观测到hook_module变化时，会等待上一个hook_module所有comm_ctx中的事件结束 ## Does this PR introduce any user-facing change? 修改了torch原生的fully_shard使用方法。场景1：不需要对layer内部进行细粒度切分，或者不开启重计算，按照原来的写法即可 ```python for layer in model.layers: fully_shard(layer, fsdp_kwargs) fully_shard(model, fsdp_kwargs) ``` 场景2：需要对layer内部进行细粒度切分，并且需要使用重计算 ```python for i, layer in enumerate(model.layers): model.layers[i] = checkpoint_wrapper(layer) # 两个子模块可以有不同的devicemesh，或者其他fsdp kwargs，支持灵活配置 fully_shard(layer.attn, hook_module=layer, fsdp_kwargs1) fully_shard(layer.mlp, hook_module=layer, fsdp_kwargs2) fully_shard(layer, hook_module=layer, fsdp_kwargs) fully_shard(model, fsdp_kwargs) # 如果hook_module不设置的话，默认就是传入的第一个nn.Module ``` ## How was this patch tested? Please explain how to verify the correctness and effectiveness of this feature, as well as its usage constraints and limitations. See merge request: Ascend/MindSpeed-MM!2290	2 个月前
tools_args.py	feat(torch): Squash merge fsdp2_dev into master. Co-authored-by: zs-derrick1<1434012475@qq.com> # message auto-generated for no-merge-commit merge: !2223 merge master into master feat(torch): Squash merge fsdp2_dev into master. Created-by: zs-derrick1 Commit-by: zs-derrick1 Merged-by: ascend-robot Description: ## Motivation Squash merge fsdp2_dev into master. ## Modification fsdp2_dev分支合并至master ## Self-test (Optional) If modifications to this PR may cause/fix function/accuracy/performance DTSs/issues, a self-inspection record needs to be attached. ## BC-breaking (Optional) If there are compatibility issues, such as dependencies on cann/torch_npu versions, they need to be explained in the PR. ## Checklist Before PR: - [x] The new code needs to comply with the Clean Code specification. - [x] The PR content is self-checked, and the expression can be clear and the writing standardized After PR: - [x] CLA has been signed and all committers have signed the CLA in this PR. - [x] The ci-pipeline is passed, Code Check is passed. See merge request: Ascend/MindSpeed-MM!2223	3 个月前
training_args.py	[feature] pure fsdp2 backend add lora finetune ability. Co-authored-by: pjgao<gaopengju3@huawei.com> # message auto-generated for no-merge-commit merge: !2281 merge master into master [feature] pure fsdp2 backend add lora finetune ability. Created-by: PIPIXIU Commit-by: pjgao Merged-by: ascend-robot Description: ## What this PR does / why we need it? 一、背景问题：当前仓上，MM的新FSDP2训练框架不支持LoRA微调，用户无法在新的纯FSDP2 backend上进行高效的参数高效微调。（megatron后端、megatron+fsdp2后端支持，纯fsdp2后端不支持）二、本次修改：通过`peft`库为纯FSDP2后端添加完整的LoRA微调能力，包括： 1. 新增配置模块 `mindspeed_mm/fsdp/params/lora_args.py` - 定义`LoraArguments`配置类，包含rank、alpha、target_modules、dropout、init_lora_weights等参数 - 支持`save_mode`配置（"lora_only"或"full_model"） 2. 新增工具函数 `mindspeed_mm/fsdp/utils/lora_utils.py` - `match_target_modules()`: 支持通配符模式匹配目标模块 - `add_lora_to_model()`: 注入LoRA适配器到模型 - `freeze_parameters()`: 冻结基础模型参数 - `validate_lora_config()`: 验证LoRA配置合法性 - `get_lora_trainable_params()`: 获取LoRA可训练参数统计 3. 新增权重管理器 `mindspeed_mm/fsdp/utils/lora_weight_manager.py` - `LoraWeightManager`类管理LoRA权重的保存和加载 - `_gather_dtensor()`: 兼容FSDP2的DTensor分布式张量 - `save_lora_only()`: 仅保存LoRA适配器权重（safetensors格式） - `load_lora_weights()`: 加载预训练LoRA权重 - `verify_lora_weights()`: 验证LoRA权重有效性 4. 集成到训练流程 - 修改`trainer.py`：新增`enable_lora()`方法，在FSDP2分片前注入LoRA - 修改`train_engine.py`：支持`lora_only`和`full_model`两种保存模式 ## Does this PR introduce any user-facing change? 一、有用户接口变更。新增LoRA配置选项： ```yaml training: lora: enable: true # 启用LoRA微调 rank: 8 # LoRA秩 alpha: 16 # LoRA缩放因子 target_modules: # 目标模块（支持通配符） - "q_proj" - "k_proj" - "v_proj" dropout: 0.0 # LoRA层dropout init_lora_weights: "kaiming" # 权重初始化方法 pretrained_lora_path: null # 预训练LoRA权重路径（可选） save_mode: "lora_only" # 保存模式："lora_only"或"full_model" ``` 二、依赖要求： - 需要安装`peft`库：`pip install peft` - 保存LoRA权重需要`safetensors`库：`pip install safetensors` ## How was this patch tested? （由于代码量原因，另提PR）一、测试场景： - FSDP2分布式训练 + LoRA微调 - LoRA权重保存与加载 - 预训练LoRA权重加载二、验证点： 1. LoRA适配器正确注入到目标模块 2. 冻结基础模型参数，仅训练LoRA参数 3. `save_lora_only`模式正确保存LoRA权重（safetensors格式） 4. DTensor分布式张量正确gather到完整权重 5. LoRA权重无NaN/Inf异常值三、使用约束： - LoRA注入需在FSDP2分片前完成 - 多卡训练权重保存时，dtensor.full_tensor将权重拉回 --- COCO数据集训练loss： ![image.png](https://raw.gitcode.com/user-images/assets/7404510/d0b4c39b-8ae9-4942-9f5c-ba68193c8773/image.png 'image.png') See merge request: Ascend/MindSpeed-MM!2281	2 个月前
utils.py	feat(torch): Squash merge fsdp2_dev into master. Co-authored-by: zs-derrick1<1434012475@qq.com> # message auto-generated for no-merge-commit merge: !2223 merge master into master feat(torch): Squash merge fsdp2_dev into master. Created-by: zs-derrick1 Commit-by: zs-derrick1 Merged-by: ascend-robot Description: ## Motivation Squash merge fsdp2_dev into master. ## Modification fsdp2_dev分支合并至master ## Self-test (Optional) If modifications to this PR may cause/fix function/accuracy/performance DTSs/issues, a self-inspection record needs to be attached. ## BC-breaking (Optional) If there are compatibility issues, such as dependencies on cann/torch_npu versions, they need to be explained in the PR. ## Checklist Before PR: - [x] The new code needs to comply with the Clean Code specification. - [x] The PR content is self-checked, and the expression can be clear and the writing standardized After PR: - [x] CLA has been signed and all committers have signed the CLA in this PR. - [x] The ci-pipeline is passed, Code Check is passed. See merge request: Ascend/MindSpeed-MM!2223	3 个月前