文件最后提交记录最后更新时间
feat(torch): Squash merge fsdp2_dev into master. Co-authored-by: zs-derrick1<1434012475@qq.com> # message auto-generated for no-merge-commit merge: !2223 merge master into master feat(torch): Squash merge fsdp2_dev into master. Created-by: zs-derrick1 Commit-by: zs-derrick1 Merged-by: ascend-robot Description: ## Motivation Squash merge fsdp2_dev into master. ## Modification fsdp2_dev分支合并至master ## Self-test (Optional) If modifications to this PR may cause/fix function/accuracy/performance DTSs/issues, a self-inspection record needs to be attached. ## BC-breaking (Optional) If there are compatibility issues, such as dependencies on cann/torch_npu versions, they need to be explained in the PR. ## Checklist **Before PR**: - [x] The new code needs to comply with the Clean Code specification. - [x] The PR content is self-checked, and the expression can be clear and the writing standardized **After PR**: - [x] CLA has been signed and all committers have signed the CLA in this PR. - [x] The ci-pipeline is passed, Code Check is passed. See merge request: Ascend/MindSpeed-MM!22233 个月前
[bugfix] fix bug when use asyncoffload+recompute+ep Co-authored-by: weixin_44031810<gaojie75@huawei.com> # message auto-generated for no-merge-commit merge: !2259 merge master into master [bugfix] fix bug when use asyncoffload+recompute+ep Created-by: gaojie_ Commit-by: weixin_44031810 Merged-by: ascend-robot Description: ## What this PR does / why we need it? qwen35同时开启 asyncoffload+recompute+ep时,重计算会报和前向shape不一致问题;通过定位,发现switch stream写错了 ## Does this PR introduce any user-facing change? Please describe whether the PR will result in any user-facing usage changes. If there is related documentation, please specify its path. ## How was this patch tested? Please explain how to verify the correctness and effectiveness of this feature, as well as its usage constraints and limitations. See merge request: Ascend/MindSpeed-MM!22592 个月前
feat(torch): Squash merge fsdp2_dev into master. Co-authored-by: zs-derrick1<1434012475@qq.com> # message auto-generated for no-merge-commit merge: !2223 merge master into master feat(torch): Squash merge fsdp2_dev into master. Created-by: zs-derrick1 Commit-by: zs-derrick1 Merged-by: ascend-robot Description: ## Motivation Squash merge fsdp2_dev into master. ## Modification fsdp2_dev分支合并至master ## Self-test (Optional) If modifications to this PR may cause/fix function/accuracy/performance DTSs/issues, a self-inspection record needs to be attached. ## BC-breaking (Optional) If there are compatibility issues, such as dependencies on cann/torch_npu versions, they need to be explained in the PR. ## Checklist **Before PR**: - [x] The new code needs to comply with the Clean Code specification. - [x] The PR content is self-checked, and the expression can be clear and the writing standardized **After PR**: - [x] CLA has been signed and all committers have signed the CLA in this PR. - [x] The ci-pipeline is passed, Code Check is passed. See merge request: Ascend/MindSpeed-MM!22233 个月前
[feature] pure fsdp2 backend add lora finetune ability. Co-authored-by: pjgao<gaopengju3@huawei.com> # message auto-generated for no-merge-commit merge: !2281 merge master into master [feature] pure fsdp2 backend add lora finetune ability. Created-by: PIPIXIU Commit-by: pjgao Merged-by: ascend-robot Description: ## What this PR does / why we need it? 一、**背景问题:** 当前仓上,MM的新FSDP2训练框架不支持LoRA微调,用户无法在新的纯FSDP2 backend上进行高效的参数高效微调。(megatron后端、megatron+fsdp2后端支持,纯fsdp2后端不支持) 二、**本次修改:** 通过peft库为纯FSDP2后端添加完整的LoRA微调能力,包括: 1. **新增配置模块** mindspeed_mm/fsdp/params/lora_args.py - 定义LoraArguments配置类,包含rank、alpha、target_modules、dropout、init_lora_weights等参数 - 支持save_mode配置("lora_only"或"full_model") 2. **新增工具函数** mindspeed_mm/fsdp/utils/lora_utils.py - match_target_modules(): 支持通配符模式匹配目标模块 - add_lora_to_model(): 注入LoRA适配器到模型 - freeze_parameters(): 冻结基础模型参数 - validate_lora_config(): 验证LoRA配置合法性 - get_lora_trainable_params(): 获取LoRA可训练参数统计 3. **新增权重管理器** mindspeed_mm/fsdp/utils/lora_weight_manager.py - LoraWeightManager类管理LoRA权重的保存和加载 - _gather_dtensor(): 兼容FSDP2的DTensor分布式张量 - save_lora_only(): 仅保存LoRA适配器权重(safetensors格式) - load_lora_weights(): 加载预训练LoRA权重 - verify_lora_weights(): 验证LoRA权重有效性 4. **集成到训练流程** - 修改trainer.py:新增enable_lora()方法,在FSDP2分片前注入LoRA - 修改train_engine.py:支持lora_onlyfull_model两种保存模式 ## Does this PR introduce any user-facing change? 一、**有用户接口变更。** 新增LoRA配置选项: ```yaml training: lora: enable: true # 启用LoRA微调 rank: 8 # LoRA秩 alpha: 16 # LoRA缩放因子 target_modules: # 目标模块(支持通配符) - "q_proj" - "k_proj" - "v_proj" dropout: 0.0 # LoRA层dropout init_lora_weights: "kaiming" # 权重初始化方法 pretrained_lora_path: null # 预训练LoRA权重路径(可选) save_mode: "lora_only" # 保存模式:"lora_only"或"full_model" ``` 二、**依赖要求:** - 需要安装peft库:pip install peft - 保存LoRA权重需要safetensors库:pip install safetensors ## How was this patch tested? (由于代码量原因,另提PR) 一、**测试场景:** - FSDP2分布式训练 + LoRA微调 - LoRA权重保存与加载 - 预训练LoRA权重加载 二、**验证点:** 1. LoRA适配器正确注入到目标模块 2. 冻结基础模型参数,仅训练LoRA参数 3. save_lora_only模式正确保存LoRA权重(safetensors格式) 4. DTensor分布式张量正确gather到完整权重 5. LoRA权重无NaN/Inf异常值 三、**使用约束:** - LoRA注入需在FSDP2分片前完成 - 多卡训练权重保存时,dtensor.full_tensor将权重拉回 --- COCO数据集训练loss: ![image.png](https://raw.gitcode.com/user-images/assets/7404510/d0b4c39b-8ae9-4942-9f5c-ba68193c8773/image.png 'image.png') See merge request: Ascend/MindSpeed-MM!22812 个月前
[feature] pure fsdp2 backend add lora finetune ability. Co-authored-by: pjgao<gaopengju3@huawei.com> # message auto-generated for no-merge-commit merge: !2281 merge master into master [feature] pure fsdp2 backend add lora finetune ability. Created-by: PIPIXIU Commit-by: pjgao Merged-by: ascend-robot Description: ## What this PR does / why we need it? 一、**背景问题:** 当前仓上,MM的新FSDP2训练框架不支持LoRA微调,用户无法在新的纯FSDP2 backend上进行高效的参数高效微调。(megatron后端、megatron+fsdp2后端支持,纯fsdp2后端不支持) 二、**本次修改:** 通过peft库为纯FSDP2后端添加完整的LoRA微调能力,包括: 1. **新增配置模块** mindspeed_mm/fsdp/params/lora_args.py - 定义LoraArguments配置类,包含rank、alpha、target_modules、dropout、init_lora_weights等参数 - 支持save_mode配置("lora_only"或"full_model") 2. **新增工具函数** mindspeed_mm/fsdp/utils/lora_utils.py - match_target_modules(): 支持通配符模式匹配目标模块 - add_lora_to_model(): 注入LoRA适配器到模型 - freeze_parameters(): 冻结基础模型参数 - validate_lora_config(): 验证LoRA配置合法性 - get_lora_trainable_params(): 获取LoRA可训练参数统计 3. **新增权重管理器** mindspeed_mm/fsdp/utils/lora_weight_manager.py - LoraWeightManager类管理LoRA权重的保存和加载 - _gather_dtensor(): 兼容FSDP2的DTensor分布式张量 - save_lora_only(): 仅保存LoRA适配器权重(safetensors格式) - load_lora_weights(): 加载预训练LoRA权重 - verify_lora_weights(): 验证LoRA权重有效性 4. **集成到训练流程** - 修改trainer.py:新增enable_lora()方法,在FSDP2分片前注入LoRA - 修改train_engine.py:支持lora_onlyfull_model两种保存模式 ## Does this PR introduce any user-facing change? 一、**有用户接口变更。** 新增LoRA配置选项: ```yaml training: lora: enable: true # 启用LoRA微调 rank: 8 # LoRA秩 alpha: 16 # LoRA缩放因子 target_modules: # 目标模块(支持通配符) - "q_proj" - "k_proj" - "v_proj" dropout: 0.0 # LoRA层dropout init_lora_weights: "kaiming" # 权重初始化方法 pretrained_lora_path: null # 预训练LoRA权重路径(可选) save_mode: "lora_only" # 保存模式:"lora_only"或"full_model" ``` 二、**依赖要求:** - 需要安装peft库:pip install peft - 保存LoRA权重需要safetensors库:pip install safetensors ## How was this patch tested? (由于代码量原因,另提PR) 一、**测试场景:** - FSDP2分布式训练 + LoRA微调 - LoRA权重保存与加载 - 预训练LoRA权重加载 二、**验证点:** 1. LoRA适配器正确注入到目标模块 2. 冻结基础模型参数,仅训练LoRA参数 3. save_lora_only模式正确保存LoRA权重(safetensors格式) 4. DTensor分布式张量正确gather到完整权重 5. LoRA权重无NaN/Inf异常值 三、**使用约束:** - LoRA注入需在FSDP2分片前完成 - 多卡训练权重保存时,dtensor.full_tensor将权重拉回 --- COCO数据集训练loss: ![image.png](https://raw.gitcode.com/user-images/assets/7404510/d0b4c39b-8ae9-4942-9f5c-ba68193c8773/image.png 'image.png') See merge request: Ascend/MindSpeed-MM!22812 个月前
feat(torch): Squash merge fsdp2_dev into master. Co-authored-by: zs-derrick1<1434012475@qq.com> # message auto-generated for no-merge-commit merge: !2223 merge master into master feat(torch): Squash merge fsdp2_dev into master. Created-by: zs-derrick1 Commit-by: zs-derrick1 Merged-by: ascend-robot Description: ## Motivation Squash merge fsdp2_dev into master. ## Modification fsdp2_dev分支合并至master ## Self-test (Optional) If modifications to this PR may cause/fix function/accuracy/performance DTSs/issues, a self-inspection record needs to be attached. ## BC-breaking (Optional) If there are compatibility issues, such as dependencies on cann/torch_npu versions, they need to be explained in the PR. ## Checklist **Before PR**: - [x] The new code needs to comply with the Clean Code specification. - [x] The PR content is self-checked, and the expression can be clear and the writing standardized **After PR**: - [x] CLA has been signed and all committers have signed the CLA in this PR. - [x] The ci-pipeline is passed, Code Check is passed. See merge request: Ascend/MindSpeed-MM!22233 个月前
cleancode Co-authored-by: liyingxuan<liyingxuan3@huawei.com> # message auto-generated for no-merge-commit merge: !2323 merge master into 26.0.0 cleancode Created-by: liyx616 Commit-by: liyingxuan Merged-by: ascend-robot Description: ## What this PR does / why we need it? cleancode整改 ## Does this PR introduce any user-facing change? cleancode整改 ## How was this patch tested? Please explain how to verify the correctness and effectiveness of this feature, as well as its usage constraints and limitations. See merge request: Ascend/MindSpeed-MM!23232 个月前