文件最后提交记录最后更新时间
[Feature] support broadcast loading Co-authored-by: htwang<wanghaitao60@huawei.com> # message auto-generated for no-merge-commit merge: !2267 merge ckpt_optim into master [Feature] support broadcast loading Created-by: htwang Commit-by: htwang Merged-by: ascend-robot Description: ## What this PR does / why we need it? 当权重比较大,并且卡数比较多的是以后,权重加载时磁盘IO会成为瓶颈,通过0卡读再广播的形式可以有效降低IO ## Does this PR introduce any user-facing change? 支持rank0加载权重再广播给其他卡的权重加载方式 ## How was this patch tested? Please explain how to verify the correctness and effectiveness of this feature, as well as its usage constraints and limitations. See merge request: Ascend/MindSpeed-MM!22672 个月前
[Bugfix] fix data packing for qwen dataset Co-authored-by: htwang<wanghaitao60@huawei.com> # message auto-generated for no-merge-commit merge: !2288 merge master into master [Bugfix] fix data packing for qwen dataset Created-by: htwang Commit-by: htwang Merged-by: ascend-robot Description: ## What this PR does / why we need it? 图文数据、图文数据+纯文本混合数据 开启packing时出现position ids相关报错 ## Does this PR introduce any user-facing change? 1、packing processer处理时,postion ids生产的时连续的序号,在存在图片等模态数据时,需要在collator中重新生成正确的position ids 2、llamafactory数据集支持设置自定义pad_to_multiple_of ## How was this patch tested? 原始的postion ids产生的序列(多个样本时): ``` tensor([[[ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 0, 0, 0, 0, 0, 0, 0, 0, 0]], [[ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 0, 0, 0, 0, 0, 0, 0, 0, 0]], [[ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 0, 0, 0, 0, 0, 0, 0, 0, 0]]]) ``` 修改后: ``` tensor([[[ 0, 1, 2, 3, 4, 4, 4, 4, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 0, 0, 0, 0, 0, 0, 0, 0, 0]], [[ 0, 1, 2, 3, 4, 4, 5, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 0, 0, 0, 0, 0, 0, 0, 0, 0]], [[ 0, 1, 2, 3, 4, 5, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 0, 0, 0, 0, 0, 0, 0, 0, 0]]]) ``` 对于图片数据,位置编码是存在问题的,而且thw三个轴上的编码相同,不符合mrope的逻辑。内部业务团队验证loss与不packing一致 See merge request: Ascend/MindSpeed-MM!22882 个月前
[Bugfix] bugfix for clip grad & empty ep Co-authored-by: htwang<wanghaitao60@huawei.com> # message auto-generated for no-merge-commit merge: !2382 merge 26.0.0 into 26.0.0 [Bugfix] bugfix for clip grad & empty ep Created-by: htwang Commit-by: htwang Merged-by: ascend-robot Description: ## What this PR does / why we need it? 1、EP使能时,当部分ep rank没有收到tokens时,保持空运算,防止专家参数失去梯度 2、修复不开EP切clip grad norm大于0时,clip grad 计算错误的问题 ## Does this PR introduce any user-facing change? Please describe whether the PR will result in any user-facing usage changes. If there is related documentation, please specify its path. ## How was this patch tested? Please explain how to verify the correctness and effectiveness of this feature, as well as its usage constraints and limitations. See merge request: Ascend/MindSpeed-MM!23821 个月前
[bugfix] fix bug when use asyncoffload+recompute+ep Co-authored-by: weixin_44031810<gaojie75@huawei.com> # message auto-generated for no-merge-commit merge: !2259 merge master into master [bugfix] fix bug when use asyncoffload+recompute+ep Created-by: gaojie_ Commit-by: weixin_44031810 Merged-by: ascend-robot Description: ## What this PR does / why we need it? qwen35同时开启 asyncoffload+recompute+ep时,重计算会报和前向shape不一致问题;通过定位,发现switch stream写错了 ## Does this PR introduce any user-facing change? Please describe whether the PR will result in any user-facing usage changes. If there is related documentation, please specify its path. ## How was this patch tested? Please explain how to verify the correctness and effectiveness of this feature, as well as its usage constraints and limitations. See merge request: Ascend/MindSpeed-MM!22592 个月前
feat(torch): Squash merge fsdp2_dev into master. Co-authored-by: zs-derrick1<1434012475@qq.com> # message auto-generated for no-merge-commit merge: !2223 merge master into master feat(torch): Squash merge fsdp2_dev into master. Created-by: zs-derrick1 Commit-by: zs-derrick1 Merged-by: ascend-robot Description: ## Motivation Squash merge fsdp2_dev into master. ## Modification fsdp2_dev分支合并至master ## Self-test (Optional) If modifications to this PR may cause/fix function/accuracy/performance DTSs/issues, a self-inspection record needs to be attached. ## BC-breaking (Optional) If there are compatibility issues, such as dependencies on cann/torch_npu versions, they need to be explained in the PR. ## Checklist **Before PR**: - [x] The new code needs to comply with the Clean Code specification. - [x] The PR content is self-checked, and the expression can be clear and the writing standardized **After PR**: - [x] CLA has been signed and all committers have signed the CLA in this PR. - [x] The ci-pipeline is passed, Code Check is passed. See merge request: Ascend/MindSpeed-MM!22233 个月前
cleancode Co-authored-by: liyingxuan<liyingxuan3@huawei.com> # message auto-generated for no-merge-commit merge: !2323 merge master into 26.0.0 cleancode Created-by: liyx616 Commit-by: liyingxuan Merged-by: ascend-robot Description: ## What this PR does / why we need it? cleancode整改 ## Does this PR introduce any user-facing change? cleancode整改 ## How was this patch tested? Please explain how to verify the correctness and effectiveness of this feature, as well as its usage constraints and limitations. See merge request: Ascend/MindSpeed-MM!23232 个月前
[Modify] Enhance performance for TTS model Co-authored-by: AZe_404<wangze62@h-partners.com> # message auto-generated for no-merge-commit merge: !2296 merge tts_perf into master [Modify] Enhance performance for TTS model Created-by: AZe_404 Commit-by: AZe_404 Merged-by: ascend-robot Description: ## What this PR does / why we need it? Enhance performance for TTS model. ## Does this PR introduce any user-facing change? Please describe whether the PR will result in any user-facing usage changes. If there is related documentation, please specify its path. ## How was this patch tested? Please explain how to verify the correctness and effectiveness of this feature, as well as its usage constraints and limitations. See merge request: Ascend/MindSpeed-MM!22962 个月前
[Bugfix] bugfix for clip grad & empty ep Co-authored-by: htwang<wanghaitao60@huawei.com> # message auto-generated for no-merge-commit merge: !2382 merge 26.0.0 into 26.0.0 [Bugfix] bugfix for clip grad & empty ep Created-by: htwang Commit-by: htwang Merged-by: ascend-robot Description: ## What this PR does / why we need it? 1、EP使能时,当部分ep rank没有收到tokens时,保持空运算,防止专家参数失去梯度 2、修复不开EP切clip grad norm大于0时,clip grad 计算错误的问题 ## Does this PR introduce any user-facing change? Please describe whether the PR will result in any user-facing usage changes. If there is related documentation, please specify its path. ## How was this patch tested? Please explain how to verify the correctness and effectiveness of this feature, as well as its usage constraints and limitations. See merge request: Ascend/MindSpeed-MM!23821 个月前
[Bugfix] fix data packing for qwen dataset Co-authored-by: htwang<wanghaitao60@huawei.com> # message auto-generated for no-merge-commit merge: !2288 merge master into master [Bugfix] fix data packing for qwen dataset Created-by: htwang Commit-by: htwang Merged-by: ascend-robot Description: ## What this PR does / why we need it? 图文数据、图文数据+纯文本混合数据 开启packing时出现position ids相关报错 ## Does this PR introduce any user-facing change? 1、packing processer处理时,postion ids生产的时连续的序号,在存在图片等模态数据时,需要在collator中重新生成正确的position ids 2、llamafactory数据集支持设置自定义pad_to_multiple_of ## How was this patch tested? 原始的postion ids产生的序列(多个样本时): ``` tensor([[[ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 0, 0, 0, 0, 0, 0, 0, 0, 0]], [[ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 0, 0, 0, 0, 0, 0, 0, 0, 0]], [[ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 0, 0, 0, 0, 0, 0, 0, 0, 0]]]) ``` 修改后: ``` tensor([[[ 0, 1, 2, 3, 4, 4, 4, 4, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 0, 0, 0, 0, 0, 0, 0, 0, 0]], [[ 0, 1, 2, 3, 4, 4, 5, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 0, 0, 0, 0, 0, 0, 0, 0, 0]], [[ 0, 1, 2, 3, 4, 5, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 0, 0, 0, 0, 0, 0, 0, 0, 0]]]) ``` 对于图片数据,位置编码是存在问题的,而且thw三个轴上的编码相同,不符合mrope的逻辑。内部业务团队验证loss与不packing一致 See merge request: Ascend/MindSpeed-MM!22882 个月前
[bugfix] adapter verl training Co-authored-by: Miss_min<qiaoxiaomin@huawei.com> # message auto-generated for no-merge-commit merge: !2313 merge master into master [bugfix] adapter verl training Created-by: Miss_min Commit-by: Miss_min Merged-by: ascend-robot Description: ## What this PR does / why we need it? rl场景下,parse_args会解析出不是yaml的参数,通过透传将所需参数传入 ## Does this PR introduce any user-facing change? Please describe whether the PR will result in any user-facing usage changes. If there is related documentation, please specify its path. ## How was this patch tested? Please explain how to verify the correctness and effectiveness of this feature, as well as its usage constraints and limitations. See merge request: Ascend/MindSpeed-MM!23132 个月前
feat(torch): Squash merge fsdp2_dev into master. Co-authored-by: zs-derrick1<1434012475@qq.com> # message auto-generated for no-merge-commit merge: !2223 merge master into master feat(torch): Squash merge fsdp2_dev into master. Created-by: zs-derrick1 Commit-by: zs-derrick1 Merged-by: ascend-robot Description: ## Motivation Squash merge fsdp2_dev into master. ## Modification fsdp2_dev分支合并至master ## Self-test (Optional) If modifications to this PR may cause/fix function/accuracy/performance DTSs/issues, a self-inspection record needs to be attached. ## BC-breaking (Optional) If there are compatibility issues, such as dependencies on cann/torch_npu versions, they need to be explained in the PR. ## Checklist **Before PR**: - [x] The new code needs to comply with the Clean Code specification. - [x] The PR content is self-checked, and the expression can be clear and the writing standardized **After PR**: - [x] CLA has been signed and all committers have signed the CLA in this PR. - [x] The ci-pipeline is passed, Code Check is passed. See merge request: Ascend/MindSpeed-MM!22233 个月前
[bugfix] adapter verl training Co-authored-by: Miss_min<qiaoxiaomin@huawei.com> # message auto-generated for no-merge-commit merge: !2313 merge master into master [bugfix] adapter verl training Created-by: Miss_min Commit-by: Miss_min Merged-by: ascend-robot Description: ## What this PR does / why we need it? rl场景下,parse_args会解析出不是yaml的参数,通过透传将所需参数传入 ## Does this PR introduce any user-facing change? Please describe whether the PR will result in any user-facing usage changes. If there is related documentation, please specify its path. ## How was this patch tested? Please explain how to verify the correctness and effectiveness of this feature, as well as its usage constraints and limitations. See merge request: Ascend/MindSpeed-MM!23132 个月前
cleancode Co-authored-by: liyingxuan<liyingxuan3@huawei.com> # message auto-generated for no-merge-commit merge: !2323 merge master into 26.0.0 cleancode Created-by: liyx616 Commit-by: liyingxuan Merged-by: ascend-robot Description: ## What this PR does / why we need it? cleancode整改 ## Does this PR introduce any user-facing change? cleancode整改 ## How was this patch tested? Please explain how to verify the correctness and effectiveness of this feature, as well as its usage constraints and limitations. See merge request: Ascend/MindSpeed-MM!23232 个月前
feat(torch): Squash merge fsdp2_dev into master. Co-authored-by: zs-derrick1<1434012475@qq.com> # message auto-generated for no-merge-commit merge: !2223 merge master into master feat(torch): Squash merge fsdp2_dev into master. Created-by: zs-derrick1 Commit-by: zs-derrick1 Merged-by: ascend-robot Description: ## Motivation Squash merge fsdp2_dev into master. ## Modification fsdp2_dev分支合并至master ## Self-test (Optional) If modifications to this PR may cause/fix function/accuracy/performance DTSs/issues, a self-inspection record needs to be attached. ## BC-breaking (Optional) If there are compatibility issues, such as dependencies on cann/torch_npu versions, they need to be explained in the PR. ## Checklist **Before PR**: - [x] The new code needs to comply with the Clean Code specification. - [x] The PR content is self-checked, and the expression can be clear and the writing standardized **After PR**: - [x] CLA has been signed and all committers have signed the CLA in this PR. - [x] The ci-pipeline is passed, Code Check is passed. See merge request: Ascend/MindSpeed-MM!22233 个月前