文件最后提交记录最后更新时间
!1321 [Bugfix] import dependency isolation. Merge pull request !1321 from zs-Derrick/master 10 个月前
[Bugfix] fix ep grad caluation and clip grad Co-authored-by: htwang<wanghaitao60@huawei.com> # message auto-generated for no-merge-commit merge: !2070 merge master into master [Bugfix] fix ep grad caluation and clip grad Created-by: htwang Commit-by: htwang Merged-by: ascend-robot Description: 1、修改fsdp2场景下ep场景下grad norm计算逻辑bug(moe参数在ep group上未累加) 2、修复clip grad时因dtype导致的报错 See merge request: Ascend/MindSpeed-MM!20704 个月前
[Feature] Upgrade the functionality of the bridge to make it compatible with loading all types of weights. Co-authored-by: ningmengliu<liuhao438@huawei.com> # message auto-generated for no-merge-commit merge: !2154 merge master into master [Feature] Upgrade the functionality of the bridge to make it compatible with loading all types of weights. Created-by: ningmenglh Commit-by: ningmengliu Merged-by: ascend-robot Description: ## Motivation Upgrade the functionality of the bridge to make it compatible with loading all types of weights. ## Modification 1、将权重加载方式,使用bridge方式置为True,使其成为常态开启的选项 2、改变bridge的patch方式,使其megatron后端能够接受pt权重类型加载与huggingface权重类型加载;使fsdp2后端能够接受pt类型权重加载,huggingface类型权重,以及dcp类型权重加载 3、去掉之前的部分冗余判断 ## Self-test (Optional) If modifications to this PR may cause/fix function/accuracy/performance DTSs/issues, a self-inspection record needs to be attached. ## BC-breaking (Optional) If there are compatibility issues, such as dependencies on cann/torch_npu versions, they need to be explained in the PR. ## Checklist **Before PR**: - [ ] The new code needs to comply with the Clean Code specification. - [ ] The PR content is self-checked, and the expression can be clear and the writing standardized **After PR**: - [ ] CLA has been signed and all committers have signed the CLA in this PR. - [ ] The ci-pipeline is passed, Code Check is passed. See merge request: Ascend/MindSpeed-MM!21543 个月前
[Feature] qwen2.5VL support canonical lora Co-authored-by: chenpeizhe<chenpeizhe1@huawei.com> # message auto-generated for no-merge-commit merge: !1637 merge master into master [Feature] qwen2.5VL support canonical lora Created-by: chenpeizhe Commit-by: chenpeizhe Merged-by: ascend-robot Description: ## Motivation Please describe the motivation of this PR and the goal you want to achieve through this PR. ## Modification Please briefly describe what modification is made in this PR. ## Self-test (Optional) If modifications to this PR may cause/fix function/accuracy/performance DTSs/issues, a self-inspection record needs to be attached. ## BC-breaking (Optional) If there are compatibility issues, such as dependencies on cann/torch_npu versions, they need to be explained in the PR. ## Checklist **Before PR**: - [x] The new code needs to comply with the Clean Code specification. - [x] The PR content is self-checked, and the expression can be clear and the writing standardized **After PR**: - [x] CLA has been signed and all committers have signed the CLA in this PR. - [x] The ci-pipeline is passed, Code Check is passed. See merge request: Ascend/MindSpeed-MM!16377 个月前
!1360 [Bugfix] remove redundancy patch for dummy optimizer in 012 Merge pull request !1360 from chenhaihui/dummy_optimizer_bugfix 9 个月前
[Feature] support local experts unshard Co-authored-by: htwang<wanghaitao60@huawei.com> # message auto-generated for no-merge-commit merge: !2022 merge master into master [Feature] support local experts unshard Created-by: htwang Commit-by: htwang Merged-by: ascend-robot Description: ## Motivation support local experts unshard ## Modification 1、local experts不进行fully shard的情况下,对moe部分梯度进行all reduce 2、fused moe使能前后在Qwen3VLMoeTextSparseMoeBlock forward中的接口统一 ## Self-test (Optional) If modifications to this PR may cause/fix function/accuracy/performance DTSs/issues, a self-inspection record needs to be attached. ## BC-breaking (Optional) If there are compatibility issues, such as dependencies on cann/torch_npu versions, they need to be explained in the PR. ## Checklist **Before PR**: - [ ] The new code needs to comply with the Clean Code specification. - [ ] The PR content is self-checked, and the expression can be clear and the writing standardized **After PR**: - [ ] CLA has been signed and all committers have signed the CLA in this PR. - [ ] The ci-pipeline is passed, Code Check is passed. See merge request: Ascend/MindSpeed-MM!20224 个月前
[Docs] Annotation Standardization Rectification Co-authored-by: LKONE<wanglikai4@huawei.com> # message auto-generated for no-merge-commit merge: !2010 merge master into master [Docs] Annotation Standardization Rectification Created-by: wanglikai1019 Commit-by: LKONE Merged-by: ascend-robot Description: ## Motivation Rectify the annotations for the code repository in accordance with the annotation specification requirements. ## Modification 1. Replace Chinese annotations with English ones; 2. Organize the environment variable documentation and add explanations in the startup scripts (currently implemented on Qwen3VL and Wan2.2); 3. Add annotations to the public dataset functions and public model classes of Wan2.2 and Qwen3VL; 4. Adjust the annotation format (e.g., number of indentations). ## Self-test (Optional) If modifications to this PR may cause/fix function/accuracy/performance DTSs/issues, a self-inspection record needs to be attached. ## BC-breaking (Optional) If there are compatibility issues, such as dependencies on cann/torch_npu versions, they need to be explained in the PR. ## Checklist **Before PR**: - [ ] The new code needs to comply with the Clean Code specification. - [ ] The PR content is self-checked, and the expression can be clear and the writing standardized **After PR**: - [ ] CLA has been signed and all committers have signed the CLA in this PR. - [ ] The ci-pipeline is passed, Code Check is passed. See merge request: Ascend/MindSpeed-MM!20104 个月前
[Feature] The transformers_model supports two loss calculation methods: token level and sample level Co-authored-by: zhangxubin<1656631289@qq.com> # message auto-generated for no-merge-commit merge: !1602 merge master into master [Feature] The transformers_model supports two loss calculation methods: token level and sample level Created-by: MoCuishle-M Commit-by: zhangxubin Merged-by: ascend-robot Description: ## Motivation The transformers_model supports two loss calculation methods: token level and sample level. 1. Enable token-level loss calculation using '--calculate-per-token-loss'. Enable token-level loss calculation using '--calculate-per-sample-loss'. Perform argument validation to prevent both '--calculate-per-token-loss' and '--calculate-per-sample-loss' from being enabled simultaneously. When neither of these two arguments is enabled, the loss computation behavior remains consistent with the current implementation. 2. Modify the description of the default loss calculation behavior for vlm_model 3. Put the compute_token_level_loss function in the utility file. ## Modification The transformers_model supports two loss calculation methods: token level and sample level ## Self-test (Optional) If modifications to this PR may cause/fix function/accuracy/performance DTSs/issues, a self-inspection record needs to be attached. ## BC-breaking (Optional) If there are compatibility issues, such as dependencies on cann/torch_npu versions, they need to be explained in the PR. ## Checklist **Before PR**: - [ ] The new code needs to comply with the Clean Code specification. - [ ] The PR content is self-checked, and the expression can be clear and the writing standardized **After PR**: - [ ] CLA has been signed and all committers have signed the CLA in this PR. - [ ] The ci-pipeline is passed, Code Check is passed. See merge request: Ascend/MindSpeed-MM!16027 个月前
[Modify] Refactor vlm_model forward Co-authored-by: xiaoyue994<xiaoyuanhang@huawei.com> # message auto-generated for no-merge-commit merge: !2067 merge master into master [Modify] Refactor vlm_model forward Created-by: xiaoyue994 Commit-by: xiaoyue994 Merged-by: ascend-robot Description: ## Motivation 1.Refactor vlm_model forward: The original forward function of vlm_model had multiple else if branches, and the audio_features computing was located in func process_multimodal_embeddings, making the code difficult to understand. The calculation order of VIT, Audio, and LLM model was refactored. 2.Rename param hetero_pp: The meaning of the hetero_pp parameter is not clear. This param is changed to the clearer _IS_HETERO_PP_MOUDLE. ## Modification 1.Refactor vlm_model forward 2.Rename param hetero_pp ## Self-test (Optional) If modifications to this PR may cause/fix function/accuracy/performance DTSs/issues, a self-inspection record needs to be attached. ## BC-breaking (Optional) If there are compatibility issues, such as dependencies on cann/torch_npu versions, they need to be explained in the PR. ## Checklist **Before PR**: - [ ] The new code needs to comply with the Clean Code specification. - [ ] The PR content is self-checked, and the expression can be clear and the writing standardized **After PR**: - [ ] CLA has been signed and all committers have signed the CLA in this PR. - [ ] The ci-pipeline is passed, Code Check is passed. See merge request: Ascend/MindSpeed-MM!20674 个月前
!1312 [Refactor] Add inference fa patch Merge pull request !1312 from 王泽/infer_fa_patch 10 个月前
[Modify]Performance Optimization for Qwen3-Omni Thinker MoE Expert Weight Conversion Co-authored-by: yaoyaoxu<xuyaoyao.824404@huawei.com> # message auto-generated for no-merge-commit merge: !1857 merge better_perf_qwen3omni into master [Modify]Performance Optimization for Qwen3-Omni Thinker MoE Expert Weight Conversion Created-by: yaoyaoxu Commit-by: yaoyaoxu Merged-by: ascend-robot Description: ## Motivation Performance Optimization for Qwen3-Omni Thinker MoE Expert Weight Conversion ## Modification Performance Optimization for Qwen3-Omni Thinker MoE Expert Weight Conversion ## Self-test (Optional) If modifications to this PR may cause/fix function/accuracy/performance DTSs/issues, a self-inspection record needs to be attached. ## BC-breaking (Optional) If there are compatibility issues, such as dependencies on cann/torch_npu versions, they need to be explained in the PR. ## Checklist **Before PR**: - [ ] The new code needs to comply with the Clean Code specification. - [ ] The PR content is self-checked, and the expression can be clear and the writing standardized **After PR**: - [ ] CLA has been signed and all committers have signed the CLA in this PR. - [ ] The ci-pipeline is passed, Code Check is passed. See merge request: Ascend/MindSpeed-MM!18575 个月前
[Feature] Add the muon optimizer and adapt it for FSDP2. Co-authored-by: hanyyy<hanyue42@huawei.com> # message auto-generated for no-merge-commit merge: !1964 merge master into master [Feature] Add the muon optimizer and adapt it for FSDP2. Created-by: vasileone Commit-by: hanyyy Merged-by: ascend-robot Description: ## Motivation Added the Muon optimizer and enabled its compatibility with FSDP2. ## Modification - Added Muon optimizer implementation with FSDP2 compatibility. - Added patches for Megatron-core to enable zero-code-change injection. - Fixed the issue of Muon optimizer failing under FSDP2 context: sharded parameter handling, gradient sync, and step behavior. ## Self-test (Optional) If modifications to this PR may cause/fix function/accuracy/performance DTSs/issues, a self-inspection record needs to be attached. ## BC-breaking (Optional) If there are compatibility issues, such as dependencies on cann/torch_npu versions, they need to be explained in the PR. ## Checklist **Before PR**: - The new code needs to comply with the Clean Code specification. - The PR content is self-checked, and the expression can be clear and the writing standardized **After PR**: - CLA has been signed and all committers have signed the CLA in this PR. - The ci-pipeline is passed, Code Check is passed. See merge request: Ascend/MindSpeed-MM!19645 个月前
[Feature] Upgrade the functionality of the bridge to make it compatible with loading all types of weights. Co-authored-by: ningmengliu<liuhao438@huawei.com> # message auto-generated for no-merge-commit merge: !2154 merge master into master [Feature] Upgrade the functionality of the bridge to make it compatible with loading all types of weights. Created-by: ningmenglh Commit-by: ningmengliu Merged-by: ascend-robot Description: ## Motivation Upgrade the functionality of the bridge to make it compatible with loading all types of weights. ## Modification 1、将权重加载方式,使用bridge方式置为True,使其成为常态开启的选项 2、改变bridge的patch方式,使其megatron后端能够接受pt权重类型加载与huggingface权重类型加载;使fsdp2后端能够接受pt类型权重加载,huggingface类型权重,以及dcp类型权重加载 3、去掉之前的部分冗余判断 ## Self-test (Optional) If modifications to this PR may cause/fix function/accuracy/performance DTSs/issues, a self-inspection record needs to be attached. ## BC-breaking (Optional) If there are compatibility issues, such as dependencies on cann/torch_npu versions, they need to be explained in the PR. ## Checklist **Before PR**: - [ ] The new code needs to comply with the Clean Code specification. - [ ] The PR content is self-checked, and the expression can be clear and the writing standardized **After PR**: - [ ] CLA has been signed and all committers have signed the CLA in this PR. - [ ] The ci-pipeline is passed, Code Check is passed. See merge request: Ascend/MindSpeed-MM!21543 个月前
[Bugfix] Support the PreMul_Sum Op Co-authored-by: feng0w0<houyufeng4@huawei.com> # message auto-generated for no-merge-commit merge: !1988 merge master into master [Bugfix] Support the PreMul_Sum Op Created-by: feng0w0 Commit-by: feng0w0 Merged-by: ascend-robot Description: ## Motivation NPU don't support ReduceOp.PreMul_Sum op. ## Modification #### Add premul_sum_patch.py:Differentiate PreMul_Sum into Sum operator supported by NPU and multiplication operation #### Modify patch_manager.py: Add premul_sum patch in PatchesManager.config ## Self-test (Optional) If modifications to this PR may cause/fix function/accuracy/performance DTSs/issues, a self-inspection record needs to be attached. ## BC-breaking (Optional) If there are compatibility issues, such as dependencies on cann/torch_npu versions, they need to be explained in the PR. ## Checklist **Before PR**: - [ ] The new code needs to comply with the Clean Code specification. - [ ] The PR content is self-checked, and the expression can be clear and the writing standardized **After PR**: - [ ] CLA has been signed and all committers have signed the CLA in this PR. - [ ] The ci-pipeline is passed, Code Check is passed. See merge request: Ascend/MindSpeed-MM!19885 个月前
[feature]qwen2.5vl support usp Co-authored-by: cxiaolong<2845907121@qq.com> # message auto-generated for no-merge-commit merge: !1553 merge master into master [feature]qwen2.5vl support usp Created-by: cxiaolong Commit-by: cxiaolong Merged-by: ascend-robot Description: ## Motivation Please describe the motivation of this PR and the goal you want to achieve through this PR. ## Modification Please briefly describe what modification is made in this PR. ## Self-test (Optional) If modifications to this PR may cause/fix function/accuracy/performance DTSs/issues, a self-inspection record needs to be attached. ## BC-breaking (Optional) If there are compatibility issues, such as dependencies on cann/torch_npu versions, they need to be explained in the PR. ## Checklist **Before PR**: - [ ] The new code needs to comply with the Clean Code specification. - [ ] The PR content is self-checked, and the expression can be clear and the writing standardized **After PR**: - [ ] CLA has been signed and all committers have signed the CLA in this PR. - [ ] The ci-pipeline is passed, Code Check is passed. See merge request: Ascend/MindSpeed-MM!15537 个月前
feat(torch): fsdp2 wan2.2 14b t2v support lora finetune Co-authored-by: LKONE<wanglikai4@huawei.com> # message auto-generated for no-merge-commit merge: !2218 merge master into master feat(torch): fsdp2 wan2.2 14b t2v support lora finetune Created-by: wanglikai1019 Commit-by: LKONE Merged-by: ascend-robot Description: ## Motivation Please describe the motivation of this PR and the goal you want to achieve through this PR. ## Modification Please briefly describe what modification is made in this PR. ## Self-test (Optional) If modifications to this PR may cause/fix function/accuracy/performance DTSs/issues, a self-inspection record needs to be attached. ## BC-breaking (Optional) If there are compatibility issues, such as dependencies on cann/torch_npu versions, they need to be explained in the PR. ## Checklist **Before PR**: - [ ] The new code needs to comply with the Clean Code specification. - [ ] The PR content is self-checked, and the expression can be clear and the writing standardized **After PR**: - [ ] CLA has been signed and all committers have signed the CLA in this PR. - [ ] The ci-pipeline is passed, Code Check is passed. See merge request: Ascend/MindSpeed-MM!22183 个月前
[docs] update wan2.2 14b t2v lora finetune readme Co-authored-by: LKONE<wanglikai4@huawei.com> # message auto-generated for no-merge-commit merge: !2230 merge master into master [docs] update wan2.2 14b t2v lora finetune readme Created-by: wanglikai1019 Commit-by: LKONE Merged-by: ascend-robot Description: ## Motivation Please describe the motivation of this PR and the goal you want to achieve through this PR. ## Modification Please briefly describe what modification is made in this PR. ## Self-test (Optional) If modifications to this PR may cause/fix function/accuracy/performance DTSs/issues, a self-inspection record needs to be attached. ## BC-breaking (Optional) If there are compatibility issues, such as dependencies on cann/torch_npu versions, they need to be explained in the PR. ## Checklist **Before PR**: - [ ] The new code needs to comply with the Clean Code specification. - [ ] The PR content is self-checked, and the expression can be clear and the writing standardized **After PR**: - [ ] CLA has been signed and all committers have signed the CLA in this PR. - [ ] The ci-pipeline is passed, Code Check is passed. See merge request: Ascend/MindSpeed-MM!22302 个月前
[feature]qwen2.5vl support usp Co-authored-by: cxiaolong<2845907121@qq.com> # message auto-generated for no-merge-commit merge: !1553 merge master into master [feature]qwen2.5vl support usp Created-by: cxiaolong Commit-by: cxiaolong Merged-by: ascend-robot Description: ## Motivation Please describe the motivation of this PR and the goal you want to achieve through this PR. ## Modification Please briefly describe what modification is made in this PR. ## Self-test (Optional) If modifications to this PR may cause/fix function/accuracy/performance DTSs/issues, a self-inspection record needs to be attached. ## BC-breaking (Optional) If there are compatibility issues, such as dependencies on cann/torch_npu versions, they need to be explained in the PR. ## Checklist **Before PR**: - [ ] The new code needs to comply with the Clean Code specification. - [ ] The PR content is self-checked, and the expression can be clear and the writing standardized **After PR**: - [ ] CLA has been signed and all committers have signed the CLA in this PR. - [ ] The ci-pipeline is passed, Code Check is passed. See merge request: Ascend/MindSpeed-MM!15537 个月前
[bugfix] fix bugs for wan2.2&qwen3vl fsdp checkpointing Co-authored-by: peng-hengduo<penghengduo@huawei.com> # message auto-generated for no-merge-commit merge: !2180 merge wan_checkpointing_bugfix into master [bugfix] fix bugs for wan2.2&qwen3vl fsdp checkpointing Created-by: peng-hengduo Commit-by: peng-hengduo Merged-by: ascend-robot Description: Fix the bugs of wan2.2 qwen3vl breakpointing. See merge request: Ascend/MindSpeed-MM!21803 个月前