[Docs] Annotation Standardization Rectification
Co-authored-by: LKONE<wanglikai4@huawei.com>
# message auto-generated for no-merge-commit merge:
!2010 merge master into master
[Docs] Annotation Standardization Rectification
Created-by: wanglikai1019
Commit-by: LKONE
Merged-by: ascend-robot
Description: ## Motivation
Rectify the annotations for the code repository in accordance with the annotation specification requirements.
## Modification
1. Replace Chinese annotations with English ones;
2. Organize the environment variable documentation and add explanations in the startup scripts (currently implemented on Qwen3VL and Wan2.2);
3. Add annotations to the public dataset functions and public model classes of Wan2.2 and Qwen3VL;
4. Adjust the annotation format (e.g., number of indentations).
## Self-test (Optional)
If modifications to this PR may cause/fix function/accuracy/performance DTSs/issues, a self-inspection record needs to be attached.
## BC-breaking (Optional)
If there are compatibility issues, such as dependencies on cann/torch_npu versions, they need to be explained in the PR.
## Checklist
**Before PR**:
- [ ] The new code needs to comply with the Clean Code specification.
- [ ] The PR content is self-checked, and the expression can be clear and the writing standardized
**After PR**:
- [ ] CLA has been signed and all committers have signed the CLA in this PR.
- [ ] The ci-pipeline is passed, Code Check is passed.
See merge request: Ascend/MindSpeed-MM!2010
[Feature]CP-reconstruct for hetero-parallel
Co-authored-by: mazhuang<mazhuang21@huawei.com>
# message auto-generated for no-merge-commit merge:
!2001 merge CP-re-construct into master
[Feature]CP-reconstruct for hetero-parallel
Created-by: mazhuang1234
Commit-by: mazhuang
Merged-by: ascend-robot
Description: ## Motivation
[Feature]CP-reconstruct for hetero-parallel
## Modification
Please briefly describe what modification is made in this PR.
## Self-test (Optional)
If modifications to this PR may cause/fix function/accuracy/performance DTSs/issues, a self-inspection record needs to be attached.

## BC-breaking (Optional)
If there are compatibility issues, such as dependencies on cann/torch_npu versions, they need to be explained in the PR.
## Checklist
**Before PR**:
- [ ] The new code needs to comply with the Clean Code specification.
- [ ] The PR content is self-checked, and the expression can be clear and the writing standardized
**After PR**:
- [ ] CLA has been signed and all committers have signed the CLA in this PR.
- [ ] The ci-pipeline is passed, Code Check is passed.
See merge request: Ascend/MindSpeed-MM!2001
[Docs] Annotation Standardization Rectification
Co-authored-by: LKONE<wanglikai4@huawei.com>
# message auto-generated for no-merge-commit merge:
!2010 merge master into master
[Docs] Annotation Standardization Rectification
Created-by: wanglikai1019
Commit-by: LKONE
Merged-by: ascend-robot
Description: ## Motivation
Rectify the annotations for the code repository in accordance with the annotation specification requirements.
## Modification
1. Replace Chinese annotations with English ones;
2. Organize the environment variable documentation and add explanations in the startup scripts (currently implemented on Qwen3VL and Wan2.2);
3. Add annotations to the public dataset functions and public model classes of Wan2.2 and Qwen3VL;
4. Adjust the annotation format (e.g., number of indentations).
## Self-test (Optional)
If modifications to this PR may cause/fix function/accuracy/performance DTSs/issues, a self-inspection record needs to be attached.
## BC-breaking (Optional)
If there are compatibility issues, such as dependencies on cann/torch_npu versions, they need to be explained in the PR.
## Checklist
**Before PR**:
- [ ] The new code needs to comply with the Clean Code specification.
- [ ] The PR content is self-checked, and the expression can be clear and the writing standardized
**After PR**:
- [ ] CLA has been signed and all committers have signed the CLA in this PR.
- [ ] The ci-pipeline is passed, Code Check is passed.
See merge request: Ascend/MindSpeed-MM!2010
[Docs] Annotation Standardization Rectification
Co-authored-by: LKONE<wanglikai4@huawei.com>
# message auto-generated for no-merge-commit merge:
!2010 merge master into master
[Docs] Annotation Standardization Rectification
Created-by: wanglikai1019
Commit-by: LKONE
Merged-by: ascend-robot
Description: ## Motivation
Rectify the annotations for the code repository in accordance with the annotation specification requirements.
## Modification
1. Replace Chinese annotations with English ones;
2. Organize the environment variable documentation and add explanations in the startup scripts (currently implemented on Qwen3VL and Wan2.2);
3. Add annotations to the public dataset functions and public model classes of Wan2.2 and Qwen3VL;
4. Adjust the annotation format (e.g., number of indentations).
## Self-test (Optional)
If modifications to this PR may cause/fix function/accuracy/performance DTSs/issues, a self-inspection record needs to be attached.
## BC-breaking (Optional)
If there are compatibility issues, such as dependencies on cann/torch_npu versions, they need to be explained in the PR.
## Checklist
**Before PR**:
- [ ] The new code needs to comply with the Clean Code specification.
- [ ] The PR content is self-checked, and the expression can be clear and the writing standardized
**After PR**:
- [ ] CLA has been signed and all committers have signed the CLA in this PR.
- [ ] The ci-pipeline is passed, Code Check is passed.
See merge request: Ascend/MindSpeed-MM!2010
[Feature]VACE Model and Data Process
Co-authored-by: feng0w0<houyufeng4@huawei.com>
# message auto-generated for no-merge-commit merge:
!1669 merge master into master
[Feature]VACE Model and Data Process
Created-by: feng0w0
Commit-by: feng0w0
Merged-by: ascend-robot
Description: ## Motivation
Support VACE Model
## Modification
Added VACE data processing pipeline and model components.
## Self-test (Optional)
If modifications to this PR may cause/fix function/accuracy/performance DTSs/issues, a self-inspection record needs to be attached.
## BC-breaking (Optional)
If there are compatibility issues, such as dependencies on cann/torch_npu versions, they need to be explained in the PR.
## Checklist
**Before PR**:
- [ ] The new code needs to comply with the Clean Code specification.
- [ ] The PR content is self-checked, and the expression can be clear and the writing standardized
**After PR**:
- [ ] CLA has been signed and all committers have signed the CLA in this PR.
- [ ] The ci-pipeline is passed, Code Check is passed.
See merge request: Ascend/MindSpeed-MM!1669
[Test] chunkloss UT
Co-authored-by: liyingxuan<liyingxuan3@huawei.com>
# message auto-generated for no-merge-commit merge:
!1751 merge chunkloss_ut into master
[Test] chunkloss UT
Created-by: liyx616
Commit-by: liyingxuan
Merged-by: ascend-robot
Description: ## Motivation
chunkloss ut
## Modification
增加chunkloss ut,并修复两个笔误
## Self-test (Optional)
If modifications to this PR may cause/fix function/accuracy/performance DTSs/issues, a self-inspection record needs to be attached.
## BC-breaking (Optional)
If there are compatibility issues, such as dependencies on cann/torch_npu versions, they need to be explained in the PR.
## Checklist
**Before PR**:
- [x] The new code needs to comply with the Clean Code specification.
- [x] The PR content is self-checked, and the expression can be clear and the writing standardized
**After PR**:
- [x] CLA has been signed and all committers have signed the CLA in this PR.
- [x] The ci-pipeline is passed, Code Check is passed.
See merge request: Ascend/MindSpeed-MM!1751
[Docs] Annotation Standardization Rectification
Co-authored-by: LKONE<wanglikai4@huawei.com>
# message auto-generated for no-merge-commit merge:
!2010 merge master into master
[Docs] Annotation Standardization Rectification
Created-by: wanglikai1019
Commit-by: LKONE
Merged-by: ascend-robot
Description: ## Motivation
Rectify the annotations for the code repository in accordance with the annotation specification requirements.
## Modification
1. Replace Chinese annotations with English ones;
2. Organize the environment variable documentation and add explanations in the startup scripts (currently implemented on Qwen3VL and Wan2.2);
3. Add annotations to the public dataset functions and public model classes of Wan2.2 and Qwen3VL;
4. Adjust the annotation format (e.g., number of indentations).
## Self-test (Optional)
If modifications to this PR may cause/fix function/accuracy/performance DTSs/issues, a self-inspection record needs to be attached.
## BC-breaking (Optional)
If there are compatibility issues, such as dependencies on cann/torch_npu versions, they need to be explained in the PR.
## Checklist
**Before PR**:
- [ ] The new code needs to comply with the Clean Code specification.
- [ ] The PR content is self-checked, and the expression can be clear and the writing standardized
**After PR**:
- [ ] CLA has been signed and all committers have signed the CLA in this PR.
- [ ] The ci-pipeline is passed, Code Check is passed.
See merge request: Ascend/MindSpeed-MM!2010
[Bugfix]fix fpdt nonblocking data dependency bug
Co-authored-by: Bian Zheng<bianzheng8@huawei.com>
# message auto-generated for no-merge-commit merge:
!1951 merge fix_fpdt_nonblocking into master
[Bugfix]fix fpdt nonblocking data dependency bug
Created-by: rewindz
Commit-by: Bian Zheng
Merged-by: ascend-robot
Description: ## Motivation
Fix the non-blocking data dependency bug in the FPDT (Fully Pipelined Distributed Transformer, aka ulysses offload) mechanism. In the current implementation, data transfer between CPU and NPU uses non-blocking mode, but data dependencies are not properly handled, which may lead to data inconsistency or calculation errors.
## Modification
1. Optimize data transfer mode : Change data transfer between CPU and NPU from non_blocking=False to non_blocking=True to improve data transfer efficiency.
2. Add stream synchronization mechanism :
- When loading data to NPU, use offload_stream for asynchronous loading and wait for completion in compute_stream
- Add compute_stream.synchronize() and general_offload_stream.synchronize() to ensure data transfer is complete before computation
- Add torch_npu.npu.synchronize() before backpropagation to ensure all operations are complete
3. Refactor data loading logic : Encapsulate data loading operations in a stream context manager to clarify the asynchronous nature of data transfer.
## Self-test (Optional)
The correctness of the modification has been verified through the following tests:
- Functional test: Ensure the FPDT mechanism works normally in non-blocking mode
- Performance test: Verify the speedup of step_time with 10~15% (baseline: blocking FPDT)
- Consistency test: Ensure the calculation results are consistent with those before modification
- Precision alignment: Precision has been aligned with PR #1886
## BC-breaking (Optional)
No compatibility issues. The modification only involves internal implementation details and does not affect external interfaces. The required cann/torch_npu version remains consistent with before modification.
## Checklist
**Before PR**:
- [ ] The new code needs to comply with the Clean Code specification.
- [ ] The PR content is self-checked, and the expression can be clear and the writing standardized
**After PR**:
- [ ] CLA has been signed and all committers have signed the CLA in this PR.
- [ ] The ci-pipeline is passed, Code Check is passed.
See merge request: Ascend/MindSpeed-MM!1951
[Modify] EP support fused expert-hidden_states dim & performance improve
Co-authored-by: liyingxuan<liyingxuan3@huawei.com>
# message auto-generated for no-merge-commit merge:
!2032 merge ep into master
[Modify] EP support fused expert-hidden_states dim & performance improve
Created-by: liyx616
Commit-by: liyingxuan
Merged-by: ascend-robot
Description: ## Motivation
EP 支持专家合轴,EP性能提升
## Modification
1. 兼容专家合轴: 兼容expert部分参数的shape是 [num_experts, input_dim, output_dim]和 [num_experts* input_dim, output_dim]两种情况
2. 移除EP dispatch和combine部分的sort_chunks_by_idxs操作,该操作执行效率很低且涉及较多同步,造成开启EP后host bound
## Self-test (Optional)
If modifications to this PR may cause/fix function/accuracy/performance DTSs/issues, a self-inspection record needs to be attached.
## BC-breaking (Optional)
If there are compatibility issues, such as dependencies on cann/torch_npu versions, they need to be explained in the PR.
## Checklist
**Before PR**:
- [ ] The new code needs to comply with the Clean Code specification.
- [ ] The PR content is self-checked, and the expression can be clear and the writing standardized
**After PR**:
- [ ] CLA has been signed and all committers have signed the CLA in this PR.
- [ ] The ci-pipeline is passed, Code Check is passed.
See merge request: Ascend/MindSpeed-MM!2032
[Feature] qwen3vl improve performance
Co-authored-by: liyingxuan<liyingxuan3@huawei.com>
# message auto-generated for no-merge-commit merge:
!1717 merge master into master
[Feature] qwen3vl improve performance
Created-by: liyx616
Commit-by: liyingxuan
Merged-by: ascend-robot
Description: ## Motivation
qwen3vl improve performance
## Modification
1. moe block融合算子(默认关闭)
2. rmsnorm和rope融合算子
## Self-test (Optional)
If modifications to this PR may cause/fix function/accuracy/performance DTSs/issues, a self-inspection record needs to be attached.
## BC-breaking (Optional)
If there are compatibility issues, such as dependencies on cann/torch_npu versions, they need to be explained in the PR.
## Checklist
**Before PR**:
- [x] The new code needs to comply with the Clean Code specification.
- [x] The PR content is self-checked, and the expression can be clear and the writing standardized
**After PR**:
- [x] CLA has been signed and all committers have signed the CLA in this PR.
- [x] The ci-pipeline is passed, Code Check is passed.
See merge request: Ascend/MindSpeed-MM!1717
[Docs] Annotation Standardization Rectification
Co-authored-by: LKONE<wanglikai4@huawei.com>
# message auto-generated for no-merge-commit merge:
!2010 merge master into master
[Docs] Annotation Standardization Rectification
Created-by: wanglikai1019
Commit-by: LKONE
Merged-by: ascend-robot
Description: ## Motivation
Rectify the annotations for the code repository in accordance with the annotation specification requirements.
## Modification
1. Replace Chinese annotations with English ones;
2. Organize the environment variable documentation and add explanations in the startup scripts (currently implemented on Qwen3VL and Wan2.2);
3. Add annotations to the public dataset functions and public model classes of Wan2.2 and Qwen3VL;
4. Adjust the annotation format (e.g., number of indentations).
## Self-test (Optional)
If modifications to this PR may cause/fix function/accuracy/performance DTSs/issues, a self-inspection record needs to be attached.
## BC-breaking (Optional)
If there are compatibility issues, such as dependencies on cann/torch_npu versions, they need to be explained in the PR.
## Checklist
**Before PR**:
- [ ] The new code needs to comply with the Clean Code specification.
- [ ] The PR content is self-checked, and the expression can be clear and the writing standardized
**After PR**:
- [ ] CLA has been signed and all committers have signed the CLA in this PR.
- [ ] The ci-pipeline is passed, Code Check is passed.
See merge request: Ascend/MindSpeed-MM!2010
[Bugfix]resolve multiple issues — unused code, index out of bounds, undefined vars, resource leaks
Co-authored-by: zhangxubin<1656631289@qq.com>
# message auto-generated for no-merge-commit merge:
!1662 merge master into master
[Bugfix]resolve multiple issues — unused code, index out of bounds, undefined vars, resource leaks
Created-by: MoCuishle-M
Commit-by: zhangxubin
Merged-by: ascend-robot
Description: ## Motivation
Fix some security issues.
## Modification
The issues fixed are as follows:
1. Removed unused code and fixed logic errors
2. Fixed array out-of-bounds access.
3. Fixed usage of undefined variables
4. Fixed resource leaks by ensuring proper release
## Self-test (Optional)
If modifications to this PR may cause/fix function/accuracy/performance DTSs/issues, a self-inspection record needs to be attached.
## BC-breaking (Optional)
If there are compatibility issues, such as dependencies on cann/torch_npu versions, they need to be explained in the PR.
## Checklist
**Before PR**:
- [x] The new code needs to comply with the Clean Code specification.
- [x] The PR content is self-checked, and the expression can be clear and the writing standardized
**After PR**:
- [x] CLA has been signed and all committers have signed the CLA in this PR.
- [x] The ci-pipeline is passed, Code Check is passed.
See merge request: Ascend/MindSpeed-MM!1662
[Docs] Annotation Standardization Rectification
Co-authored-by: LKONE<wanglikai4@huawei.com>
# message auto-generated for no-merge-commit merge:
!2010 merge master into master
[Docs] Annotation Standardization Rectification
Created-by: wanglikai1019
Commit-by: LKONE
Merged-by: ascend-robot
Description: ## Motivation
Rectify the annotations for the code repository in accordance with the annotation specification requirements.
## Modification
1. Replace Chinese annotations with English ones;
2. Organize the environment variable documentation and add explanations in the startup scripts (currently implemented on Qwen3VL and Wan2.2);
3. Add annotations to the public dataset functions and public model classes of Wan2.2 and Qwen3VL;
4. Adjust the annotation format (e.g., number of indentations).
## Self-test (Optional)
If modifications to this PR may cause/fix function/accuracy/performance DTSs/issues, a self-inspection record needs to be attached.
## BC-breaking (Optional)
If there are compatibility issues, such as dependencies on cann/torch_npu versions, they need to be explained in the PR.
## Checklist
**Before PR**:
- [ ] The new code needs to comply with the Clean Code specification.
- [ ] The PR content is self-checked, and the expression can be clear and the writing standardized
**After PR**:
- [ ] CLA has been signed and all committers have signed the CLA in this PR.
- [ ] The ci-pipeline is passed, Code Check is passed.
See merge request: Ascend/MindSpeed-MM!2010