MindSpeed-MM/mindspeed_mm/models/transformers/qwen3vl · Ascend/MindSpeed-MM - AtomGit

ascend-robot[Modify] modify qwen3vl sync operation

文件	最后提交记录	最后更新时间
__init__.py	[Feature] support auto model register Co-authored-by: young256<liumingyang16@huawei.com> # message auto-generated for no-merge-commit merge: !2077 merge model-register into master [Feature] support auto model register Created-by: young256 Commit-by: young256 Merged-by: ascend-robot Description: A decorator-based automatic model registration system has been implemented, enabling dynamic discovery and on-demand class loading without manual import statements or configuration files. Core Functionality: 1. Lazy Initialization Modules are imported only when the corresponding model is first requested, optimizing startup performance and resource usage. 2. Fault Containment Import failures are isolated per module, ensuring that issues with one model don't disrupt the registration of others. 3. Comprehensive Diagnostics Detailed error reporting includes available model inventories and specific failure causes for troubleshooting. 4. Extensibility by Design The system adheres to the Open/Closed Principle: Open for extension: Add new models by creating modules with decorated classes Closed for modification: No changes to existing registry code required Usage Workflow: Create a new model module Apply the registration decorator to model classes The system automatically handles discovery and availability See merge request: Ascend/MindSpeed-MM!2077	4 个月前
modeling_qwen3_vl.py	[Modify] modify qwen3vl sync operation Co-authored-by: LKONE<wanglikai4@huawei.com> # message auto-generated for no-merge-commit merge: !2260 merge master into master [Modify] modify qwen3vl sync operation Created-by: wanglikai1019 Commit-by: LKONE Merged-by: ascend-robot Description: ## What this PR does / why we need it? Performance degradation of qwen3vl is affected by synchronous operations ## Does this PR introduce any user-facing change? Modify the synchronization method to reduce the benefits of synchronization ## How was this patch tested? Please explain how to verify the correctness and effectiveness of this feature, as well as its usage constraints and limitations. See merge request: Ascend/MindSpeed-MM!2260	2 个月前
modeling_qwen3_vl_moe.py	[Modify] modify qwen3vl sync operation Co-authored-by: LKONE<wanglikai4@huawei.com> # message auto-generated for no-merge-commit merge: !2260 merge master into master [Modify] modify qwen3vl sync operation Created-by: wanglikai1019 Commit-by: LKONE Merged-by: ascend-robot Description: ## What this PR does / why we need it? Performance degradation of qwen3vl is affected by synchronous operations ## Does this PR introduce any user-facing change? Modify the synchronization method to reduce the benefits of synchronization ## How was this patch tested? Please explain how to verify the correctness and effectiveness of this feature, as well as its usage constraints and limitations. See merge request: Ascend/MindSpeed-MM!2260	2 个月前
modules.py	feat(torch): Qwen3-Omni support ulysses cp / fix(torch): repeat_kv and activation_offload bug Co-authored-by: yaoyaoxu<xuyaoyao.824404@huawei.com> # message auto-generated for no-merge-commit merge: !2188 merge qwen3omni_ulysses_cp into master feat(torch): Qwen3-Omni support ulysses cp / fix(torch): repeat_kv and activation_offload bug Created-by: yaoyaoxu Commit-by: yaoyaoxu Merged-by: ascend-robot Description: ## Motivation 当前序列长度无法支持128K，需要适配CP以支持128K长序列 ## Modification 1.Qwen3-Omni支持ulysses cp：对AuT、ViT、LLM都做了适配；如果开启CP，但没有传入音频数据或CP size > = seq_len，则不对音频模块做CP处理 2.修复repeat_kv的bug 3.修复开启activation_offload配置时的内存泄漏bug ## Self-test (Optional) If modifications to this PR may cause/fix function/accuracy/performance DTSs/issues, a self-inspection record needs to be attached. ## BC-breaking (Optional) If there are compatibility issues, such as dependencies on cann/torch_npu versions, they need to be explained in the PR. ## Checklist Before PR: - [ ] The new code needs to comply with the Clean Code specification. - [ ] The PR content is self-checked, and the expression can be clear and the writing standardized After PR: - [ ] CLA has been signed and all committers have signed the CLA in this PR. - [ ] The ci-pipeline is passed, Code Check is passed. See merge request: Ascend/MindSpeed-MM!2188	2 个月前
output.py	[Modify] Refactor the fsdp2 model Co-authored-by: htwang<wanghaitao60@huawei.com> # message auto-generated for no-merge-commit merge: !1793 merge master into master [Modify] Refactor the fsdp2 model Created-by: htwang Commit-by: htwang Merged-by: ascend-robot Description: ## Motivation 1、修改modelzoo至modelhub 2、修改hf_src至transformers ## Modification 1、修改modelzoo至modelhub 2、修改hf_src至transformers ## Self-test (Optional) If modifications to this PR may cause/fix function/accuracy/performance DTSs/issues, a self-inspection record needs to be attached. ## BC-breaking (Optional) If there are compatibility issues, such as dependencies on cann/torch_npu versions, they need to be explained in the PR. ## Checklist Before PR: - [x] The new code needs to comply with the Clean Code specification. - [x] The PR content is self-checked, and the expression can be clear and the writing standardized After PR: - [x] CLA has been signed and all committers have signed the CLA in this PR. - [x] The ci-pipeline is passed, Code Check is passed. See merge request: Ascend/MindSpeed-MM!1793	6 个月前
qwen3vl.py	[Refactor]Refactor qwen3omni attention and use fusion op Co-authored-by: yaoyaoxu<xuyaoyao.824404@huawei.com> # message auto-generated for no-merge-commit merge: !2167 merge qwen3omni_fuse_ops into master [Refactor]Refactor qwen3omni attention and use fusion op Created-by: yaoyaoxu Commit-by: yaoyaoxu Merged-by: ascend-robot Description: ## Motivation 为了提高性能，fa2、rmsnorm、rope均使用融合算子：图文16K的优化效果（fa2+TND；rmsnorm、rope均使用融合算子）：计算、通信分别加速25%、20% ## Modification 1.重构attention，支持fa2、sdpa、eager，layout均支持BNSD，其中fa2还支持TND；并适配ut，看护代码质量 2.修复 freeze bug 3.融合算子：fa2、rmsnorm、rope均使用融合算子 4.将attention_utils.py作为公共模块，qwen3vl和qwen3omni共用，并修复bug ## Self-test (Optional) If modifications to this PR may cause/fix function/accuracy/performance DTSs/issues, a self-inspection record needs to be attached. ## BC-breaking (Optional) If there are compatibility issues, such as dependencies on cann/torch_npu versions, they need to be explained in the PR. ## Checklist Before PR: - [ ] The new code needs to comply with the Clean Code specification. - [ ] The PR content is self-checked, and the expression can be clear and the writing standardized After PR: - [ ] CLA has been signed and all committers have signed the CLA in this PR. - [ ] The ci-pipeline is passed, Code Check is passed. See merge request: Ascend/MindSpeed-MM!2167	3 个月前