文件最后提交记录最后更新时间
[Feature] support auto model register Co-authored-by: young256<liumingyang16@huawei.com> # message auto-generated for no-merge-commit merge: !2077 merge model-register into master [Feature] support auto model register Created-by: young256 Commit-by: young256 Merged-by: ascend-robot Description: A decorator-based automatic model registration system has been implemented, enabling dynamic discovery and on-demand class loading without manual import statements or configuration files. Core Functionality: 1. Lazy Initialization Modules are imported only when the corresponding model is first requested, optimizing startup performance and resource usage. 2. Fault Containment Import failures are isolated per module, ensuring that issues with one model don't disrupt the registration of others. 3. Comprehensive Diagnostics Detailed error reporting includes available model inventories and specific failure causes for troubleshooting. 4. Extensibility by Design The system adheres to the Open/Closed Principle: Open for extension: Add new models by creating modules with decorated classes Closed for modification: No changes to existing registry code required Usage Workflow: Create a new model module Apply the registration decorator to model classes The system automatically handles discovery and availability See merge request: Ascend/MindSpeed-MM!20774 个月前
feat(torch): Qwen3-Omni support ulysses cp / fix(torch): repeat_kv and activation_offload bug Co-authored-by: yaoyaoxu<xuyaoyao.824404@huawei.com> # message auto-generated for no-merge-commit merge: !2188 merge qwen3omni_ulysses_cp into master feat(torch): Qwen3-Omni support ulysses cp / fix(torch): repeat_kv and activation_offload bug Created-by: yaoyaoxu Commit-by: yaoyaoxu Merged-by: ascend-robot Description: ## Motivation 当前序列长度无法支持128K,需要适配CP以支持128K长序列 ## Modification 1.Qwen3-Omni支持ulysses cp:对AuT、ViT、LLM都做了适配; 如果开启CP,但没有传入音频数据或CP size > = seq_len,则不对音频模块做CP处理 2.修复repeat_kv的bug 3.修复开启activation_offload配置时的内存泄漏bug ## Self-test (Optional) If modifications to this PR may cause/fix function/accuracy/performance DTSs/issues, a self-inspection record needs to be attached. ## BC-breaking (Optional) If there are compatibility issues, such as dependencies on cann/torch_npu versions, they need to be explained in the PR. ## Checklist **Before PR**: - [ ] The new code needs to comply with the Clean Code specification. - [ ] The PR content is self-checked, and the expression can be clear and the writing standardized **After PR**: - [ ] CLA has been signed and all committers have signed the CLA in this PR. - [ ] The ci-pipeline is passed, Code Check is passed. See merge request: Ascend/MindSpeed-MM!21882 个月前
[Refactor] compatible for transformers-5.0.0(7a833d1c) Co-authored-by: zhangxubin<1656631289@qq.com> # message auto-generated for no-merge-commit merge: !2079 merge master into master [Refactor] compatible for transformers-5.0.0(7a833d1c) Created-by: MoCuishle-M Commit-by: MoCuishle-M;zhangxubin Merged-by: ascend-robot Description: ## Motivation compatible for transformers-5.0.0(7a833d1c). ## Modification 该PR大部分改动来自https://gitcode.com/Ascend/MindSpeed-MM/pull/2040 ,只修改了lora patch的实现。 1.兼容qwen2/2.5/3vl transformers 5.0.0 rope 配置 2.规避pretrain_transformer forward参数检验 3.过滤相关参数兼容 transformers 5.0.0 4.修复ci打屏日志utf-8编解码问题 5.lora适配peft 0.18.1 ## Self-test (Optional) If modifications to this PR may cause/fix function/accuracy/performance DTSs/issues, a self-inspection record needs to be attached. ## BC-breaking (Optional) If there are compatibility issues, such as dependencies on cann/torch_npu versions, they need to be explained in the PR. ## Checklist **Before PR**: - [x] The new code needs to comply with the Clean Code specification. - [x] The PR content is self-checked, and the expression can be clear and the writing standardized **After PR**: - [x] CLA has been signed and all committers have signed the CLA in this PR. - [x] The ci-pipeline is passed, Code Check is passed. See merge request: Ascend/MindSpeed-MM!20794 个月前
feat(torch): Qwen3-Omni support ulysses cp / fix(torch): repeat_kv and activation_offload bug Co-authored-by: yaoyaoxu<xuyaoyao.824404@huawei.com> # message auto-generated for no-merge-commit merge: !2188 merge qwen3omni_ulysses_cp into master feat(torch): Qwen3-Omni support ulysses cp / fix(torch): repeat_kv and activation_offload bug Created-by: yaoyaoxu Commit-by: yaoyaoxu Merged-by: ascend-robot Description: ## Motivation 当前序列长度无法支持128K,需要适配CP以支持128K长序列 ## Modification 1.Qwen3-Omni支持ulysses cp:对AuT、ViT、LLM都做了适配; 如果开启CP,但没有传入音频数据或CP size > = seq_len,则不对音频模块做CP处理 2.修复repeat_kv的bug 3.修复开启activation_offload配置时的内存泄漏bug ## Self-test (Optional) If modifications to this PR may cause/fix function/accuracy/performance DTSs/issues, a self-inspection record needs to be attached. ## BC-breaking (Optional) If there are compatibility issues, such as dependencies on cann/torch_npu versions, they need to be explained in the PR. ## Checklist **Before PR**: - [ ] The new code needs to comply with the Clean Code specification. - [ ] The PR content is self-checked, and the expression can be clear and the writing standardized **After PR**: - [ ] CLA has been signed and all committers have signed the CLA in this PR. - [ ] The ci-pipeline is passed, Code Check is passed. See merge request: Ascend/MindSpeed-MM!21882 个月前
[Modify] modify qwen3vl sync operation Co-authored-by: LKONE<wanglikai4@huawei.com> # message auto-generated for no-merge-commit merge: !2260 merge master into master [Modify] modify qwen3vl sync operation Created-by: wanglikai1019 Commit-by: LKONE Merged-by: ascend-robot Description: ## What this PR does / why we need it? Performance degradation of qwen3vl is affected by synchronous operations ## Does this PR introduce any user-facing change? Modify the synchronization method to reduce the benefits of synchronization ## How was this patch tested? Please explain how to verify the correctness and effectiveness of this feature, as well as its usage constraints and limitations. See merge request: Ascend/MindSpeed-MM!22602 个月前
[Refactor]Refactor qwen3omni attention and use fusion op Co-authored-by: yaoyaoxu<xuyaoyao.824404@huawei.com> # message auto-generated for no-merge-commit merge: !2167 merge qwen3omni_fuse_ops into master [Refactor]Refactor qwen3omni attention and use fusion op Created-by: yaoyaoxu Commit-by: yaoyaoxu Merged-by: ascend-robot Description: ## Motivation 为了提高性能,fa2、rmsnorm、rope均使用融合算子: 图文16K的优化效果(fa2+TND;rmsnorm、rope均使用融合算子):计算、通信分别加速25%、20% ## Modification 1.重构attention,支持fa2、sdpa、eager,layout均支持BNSD,其中fa2还支持TND;并适配ut,看护代码质量 2.修复 freeze bug 3.融合算子:fa2、rmsnorm、rope均使用融合算子 4.将attention_utils.py作为公共模块,qwen3vl和qwen3omni共用,并修复bug ## Self-test (Optional) If modifications to this PR may cause/fix function/accuracy/performance DTSs/issues, a self-inspection record needs to be attached. ## BC-breaking (Optional) If there are compatibility issues, such as dependencies on cann/torch_npu versions, they need to be explained in the PR. ## Checklist **Before PR**: - [ ] The new code needs to comply with the Clean Code specification. - [ ] The PR content is self-checked, and the expression can be clear and the writing standardized **After PR**: - [ ] CLA has been signed and all committers have signed the CLA in this PR. - [ ] The ci-pipeline is passed, Code Check is passed. See merge request: Ascend/MindSpeed-MM!21673 个月前
[Bugfix] Fix cpu offload bug Co-authored-by: LKONE<wanglikai4@huawei.com> # message auto-generated for no-merge-commit merge: !2203 merge master into master [Bugfix] Fix cpu offload bug Created-by: wanglikai1019 Commit-by: LKONE Merged-by: ascend-robot Description: ## Motivation Even when cpu offload is closed, tensor will be offload to cpu and be empty ## Modification Only when tensor.device equals to cpu, offload it to cpu ## Self-test (Optional) If modifications to this PR may cause/fix function/accuracy/performance DTSs/issues, a self-inspection record needs to be attached. ## BC-breaking (Optional) If there are compatibility issues, such as dependencies on cann/torch_npu versions, they need to be explained in the PR. ## Checklist **Before PR**: - [ ] The new code needs to comply with the Clean Code specification. - [ ] The PR content is self-checked, and the expression can be clear and the writing standardized **After PR**: - [ ] CLA has been signed and all committers have signed the CLA in this PR. - [ ] The ci-pipeline is passed, Code Check is passed. See merge request: Ascend/MindSpeed-MM!22033 个月前
cleancode Co-authored-by: liyingxuan<liyingxuan3@huawei.com> # message auto-generated for no-merge-commit merge: !2323 merge master into 26.0.0 cleancode Created-by: liyx616 Commit-by: liyingxuan Merged-by: ascend-robot Description: ## What this PR does / why we need it? cleancode整改 ## Does this PR introduce any user-facing change? cleancode整改 ## How was this patch tested? Please explain how to verify the correctness and effectiveness of this feature, as well as its usage constraints and limitations. See merge request: Ascend/MindSpeed-MM!23232 个月前
feat(torch): Qwen3-Omni support ulysses cp / fix(torch): repeat_kv and activation_offload bug Co-authored-by: yaoyaoxu<xuyaoyao.824404@huawei.com> # message auto-generated for no-merge-commit merge: !2188 merge qwen3omni_ulysses_cp into master feat(torch): Qwen3-Omni support ulysses cp / fix(torch): repeat_kv and activation_offload bug Created-by: yaoyaoxu Commit-by: yaoyaoxu Merged-by: ascend-robot Description: ## Motivation 当前序列长度无法支持128K,需要适配CP以支持128K长序列 ## Modification 1.Qwen3-Omni支持ulysses cp:对AuT、ViT、LLM都做了适配; 如果开启CP,但没有传入音频数据或CP size > = seq_len,则不对音频模块做CP处理 2.修复repeat_kv的bug 3.修复开启activation_offload配置时的内存泄漏bug ## Self-test (Optional) If modifications to this PR may cause/fix function/accuracy/performance DTSs/issues, a self-inspection record needs to be attached. ## BC-breaking (Optional) If there are compatibility issues, such as dependencies on cann/torch_npu versions, they need to be explained in the PR. ## Checklist **Before PR**: - [ ] The new code needs to comply with the Clean Code specification. - [ ] The PR content is self-checked, and the expression can be clear and the writing standardized **After PR**: - [ ] CLA has been signed and all committers have signed the CLA in this PR. - [ ] The ci-pipeline is passed, Code Check is passed. See merge request: Ascend/MindSpeed-MM!21882 个月前
[Feature] support auto model register Co-authored-by: young256<liumingyang16@huawei.com> # message auto-generated for no-merge-commit merge: !2077 merge model-register into master [Feature] support auto model register Created-by: young256 Commit-by: young256 Merged-by: ascend-robot Description: A decorator-based automatic model registration system has been implemented, enabling dynamic discovery and on-demand class loading without manual import statements or configuration files. Core Functionality: 1. Lazy Initialization Modules are imported only when the corresponding model is first requested, optimizing startup performance and resource usage. 2. Fault Containment Import failures are isolated per module, ensuring that issues with one model don't disrupt the registration of others. 3. Comprehensive Diagnostics Detailed error reporting includes available model inventories and specific failure causes for troubleshooting. 4. Extensibility by Design The system adheres to the Open/Closed Principle: Open for extension: Add new models by creating modules with decorated classes Closed for modification: No changes to existing registry code required Usage Workflow: Create a new model module Apply the registration decorator to model classes The system automatically handles discovery and availability See merge request: Ascend/MindSpeed-MM!20774 个月前
[Bugfix] fix ep grad caluation and clip grad Co-authored-by: htwang<wanghaitao60@huawei.com> # message auto-generated for no-merge-commit merge: !2070 merge master into master [Bugfix] fix ep grad caluation and clip grad Created-by: htwang Commit-by: htwang Merged-by: ascend-robot Description: 1、修改fsdp2场景下ep场景下grad norm计算逻辑bug(moe参数在ep group上未累加) 2、修复clip grad时因dtype导致的报错 See merge request: Ascend/MindSpeed-MM!20704 个月前
cleancode Co-authored-by: liyingxuan<liyingxuan3@huawei.com> # message auto-generated for no-merge-commit merge: !2323 merge master into 26.0.0 cleancode Created-by: liyx616 Commit-by: liyingxuan Merged-by: ascend-robot Description: ## What this PR does / why we need it? cleancode整改 ## Does this PR introduce any user-facing change? cleancode整改 ## How was this patch tested? Please explain how to verify the correctness and effectiveness of this feature, as well as its usage constraints and limitations. See merge request: Ascend/MindSpeed-MM!23232 个月前
[Feature] support auto model register Co-authored-by: young256<liumingyang16@huawei.com> # message auto-generated for no-merge-commit merge: !2077 merge model-register into master [Feature] support auto model register Created-by: young256 Commit-by: young256 Merged-by: ascend-robot Description: A decorator-based automatic model registration system has been implemented, enabling dynamic discovery and on-demand class loading without manual import statements or configuration files. Core Functionality: 1. Lazy Initialization Modules are imported only when the corresponding model is first requested, optimizing startup performance and resource usage. 2. Fault Containment Import failures are isolated per module, ensuring that issues with one model don't disrupt the registration of others. 3. Comprehensive Diagnostics Detailed error reporting includes available model inventories and specific failure causes for troubleshooting. 4. Extensibility by Design The system adheres to the Open/Closed Principle: Open for extension: Add new models by creating modules with decorated classes Closed for modification: No changes to existing registry code required Usage Workflow: Create a new model module Apply the registration decorator to model classes The system automatically handles discovery and availability See merge request: Ascend/MindSpeed-MM!20774 个月前
[Feature] support auto model register Co-authored-by: young256<liumingyang16@huawei.com> # message auto-generated for no-merge-commit merge: !2077 merge model-register into master [Feature] support auto model register Created-by: young256 Commit-by: young256 Merged-by: ascend-robot Description: A decorator-based automatic model registration system has been implemented, enabling dynamic discovery and on-demand class loading without manual import statements or configuration files. Core Functionality: 1. Lazy Initialization Modules are imported only when the corresponding model is first requested, optimizing startup performance and resource usage. 2. Fault Containment Import failures are isolated per module, ensuring that issues with one model don't disrupt the registration of others. 3. Comprehensive Diagnostics Detailed error reporting includes available model inventories and specific failure causes for troubleshooting. 4. Extensibility by Design The system adheres to the Open/Closed Principle: Open for extension: Add new models by creating modules with decorated classes Closed for modification: No changes to existing registry code required Usage Workflow: Create a new model module Apply the registration decorator to model classes The system automatically handles discovery and availability See merge request: Ascend/MindSpeed-MM!20774 个月前
[bugfix] fix weight loading bug for EP when experts.ndim == 2 Co-authored-by: liyingxuan<liyingxuan3@huawei.com> # message auto-generated for no-merge-commit merge: !2060 merge ep into master [bugfix] fix weight loading bug for EP when experts.ndim == 2 Created-by: liyx616 Commit-by: liyingxuan Merged-by: ascend-robot Description: ## Motivation fix weight loading bug for EP when experts.ndim==2 ## Modification 修复EP切分专家维度为[num_experts * input_dim, output_dim]时权重加载的bug ## Self-test (Optional) If modifications to this PR may cause/fix function/accuracy/performance DTSs/issues, a self-inspection record needs to be attached. ## BC-breaking (Optional) If there are compatibility issues, such as dependencies on cann/torch_npu versions, they need to be explained in the PR. ## Checklist **Before PR**: - [ ] The new code needs to comply with the Clean Code specification. - [ ] The PR content is self-checked, and the expression can be clear and the writing standardized **After PR**: - [ ] CLA has been signed and all committers have signed the CLA in this PR. - [ ] The ci-pipeline is passed, Code Check is passed. See merge request: Ascend/MindSpeed-MM!20603 个月前