文件最后提交记录最后更新时间
test: Add FSDP2 Memory Logging & Kimi-K2.5 ST Case Co-authored-by: LKONE<wanglikai4@huawei.com> # message auto-generated for no-merge-commit merge: !2571 merge master into master test: Add FSDP2 Memory Logging & Kimi-K2.5 ST Case Created-by: wanglikai1019 Commit-by: LKONE Merged-by: ascend-robot Description: ## What this PR does / why we need it? (1)fsdp2后端添加内存日志打印,用于后续的ST校验; (2)添加Kimi-K2.5模型的ST用例,脚本运行过程如下: - 安装依赖的三方库; - 获取 MindSpeed-MM 的主路径; - 判断 MindSpeed-MM 路径下是否存在 mindspeed 文件: - 若存在,则将现有的 mindspeed 文件备份为 mindspeed-bak,并将回退标记位 need_restore 设置为 True; - 在 ci 机器的 ci_resource 文件夹中,根据日期创建相应的mindspeed文件夹: - 若目标日期的文件夹已经存在,则直接使用该文件夹,将其中的mindspeed拷贝到MindSpeed-MM路径下; - 若目标日期的文件夹不存在,则 git clone 最新的 Mindspeed 到相应日期的文件夹中: - 若克隆成功,拷贝克隆后的最新的 mindspeed 到 MindSpeed-MM 路径下; - 若克隆失败,拷贝 ci 机器上已经缓存好的 26.0.0 分支的 MindSpeed 的 mindspeed 到 MindSpeed-MM 路径下; - 从 ci 机器的模型文件夹中拷贝 Kimi-K2.5 所需的相关模型文件到 mindspeed_mm/fsdp/models/kimik2_5 路径中; - 修改模型配置文件中的 ViT/LLM 层数为 5 层,专家数量为 64; - 执行训练流程; - 训练完成后,删除当前使用的 mindspeed 文件;若回退标记位为 True,将缓存的原始 mindspeed-bak 文件夹修改回 mindspeed。 (3)由于在 parallel_state.py 文件中引用 Singleton 会导致循环引用问题,且 Singleton 与 utils.py 文件内容不匹配,故调整 Singleton 的位置至 decorators.py 文件中,并同步修改所有引用。 ## Does this PR introduce any user-facing change? 无. ## How was this patch tested? (1)查看训练过程中能否正确打印内存信息; (2)查看Kimi-K2.5模型的用例能否正确运行通过。 See merge request: Ascend/MindSpeed-MM!25715 天前
style: pre-commit autofix cleancode (base check) Co-authored-by: liyingxuan<liyingxuan3@huawei.com> # message auto-generated for no-merge-commit merge: !2616 merge master into master style: pre-commit autofix cleancode (base check) Created-by: liyx616 Commit-by: liyingxuan Merged-by: ascend-robot Description: ## What this PR does / why we need it? Please describe the background and detailed changes of the PR. If it is a bugfix, please attach the related issue. ## Does this PR introduce any user-facing change? Please describe whether the PR will result in any user-facing usage changes. If there is related documentation, please specify its path. ## How was this patch tested? Please explain how to verify the correctness and effectiveness of this feature, as well as its usage constraints and limitations. See merge request: Ascend/MindSpeed-MM!261617 小时前
style: pre-commit autofix cleancode (base check) Co-authored-by: liyingxuan<liyingxuan3@huawei.com> # message auto-generated for no-merge-commit merge: !2616 merge master into master style: pre-commit autofix cleancode (base check) Created-by: liyx616 Commit-by: liyingxuan Merged-by: ascend-robot Description: ## What this PR does / why we need it? Please describe the background and detailed changes of the PR. If it is a bugfix, please attach the related issue. ## Does this PR introduce any user-facing change? Please describe whether the PR will result in any user-facing usage changes. If there is related documentation, please specify its path. ## How was this patch tested? Please explain how to verify the correctness and effectiveness of this feature, as well as its usage constraints and limitations. See merge request: Ascend/MindSpeed-MM!261617 小时前
style: pre-commit autofix cleancode (base check) Co-authored-by: liyingxuan<liyingxuan3@huawei.com> # message auto-generated for no-merge-commit merge: !2616 merge master into master style: pre-commit autofix cleancode (base check) Created-by: liyx616 Commit-by: liyingxuan Merged-by: ascend-robot Description: ## What this PR does / why we need it? Please describe the background and detailed changes of the PR. If it is a bugfix, please attach the related issue. ## Does this PR introduce any user-facing change? Please describe whether the PR will result in any user-facing usage changes. If there is related documentation, please specify its path. ## How was this patch tested? Please explain how to verify the correctness and effectiveness of this feature, as well as its usage constraints and limitations. See merge request: Ascend/MindSpeed-MM!261617 小时前
[Test] Support test for megatron-bridge ckpt online transformation Co-authored-by: ningmengliu<liuhao438@huawei.com> # message auto-generated for no-merge-commit merge: !2493 merge master into master [Test] Support test for megatron-bridge ckpt online transformation Created-by: ningmenglh Commit-by: ningmengliu Merged-by: ascend-robot Description: ## What this PR does / why we need it? 维护在线权重转换的功能,增加qwen3vl-30B的ST用例 ## Does this PR introduce any user-facing change? Please describe whether the PR will result in any user-facing usage changes. If there is related documentation, please specify its path. ## How was this patch tested? Please explain how to verify the correctness and effectiveness of this feature, as well as its usage constraints and limitations. See merge request: Ascend/MindSpeed-MM!249321 天前
[Test]Change the training configuration of wan2.1 to fsdp2 Co-authored-by: zs-derrick1<1434012475@qq.com> # message auto-generated for no-merge-commit merge: !1807 merge wan21ST into master [Test]Change the training configuration of wan2.1 to fsdp2 Created-by: zs-derrick1 Commit-by: zs-derrick1 Merged-by: ascend-robot Description: ## Motivation Change the training configuration of wan2.1 to fsdp2 ## Modification Please briefly describe what modification is made in this PR. ## Self-test (Optional) If modifications to this PR may cause/fix function/accuracy/performance DTSs/issues, a self-inspection record needs to be attached. ## BC-breaking (Optional) If there are compatibility issues, such as dependencies on cann/torch_npu versions, they need to be explained in the PR. ## Checklist **Before PR**: - [x] The new code needs to comply with the Clean Code specification. - [x] The PR content is self-checked, and the expression can be clear and the writing standardized **After PR**: - [x] CLA has been signed and all committers have signed the CLA in this PR. - [x] The ci-pipeline is passed, Code Check is passed. See merge request: Ascend/MindSpeed-MM!18075 个月前
[Test] Support test for megatron-bridge ckpt online transformation Co-authored-by: ningmengliu<liuhao438@huawei.com> # message auto-generated for no-merge-commit merge: !2493 merge master into master [Test] Support test for megatron-bridge ckpt online transformation Created-by: ningmenglh Commit-by: ningmengliu Merged-by: ascend-robot Description: ## What this PR does / why we need it? 维护在线权重转换的功能,增加qwen3vl-30B的ST用例 ## Does this PR introduce any user-facing change? Please describe whether the PR will result in any user-facing usage changes. If there is related documentation, please specify its path. ## How was this patch tested? Please explain how to verify the correctness and effectiveness of this feature, as well as its usage constraints and limitations. See merge request: Ascend/MindSpeed-MM!249321 天前
[Test] add st for wan2.2 Co-authored-by: 林明哲<linmingzhe3@huawei.com> # message auto-generated for no-merge-commit merge: !1928 merge st1210 into master [Test] add st for wan2.2 Created-by: LinMingZhe Commit-by: 林明哲 Merged-by: ascend-robot Description: ## Motivation add st for wan2.2 ## Modification add st for wan2.2 **Before PR**: - [ ] The new code needs to comply with the Clean Code specification. - [x] The PR content is self-checked, and the expression can be clear and the writing standardized **After PR**: - [ ] CLA has been signed and all committers have signed the CLA in this PR. - [ ] The ci-pipeline is passed, Code Check is passed. **运行时间** Execution Time for pretrain_wan2.2_i2v.sh:2 m 17 s Execution Time for ST:24m 41s [2025/12/16 18:50:30.830 GMT+08:00] * Execution Time for pretrain_opensoraplan1_3.sh: 1 m 26 s * [2025/12/16 18:50:30.830 GMT+08:00] * Execution Time for inference_internvl2_5.sh: 1 m 4 s * [2025/12/16 18:50:30.830 GMT+08:00] * Execution Time for finetune_deepseekvl2.sh: 1 m 23 s * [2025/12/16 18:50:30.830 GMT+08:00] * Execution Time for pretrain_cogvideox_i2v_1.5.sh: 1 m 27 s * [2025/12/16 18:50:30.830 GMT+08:00] * Execution Time for posttrain_qwen2vl_dpo.sh: 2 m 3 s * [2025/12/16 18:50:30.830 GMT+08:00] * Execution Time for pretrain_wan2.1_t2v.sh: 1 m 45 s * [2025/12/16 18:50:30.830 GMT+08:00] * Execution Time for inference_qwen2vl_7b_pp4.sh: 1 m 2 s * [2025/12/16 18:50:30.830 GMT+08:00] * Execution Time for pretrain_hunyuanvideo_t2v.sh: 1 m 37 s * [2025/12/16 18:50:30.830 GMT+08:00] * Execution Time for inference_cogvideox_t2v_1.5.sh: 1 m 33 s * [2025/12/16 18:50:30.830 GMT+08:00] * Execution Time for pretrain_wan2.2_i2v.sh: 2 m 17 s * [2025/12/16 18:50:30.830 GMT+08:00] * Execution Time for pretrain_cogvideox_t2v_1_0.sh: 2 m 28 s * [2025/12/16 18:50:30.830 GMT+08:00] * Execution Time for inference_wan2.2_t2v.sh: 1 m 20 s * [2025/12/16 18:50:30.830 GMT+08:00] * Execution Time for inference_qwen2vl_7b_pp1.sh: 1 m 5 s * [2025/12/16 18:50:30.830 GMT+08:00] * Execution Time for finetune_qwen2vl_7B.sh: 1 m 29 s * [2025/12/16 18:50:30.830 GMT+08:00] * Execution Time for finetune_qwen2_5_vl_7b.sh: 1 m 19 s * See merge request: Ascend/MindSpeed-MM!19285 个月前