MindSpeed-LLM/examples/fsdp2 · Ascend/MindSpeed-LLM - AtomGit

ascend-robotchore(fsdp2): develop longcat-flash-lite model in fsdp2

文件	最后提交记录	最后更新时间
gpt_oss	[pytorch][features]Add inference for FSDP2 backend Co-authored-by: sunjunjie1587<sunjunjie8@huawei.com> # message auto-generated for no-merge-commit merge: !4266 merge master into master [pytorch][features]Add inference for FSDP2 backend Created-by: sunjunjie1587 Commit-by: sunjunjie1587 Merged-by: ascend-robot Description: Add inference for FSDP2 backend See merge request: Ascend/MindSpeed-LLM!4266	2 个月前
longcat_flash_lite	chore(fsdp2): develop longcat-flash-lite model in fsdp2 Co-authored-by: guihaowen666<guihaowen@huawei.com> # message auto-generated for no-merge-commit merge: !4344 merge br_master_longcat_flash_lite_fsdp2 into master chore(fsdp2): develop longcat-flash-lite model in fsdp2 Created-by: guihaowen666 Commit-by: guihaowen666 Merged-by: ascend-robot Description: ## What this PR does / why we need it? develop longcat-flash-lite model in fsdp2 ## Does this PR introduce any user-facing change? new model development, no user-facing change ## How was this patch tested? Run the inference task and check whether the model can perform normal dialogs. See merge request: Ascend/MindSpeed-LLM!4344	3 小时前
mamba3	fix(pytorch): fix mamba3 Co-authored-by: qyzqyz<quyueze@h-partners.com> # message auto-generated for no-merge-commit merge: !4385 merge master into master fix(pytorch): fix mamba3 Created-by: qyzqyz Commit-by: qyzqyz Merged-by: ascend-robot Description: ## What this PR does / why we need it? Please describe the background and detailed changes of the PR. If it is a bugfix, please attach the related issue. ## Does this PR introduce any user-facing change? Please describe whether the PR will result in any user-facing usage changes. If there is related documentation, please specify its path. ## How was this patch tested? Please explain how to verify the correctness and effectiveness of this feature, as well as its usage constraints and limitations. See merge request: Ascend/MindSpeed-LLM!4385	1 个月前
minimax_m27	feat(pytorch): add minimax-m2.7 model in FSDP2 backend Co-authored-by: HanhuiChen<chenhanhui1@h-partners.com> # message auto-generated for no-merge-commit merge: !4410 merge minimax27 into master feat(pytorch): add minimax-m2.7 model in FSDP2 backend Created-by: HANHU1CHEN Commit-by: HanhuiChen;HANHU1CHEN Merged-by: ascend-robot Description: ## What this PR does / why we need it? Adds support for the MiniMax-M27 model in the FSDP2 backend, including model architecture adaptation and configuration integration, enabling distributed training and inference under the MindSpeed-LLM framework. ## Does this PR introduce any user-facing change? Yes. Users can now launch MiniMax-M27 training and inference via the FSDP2 backend by specifying model_id=minimax_m27. Refer to any newly added example scripts under the examples/fsdp2 directory for usage details. ## How was this patch tested? Verified on Ascend NPU with the following scenarios: - Full-parameter training: Multi-device distributed training runs successfully. - Full-parameter inference: The model loads correctly and produces coherent responses as expected. See merge request: Ascend/MindSpeed-LLM!4410	1 个月前
qwen3	model: add qwen3-14B FSDP2 Co-authored-by: wangboroy<wangbo39@huawei.com> # message auto-generated for no-merge-commit merge: !4468 merge dev into master model: add qwen3-14B FSDP2 Created-by: wangboroy Commit-by: wangboroy Merged-by: ascend-robot Description: ## What this PR does / why we need it? Please describe the background and detailed changes of the PR. If it is a bugfix, please attach the related issue. ## Does this PR introduce any user-facing change? Please describe whether the PR will result in any user-facing usage changes. If there is related documentation, please specify its path. ## How was this patch tested? Please explain how to verify the correctness and effectiveness of this feature, as well as its usage constraints and limitations. See merge request: Ascend/MindSpeed-LLM!4468	18 天前
qwen3_moe	[feat] Add FSDP for MXFP8 Co-authored-by: EVA1<jingsiyu1@huawei.com> Co-authored-by: quancs001<quancs@qq.com> Co-authored-by: h00638954<huangzhiyuan8@huawei.com> # message auto-generated for no-merge-commit merge: !4379 merge fsdp_comm into master [feat] Add FSDP for MXFP8 Created-by: quancs001 Commit-by: EVA1;quancs001;h00638954 Merged-by: ascend-robot Description: 添加MXFP8 FSDP功能，支持Dense/MoE（EP+EFSDP） ## What this PR does / why we need it? Please describe the background and detailed changes of the PR. If it is a bugfix, please attach the related issue. ## Does this PR introduce any user-facing change? Please describe whether the PR will result in any user-facing usage changes. If there is related documentation, please specify its path. ## How was this patch tested? Please explain how to verify the correctness and effectiveness of this feature, as well as its usage constraints and limitations. See merge request: Ascend/MindSpeed-LLM!4379	1 个月前
qwen3_next	feat(pytorch): add gdn ascend C kernel of qwen3_next Co-authored-by: cxy-thinkbook<xuanyuchen@seu.edu.cn> # message auto-generated for no-merge-commit merge: !4335 merge master into master feat(pytorch): add gdn ascend C kernel of qwen3_next Created-by: 2402_84360594 Commit-by: cxy-thinkbook Merged-by: ascend-robot Description: 将ascend C编写的qwen3_next的gdn算子作为特性加入项目 ## What this PR does / why we need it? add gdn ascend C kernel of qwen3_next ## Does this PR introduce any user-facing change? The parameter use_flash_gdn can be enabled in docs/zh/pytorch/features/fsdp2/arguments.md ### the way to use ascend c kernel After testing, the following versions work properly. Theoretically, all CANN packages version 8.5.0 and above are fully compatible. FrameworkPTAdapter FrameworkPTAdapter 26.1.0.B030 CANN （A2/A3） CANN 9.1.0.B020 You can install and test the ASC kernels by following these steps.Please source the CANN package in advance and ensure network connectivity. The compilation of the flash-linear-attention-npu repository runs locally. Run packages from different machines are generally not interchangeable.： ```bash git clone https://github.com/flashserve/flash-linear-attention-npu.git # It is recommended to use the version tagged v26.1.0 git checkout v26.1.0 cd flash-linear-attention-npu-main # Use the --soc parameter to accurately specify the current device chip model. Example configuration: --soc=ascend910_93. bash build.sh --soc=ascend910_93 --pkg --ops=chunk_bwd_dv_local,chunk_bwd_dqkwg,chunk_gated_delta_rule_bwd_dhu,prepare_wy_repr_bwd_da,prepare_wy_repr_bwd_full,chunk_fwd_o,chunk_gated_delta_rule_fwd_h,recurrent_gated_delta_rule,recompute_wu_fwd,causal_conv1d ./build_out/cann-ops-transformer-custom_linux-aarch64.run # Fix for installation hanging: clear conflicting CANN vendor files # rm -rf cann-8.5.0/opp/vendors/ # Reinstall: ./build_out/cann-ops-transformer-custom_linux-aarch64.run cd torch_custom/fla_npu bash gen.sh npu_custom.yaml # The gen.sh script will generate the following contents, which can be verified using the ls command: # op_plugin/config/v2r7/: Configuration files # torch_npu/csrc/aten/: ATen layer adaptation code # torch_npu/utils/*: Utility functions python setup.py bdist_wheel pip install dist/fla_npu.whl --force-reinstall --no-deps # then, you could test kernels cd torch_custom/fla_npu/test bash test.sh # Some libraries are used only for testing and are not required by the model itself. They can be installed on demand. ``` ## How was this patch tested? The final section of The way to use Ascend C kernel covers the testing process. See merge request: Ascend/MindSpeed-LLM!4335	12 天前
step35	test(fsdp2): add step35-flash st Co-authored-by: yanzhixiao<yanzhixiao@h-partners.com> # message auto-generated for no-merge-commit merge: !4537 merge add-step35-st into master test(fsdp2): add step35-flash st Created-by: yanzhixiao23 Commit-by: yanzhixiao Merged-by: ascend-robot Description: ## What this PR does / why we need it? add step35-flash st ## Does this PR introduce any user-facing change? Na ## How was this patch tested? Na See merge request: Ascend/MindSpeed-LLM!4537	5 天前
env_config.sh	feature(pytorch): FSDP2 support hardware-adaptive execution Co-authored-by: zhyebin01<zhangyebin@h-partners.com> # message auto-generated for no-merge-commit merge: !4343 merge fsdp2_gpu into master feature(pytorch): FSDP2 support hardware-adaptive execution Created-by: zhyebin01 Commit-by: zhyebin01 Merged-by: ascend-robot Description: ## What this PR does / why we need it? FSDP2 support hardware-adaptive execution ## Does this PR introduce any user-facing change? No ## How was this patch tested? pipeline test passed See merge request: Ascend/MindSpeed-LLM!4343	2 个月前