MindSpeed-LLM/docs/zh/pytorch/features/fsdp2 · Ascend/MindSpeed-LLM - AtomGit

ascend-robotfeat(pytorch): add gdn ascend C kernel of qwen3_next

文件	最后提交记录	最后更新时间
arguments.md	feat(pytorch): add gdn ascend C kernel of qwen3_next Co-authored-by: cxy-thinkbook<xuanyuchen@seu.edu.cn> # message auto-generated for no-merge-commit merge: !4335 merge master into master feat(pytorch): add gdn ascend C kernel of qwen3_next Created-by: 2402_84360594 Commit-by: cxy-thinkbook Merged-by: ascend-robot Description: 将ascend C编写的qwen3_next的gdn算子作为特性加入项目 ## What this PR does / why we need it? add gdn ascend C kernel of qwen3_next ## Does this PR introduce any user-facing change? The parameter use_flash_gdn can be enabled in docs/zh/pytorch/features/fsdp2/arguments.md ### the way to use ascend c kernel After testing, the following versions work properly. Theoretically, all CANN packages version 8.5.0 and above are fully compatible. FrameworkPTAdapter FrameworkPTAdapter 26.1.0.B030 CANN （A2/A3） CANN 9.1.0.B020 You can install and test the ASC kernels by following these steps.Please source the CANN package in advance and ensure network connectivity. The compilation of the flash-linear-attention-npu repository runs locally. Run packages from different machines are generally not interchangeable.： ```bash git clone https://github.com/flashserve/flash-linear-attention-npu.git # It is recommended to use the version tagged v26.1.0 git checkout v26.1.0 cd flash-linear-attention-npu-main # Use the --soc parameter to accurately specify the current device chip model. Example configuration: --soc=ascend910_93. bash build.sh --soc=ascend910_93 --pkg --ops=chunk_bwd_dv_local,chunk_bwd_dqkwg,chunk_gated_delta_rule_bwd_dhu,prepare_wy_repr_bwd_da,prepare_wy_repr_bwd_full,chunk_fwd_o,chunk_gated_delta_rule_fwd_h,recurrent_gated_delta_rule,recompute_wu_fwd,causal_conv1d ./build_out/cann-ops-transformer-custom_linux-aarch64.run # Fix for installation hanging: clear conflicting CANN vendor files # rm -rf cann-8.5.0/opp/vendors/ # Reinstall: ./build_out/cann-ops-transformer-custom_linux-aarch64.run cd torch_custom/fla_npu bash gen.sh npu_custom.yaml # The gen.sh script will generate the following contents, which can be verified using the ls command: # op_plugin/config/v2r7/: Configuration files # torch_npu/csrc/aten/: ATen layer adaptation code # torch_npu/utils/*: Utility functions python setup.py bdist_wheel pip install dist/fla_npu.whl --force-reinstall --no-deps # then, you could test kernels cd torch_custom/fla_npu/test bash test.sh # Some libraries are used only for testing and are not required by the model itself. They can be installed on demand. ``` ## How was this patch tested? The final section of The way to use Ascend C kernel covers the testing process. See merge request: Ascend/MindSpeed-LLM!4335	12 天前
fsdp2_basic_features.md	docs(fsdp2): fix documentation format issues and improve readability Co-authored-by: wangjiangben<wangjiangben@huawei.com> # message auto-generated for no-merge-commit merge: !4476 merge docs/fix-fsdp2-docs-format into master docs(fsdp2): fix documentation format issues and improve readability Created-by: wangjiangben Commit-by: wangjiangben Merged-by: ascend-robot Description: ## Summary Fix format errors in FSDP2 documentation and optimize document structure to improve readability and compliance with Markdown standards. Format Fixes: - Fix table format issues in quantization.md (missing header separators, column alignment errors) - Fix HTML entity syntax error in arguments.md (`"ulysses&quot` missing semicolon) - Remove extra blank lines in code blocks - Standardize indentation in example scripts Structure Optimization: - Unify list markers to standard Markdown `-` syntax - Optimize DTensor section hierarchy for better structure clarity - Convert reference links to proper Markdown link format - Split long paragraphs in quantization descriptions for better readability - Fix MD032 lint error (add blank line before list) Files Changed: - `arguments.md`: Fix HTML entity syntax - `fsdp2_basic_features.md`: Optimize structure hierarchy and list format - `quantization.md`: Fix table format, optimize description text See merge request: Ascend/MindSpeed-LLM!4476	14 天前
quantization.md	docs(fsdp2): fix documentation format issues and improve readability Co-authored-by: wangjiangben<wangjiangben@huawei.com> # message auto-generated for no-merge-commit merge: !4476 merge docs/fix-fsdp2-docs-format into master docs(fsdp2): fix documentation format issues and improve readability Created-by: wangjiangben Commit-by: wangjiangben Merged-by: ascend-robot Description: ## Summary Fix format errors in FSDP2 documentation and optimize document structure to improve readability and compliance with Markdown standards. Format Fixes: - Fix table format issues in quantization.md (missing header separators, column alignment errors) - Fix HTML entity syntax error in arguments.md (`"ulysses&quot` missing semicolon) - Remove extra blank lines in code blocks - Standardize indentation in example scripts Structure Optimization: - Unify list markers to standard Markdown `-` syntax - Optimize DTensor section hierarchy for better structure clarity - Convert reference links to proper Markdown link format - Split long paragraphs in quantization descriptions for better readability - Fix MD032 lint error (add blank line before list) Files Changed: - `arguments.md`: Fix HTML entity syntax - `fsdp2_basic_features.md`: Optimize structure hierarchy and list format - `quantization.md`: Fix table format, optimize description text See merge request: Ascend/MindSpeed-LLM!4476	14 天前