MindSpeed-Ops/docs/triton · isfrapples/MindSpeed-Ops - AtomGit

文件	最后提交记录	最后更新时间
add.md	feat: version update & doc update Co-authored-by: shiyuan680<yangcheng104@huawei.com> # message auto-generated for no-merge-commit merge: !19 merge update into master feat: version update & doc update Created-by: zhizaidicengshehua Commit-by: shiyuan680 Merged-by: ascend-robot Description: ## What this PR does / why we need it? update triton version to fix bug（https://gitcode.com/Ascend/MindSpeed-Ops/issues/2） ## Does this PR introduce any user-facing change? Please describe whether the PR will result in any user-facing usage changes. If there is related documentation, please specify its path. ## How was this patch tested? Please explain how to verify the correctness and effectiveness of this feature, as well as its usage constraints and limitations. See merge request: Ascend/MindSpeed-Ops!19	14 天前
causal_conv1d.md	feat: conv1d wrapper Co-authored-by: liuxi_<liuxi75@huawei.com>	24 天前
chunk_kda_bwd.md	[Feat] Add chunk kda backward op for Kimi Linear Co-authored-by: zhuweichen<calvin_zhu0210@outlook.com> # message auto-generated for no-merge-commit merge: !26 merge kda into master [Feat] Add chunk kda backward op for Kimi Linear Created-by: zhuweichen Commit-by: zhuweichen Merged-by: ascend-robot Description: ## What this PR does / why we need it? This PR adds the `chunk_kda_bwd_wy_dqkg_fused` Triton operator for KDA chunk backward on Ascend arch32. The operator computes fused backward outputs `dq`, `dk`, `dv`, `db`, `dg`, and `dA`. It also adds the public API, arch32 implementation, UT, ATK cases, documentation, and README entry. https://gitcode.com/Ascend/MindSpeed-Ops/issues/28 ## Does this PR introduce any user-facing change? Yes. A new Triton API is added: `from mindspeed_ops.api.triton.chunk_kda_bwd import chunk_kda_bwd_wy_dqkg_fused` Documentation: docs/triton/chunk_kda_bwd.md Limitations: - Supports arch32 only; arch35 raises NotImplementedError. - Main inputs support float16 / float32. - g, h, and dh are expected to be float32. - bf16 is not declared as supported. ## How was this patch tested? UT: ```shell pytest tests/unit_tests/triton/test_chunk_kda_bwd.py -s pytest tests/unit_tests/triton/test_chunk_kda_bwd.py -m model_shape -s ``` ATK: ```shell cd tests/atk_tests/triton/chunk_kda_bwd atk case -f chunk_kda_bwd.yaml -p generate_chunk_kda_bwd.py atk node --backend triton --devices 0 node --backend npu --devices 0 task \ -c result/chunk_kda_bwd/json/all_chunk_kda_bwd.json \ --task accuracy -tup ./ -p triton_chunk_kda_bwd.py ``` ![image.png](https://raw.gitcode.com/user-images/assets/9612429/75125840-6c01-439c-bb35-f3507a829c19/image.png 'image.png') ![image.png](https://raw.gitcode.com/user-images/assets/9612429/d5d76f60-92bd-4f7a-b97c-11c3e1f44e60/image.png 'image.png') ![image.png](https://raw.gitcode.com/user-images/assets/9612429/69abe3da-d3d0-4b8c-b167-dfd8555e25ad/image.png 'image.png') See merge request: Ascend/MindSpeed-Ops!26	12 天前
fused_cross_entropy_loss.md	feat: Add FusedCrossEntropyLoss for Qwen3.5 Co-authored-by: liu_zhi_xu<liuzhexu1@huawei.com> # message auto-generated for no-merge-commit merge: !25 merge new_func into master feat: Add FusedCrossEntropyLoss for Qwen3.5 Created-by: liu_zhi_xu Commit-by: liu_zhi_xu Merged-by: ascend-robot Description: ## What this PR does / why we need it? 1、New model adaptation operator completion [#related roadmap](https://gitcode.com/Ascend/MindSpeed-Ops/issues/1) 2、Modify ATK config related RMS/SINK ## Does this PR introduce any user-facing change? Reference Operator Markdown Description ## How was this patch tested? [UT] pytest test_fused_cross_entropy_loss.py ![image.png](https://raw.gitcode.com/user-images/assets/9612429/2afb8d87-6bf6-4113-8078-1ea9eb30aebc/image.png 'image.png') [ATK] atk case -f rmsnorm_without_weight.yaml -p generate_rmsnorm_without_weight.py atk node --backend triton --devices 0 node --backend cpu --devices 0 task -c result/rmsnorm_without_weight/json/all_rmsnorm_without_weight.json --task accuracy -p triton_rmsnorm_without_weight.py ![image.png](https://raw.gitcode.com/user-images/assets/9612429/2440a7b5-0522-4e2e-a995-26623dc16d46/image.png 'image.png') atk case -f sinkhorn.yaml -p generate_sinkhorn.py atk node --backend triton --devices 0 node --backend cpu --devices 0 task -c result/sinkhorn/json/all_sinkhorn.json --task accuracy -p triton_sinkhorn.py ![image.png](https://raw.gitcode.com/user-images/assets/9612429/6bba5494-02a3-427d-b7d4-30997cc8a91a/image.png 'image.png') atk case -f fused_cross_entropy_loss.yaml -p generate_fused_cross_entropy_loss.py atk node --backend triton --devices 0 node --backend npu --devices 0 task -c result/fused_cross_entropy_loss/json/all_fused_cross_entropy_loss.json --task accuracy -p triton_fused_cross_entropy_loss.py atk node --backend triton --devices 0 node --backend npu --devices 0 task -c result/fused_cross_entropy_loss/json/all_fused_cross_entropy_loss.json --task performance_device -p triton_fused_cross_entropy_loss.py ![image.png](https://raw.gitcode.com/user-images/assets/9612429/ba106c72-2104-4d55-82c4-87203544529d/image.png 'image.png') ![image.png](https://raw.gitcode.com/user-images/assets/9612429/d7951ef0-e096-430c-9030-3e690e64d807/image.png 'image.png') See merge request: Ascend/MindSpeed-Ops!25	17 天前
mhc_post.md	add mhc ops Co-authored-by: wangxuefei10<wangxuefei10@huawei.com> # message auto-generated for no-merge-commit merge: !35 merge dev_mhc_0525 into master feat: add mhc triton ops Created-by: Ling_i Commit-by: wangxuefei10 Merged-by: ascend-robot Description: ## What this PR does / why we need it? add mhc triton ops https://gitcode.com/Ascend/MindSpeed-Ops/issues/1 ## Does this PR introduce any user-facing change? Reference Operator Markdown Description. ## How was this patch tested? UT and UTK ![image.png](https://raw.gitcode.com/user-images/assets/9612429/f75f1ab2-4d23-4dc6-aacf-3d6e95d335b6/image.png 'image.png') See merge request: Ascend/MindSpeed-Ops!35	8 天前
mhc_pre_bmm.md	add mhc ops Co-authored-by: wangxuefei10<wangxuefei10@huawei.com> # message auto-generated for no-merge-commit merge: !35 merge dev_mhc_0525 into master feat: add mhc triton ops Created-by: Ling_i Commit-by: wangxuefei10 Merged-by: ascend-robot Description: ## What this PR does / why we need it? add mhc triton ops https://gitcode.com/Ascend/MindSpeed-Ops/issues/1 ## Does this PR introduce any user-facing change? Reference Operator Markdown Description. ## How was this patch tested? UT and UTK ![image.png](https://raw.gitcode.com/user-images/assets/9612429/f75f1ab2-4d23-4dc6-aacf-3d6e95d335b6/image.png 'image.png') See merge request: Ascend/MindSpeed-Ops!35	8 天前
mhc_pre_only.md	add mhc ops Co-authored-by: wangxuefei10<wangxuefei10@huawei.com> # message auto-generated for no-merge-commit merge: !35 merge dev_mhc_0525 into master feat: add mhc triton ops Created-by: Ling_i Commit-by: wangxuefei10 Merged-by: ascend-robot Description: ## What this PR does / why we need it? add mhc triton ops https://gitcode.com/Ascend/MindSpeed-Ops/issues/1 ## Does this PR introduce any user-facing change? Reference Operator Markdown Description. ## How was this patch tested? UT and UTK ![image.png](https://raw.gitcode.com/user-images/assets/9612429/f75f1ab2-4d23-4dc6-aacf-3d6e95d335b6/image.png 'image.png') See merge request: Ascend/MindSpeed-Ops!35	8 天前
rmsnorm_without_weight.md	feat：Add Sinkhorn for DS V4 Co-authored-by: liu_zhi_xu<liuzhexu1@huawei.com> # message auto-generated for no-merge-commit merge: !24 merge sinkhorn into master feat：Add Sinkhorn for DS V4 Created-by: liu_zhi_xu Commit-by: liu_zhi_xu Merged-by: ascend-robot Description: ## What this PR does / why we need it? 1、New model adaptation operator completion [#related roadmap](https://gitcode.com/Ascend/MindSpeed-Ops/issues/1) 2、Operator Description Enhancement for rmsnorm_without_weight 3、Refactor the common test func related Add/RMS/SINK ## Does this PR introduce any user-facing change? Reference Operator Markdown Description ## How was this patch tested? [UT] pytest test_sinkhorn.py ![image.png](https://raw.gitcode.com/user-images/assets/9612429/d595acb8-cc1d-4b98-b7bc-4c46d3a586a9/image.png 'image.png') [ATK] atk case -f sinkhorn.yaml -p generate_input.py atk node --backend triton --devices 0 node --backend npu --devices 0 task -c result/sinkhorn/json/all_sinkhorn.json --task performance_device -p sinkhorn.py atk node --backend triton --devices 0 node --backend npu --devices 0 task -c result/sinkhorn/json/all_sinkhorn.json --task accuracy -p sinkhorn.py ![image.png](https://raw.gitcode.com/user-images/assets/9612429/fd979bb5-ece5-4eac-adf9-5cca7a236279/image.png 'image.png') ![image.png](https://raw.gitcode.com/user-images/assets/9612429/b4809f5d-78a1-4673-9251-2915e6055f8b/image.png 'image.png') See merge request: Ascend/MindSpeed-Ops!24	21 天前
rmsnormgated.md	feat: add RmsNormGated Co-authored-by: feng0w0<houyufeng4@huawei.com> # message auto-generated for no-merge-commit merge: !22 merge master into master feat: add RmsNormGated Created-by: feng0w0 Commit-by: feng0w0 Merged-by: ascend-robot Description: ## What this PR does / why we need it? add RmsNormGated Triton [#1](https://gitcode.com/Ascend/MindSpeed-Ops/issues/1) ## Does this PR introduce any user-facing change? Please describe whether the PR will result in any user-facing usage changes. If there is related documentation, please specify its path. ## How was this patch tested? ut ![image.png](https://raw.gitcode.com/user-images/assets/9612429/12418371-1f2d-4966-ab96-986077b3c558/image.png 'image.png') atk 精度 ![image.png](https://raw.gitcode.com/user-images/assets/9612429/3879ca4d-ab45-42c2-ab46-fa2920736536/image.png 'image.png') atk性能 ![image.png](https://raw.gitcode.com/user-images/assets/9612429/5efab269-aefb-4d02-8be5-fe36df89b857/image.png 'image.png') See merge request: Ascend/MindSpeed-Ops!22	8 天前
sinkhorn.md	feat：Add Sinkhorn for DS V4 Co-authored-by: liu_zhi_xu<liuzhexu1@huawei.com> # message auto-generated for no-merge-commit merge: !24 merge sinkhorn into master feat：Add Sinkhorn for DS V4 Created-by: liu_zhi_xu Commit-by: liu_zhi_xu Merged-by: ascend-robot Description: ## What this PR does / why we need it? 1、New model adaptation operator completion [#related roadmap](https://gitcode.com/Ascend/MindSpeed-Ops/issues/1) 2、Operator Description Enhancement for rmsnorm_without_weight 3、Refactor the common test func related Add/RMS/SINK ## Does this PR introduce any user-facing change? Reference Operator Markdown Description ## How was this patch tested? [UT] pytest test_sinkhorn.py ![image.png](https://raw.gitcode.com/user-images/assets/9612429/d595acb8-cc1d-4b98-b7bc-4c46d3a586a9/image.png 'image.png') [ATK] atk case -f sinkhorn.yaml -p generate_input.py atk node --backend triton --devices 0 node --backend npu --devices 0 task -c result/sinkhorn/json/all_sinkhorn.json --task performance_device -p sinkhorn.py atk node --backend triton --devices 0 node --backend npu --devices 0 task -c result/sinkhorn/json/all_sinkhorn.json --task accuracy -p sinkhorn.py ![image.png](https://raw.gitcode.com/user-images/assets/9612429/fd979bb5-ece5-4eac-adf9-5cca7a236279/image.png 'image.png') ![image.png](https://raw.gitcode.com/user-images/assets/9612429/b4809f5d-78a1-4673-9251-2915e6055f8b/image.png 'image.png') See merge request: Ascend/MindSpeed-Ops!24	21 天前
wy_fast.md	[Feat] Add chunk kda backward op for Kimi Linear Co-authored-by: zhuweichen<calvin_zhu0210@outlook.com> # message auto-generated for no-merge-commit merge: !26 merge kda into master [Feat] Add chunk kda backward op for Kimi Linear Created-by: zhuweichen Commit-by: zhuweichen Merged-by: ascend-robot Description: ## What this PR does / why we need it? This PR adds the `chunk_kda_bwd_wy_dqkg_fused` Triton operator for KDA chunk backward on Ascend arch32. The operator computes fused backward outputs `dq`, `dk`, `dv`, `db`, `dg`, and `dA`. It also adds the public API, arch32 implementation, UT, ATK cases, documentation, and README entry. https://gitcode.com/Ascend/MindSpeed-Ops/issues/28 ## Does this PR introduce any user-facing change? Yes. A new Triton API is added: `from mindspeed_ops.api.triton.chunk_kda_bwd import chunk_kda_bwd_wy_dqkg_fused` Documentation: docs/triton/chunk_kda_bwd.md Limitations: - Supports arch32 only; arch35 raises NotImplementedError. - Main inputs support float16 / float32. - g, h, and dh are expected to be float32. - bf16 is not declared as supported. ## How was this patch tested? UT: ```shell pytest tests/unit_tests/triton/test_chunk_kda_bwd.py -s pytest tests/unit_tests/triton/test_chunk_kda_bwd.py -m model_shape -s ``` ATK: ```shell cd tests/atk_tests/triton/chunk_kda_bwd atk case -f chunk_kda_bwd.yaml -p generate_chunk_kda_bwd.py atk node --backend triton --devices 0 node --backend npu --devices 0 task \ -c result/chunk_kda_bwd/json/all_chunk_kda_bwd.json \ --task accuracy -tup ./ -p triton_chunk_kda_bwd.py ``` ![image.png](https://raw.gitcode.com/user-images/assets/9612429/75125840-6c01-439c-bb35-f3507a829c19/image.png 'image.png') ![image.png](https://raw.gitcode.com/user-images/assets/9612429/d5d76f60-92bd-4f7a-b97c-11c3e1f44e60/image.png 'image.png') ![image.png](https://raw.gitcode.com/user-images/assets/9612429/69abe3da-d3d0-4b8c-b167-dfd8555e25ad/image.png 'image.png') See merge request: Ascend/MindSpeed-Ops!26	12 天前