文件最后提交记录最后更新时间
feat: version update & doc update Co-authored-by: shiyuan680<yangcheng104@huawei.com> # message auto-generated for no-merge-commit merge: !19 merge update into master feat: version update & doc update Created-by: zhizaidicengshehua Commit-by: shiyuan680 Merged-by: ascend-robot Description: ## What this PR does / why we need it? update triton version to fix bug(https://gitcode.com/Ascend/MindSpeed-Ops/issues/2) ## Does this PR introduce any user-facing change? Please describe whether the PR will result in any user-facing usage changes. If there is related documentation, please specify its path. ## How was this patch tested? Please explain how to verify the correctness and effectiveness of this feature, as well as its usage constraints and limitations. See merge request: Ascend/MindSpeed-Ops!1914 天前
feat: conv1d wrapper Co-authored-by: liuxi_<liuxi75@huawei.com> 24 天前
[Feat] Add chunk kda backward op for Kimi Linear Co-authored-by: zhuweichen<calvin_zhu0210@outlook.com> # message auto-generated for no-merge-commit merge: !26 merge kda into master [Feat] Add chunk kda backward op for Kimi Linear Created-by: zhuweichen Commit-by: zhuweichen Merged-by: ascend-robot Description: ## What this PR does / why we need it? This PR adds the chunk_kda_bwd_wy_dqkg_fused Triton operator for KDA chunk backward on Ascend arch32. The operator computes fused backward outputs dq, dk, dv, db, dg, and dA. It also adds the public API, arch32 implementation, UT, ATK cases, documentation, and README entry. https://gitcode.com/Ascend/MindSpeed-Ops/issues/28 ## Does this PR introduce any user-facing change? Yes. A new Triton API is added: from mindspeed_ops.api.triton.chunk_kda_bwd import chunk_kda_bwd_wy_dqkg_fused Documentation: docs/triton/chunk_kda_bwd.md Limitations: - Supports arch32 only; arch35 raises NotImplementedError. - Main inputs support float16 / float32. - g, h, and dh are expected to be float32. - bf16 is not declared as supported. ## How was this patch tested? UT: ```shell pytest tests/unit_tests/triton/test_chunk_kda_bwd.py -s pytest tests/unit_tests/triton/test_chunk_kda_bwd.py -m model_shape -s ``` ATK: ```shell cd tests/atk_tests/triton/chunk_kda_bwd atk case -f chunk_kda_bwd.yaml -p generate_chunk_kda_bwd.py atk node --backend triton --devices 0 node --backend npu --devices 0 task \ -c result/chunk_kda_bwd/json/all_chunk_kda_bwd.json \ --task accuracy -tup ./ -p triton_chunk_kda_bwd.py ``` ![image.png](https://raw.gitcode.com/user-images/assets/9612429/75125840-6c01-439c-bb35-f3507a829c19/image.png 'image.png') ![image.png](https://raw.gitcode.com/user-images/assets/9612429/d5d76f60-92bd-4f7a-b97c-11c3e1f44e60/image.png 'image.png') ![image.png](https://raw.gitcode.com/user-images/assets/9612429/69abe3da-d3d0-4b8c-b167-dfd8555e25ad/image.png 'image.png') See merge request: Ascend/MindSpeed-Ops!2612 天前
feat: Add FusedCrossEntropyLoss for Qwen3.5 Co-authored-by: liu_zhi_xu<liuzhexu1@huawei.com> # message auto-generated for no-merge-commit merge: !25 merge new_func into master feat: Add FusedCrossEntropyLoss for Qwen3.5 Created-by: liu_zhi_xu Commit-by: liu_zhi_xu Merged-by: ascend-robot Description: ## What this PR does / why we need it? 1、New model adaptation operator completion [#related roadmap](https://gitcode.com/Ascend/MindSpeed-Ops/issues/1) 2、Modify ATK config related RMS/SINK ## Does this PR introduce any user-facing change? Reference Operator Markdown Description ## How was this patch tested? [UT] pytest test_fused_cross_entropy_loss.py ![image.png](https://raw.gitcode.com/user-images/assets/9612429/2afb8d87-6bf6-4113-8078-1ea9eb30aebc/image.png 'image.png') [ATK] atk case -f rmsnorm_without_weight.yaml -p generate_rmsnorm_without_weight.py atk node --backend triton --devices 0 node --backend cpu --devices 0 task -c result/rmsnorm_without_weight/json/all_rmsnorm_without_weight.json --task accuracy -p triton_rmsnorm_without_weight.py ![image.png](https://raw.gitcode.com/user-images/assets/9612429/2440a7b5-0522-4e2e-a995-26623dc16d46/image.png 'image.png') atk case -f sinkhorn.yaml -p generate_sinkhorn.py atk node --backend triton --devices 0 node --backend cpu --devices 0 task -c result/sinkhorn/json/all_sinkhorn.json --task accuracy -p triton_sinkhorn.py ![image.png](https://raw.gitcode.com/user-images/assets/9612429/6bba5494-02a3-427d-b7d4-30997cc8a91a/image.png 'image.png') atk case -f fused_cross_entropy_loss.yaml -p generate_fused_cross_entropy_loss.py atk node --backend triton --devices 0 node --backend npu --devices 0 task -c result/fused_cross_entropy_loss/json/all_fused_cross_entropy_loss.json --task accuracy -p triton_fused_cross_entropy_loss.py atk node --backend triton --devices 0 node --backend npu --devices 0 task -c result/fused_cross_entropy_loss/json/all_fused_cross_entropy_loss.json --task performance_device -p triton_fused_cross_entropy_loss.py ![image.png](https://raw.gitcode.com/user-images/assets/9612429/ba106c72-2104-4d55-82c4-87203544529d/image.png 'image.png') ![image.png](https://raw.gitcode.com/user-images/assets/9612429/d7951ef0-e096-430c-9030-3e690e64d807/image.png 'image.png') See merge request: Ascend/MindSpeed-Ops!2517 天前
add mhc ops Co-authored-by: wangxuefei10<wangxuefei10@huawei.com> # message auto-generated for no-merge-commit merge: !35 merge dev_mhc_0525 into master feat: add mhc triton ops Created-by: Ling_i Commit-by: wangxuefei10 Merged-by: ascend-robot Description: ## What this PR does / why we need it? add mhc triton ops https://gitcode.com/Ascend/MindSpeed-Ops/issues/1 ## Does this PR introduce any user-facing change? Reference Operator Markdown Description. ## How was this patch tested? UT and UTK ![image.png](https://raw.gitcode.com/user-images/assets/9612429/f75f1ab2-4d23-4dc6-aacf-3d6e95d335b6/image.png 'image.png') See merge request: Ascend/MindSpeed-Ops!358 天前
add mhc ops Co-authored-by: wangxuefei10<wangxuefei10@huawei.com> # message auto-generated for no-merge-commit merge: !35 merge dev_mhc_0525 into master feat: add mhc triton ops Created-by: Ling_i Commit-by: wangxuefei10 Merged-by: ascend-robot Description: ## What this PR does / why we need it? add mhc triton ops https://gitcode.com/Ascend/MindSpeed-Ops/issues/1 ## Does this PR introduce any user-facing change? Reference Operator Markdown Description. ## How was this patch tested? UT and UTK ![image.png](https://raw.gitcode.com/user-images/assets/9612429/f75f1ab2-4d23-4dc6-aacf-3d6e95d335b6/image.png 'image.png') See merge request: Ascend/MindSpeed-Ops!358 天前
add mhc ops Co-authored-by: wangxuefei10<wangxuefei10@huawei.com> # message auto-generated for no-merge-commit merge: !35 merge dev_mhc_0525 into master feat: add mhc triton ops Created-by: Ling_i Commit-by: wangxuefei10 Merged-by: ascend-robot Description: ## What this PR does / why we need it? add mhc triton ops https://gitcode.com/Ascend/MindSpeed-Ops/issues/1 ## Does this PR introduce any user-facing change? Reference Operator Markdown Description. ## How was this patch tested? UT and UTK ![image.png](https://raw.gitcode.com/user-images/assets/9612429/f75f1ab2-4d23-4dc6-aacf-3d6e95d335b6/image.png 'image.png') See merge request: Ascend/MindSpeed-Ops!358 天前
feat:Add Sinkhorn for DS V4 Co-authored-by: liu_zhi_xu<liuzhexu1@huawei.com> # message auto-generated for no-merge-commit merge: !24 merge sinkhorn into master feat:Add Sinkhorn for DS V4 Created-by: liu_zhi_xu Commit-by: liu_zhi_xu Merged-by: ascend-robot Description: ## What this PR does / why we need it? 1、New model adaptation operator completion [#related roadmap](https://gitcode.com/Ascend/MindSpeed-Ops/issues/1) 2、Operator Description Enhancement for rmsnorm_without_weight 3、Refactor the common test func related Add/RMS/SINK ## Does this PR introduce any user-facing change? Reference Operator Markdown Description ## How was this patch tested? [UT] pytest test_sinkhorn.py ![image.png](https://raw.gitcode.com/user-images/assets/9612429/d595acb8-cc1d-4b98-b7bc-4c46d3a586a9/image.png 'image.png') [ATK] atk case -f sinkhorn.yaml -p generate_input.py atk node --backend triton --devices 0 node --backend npu --devices 0 task -c result/sinkhorn/json/all_sinkhorn.json --task performance_device -p sinkhorn.py atk node --backend triton --devices 0 node --backend npu --devices 0 task -c result/sinkhorn/json/all_sinkhorn.json --task accuracy -p sinkhorn.py ![image.png](https://raw.gitcode.com/user-images/assets/9612429/fd979bb5-ece5-4eac-adf9-5cca7a236279/image.png 'image.png') ![image.png](https://raw.gitcode.com/user-images/assets/9612429/b4809f5d-78a1-4673-9251-2915e6055f8b/image.png 'image.png') See merge request: Ascend/MindSpeed-Ops!2421 天前
feat: add RmsNormGated Co-authored-by: feng0w0<houyufeng4@huawei.com> # message auto-generated for no-merge-commit merge: !22 merge master into master feat: add RmsNormGated Created-by: feng0w0 Commit-by: feng0w0 Merged-by: ascend-robot Description: ## What this PR does / why we need it? add RmsNormGated Triton [#1](https://gitcode.com/Ascend/MindSpeed-Ops/issues/1) ## Does this PR introduce any user-facing change? Please describe whether the PR will result in any user-facing usage changes. If there is related documentation, please specify its path. ## How was this patch tested? ut ![image.png](https://raw.gitcode.com/user-images/assets/9612429/12418371-1f2d-4966-ab96-986077b3c558/image.png 'image.png') atk 精度 ![image.png](https://raw.gitcode.com/user-images/assets/9612429/3879ca4d-ab45-42c2-ab46-fa2920736536/image.png 'image.png') atk性能 ![image.png](https://raw.gitcode.com/user-images/assets/9612429/5efab269-aefb-4d02-8be5-fe36df89b857/image.png 'image.png') See merge request: Ascend/MindSpeed-Ops!228 天前
feat:Add Sinkhorn for DS V4 Co-authored-by: liu_zhi_xu<liuzhexu1@huawei.com> # message auto-generated for no-merge-commit merge: !24 merge sinkhorn into master feat:Add Sinkhorn for DS V4 Created-by: liu_zhi_xu Commit-by: liu_zhi_xu Merged-by: ascend-robot Description: ## What this PR does / why we need it? 1、New model adaptation operator completion [#related roadmap](https://gitcode.com/Ascend/MindSpeed-Ops/issues/1) 2、Operator Description Enhancement for rmsnorm_without_weight 3、Refactor the common test func related Add/RMS/SINK ## Does this PR introduce any user-facing change? Reference Operator Markdown Description ## How was this patch tested? [UT] pytest test_sinkhorn.py ![image.png](https://raw.gitcode.com/user-images/assets/9612429/d595acb8-cc1d-4b98-b7bc-4c46d3a586a9/image.png 'image.png') [ATK] atk case -f sinkhorn.yaml -p generate_input.py atk node --backend triton --devices 0 node --backend npu --devices 0 task -c result/sinkhorn/json/all_sinkhorn.json --task performance_device -p sinkhorn.py atk node --backend triton --devices 0 node --backend npu --devices 0 task -c result/sinkhorn/json/all_sinkhorn.json --task accuracy -p sinkhorn.py ![image.png](https://raw.gitcode.com/user-images/assets/9612429/fd979bb5-ece5-4eac-adf9-5cca7a236279/image.png 'image.png') ![image.png](https://raw.gitcode.com/user-images/assets/9612429/b4809f5d-78a1-4673-9251-2915e6055f8b/image.png 'image.png') See merge request: Ascend/MindSpeed-Ops!2421 天前
[Feat] Add chunk kda backward op for Kimi Linear Co-authored-by: zhuweichen<calvin_zhu0210@outlook.com> # message auto-generated for no-merge-commit merge: !26 merge kda into master [Feat] Add chunk kda backward op for Kimi Linear Created-by: zhuweichen Commit-by: zhuweichen Merged-by: ascend-robot Description: ## What this PR does / why we need it? This PR adds the chunk_kda_bwd_wy_dqkg_fused Triton operator for KDA chunk backward on Ascend arch32. The operator computes fused backward outputs dq, dk, dv, db, dg, and dA. It also adds the public API, arch32 implementation, UT, ATK cases, documentation, and README entry. https://gitcode.com/Ascend/MindSpeed-Ops/issues/28 ## Does this PR introduce any user-facing change? Yes. A new Triton API is added: from mindspeed_ops.api.triton.chunk_kda_bwd import chunk_kda_bwd_wy_dqkg_fused Documentation: docs/triton/chunk_kda_bwd.md Limitations: - Supports arch32 only; arch35 raises NotImplementedError. - Main inputs support float16 / float32. - g, h, and dh are expected to be float32. - bf16 is not declared as supported. ## How was this patch tested? UT: ```shell pytest tests/unit_tests/triton/test_chunk_kda_bwd.py -s pytest tests/unit_tests/triton/test_chunk_kda_bwd.py -m model_shape -s ``` ATK: ```shell cd tests/atk_tests/triton/chunk_kda_bwd atk case -f chunk_kda_bwd.yaml -p generate_chunk_kda_bwd.py atk node --backend triton --devices 0 node --backend npu --devices 0 task \ -c result/chunk_kda_bwd/json/all_chunk_kda_bwd.json \ --task accuracy -tup ./ -p triton_chunk_kda_bwd.py ``` ![image.png](https://raw.gitcode.com/user-images/assets/9612429/75125840-6c01-439c-bb35-f3507a829c19/image.png 'image.png') ![image.png](https://raw.gitcode.com/user-images/assets/9612429/d5d76f60-92bd-4f7a-b97c-11c3e1f44e60/image.png 'image.png') ![image.png](https://raw.gitcode.com/user-images/assets/9612429/69abe3da-d3d0-4b8c-b167-dfd8555e25ad/image.png 'image.png') See merge request: Ascend/MindSpeed-Ops!2612 天前