| [fix]:solve the problem of chunk_bwd_dqkwg triton's time degradation.
Co-authored-by: LinShua<707894133@qq.com>
# message auto-generated for no-merge-commit merge:
!3379 merge 26.0.0_core_r0.12.1_dqkwg_triton_time into 26.0.0_core_r0.12.1
[fix]:solve the problem of chunk_bwd_dqkwg triton's time degradation.
Created-by: LinShua
Commit-by: LinShua
Merged-by: ascend-robot
Description: What this PR does / why we need it?
解决chunk_bwd_dqkwg 算子定长性能优化带来的变长性能劣化问题:
chunk_bwd_kernel_dqkwg
定长: 原逻辑:26158us 优化后:18855us
变长: 原逻辑:33450us 优化后:89597us
当前PR: 定长: 18769us 变长: 33509us
Does this PR introduce any user-facing change?
NA
How was this patch tested?
见PR:[fix]:solve the UT of GDN triton.
See merge request: Ascend/MindSpeed!3379 | 1 个月前 |