文件最后提交记录最后更新时间
feat(triton):sort_chunks_by_idx Co-authored-by: guofanfeng<guofanfeng1@huawei.com> # message auto-generated for no-merge-commit merge: !2997 merge master into master feat(triton):sort_chunks_by_idx Created-by: guofanfeng23 Commit-by: guofanfeng Merged-by: ascend-robot Description: sort_chunks_by_idx triton算子接入 算子验证结果: https://wiki.huawei.com/domains/152732/wiki/307991/WIKI202511219117266 See merge request: Ascend/MindSpeed!29975 个月前
[fix]:solve the problem of chunk_bwd_dqkwg triton's time degradation. Co-authored-by: LinShua<707894133@qq.com> # message auto-generated for no-merge-commit merge: !3379 merge 26.0.0_core_r0.12.1_dqkwg_triton_time into 26.0.0_core_r0.12.1 [fix]:solve the problem of chunk_bwd_dqkwg triton's time degradation. Created-by: LinShua Commit-by: LinShua Merged-by: ascend-robot Description: What this PR does / why we need it? 解决chunk_bwd_dqkwg 算子定长性能优化带来的变长性能劣化问题: chunk_bwd_kernel_dqkwg 定长: 原逻辑:26158us 优化后:18855us 变长: 原逻辑:33450us 优化后:89597us 当前PR: 定长: 18769us 变长: 33509us Does this PR introduce any user-facing change? NA How was this patch tested? 见PR:[fix]:solve the UT of GDN triton. See merge request: Ascend/MindSpeed!33791 个月前
[fix]:solve the problem of chunk_bwd_dqkwg triton's time degradation. Co-authored-by: LinShua<707894133@qq.com> # message auto-generated for no-merge-commit merge: !3379 merge 26.0.0_core_r0.12.1_dqkwg_triton_time into 26.0.0_core_r0.12.1 [fix]:solve the problem of chunk_bwd_dqkwg triton's time degradation. Created-by: LinShua Commit-by: LinShua Merged-by: ascend-robot Description: What this PR does / why we need it? 解决chunk_bwd_dqkwg 算子定长性能优化带来的变长性能劣化问题: chunk_bwd_kernel_dqkwg 定长: 原逻辑:26158us 优化后:18855us 变长: 原逻辑:33450us 优化后:89597us 当前PR: 定长: 18769us 变长: 33509us Does this PR introduce any user-facing change? NA How was this patch tested? 见PR:[fix]:solve the UT of GDN triton. See merge request: Ascend/MindSpeed!33791 个月前
chunk_scaled_dot_kkt chunk_bwd_dv_local chunk_bwd_dqkwg 算子变长/mbs2同时适配 Co-authored-by: xubin<mark19980312@126.com> # message auto-generated for no-merge-commit merge: !3116 merge master into master chunk_scaled_dot_kkt chunk_bwd_dv_local chunk_bwd_dqkwg 算子变长/mbs2同时适配 Created-by: xubin787 Commit-by: xubin Merged-by: ascend-robot Description: chunk_scaled_dot_kkt chunk_bwd_dv_local chunk_bwd_dqkwg 算子变长/mbs2同时适配,精度对齐 See merge request: Ascend/MindSpeed!31165 个月前
fix(gdn): fix dqkwg and cumsum bug Co-authored-by: chy3<843049740@qq.com> # message auto-generated for no-merge-commit merge: !3274 merge master_dqkwg_cumsum into master fix(gdn): fix dqkwg and cumsum bug Created-by: chy3 Commit-by: chy3 Merged-by: ascend-robot Description: fix(gdn): fix dqkwg and cumsum bug See merge request: Ascend/MindSpeed!32743 个月前
feat(triton): add triton fused_norm_gate kernels and UTs Co-authored-by: iiiiLllllzx<353279568@qq.com> Co-authored-by: mm_abc<mulinhong@huawei.com> Co-authored-by: iiiiLllllzx<zhouyang271@huawei.com> # message auto-generated for no-merge-commit merge: !3183 merge lzx-dev into master feat(triton): add triton fused_norm_gate kernels and UTs Created-by: iiiiLllllzx Commit-by: iiiiLllllzx;mm_abc Merged-by: ascend-robot Description: Add Triton operators fused_norm_gate with corresponding test cases. Open-source implementation of fused_norm_gatehttps://github.com/fla-org/flash-linear-attention/blob/main/fla/modules/fused_norm_gate.py See merge request: Ascend/MindSpeed!31833 个月前
update triton l2norm chunk_scaled_dot_kkt kernels and UTs Co-authored-by: xubin<mark19980312@126.com> # message auto-generated for no-merge-commit merge: !3039 merge master into master update triton l2norm chunk_scaled_dot_kkt kernels and UTs Created-by: xubin787 Commit-by: xubin787;xubin Merged-by: ascend-robot Description: update triton l2norm chunk_scaled_dot_kkt kernels and UTs See merge request: Ascend/MindSpeed!30395 个月前
feat(triton): sinkhorn operation perf enhance Co-authored-by: liu_zhi_xu<liuzhexu1@huawei.com> # message auto-generated for no-merge-commit merge: !3223 merge master into master feat(triton): sinkhorn operation perf enhance Created-by: liu_zhi_xu Commit-by: liu_zhi_xu Merged-by: ascend-robot Description: perf enhance See merge request: Ascend/MindSpeed!32234 个月前
feat(triton): optimize ops solve_tril and chunk_delta_h Co-authored-by: wangxuefei10<wangxuefei10@huawei.com> # message auto-generated for no-merge-commit merge: !3190 merge dev_gdn_opt into master feat(triton): optimize ops solve_tril and chunk_delta_h Created-by: Ling_i Commit-by: wangxuefei10 Merged-by: ascend-robot Description: feat(triton): optimize ops solve_tril and chunk_delta_h See merge request: Ascend/MindSpeed!31904 个月前
docs:fix docs/zh mistakes Co-authored-by: Keilo_W<wangkaiyu11@h-partners.com> # message auto-generated for no-merge-commit merge: !3318 merge master into master docs:fix docs/zh mistakes Created-by: Keilo_W Commit-by: Keilo_W Merged-by: ascend-robot Description: 修改了一些被误操作的注释及代码 See merge request: Ascend/MindSpeed!33182 个月前
[fix]:solve the error reported when calling the TA interface with the triton operator. Co-authored-by: LinShua<707894133@qq.com> # message auto-generated for no-merge-commit merge: !3295 merge master_triton_ut_0303 into master [fix]:solve the error reported when calling the TA interface with the triton operator. Created-by: LinShua Commit-by: LinShua Merged-by: ascend-robot Description: ## What this PR does / why we need it? 根据triton-ascend新版本接口变化,进行适配修改。 ## Does this PR introduce any user-facing change? triton-ascend版本更新到20260210204207版本 ## How was this patch tested? 测试脚本指令:python3 -m pytest --color=no -k "not allocator" -x tests_extend/unit_tests See merge request: Ascend/MindSpeed!32952 个月前
[fix]:solve the problem of chunk_bwd_dqkwg triton's time degradation. Co-authored-by: LinShua<707894133@qq.com> # message auto-generated for no-merge-commit merge: !3379 merge 26.0.0_core_r0.12.1_dqkwg_triton_time into 26.0.0_core_r0.12.1 [fix]:solve the problem of chunk_bwd_dqkwg triton's time degradation. Created-by: LinShua Commit-by: LinShua Merged-by: ascend-robot Description: What this PR does / why we need it? 解决chunk_bwd_dqkwg 算子定长性能优化带来的变长性能劣化问题: chunk_bwd_kernel_dqkwg 定长: 原逻辑:26158us 优化后:18855us 变长: 原逻辑:33450us 优化后:89597us 当前PR: 定长: 18769us 变长: 33509us Does this PR introduce any user-facing change? NA How was this patch tested? 见PR:[fix]:solve the UT of GDN triton. See merge request: Ascend/MindSpeed!33791 个月前