triton-ascend/third_party/amd/python/test · Ascend/triton-ascend - AtomGit

GGitHubReland "byte permutes in intra-warp layout conversion" (#7933 )

1e6be008创建于 2025年8月22日历史提交

文件	最后提交记录	最后更新时间
address_sanitizer_helper.py	[AMD] Move address sanitizer tests into amd backend (#5524)	1 年前
attn_fwd.ttir	[AMD][LLVM] Scalarize packed fops in the same mfma/wmma block (#6656) This PR adds an _LLVM Pass_ that scalarizes vector `fmul`s and `fadd`s in basic blocks that contain MFMAs/WMMAs. The point/purpose/value of doing this is these instructions get codegened to "packed" ops (`v_pk_mul_f32`/`v_pk_add_f32`), which cannot be co-issued with mfma, thus there is a performance cost. Concretely/specifically this eliminates `v_pk_mul_f32`/`v_pk_add_f32` operations in the final asm in bbs with MFMAs. Note, these "scalar" floating point ops will still get lowered to vector instructions like `v_mul_f32_e32` and `v_add_u32_e32`, just not the "packed" variants. Note, these packed fops aren't actually emitted by triton per se - they are introduced/inserted by the `VectorCombine::foldPermuteOfBinops` pattern during the `optimize_module` pipeline (hence why this LLVM pass needs to follow that pipeline).	1 年前
conftest.py	[Tests] Use `tmp_path` pytest fixture instead of `tempfile.NamedTemporaryFile` (#7735) `tmp_path` is the main style in tests. And plus to this it does not cause problems when running these tests on Windows. --------- Signed-off-by: Anatoly Myachev <anatoly.myachev@intel.com>	9 个月前
test_address_sanitizer.py	[AMD] Turn buffer ops support on by default (#5960) This commit moves buffer ops to be on by default. Under the hood it means we are emitting buffer load/store intrinsics instead of global load/store ones when possible. This aims to improve performance due to better out of bound access, better register usage, etc. Though buffer ops have limitations. To fully get the benefits, some kernel `tl.assume` annotations are needed to guide the analysis to help it kick in.	1 年前
test_convert_op_permlane_swap.py	Reland "byte permutes in intra-warp layout conversion" (#7933) Reland https://github.com/triton-lang/triton/pull/7809, https://github.com/triton-lang/triton/pull/7825, https://github.com/triton-lang/triton/pull/7861 Add a workaround for ptxas bug and add a regression test	9 个月前
test_extract_slice_concat_op.py	[Tests] Use `tmp_path` pytest fixture instead of `tempfile.NamedTemporaryFile` (#7735) `tmp_path` is the main style in tests. And plus to this it does not cause problems when running these tests on Windows. --------- Signed-off-by: Anatoly Myachev <anatoly.myachev@intel.com>	9 个月前
test_scalarize_packed_fops.py	[AMD][LLVM] Scalarize packed fops in the same mfma/wmma block (#6656) This PR adds an _LLVM Pass_ that scalarizes vector `fmul`s and `fadd`s in basic blocks that contain MFMAs/WMMAs. The point/purpose/value of doing this is these instructions get codegened to "packed" ops (`v_pk_mul_f32`/`v_pk_add_f32`), which cannot be co-issued with mfma, thus there is a performance cost. Concretely/specifically this eliminates `v_pk_mul_f32`/`v_pk_add_f32` operations in the final asm in bbs with MFMAs. Note, these "scalar" floating point ops will still get lowered to vector instructions like `v_mul_f32_e32` and `v_add_u32_e32`, just not the "packed" variants. Note, these packed fops aren't actually emitted by triton per se - they are introduced/inserted by the `VectorCombine::foldPermuteOfBinops` pattern during the `optimize_module` pipeline (hence why this LLVM pass needs to follow that pipeline).	1 年前