triton-ascend/third_party/amd/include · Ascend/triton-ascend - AtomGit

GGitHub[Backend] Bump to llvm/llvm-project@bc773632355b (#7881 )

6ca2dda9创建于 2025年8月19日历史提交

文件	最后提交记录	最后更新时间
Analysis	[BACKEND] Localize the use and definition of `getShapePerCTATile` in the AMD backend and aim for elimination (#7740)	9 个月前
Dialect	[Backend] Bump to llvm/llvm-project@bc773632355b (#7881) * Switched `Constant{Int\|Float}Op` type and value order following llvm/llvm-project@a45fda6aeba362926da6cc1b107be92dafb0d490 * Provided triple for `TargetLibraryInfoImpl` following llvm/llvm-project@c91cbafad2119cace85499e8d231b8e5737f3b41 * Fixed atomic sync scope for NVIDIA following llvm/llvm-project@0f1b16dd5f83fd931ecb111bb925ac9e1d56f589 * Updated MLIR lib names following llvm/llvm-project@e68a20e0b7623738d6af736d3aa02625cba6126a * Updated `nvvm.stmatrix` op following llvm/llvm-project@2b27377b0bf72e4524774dedf4b03521b07606d5 * Updated `ROCDL::Mbcnt{Lo\|Hi}Op` following llvm/llvm-project@bbe3d64b39d80c2d6132fbad6008b2a6e86fd4d5 Closes https://github.com/triton-lang/triton/pull/7413 Closes https://github.com/triton-lang/triton/pull/7575 Closes https://github.com/triton-lang/triton/pull/7765 --------- Co-authored-by: Yi Qian <yi.qian@amd.com> Co-authored-by: Thomas Raoux <thomas.raoux@openai.com>	9 个月前
TritonAMDGPUToLLVM	[Backend] Bump to llvm/llvm-project@bc773632355b (#7881) * Switched `Constant{Int\|Float}Op` type and value order following llvm/llvm-project@a45fda6aeba362926da6cc1b107be92dafb0d490 * Provided triple for `TargetLibraryInfoImpl` following llvm/llvm-project@c91cbafad2119cace85499e8d231b8e5737f3b41 * Fixed atomic sync scope for NVIDIA following llvm/llvm-project@0f1b16dd5f83fd931ecb111bb925ac9e1d56f589 * Updated MLIR lib names following llvm/llvm-project@e68a20e0b7623738d6af736d3aa02625cba6126a * Updated `nvvm.stmatrix` op following llvm/llvm-project@2b27377b0bf72e4524774dedf4b03521b07606d5 * Updated `ROCDL::Mbcnt{Lo\|Hi}Op` following llvm/llvm-project@bbe3d64b39d80c2d6132fbad6008b2a6e86fd4d5 Closes https://github.com/triton-lang/triton/pull/7413 Closes https://github.com/triton-lang/triton/pull/7575 Closes https://github.com/triton-lang/triton/pull/7765 --------- Co-authored-by: Yi Qian <yi.qian@amd.com> Co-authored-by: Thomas Raoux <thomas.raoux@openai.com>	9 个月前
TritonAMDGPUTransforms	[AMD] Use single LDS for both transposed and non-transposed access (#7813) This commit introduces a pass for detecting a pair of tt.dot ops that both use the same tt.load result, one directly and one via tt.trans and creates the same shared memory allocation. This allows the pipeliner to pick a single LDS layout, and enables pipeline of the loads.	9 个月前
Utils	[AMD] Rewrite extract_slice op implementation (#7128) This PR refactors the extract_slice operation to support two major improvements: 1) Relaxed Layout Constraints The operation now allows more flexible source and destination layouts, aligning better with linear layouts. 2) Support for Arbitrary Tensor Ranks extract_slice is no longer limited to 2D tensors and can now handle tensors of any rank. The "extract_slice" operation enables extracting a slice of a tensor in registers. It supports the following arguments: * source: the base tensor on which to create a view tensor * offsets: offsets into the base tensor at which to create the view In distributed layouts, tensors are divided into CTA tiles. A CTA tile represents the smallest contiguous portion of a tensor that is distributed across all threads and warps within a workgroup. The ExtractSlice operation extracts a portion of the tensor that aligns with CTA tile boundaries. This op is designed to work on logical tensors directly, avoiding the need for complex layout reinterpretation or reshaping. For example, the tt.split operation only supports splitting along the innermost dimension, and requires that the resulting innermost dimension provide 2 elements per thread, distributed across registers. In contrast, extract_slice op imposes no constraints on the extraction dimension or the size of dimensions. --------- Co-authored-by: Ognjen Plavsic <plognjen@amd.com> Co-authored-by: Lei Zhang <antiagainst@gmail.com>	11 个月前
CMakeLists.txt	[AMD] Add TritonAMDGPU dialect scaffolding (#4685) This PR adds an TritonAMDGPU dialect to host future AMD specific ops to help with AMD backend CodeGen. --------- Co-authored-by: Ognjen Plavsic <ognjen.plavsic@luxoft.com> Co-authored-by: Lei Zhang <antiagainst@gmail.com>	1 年前