文件最后提交记录最后更新时间
[BACKEND] Localize the use and definition of getShapePerCTATile in the AMD backend and aim for elimination (#7740) 9 个月前
[AMD] Support ExtractSliceOp for AxisInfo (#7094) This commit updates AxisInfo to support backend callbacks to enable recognizing backend ops. One can use ExtractSliceOp to slice tensors of pointers to refine tt.load or tt.store. The TritonAMDGPUConvertToBufferOpsBase will fail to perform negativity analysis due to the presence of ExtractSliceOp which after rewrites is going to slice tensors of offsets. This PR addresses the issue.11 个月前
[AMD] Add missing CMake dependency on TritonAMDGPUTableGen (#7824) The library TritonAMDAnalysis includes Dialect.h which in turn includes Dialect.h.inc. This means that for the library to build successfully, the tablegen target that produces Dialect.h.inc must run first. That target is TritonAMDGPUTableGen. However, TritonAMDAnalysis has no dependency on TritonAMDGPUTableGen resulting in spurious build breaks. This change adds the missing dependency. Fixes https://github.com/triton-lang/triton/issues/78219 个月前
[AMD] Fixed pid range analysis assumption (#7793) Fixes a bug in RangeAnalysis where the assumptions about the max number of programs were wrong for the X dimension. This is the correct information based on rocminfo. ``` Grid Max Size: 4294967295(0xffffffff) Grid Max Size per Dimension: x 2147483647(0x7fffffff) y 65535(0xffff) z 65535(0xffff) ``` This was leading to an IMA in inductor generated code when it generated a 1D grid of 72,000 programs.9 个月前