[AMD] Support ExtractSliceOp for AxisInfo (#7094)
This commit updates AxisInfo to support backend callbacks to
enable recognizing backend ops.
One can use ExtractSliceOp to slice tensors of pointers to refine
tt.load or tt.store. The TritonAMDGPUConvertToBufferOpsBase
will fail to perform negativity analysis due to the presence of
ExtractSliceOp which after rewrites is going to slice tensors of
offsets. This PR addresses the issue.
[AMD] Add missing CMake dependency on TritonAMDGPUTableGen (#7824)
The library TritonAMDAnalysis includes Dialect.h which in turn includes
Dialect.h.inc. This means that for the library to build successfully,
the tablegen target that produces Dialect.h.inc must run first. That
target is TritonAMDGPUTableGen. However, TritonAMDAnalysis has no
dependency on TritonAMDGPUTableGen resulting in spurious build breaks.
This change adds the missing dependency.
Fixes https://github.com/triton-lang/triton/issues/7821
[AMD] Fixed pid range analysis assumption (#7793)
Fixes a bug in RangeAnalysis where the assumptions about the max number
of programs were wrong for the X dimension. This is the correct
information based on rocminfo.
```
Grid Max Size: 4294967295(0xffffffff)
Grid Max Size per Dimension:
x 2147483647(0x7fffffff)
y 65535(0xffff)
z 65535(0xffff)
```
This was leading to an IMA in inductor generated code when it generated
a 1D grid of 72,000 programs.