[AMD] Support ExtractSliceOp for AxisInfo (#7094)
This commit updates AxisInfo to support backend callbacks to
enable recognizing backend ops.
One can use ExtractSliceOp to slice tensors of pointers to refine
tt.load or tt.store. The TritonAMDGPUConvertToBufferOpsBase
will fail to perform negativity analysis due to the presence of
ExtractSliceOp which after rewrites is going to slice tensors of
offsets. This PR addresses the issue.
[AMD] improve RangeAnalysis to support persistent kernel (#6390)
This PR implements some new features in RangeAnalysis
to better support persistent kernels:
1. tl.assume on tt.func args
2. tl.assume for scf.for bounds and analysis inside such scf.fors
(this used to only work with static/constant bounds)
3. inference through unsigned arith ops. strictly this isn't necessary
since integers are signed/signless in triton but it is useful to be
able to see how both the signed and unsigned values in the
ConstantIntRanges actually change.
Note, by tl.assume I mean hinting ranges for inputs using e.g.
tl.assume(K > 1), tl.assume(K < 10).