文件最后提交记录最后更新时间
[AMD] Support ExtractSliceOp for AxisInfo (#7094) This commit updates AxisInfo to support backend callbacks to enable recognizing backend ops. One can use ExtractSliceOp to slice tensors of pointers to refine tt.load or tt.store. The TritonAMDGPUConvertToBufferOpsBase will fail to perform negativity analysis due to the presence of ExtractSliceOp which after rewrites is going to slice tensors of offsets. This PR addresses the issue.11 个月前
[AMD] Fix vmcnt(0) for LocalLoads with loop-carried AsyncToken (#7052) Moves all async related llvm workaround function to a separate utility file. Reuses LocalLoad annotations introduced by https://github.com/triton-lang/triton/pull/7047 to handle loop-carried tokens in alias computations. The only functional change is handling vmcnt(0) case better. Before this PR we get a vmcnt(0) before the ds_read for such cases. --------- Co-authored-by: Lei Zhang <antiagainst@gmail.com>11 个月前
[AMD] improve RangeAnalysis to support persistent kernel (#6390) This PR implements some new features in RangeAnalysis to better support persistent kernels: 1. tl.assume on tt.func args 2. tl.assume for scf.for bounds and analysis inside such scf.fors (this used to only work with static/constant bounds) 3. inference through unsigned arith ops. strictly this isn't necessary since integers are signed/signless in triton but it is useful to be able to see how both the signed and unsigned values in the ConstantIntRanges actually change. Note, by tl.assume I mean hinting ranges for inputs using e.g. tl.assume(K > 1), tl.assume(K < 10).1 年前
[AMD] Support ExtractSliceOp for AxisInfo (#7094) This commit updates AxisInfo to support backend callbacks to enable recognizing backend ops. One can use ExtractSliceOp to slice tensors of pointers to refine tt.load or tt.store. The TritonAMDGPUConvertToBufferOpsBase will fail to perform negativity analysis due to the presence of ExtractSliceOp which after rewrites is going to slice tensors of offsets. This PR addresses the issue.11 个月前