| [AMD] Use single LDS for both transposed and non-transposed access (#7813)
This commit introduces a pass for detecting a pair of tt.dot ops that
both use the same tt.load result, one directly and one via tt.trans and
creates the same shared memory allocation. This allows the pipeliner to
pick a single LDS layout, and enables pipeline of the loads. | 9 个月前 |