| [BACKEND] Implement generic swizzling when lowering convert_layout (#6982)
We implement a generic swizzling algorithm by @apgoucher that, given two
linear layouts, finds the optimal shared memory layout that maximises
read/write vectorisation and, provided that, minimises bank conflicts.
We also implement an algorithm to find the minimum tile size necessary
to perform the convert_layout given the restrictions above, and we use
it to perform the convert_layout iteratively.
This PR does not yet implement a lowering to ldmatrix/stmatrix, we'll do
that in a future PR.
---------
Co-authored-by: Adam P. Goucher <apgoucher@openai.com> | 11 个月前 |