文件最后提交记录最后更新时间
[BACKEND] Implement generic swizzling when lowering convert_layout (#6982) We implement a generic swizzling algorithm by @apgoucher that, given two linear layouts, finds the optimal shared memory layout that maximises read/write vectorisation and, provided that, minimises bank conflicts. We also implement an algorithm to find the minimum tile size necessary to perform the convert_layout given the restrictions above, and we use it to perform the convert_layout iteratively. This PR does not yet implement a lowering to ldmatrix/stmatrix, we'll do that in a future PR. --------- Co-authored-by: Adam P. Goucher <apgoucher@openai.com>11 个月前
[Swizzling] Disallow asymmetric vectorisation within generic swizzling (#7833) Before, we would generate bank conflicts in, say, the load, as we were too eager to use st.shared.b32.vn. Now in these cases, we do not use vectorisation on the stores, but we do not create bank conflicts on the load either. We leave a comment as to how we could do better, but it'd be a bit of effort for rather modest wins (if any). Fixes https://github.com/triton-lang/triton/issues/78159 个月前
[BACKEND] Always lower tcgen05.ld/st via generic LinearLayout code (#7862) We now add support for TensorMemoryScalesEncoding in toLinearLayout and we reuse the generic lowering to lower it.9 个月前
[RELAND][BACKEND] Implement generic lowering for tcgen05.ld/st (#7874) Reland https://github.com/triton-lang/triton/pull/7831 This new lowering supports all the instructions and loads/stores from/to arbitrary layouts. It also makes supporting all the available instructions trivial.9 个月前