文件最后提交记录最后更新时间
Linear layouts (#3794) Today we have many different layout objects, representing e.g. MMAv2 operands in registers, MMAv2 results in registers (different thing!), AMD tensor core operands in registers, shared memory swizzled Just Right for Hopper MMAv3, and so on. In CUTLASS v2, they used to have the same problem. In v3, they introduced the notion of a CuTe layout, which unifies all of these special cases into one programmatic thing. I want to do the same thing for Triton, because 1. we have a bunch of [known bugs](https://github.com/openai/triton/blob/0b46687895f0bc7c4d5216150d8d5cfeb5b4e254/python/test/unit/language/test_core.py#L4771) around layout conversions that have been very hard to fix, 2. there are certain operations (like some reshape + transpose + reshape combinations) that cannot be represented efficiently with today's layouts, and 3. the code for handling layouts is already very complex, and I'm concerned that Blackwell is going to make the problem worse. One approach I considered is using CuTe inside Triton. But I concluded it's not a great fit for various reasons. As an alternative, @apgoucher proposed this idea of "linear layouts" that seems to work really well, and is a lot simpler. This PR is currently a first pass of linear layouts. It Appears To Work (tm). The way this PR uses linear layouts is that before we generate the indices for a Triton BlockedLayout, we convert it to a linear layout and use that to generate indices instead. The implementation plan is to do the same thing for the other Triton layouts (i.e. make codegen only use linear layouts). Once that is working, we can start using linear layouts in the Triton middle-end. Eventually the goal is to replace all layouts with just this one. There are a few questions still outstanding which need to be resolved before we can land this. 1. Are linear layouts actually flexible enough to represent all the layouts we care about? 2. What will the textual IR look like for linear layouts? Can we make it as easy to read as the current IR?2 年前
Add LL::quotient and remove uses of divideRight and sublayoutIsIdentity (#4968) We add a new abstraction LL::quotient that abstracts the idea of "a linear layout does not permute certain dimensions". Doing so, allows us to remove divideRight and subsume them into this higher-level abstraction. We also fix a bug in isCrossCTAConversion. We also remove some code duplication from transferWithinThreads and cvtReorderRegisters in favour of a more generic approach. We fix a bug in sublayout that meant that sublayout would reorder outDims at will by using a set instead of a vector. I am missing adding tests for LL::quotient, will do in a minute.1 年前