| [AMD] Fix buffer op mask operand removal (#7963) | 9 个月前 |
| [IR] Improve memdesc_index printing to make it obvious what operand is the index (#7876) | 9 个月前 |
| [BACKEND] Don't use mmav5 with num warps < 4 (#7928) | 9 个月前 |
| [TritonGPU] Enable accum-init optimization for unconditionally zero-ed accumulators (#6395) | 1 年前 |
| [Dialect] Actually enable TMEM layout check and fix all the tests (#7723) | 10 个月前 |
| [TensorDesc] Cleanup ttng IR representation (#7036) | 11 个月前 |
| [Dialect] Layout attr cleanup and tighten invariants (#7714) | 10 个月前 |
| [BACKEND] Refactor shared memory layout representation (#5786) | 1 年前 |
| [Dialect] Actually enable TMEM layout check and fix all the tests (#7723) | 10 个月前 |
| Add support for Nvidia Blackwell GPUs (#5724) | 1 年前 |
| [Dialect] Actually enable TMEM layout check and fix all the tests (#7723) | 10 个月前 |
| [IR] Improve memdesc_index printing to make it obvious what operand is the index (#7876) | 9 个月前 |
| [IR] Improve memdesc_index printing to make it obvious what operand is the index (#7876) | 9 个月前 |
| [LAYOUTS] Implement toLinearLayout for TensorMemoryEncodingAttr (#7748) | 9 个月前 |
| [LAYOUTS] Implement toLinearLayout for TensorMemoryEncodingAttr (#7748) | 9 个月前 |
| [TritonGPU] Propagate num_stages from outer loop when flattening (#6233) | 1 年前 |
| [DIALECT] Rename triton_gpu to ttg and triton_nvidia_gpu to ttng (#5266) | 1 年前 |
| [PROTON] Intra kernel profiling (#7258) | 10 个月前 |
| [BACKEND] Fix wrong check in tmem_alloc canonicalization pattern (#7719) | 10 个月前 |
| [Dialect] Actually enable TMEM layout check and fix all the tests (#7723) | 10 个月前 |
| [AMD] Enable dot_scaled on gfx11 (#7954) | 9 个月前 |
| [IR] Improve memdesc_index printing to make it obvious what operand is the index (#7876) | 9 个月前 |
| [WS] reorder partition-loops and lower-aref (#7927) | 9 个月前 |
| [IR] Improve memdesc_index printing to make it obvious what operand is the index (#7876) | 9 个月前 |
| [IR] Improve memdesc_index printing to make it obvious what operand is the index (#7876) | 9 个月前 |
| [BACKEND] combineRedundantWaitOps should not combine across loops/branches (#7593) | 10 个月前 |
| [Dialect] Actually enable TMEM layout check and fix all the tests (#7723) | 10 个月前 |
| [TritonGPU] Split MemDescSubview into MemDescIndex and MemDescSubslice (#7622) | 10 个月前 |
| [Dialect] Actually enable TMEM layout check and fix all the tests (#7723) | 10 个月前 |
| [Dialect] Actually enable TMEM layout check and fix all the tests (#7723) | 10 个月前 |
| [IR] Improve memdesc_index printing to make it obvious what operand is the index (#7876) | 9 个月前 |
| [Dialect] Actually enable TMEM layout check and fix all the tests (#7723) | 10 个月前 |
| [IR] Improve memdesc_index printing to make it obvious what operand is the index (#7876) | 9 个月前 |
| [LAYOUTS] Enable generic swizzling on AMD (#7225) | 11 个月前 |
| [Dialect] Actually enable TMEM layout check and fix all the tests (#7723) | 10 个月前 |
| [Pipeliner] Merge warp specialization and pipeliner scheduling (#6887) | 1 年前 |
| [IR] Improve memdesc_index printing to make it obvious what operand is the index (#7876) | 9 个月前 |
| [Gluon] Actually enable scf+cf+arith canonicalizers (#7775) | 9 个月前 |
| [Dialect] Actually enable TMEM layout check and fix all the tests (#7723) | 10 个月前 |
| [Dialect] Actually enable TMEM layout check and fix all the tests (#7723) | 10 个月前 |
| [DIALECT] Rename triton_gpu to ttg and triton_nvidia_gpu to ttng (#5266) | 1 年前 |
| [NVWS] Assign final try_wait to a partition. (#7757) | 9 个月前 |
| [NVWS] Assign final try_wait to a partition. (#7757) | 9 个月前 |
| [Dialect] Actually enable TMEM layout check and fix all the tests (#7723) | 10 个月前 |
| [TritonGPU] Split MemDescSubview into MemDescIndex and MemDescSubslice (#7622) | 10 个月前 |
| [IR] Improve memdesc_index printing to make it obvious what operand is the index (#7876) | 9 个月前 |
| [Dialect] Actually enable TMEM layout check and fix all the tests (#7723) | 10 个月前 |
| [Dialect] Actually enable TMEM layout check and fix all the tests (#7723) | 10 个月前 |
| [IR] Improve memdesc_index printing to make it obvious what operand is the index (#7876) | 9 个月前 |
| [BACKEND] Add a new pass to insert fence.proxy.async for write after read hazard (#7262) | 11 个月前 |
| [Backend] Improve warp-local layout conversion algo using shuffles (#7558) | 10 个月前 |
| [Dialect] Actually enable TMEM layout check and fix all the tests (#7723) | 10 个月前 |
| [NVWS] Assign final try_wait to a partition. (#7757) | 9 个月前 |
| Remove -split-input-file from tests that don't use file splitting (#6257) | 1 年前 |
| [DIALECT] Rename triton_gpu to ttg and triton_nvidia_gpu to ttng (#5266) | 1 年前 |