| [Backend] Bump to llvm/llvm-project@bc773632355b (#7881) | 9 个月前 |
| [WS] reorder partition-loops and lower-aref (#7927) | 9 个月前 |
| [BACKEND] Don't use mmav5 with num warps < 4 (#7928) | 9 个月前 |
| [WS] Reimplement PartitionLoops pass supporting general control flow (#7415) | 10 个月前 |
| [NFC] Use RankedTensorType's clone and cloneWithEncoding member functions (#7464) | 10 个月前 |
| [AMD] NFC: simplify pass/pattern constructor declaration (#7665) | 10 个月前 |
| Add support for Nvidia Blackwell GPUs (#5724) | 1 年前 |
| [AMD] Support scaled dot for gfx12 (#7644) | 10 个月前 |
| [BACKEND] bump llvm to 2f7ade4b (#5599) | 1 年前 |
| [RELAND][Blackwell] Optimize MMA warp specialization to allow multiple consumers of MMAv5 result (#6514) | 1 年前 |
| [BACKEND] Fix wrong check in tmem_alloc canonicalization pattern (#7719) | 10 个月前 |
| [Utility] fix pass-by-reference addIterArgsToLoop API (#7029) | 1 年前 |
| [LAYOUTS] Fix memdesc_subviews when we don't slice along the swizzling pattern (#7480) | 10 个月前 |
| [NFC] Use RankedTensorType's clone and cloneWithEncoding member functions (#7464) | 10 个月前 |
| [TritonGPU] Split MemDescSubview into MemDescIndex and MemDescSubslice (#7622) | 10 个月前 |
| [LAYOUTS] Move order to LinearEncoding implementation (#6243) | 1 年前 |
| [NFC] Use RankedTensorType's clone and cloneWithEncoding member functions (#7464) | 10 个月前 |
| [BACKEND] Fix dereference nullptr (#5944) | 1 年前 |
| [NVWS] Add pass to insert aref for TMA load (#7581) | 10 个月前 |