| Fix default FMA implementation for tensors with integer elements (#7419) | 10 个月前 |
| [AMD] Introduce specialized Allocation pass (#7328) | 11 个月前 |
| [AMD] Introduce specialized Allocation pass (#7328) | 11 个月前 |
| [Warp Specialization] Allow worker partitions to steal registers from the default partition (#6798) | 1 年前 |
| [ConSan] Support for WGMMA. Checks on non-async shmem and tmem accesses (#7712) | 10 个月前 |
| [AMD] Introduce specialized Allocation pass (#7328) | 11 个月前 |
| [PROTON] Intra kernel profiling (#7258) | 10 个月前 |
| Reland "byte permutes in intra-warp layout conversion" (#7933) | 9 个月前 |
| [Triton][Gluon] Add map_elementwise (#7564) | 10 个月前 |
| [PROTON] Intra kernel profiling (#7258) | 10 个月前 |
| [NFC] Make toLinearLayout take a RankedTensorType or MemDescType (#7440) | 10 个月前 |
| [Backend] Plumb ttg.warp_specialize through LLVM lowering (#5963) | 1 年前 |
| Fix histograms for complex replicated layouts (#7938) | 9 个月前 |
| [NFC] replace TritonGPUToLLVM/Utility.h macros with TritonLLVMOpBuilder (#5717) | 1 年前 |
| [BACKEND] Fix vectorization for PaddedSharedEncoding with non default order (#7845) | 9 个月前 |
| [Backend][NFC] Switch some inline PTX to NVVM ops/intrinsics (#7725) | 10 个月前 |
| NFC: Fix common typos (#6309) | 1 年前 |
| [BACKEND] Move lowering of CF as the last step of conversion to LLVM (#7213) | 11 个月前 |
| [Backend][NFC] Switch some inline PTX to NVVM ops/intrinsics (#7725) | 10 个月前 |
| [LAYOUTS] Kill getWarpsPerCTA(Attribute) and prefer LinearLayout-based impl (#6252) | 1 年前 |
| Cleanup includes in TritonGPUToLLVM/Utility.h (#6818) | 1 年前 |
| [Backend] Fix various issues with smem base offsets (#7949) | 9 个月前 |
| [Backend] Fix various issues with smem base offsets (#7949) | 9 个月前 |