文件最后提交记录最后更新时间
[AMD] Support FP8E5M2 with MFMA FP16 instructions (#4259) Cast dot arguments from unsupported FP8 to supported FP16 in order to use MFMA instructions instead of FMA. This approach is expected to give better performance and be more stable compared to FMA implementation. --------- Co-authored-by: Lei Zhang <antiagainst@gmail.com>1 年前
[AMD] Support WMMAv2 in AccelerateAMDMatmulPass (#4452) - Specify kWidth parameter according to the version - For the first iteration fp8 operands are unsupported, no new operand configuration are added for now - Added lit tests Signed-off-by: Ilya Veselov <iveselov.nn@gmail.com>1 年前
[AMD] Support WMMAv2 in AccelerateAMDMatmulPass (#4452) - Specify kWidth parameter according to the version - For the first iteration fp8 operands are unsupported, no new operand configuration are added for now - Added lit tests Signed-off-by: Ilya Veselov <iveselov.nn@gmail.com>1 年前
[AMD] Add pass to convert tt.load/tt.store to buffer operations (#4966) This PR is only introducing a ttgir pass to convert tt.load/tt.store to amdgpu.buffer_load/amdgpu.buffer_load, _when this is possible_ : this means we need to check for 3 conditions: 1. The pointer arithmetic has been canonicalized (scalarPtr->splat->addptr->load/store) 2. The offsets are 32-bits 3. The offsets are non-negative. We use a mix of analysis and assumptions to verify this condition Right now the functionality is gated behind an AMDGCN_USE_BUFFER_OPS, which now also covers the pointer canonicalization pass which is mostly meant to handle this. 1 年前
[release/3.2.x] [CHERRY PICK] Add gfx950 target definition (#5452) This PR brings in required LLVM bumps and additional targets for gfx950 support. - https://github.com/triton-lang/triton/pull/5040 - https://github.com/triton-lang/triton/pull/5064 - https://github.com/triton-lang/triton/pull/5180 - https://github.com/triton-lang/triton/pull/5242 - https://github.com/triton-lang/triton/pull/5392 Reverts: - #5347 - #51911 年前
[AMD] Fix kwidth parsing and printing (#3676) This PR fixes parsing and printing of kWidth attribute for MFMA and WMMA layouts. --------- Co-authored-by: Lei Zhang <antiagainst@gmail.com>2 年前
[AMD] unrevert #4901; revert #4823 (#4920) 1 年前
[release/3.2.x] [CHERRY PICK] [AMD] Fix issue with rank=1 in tryFitCvtIntoLDS (#5084) (#5453) Co-authored-by: Sam Ginzburg <ginzburg@fb.com> (cherry picked from commit f9cdf58276a42b31151debf219c2de1471ed4422) Co-authored-by: Samuel Ginzburg <ginzburg@meta.com>1 年前