文件最后提交记录最后更新时间
Add support for padding option to TMA loads (#7993) Closes #7364 builds on top of #7364 from @jhapradip and addresses remaining comments, as well as implements thepadding option in the fallback RewriteTensorDescriptorToPointer path. - support for passing padding = "nan" on TMA descriptor creation for both host and device TMAs - forwards this argument down to tma descriptor creation - implement the NaN other value in the TMA fallback path --------- Co-authored-by: Pradip Jha <pradipjha@hotmail.com>9 个月前
[Backend] Bump to llvm/llvm-project@bc773632355b (#7881) * Switched Constant{Int|Float}Op type and value order following llvm/llvm-project@a45fda6aeba362926da6cc1b107be92dafb0d490 * Provided triple for TargetLibraryInfoImpl following llvm/llvm-project@c91cbafad2119cace85499e8d231b8e5737f3b41 * Fixed atomic sync scope for NVIDIA following llvm/llvm-project@0f1b16dd5f83fd931ecb111bb925ac9e1d56f589 * Updated MLIR lib names following llvm/llvm-project@e68a20e0b7623738d6af736d3aa02625cba6126a * Updated nvvm.stmatrix op following llvm/llvm-project@2b27377b0bf72e4524774dedf4b03521b07606d5 * Updated ROCDL::Mbcnt{Lo|Hi}Op following llvm/llvm-project@bbe3d64b39d80c2d6132fbad6008b2a6e86fd4d5 Closes https://github.com/triton-lang/triton/pull/7413 Closes https://github.com/triton-lang/triton/pull/7575 Closes https://github.com/triton-lang/triton/pull/7765 --------- Co-authored-by: Yi Qian <yi.qian@amd.com> Co-authored-by: Thomas Raoux <thomas.raoux@openai.com>9 个月前
[AMD] Add fast_tanhf to libdevice (#7780) This PR added fast_tanhf operator under libdevice for AMD hardwares. This PR reused the same pass for fast_expf to implement fast_tanhf. A optimized tanh use below formulation: tanh(X) = (fast_expf(2X) - 1) / (fast_expf(2X) + 1)9 个月前
[AMD] Enable dot_scaled on gfx11 (#7954) I saw some occasional test failures, but after disabling True16, tests seem to pass fine. --------- Co-authored-by: Paul Trojahn <paul.trojahn@amd.com>9 个月前
Reland "byte permutes in intra-warp layout conversion" (#7933) Reland https://github.com/triton-lang/triton/pull/7809, https://github.com/triton-lang/triton/pull/7825, https://github.com/triton-lang/triton/pull/7861 Add a workaround for ptxas bug and add a regression test9 个月前
[AMD] Support ExtractSliceOp for AxisInfo (#7094) This commit updates AxisInfo to support backend callbacks to enable recognizing backend ops. One can use ExtractSliceOp to slice tensors of pointers to refine tt.load or tt.store. The TritonAMDGPUConvertToBufferOpsBase will fail to perform negativity analysis due to the presence of ExtractSliceOp which after rewrites is going to slice tensors of offsets. This PR addresses the issue.11 个月前
[AMD] Add needed profile_scratch to AOT compile (#7776) Needed after https://github.com/triton-lang/triton/pull/72589 个月前
[BUILD] Speedup unittests linking (#6796) Comparison of the sizes of build artifacts: Old: ```bash ls build/cmake.linux-x86_64-cpython-3.10/unittest/Dialect/TritonGPU/ -lh total 1.8G drwxr-sr-x 6 jovyan users 4.0K May 11 13:29 CMakeFiles -rw-r--r-- 1 jovyan users 3.3K May 12 21:16 CTestTestfile.cmake -rwxr-xr-x 1 jovyan users 440M May 12 21:18 Dialect -rw-r--r-- 1 jovyan users 387 May 12 21:16 'Dialect[1]_include.cmake' -rw-r--r-- 1 jovyan users 18K May 12 21:18 'Dialect[1]_tests.cmake' -rwxr-xr-x 1 jovyan users 438M May 12 21:18 DumpLayoutTest -rw-r--r-- 1 jovyan users 415 May 12 21:16 'DumpLayoutTest[1]_include.cmake' -rw-r--r-- 1 jovyan users 2.3K May 12 21:18 'DumpLayoutTest[1]_tests.cmake' -rwxr-xr-x 1 jovyan users 446M May 12 21:18 LinearLayoutConversions -rw-r--r-- 1 jovyan users 451 May 12 21:16 'LinearLayoutConversions[1]_include.cmake' -rw-r--r-- 1 jovyan users 60K May 12 21:18 'LinearLayoutConversions[1]_tests.cmake' -rwxr-xr-x 1 jovyan users 439M May 12 21:18 TestSwizzling -rw-r--r-- 1 jovyan users 411 May 12 21:16 'TestSwizzling[1]_include.cmake' -rw-r--r-- 1 jovyan users 8.6K May 12 21:18 'TestSwizzling[1]_tests.cmake' -rw-r--r-- 1 jovyan users 1.7K May 12 21:16 cmake_install.cmake ``` New: ```bash ls build/cmake.linux-x86_64-cpython-3.10/unittest/Dialect/TritonGPU/ -lh total 853M drwxr-sr-x 6 jovyan users 4.0K May 11 13:29 CMakeFiles -rw-r--r-- 1 jovyan users 3.3K May 12 21:39 CTestTestfile.cmake -rwxr-xr-x 1 jovyan users 217M May 12 21:39 Dialect -rw-r--r-- 1 jovyan users 387 May 12 21:39 'Dialect[1]_include.cmake' -rw-r--r-- 1 jovyan users 18K May 12 21:39 'Dialect[1]_tests.cmake' -rwxr-xr-x 1 jovyan users 210M May 12 21:39 DumpLayoutTest -rw-r--r-- 1 jovyan users 415 May 12 21:39 'DumpLayoutTest[1]_include.cmake' -rw-r--r-- 1 jovyan users 2.3K May 12 21:39 'DumpLayoutTest[1]_tests.cmake' -rwxr-xr-x 1 jovyan users 217M May 12 21:39 LinearLayoutConversions -rw-r--r-- 1 jovyan users 451 May 12 21:39 'LinearLayoutConversions[1]_include.cmake' -rw-r--r-- 1 jovyan users 60K May 12 21:39 'LinearLayoutConversions[1]_tests.cmake' -rwxr-xr-x 1 jovyan users 210M May 12 21:39 TestSwizzling -rw-r--r-- 1 jovyan users 411 May 12 21:39 'TestSwizzling[1]_include.cmake' -rw-r--r-- 1 jovyan users 8.6K May 12 21:39 'TestSwizzling[1]_tests.cmake' -rw-r--r-- 1 jovyan users 1.7K May 12 21:39 cmake_install.cmake ``` Signed-off-by: Anatoly Myachev <anatoly.myachev@intel.com>1 年前
[AMD][Build] Fix build issue with AMD lld (#7608) The PR https://github.com/triton-lang/triton/pull/7548 requires the use of LLD as part of LLVM (as opposed to the system LDD), which causes the following error when - (1) ldd is not built, or - (2) the location of the LLD_DIR is not specified correctly: ``` CMake Error at third_party/amd/CMakeLists.txt:6 (find_package): Could not find a package configuration file provided by "LLD" with any of the following names: LLDConfig.cmake lld-config.cmake ``` To fix this issue, this PR made the following changes: - For (1), build lld in scripts/build-llvm-project.sh, which is invoked by make dev-install-llvm. The script was added in README in https://github.com/triton-lang/triton/pull/6709, and building ldd was added in https://github.com/triton-lang/triton/pull/6049. - For (2), make sure that LLVM_BUILD_PATH is an absolute path in Makefile, so LLD_DIR and MLIR_DIR are interpereted correctly. Otherwise CMake has hard time finding the relative location with CMakeLists.txt in a subdirectory. - Introduce LLD_DIR so we don't write "${MLIR_DIR}/../lld". 10 个月前