文件最后提交记录最后更新时间
[Backend] Fix various issues with smem base offsets (#7949) Stacked PRs: * __->__#7949 --- --- --- ### [Backend] Fix various issues with smem base offsets Many ops were using smemObj.getBase() without applying the offsets to get the part of smem to read to write. This is incorrect since the offsets may be nonzero. This PR fixes as many of these issues I could identify and adds tests for this.9 个月前
[BACKEND] Add bar.sync before deallocating tmem (#7994) Without a barrier some warp may deallocate tmem while it is still in use causing some other block to override it.9 个月前
[Gluon] Fix auto_encoding for ops which may infer multiple layouts (#7718) In the failing example we have: ```mlir %0 = tt.make_range {end = 8192 : i32, start = 0 : i32} : tensor<8192xi32, #gluon.auto_encoding> %1 = tt.reshape %0 : tensor<8192xi32, #gluon.auto_encoding> -> tensor<64x128xi32, #gluon.auto_encoding> %2 = gluon.set_auto_layout %1 : tensor<64x128xi32, #gluon.auto_encoding> -> tensor<64x128xi32, #blocked> ``` which currently fails with the error: ```python /root/code/triton/test.py:43:52: error: 'tt.reshape' op Found conflicting encodings for value gl.arange(0, BLOCK_M * BLOCK_N).reshape((BLOCK_M, BLOCK_N)), ``` The issue is that we propagate the blocked layout backwards to get a linear layout for the make_range result, then the algorithm propagates that layout forward to the reshape result. However, it infers a linear layout and errors because it conflicts with the original blocked layout. I fix this by setting a mayVary flag when an encoding comes from an inference result that isn't the only possibility. I then have special rules that resolve conflicts where one or more of the encodings is allowed to vary.9 个月前
[Dialect] Actually enable TMEM layout check and fix all the tests (#7723) I also removed unnecessary noinline=false from the tests.9 个月前
[BACKEND] Add LLVM pre-processing pass to break struct types (#2285) Add infrastructure to be able to add and test custom LLVM passes in the backend. This will allow use to apply some low level optimizations and cleanup on LLVM IR. Add a first pass that breaks up phi of struct created by lowering to LLVM. Those can often pessimise the optimizer as it would block optimizations going through phi nodes.2 年前
[WS] reorder partition-loops and lower-aref (#7927) * Reorder partition-loops and lower-aref passes * Split lower-aref to lower-aref and assign-stage-phase passes * the split separates concerns, as assign-stage-phase is more complex pass and testing/debugging can be focused on correctness stage/phase assignment w/o added complexity of aref->mbarrier lowering. * assign-stage-phase uses enterOp token to assign same stage variable that enterOp uses to exitOp, instead of previously having separate stage for enterOps/exitOps. * lower-aref testing verifies correctness of aref->mbarrier lowerings * in load-mma-specialization don't place final waitOp inside ws-region, revert to original behavior before https://github.com/triton-lang/triton/pull/7757, as that change causes perf regression with this pR. Keeping ws.tag to differentiate partitions in different loops as it will be relied upon in aref-tmem-insertion (WIP). This is prep PR needed for aref-tmem-insertion WIP (will be submitted after this one) # New contributor declaration - [x] I am not making a trivial change, such as fixing a typo in a comment. - [x] I have written a PR description following these [rules](https://cbea.ms/git-commit/#why-not-how). - [x] I have run pre-commit run --from-ref origin/main --to-ref HEAD. - Select one of the following. - [x] I have added tests. - /test for lit tests - /unittest for C++ tests - /python/test for end-to-end tests - [ ] This PR does not need a test because FILL THIS IN. - Select one of the following. - [ ] I have not added any lit tests. - [x] The lit tests I have added follow these [best practices](https://mlir.llvm.org/getting_started/TestingGuide/#filecheck-best-practices), including the "tests should be minimal" section. (Usually running Python code and using the instructions it generates is not minimal.)9 个月前
[LAYOUTS] Implement toLinearLayout for TensorMemoryEncodingAttr (#7748) We do so by modelling M/N as describing elements and not the hardware 32bit registers. This allows us to avoid the issue of having two elements pointing to the same register when unpacked=False. We also tighten the MemDescType verifier and the TensorMemoryEncodingAttr verifier to be consistent with the definition we are using. Doing this makes us having to update a ton of lit tests that were silently wrong...9 个月前
[AMD] NFC: Drop version minor for AMD MFMA layout (#7285) AMD's MFMA layout does not need version minor information like NVIDIA. It always defaults to 0 in the current codebase. The PR drops version minor and change to a single version parameter for MFMA layout.11 个月前
Add support for padding option to TMA loads (#7993) Closes #7364 builds on top of #7364 from @jhapradip and addresses remaining comments, as well as implements thepadding option in the fallback RewriteTensorDescriptorToPointer path. - support for passing padding = "nan" on TMA descriptor creation for both host and device TMAs - forwards this argument down to tma descriptor creation - implement the NaN other value in the TMA fallback path --------- Co-authored-by: Pradip Jha <pradipjha@hotmail.com>9 个月前
[AMD] Enable dot_scaled on gfx11 (#7954) I saw some occasional test failures, but after disabling True16, tests seem to pass fine. --------- Co-authored-by: Paul Trojahn <paul.trojahn@amd.com>9 个月前
[Backend] Fix various issues with smem base offsets (#7949) Stacked PRs: * __->__#7949 --- --- --- ### [Backend] Fix various issues with smem base offsets Many ops were using smemObj.getBase() without applying the offsets to get the part of smem to read to write. This is incorrect since the offsets may be nonzero. This PR fixes as many of these issues I could identify and adds tests for this.9 个月前
[AMD] Support ExtractSliceOp for AxisInfo (#7094) This commit updates AxisInfo to support backend callbacks to enable recognizing backend ops. One can use ExtractSliceOp to slice tensors of pointers to refine tt.load or tt.store. The TritonAMDGPUConvertToBufferOpsBase will fail to perform negativity analysis due to the presence of ExtractSliceOp which after rewrites is going to slice tensors of offsets. This PR addresses the issue.11 个月前
[PROTON] Intra kernel profiling (#7258) ### Instrumentation & Runtime - Introduce a dedicated instrumentation mode - proton.start(..., mode="instrumentation", ...) - Introduce both high- and low- level scope APIs - For Gluon DSL: pl.scope, pl.enter_scope, and pl.exit_scope. Profiling API for Triton DSL is disabled by default. - For TTGIR: proton.record start and proton.record end - Inject profiling buffers for each triton kernel at codegen time and pass them to the proton runtime so kernels can push data directly from the device to the host ### Proton Dialect & Lowering - Add Proton → ProtonGPU → LLVM pipelines, including passes for shared-memory allocation, profile scratch allocation, and a few optimizations for reduced overhead or improved accuracy. ### Tracing - proton.start(..., data="trace", ...) is supported for both fine- and coarse-grained events. --------- Co-authored-by: Yuanwei Fang <fywkevin@gmail.com> Co-authored-by: Yuanwei Fang <fywkevin@fb.com> Co-authored-by: Corbin Robeck <corbin.robeck@amd.com> Co-authored-by: peterbell10 <peterbell10@openai.com> Co-authored-by: Corbin Robeck <corbin.robeck@gmail.com> Co-authored-by: Corbin Robeck <robeck@meta.com> Co-authored-by: robeck <robeck@devgpu284.prn2.facebook.com> Co-authored-by: Srivatsan Ramesh <srivatsan-ramesh@users.noreply.github.com> Co-authored-by: Shawn Zhong <github@shawnzhong.com> Co-authored-by: Shawn Zhong <shawnzhong@fb.com> Co-authored-by: 鐘天楽 <a844379248@icloud.com>9 个月前
Fix lit subfolder tests failing when run via ninja check-triton-lit-tests-<folder> (#7966) ### Summary This PR fixes an issue where running a specific lit test folder (e.g., Conversion) using ninja check-triton-lit-tests-<folder> fails with the error: ``` (triton) zzl@gtx1660:~/triton-project/triton/build/cmake.linux-x86_64-cpython-3.10$ ninja check-triton-lit-tests-conversion -v [0/1] cd /home/zzl/triton-project/triton/build/cmake.linux-x86_64-cpython-3.10/test && /home/zzl/triton-project/triton/.venv/bin/python3 /home/zzl/triton-project/triton/.venv/bin/lit -sv /home/zzl/triton-project/triton/test/Conversion lit: /home/zzl/triton-project/triton/.venv/lib/python3.10/site-packages/lit/TestingConfig.py:152: fatal: unable to parse config file '/home/zzl/triton-project/triton/test/lit.cfg.py', traceback: Traceback (most recent call last): File "/home/zzl/triton-project/triton/.venv/lib/python3.10/site-packages/lit/TestingConfig.py", line 140, in load_from_path exec(compile(data, path, "exec"), cfg_globals, None) File "/home/zzl/triton-project/triton/test/lit.cfg.py", line 17, in <module> config.test_format = lit.formats.ShTest(not llvm_config.use_lit_shell) AttributeError: 'NoneType' object has no attribute 'use_lit_shell' FAILED: test/CMakeFiles/check-triton-lit-tests-conversion /home/zzl/triton-project/triton/build/cmake.linux-x86_64-cpython-3.10/test/CMakeFiles/check-triton-lit-tests-conversion cd /home/zzl/triton-project/triton/build/cmake.linux-x86_64-cpython-3.10/test && /home/zzl/triton-project/triton/.venv/bin/python3 /home/zzl/triton-project/triton/.venv/bin/lit -sv /home/zzl/triton-project/triton/test/Conversion ninja: build stopped: subcommand failed. ``` The issue is that llvm_config ends up being NoneType, because ninja check is executed in ${CMAKE_CURRENT_SOURCE_DIR} instead of ${CMAKE_CURRENT_BINARY_DIR}. As a result, lit.site.cfg.py is not invoked, which is supposed to call lit.llvm.initialize and properly initialize llvm_config. ### Changes • Run tests from the binary directory in TRITON-LIT-TESTS. • Explicitly set MAIN_CONFIG to lit.site.cfg.py so that llvm_config is initialized correctly. ### Related Issue Fixes #7965 <!--- The core Triton is a small number of people, and we receive many PRs (thank you!). To help us review your code more quickly, **if you are a new contributor (less than 3 PRs merged) we ask that you complete the following tasks and include the filled-out checklist in your PR description.** Complete the following tasks before sending your PR, and replace [ ] with [x] to indicate you have done them. --> # New contributor declaration - [x] I am not making a trivial change, such as fixing a typo in a comment. - [x] I have written a PR description following these [rules](https://cbea.ms/git-commit/#why-not-how). - [x] I have run pre-commit run --from-ref origin/main --to-ref HEAD. - Select one of the following. - [ ] I have added tests. - /test for lit tests - /unittest for C++ tests - /python/test for end-to-end tests - [x] This PR does not need a test because FILL THIS IN. - Select one of the following. - [x] I have not added any lit tests. - [ ] The lit tests I have added follow these [best practices](https://mlir.llvm.org/getting_started/TestingGuide/#filecheck-best-practices), including the "tests should be minimal" section. (Usually running Python code and using the instructions it generates is not minimal.)9 个月前
Add llc to the lit tool configuration (#7992) The triton lit tests leverage several llvm & mlir tools such as opt, mlir-translate and llc. These are generally configured to come directly from the llvm build that triton is using, but llc is not. This means that right now the llc version on the system will be used instead and if it doesn't match the rest of the tools, we can end up with failures. An example failure is using an older version than the llvm build and missing features (such as ptx83). To fix this issue, configure lit to use the llc directly from the llvm build.9 个月前
[BUILD] Some CMake cleanup/modernisation (#5271) - Prefer find_package over ad-hoc variable passing - Prefer target_ api vs global _directories apis - Use target_link_options to specify link options instead of target_link_libraries Closes #52701 年前