triton-ascend/test · wutianyao/triton-ascend - AtomGit

Ccandyhongfix: sync with upstream commit 523a1b2

文件	最后提交记录	最后更新时间
Analysis	[Backend] Fix various issues with smem base offsets (#7949) Stacked PRs: * __->__#7949 --- --- --- ### [Backend] Fix various issues with smem base offsets Many ops were using `smemObj.getBase()` without applying the offsets to get the part of smem to read to write. This is incorrect since the offsets may be nonzero. This PR fixes as many of these issues I could identify and adds tests for this.	9 个月前
Conversion	[BACKEND] Add bar.sync before deallocating tmem (#7994) Without a barrier some warp may deallocate tmem while it is still in use causing some other block to override it.	9 个月前
Gluon	[Gluon] Fix auto_encoding for ops which may infer multiple layouts (#7718) In the failing example we have: ```mlir %0 = tt.make_range {end = 8192 : i32, start = 0 : i32} : tensor<8192xi32, #gluon.auto_encoding> %1 = tt.reshape %0 : tensor<8192xi32, #gluon.auto_encoding> -> tensor<64x128xi32, #gluon.auto_encoding> %2 = gluon.set_auto_layout %1 : tensor<64x128xi32, #gluon.auto_encoding> -> tensor<64x128xi32, #blocked> ``` which currently fails with the error: ```python /root/code/triton/test.py:43:52: error: 'tt.reshape' op Found conflicting encodings for value gl.arange(0, BLOCK_M * BLOCK_N).reshape((BLOCK_M, BLOCK_N)), ``` The issue is that we propagate the blocked layout backwards to get a linear layout for the `make_range` result, then the algorithm propagates that layout forward to the `reshape` result. However, it infers a linear layout and errors because it conflicts with the original blocked layout. I fix this by setting a `mayVary` flag when an encoding comes from an inference result that isn't the only possibility. I then have special rules that resolve conflicts where one or more of the encodings is allowed to vary.	9 个月前
Hopper	[Dialect] Actually enable TMEM layout check and fix all the tests (#7723) I also removed unnecessary `noinline=false` from the tests.	9 个月前
LLVMIR	[BACKEND] Add LLVM pre-processing pass to break struct types (#2285) Add infrastructure to be able to add and test custom LLVM passes in the backend. This will allow use to apply some low level optimizations and cleanup on LLVM IR. Add a first pass that breaks up phi of struct created by lowering to LLVM. Those can often pessimise the optimizer as it would block optimizations going through phi nodes.	2 年前
NVWS	[WS] reorder partition-loops and lower-aref (#7927) * Reorder `partition-loops` and `lower-aref` passes * Split `lower-aref` to `lower-aref` and `assign-stage-phase` passes * the split separates concerns, as `assign-stage-phase` is more complex pass and testing/debugging can be focused on correctness stage/phase assignment w/o added complexity of `aref->mbarrier` lowering. * `assign-stage-phase` uses enterOp token to assign same stage variable that enterOp uses to exitOp, instead of previously having separate stage for enterOps/exitOps. * `lower-aref` testing verifies correctness of `aref->mbarrier` lowerings * in `load-mma-specialization` don't place final `waitOp` inside ws-region, revert to original behavior before https://github.com/triton-lang/triton/pull/7757, as that change causes perf regression with this pR. Keeping ws.tag to differentiate partitions in different loops as it will be relied upon in `aref-tmem-insertion` (WIP). This is prep PR needed for `aref-tmem-insertion` WIP (will be submitted after this one) # New contributor declaration - [x] I am not making a trivial change, such as fixing a typo in a comment. - [x] I have written a PR description following these [rules](https://cbea.ms/git-commit/#why-not-how). - [x] I have run `pre-commit run --from-ref origin/main --to-ref HEAD`. - Select one of the following. - [x] I have added tests. - `/test` for `lit` tests - `/unittest` for C++ tests - `/python/test` for end-to-end tests - [ ] This PR does not need a test because `FILL THIS IN`. - Select one of the following. - [ ] I have not added any `lit` tests. - [x] The `lit` tests I have added follow these [best practices](https://mlir.llvm.org/getting_started/TestingGuide/#filecheck-best-practices), including the "tests should be minimal" section. (Usually running Python code and using the instructions it generates is not minimal.)	9 个月前
Proton	[LAYOUTS] Implement toLinearLayout for TensorMemoryEncodingAttr (#7748) We do so by modelling M/N as describing elements and not the hardware 32bit registers. This allows us to avoid the issue of having two elements pointing to the same register when `unpacked=False`. We also tighten the `MemDescType` verifier and the `TensorMemoryEncodingAttr` verifier to be consistent with the definition we are using. Doing this makes us having to update a ton of lit tests that were silently wrong...	9 个月前
Tools	[AMD] NFC: Drop version minor for AMD MFMA layout (#7285) AMD's MFMA layout does not need version minor information like NVIDIA. It always defaults to 0 in the current codebase. The PR drops version minor and change to a single `version` parameter for MFMA layout.	11 个月前
Triton	Add support for padding option to TMA loads (#7993) Closes #7364 builds on top of #7364 from @jhapradip and addresses remaining comments, as well as implements thepadding option in the fallback RewriteTensorDescriptorToPointer path. - support for passing padding = "nan" on TMA descriptor creation for both host and device TMAs - forwards this argument down to tma descriptor creation - implement the NaN other value in the TMA fallback path --------- Co-authored-by: Pradip Jha <pradipjha@hotmail.com>	9 个月前
TritonGPU	[AMD] Enable dot_scaled on gfx11 (#7954) I saw some occasional test failures, but after disabling True16, tests seem to pass fine. --------- Co-authored-by: Paul Trojahn <paul.trojahn@amd.com>	9 个月前
TritonNvidiaGPU	[Backend] Fix various issues with smem base offsets (#7949) Stacked PRs: * __->__#7949 --- --- --- ### [Backend] Fix various issues with smem base offsets Many ops were using `smemObj.getBase()` without applying the offsets to get the part of smem to read to write. This is incorrect since the offsets may be nonzero. This PR fixes as many of these issues I could identify and adds tests for this.	9 个月前
include	[AMD] Support ExtractSliceOp for AxisInfo (#7094) This commit updates AxisInfo to support backend callbacks to enable recognizing backend ops. One can use `ExtractSliceOp` to slice tensors of pointers to refine `tt.load` or `tt.store`. The `TritonAMDGPUConvertToBufferOpsBase` will fail to perform negativity analysis due to the presence of `ExtractSliceOp` which after rewrites is going to slice tensors of offsets. This PR addresses the issue.	11 个月前
lib	[PROTON] Intra kernel profiling (#7258) ### Instrumentation & Runtime - Introduce a dedicated instrumentation mode - `proton.start(..., mode="instrumentation", ...)` - Introduce both high- and low- level scope APIs - For Gluon DSL: `pl.scope`, `pl.enter_scope`, and `pl.exit_scope`. Profiling API for Triton DSL is disabled by default. - For TTGIR: `proton.record start` and `proton.record end` - Inject profiling buffers for each triton kernel at codegen time and pass them to the proton runtime so kernels can push data directly from the device to the host ### Proton Dialect & Lowering - Add Proton → ProtonGPU → LLVM pipelines, including passes for shared-memory allocation, profile scratch allocation, and a few optimizations for reduced overhead or improved accuracy. ### Tracing - `proton.start(..., data="trace", ...)` is supported for both fine- and coarse-grained events. --------- Co-authored-by: Yuanwei Fang <fywkevin@gmail.com> Co-authored-by: Yuanwei Fang <fywkevin@fb.com> Co-authored-by: Corbin Robeck <corbin.robeck@amd.com> Co-authored-by: peterbell10 <peterbell10@openai.com> Co-authored-by: Corbin Robeck <corbin.robeck@gmail.com> Co-authored-by: Corbin Robeck <robeck@meta.com> Co-authored-by: robeck <robeck@devgpu284.prn2.facebook.com> Co-authored-by: Srivatsan Ramesh <srivatsan-ramesh@users.noreply.github.com> Co-authored-by: Shawn Zhong <github@shawnzhong.com> Co-authored-by: Shawn Zhong <shawnzhong@fb.com> Co-authored-by: 鐘天楽 <a844379248@icloud.com>	9 个月前
CMakeLists.txt	Fix lit subfolder tests failing when run via ninja check-triton-lit-tests-<folder> (#7966) ### Summary This PR fixes an issue where running a specific lit test folder (e.g., `Conversion`) using `ninja check-triton-lit-tests-<folder>` fails with the error: ``` (triton) zzl@gtx1660:~/triton-project/triton/build/cmake.linux-x86_64-cpython-3.10$ ninja check-triton-lit-tests-conversion -v [0/1] cd /home/zzl/triton-project/triton/build/cmake.linux-x86_64-cpython-3.10/test && /home/zzl/triton-project/triton/.venv/bin/python3 /home/zzl/triton-project/triton/.venv/bin/lit -sv /home/zzl/triton-project/triton/test/Conversion lit: /home/zzl/triton-project/triton/.venv/lib/python3.10/site-packages/lit/TestingConfig.py:152: fatal: unable to parse config file '/home/zzl/triton-project/triton/test/lit.cfg.py', traceback: Traceback (most recent call last): File "/home/zzl/triton-project/triton/.venv/lib/python3.10/site-packages/lit/TestingConfig.py", line 140, in load_from_path exec(compile(data, path, "exec"), cfg_globals, None) File "/home/zzl/triton-project/triton/test/lit.cfg.py", line 17, in <module> config.test_format = lit.formats.ShTest(not llvm_config.use_lit_shell) AttributeError: 'NoneType' object has no attribute 'use_lit_shell' FAILED: test/CMakeFiles/check-triton-lit-tests-conversion /home/zzl/triton-project/triton/build/cmake.linux-x86_64-cpython-3.10/test/CMakeFiles/check-triton-lit-tests-conversion cd /home/zzl/triton-project/triton/build/cmake.linux-x86_64-cpython-3.10/test && /home/zzl/triton-project/triton/.venv/bin/python3 /home/zzl/triton-project/triton/.venv/bin/lit -sv /home/zzl/triton-project/triton/test/Conversion ninja: build stopped: subcommand failed. ``` The issue is that llvm_config ends up being NoneType, because ninja check is executed in ${CMAKE_CURRENT_SOURCE_DIR} instead of ${CMAKE_CURRENT_BINARY_DIR}. As a result, lit.site.cfg.py is not invoked, which is supposed to call lit.llvm.initialize and properly initialize llvm_config. ### Changes • Run tests from the binary directory in TRITON-LIT-TESTS. • Explicitly set MAIN_CONFIG to lit.site.cfg.py so that llvm_config is initialized correctly. ### Related Issue Fixes #7965 <!--- The core Triton is a small number of people, and we receive many PRs (thank you!). To help us review your code more quickly, if you are a new contributor (less than 3 PRs merged) we ask that you complete the following tasks and include the filled-out checklist in your PR description. Complete the following tasks before sending your PR, and replace `[ ]` with `[x]` to indicate you have done them. --> # New contributor declaration - [x] I am not making a trivial change, such as fixing a typo in a comment. - [x] I have written a PR description following these [rules](https://cbea.ms/git-commit/#why-not-how). - [x] I have run `pre-commit run --from-ref origin/main --to-ref HEAD`. - Select one of the following. - [ ] I have added tests. - `/test` for `lit` tests - `/unittest` for C++ tests - `/python/test` for end-to-end tests - [x] This PR does not need a test because `FILL THIS IN`. - Select one of the following. - [x] I have not added any `lit` tests. - [ ] The `lit` tests I have added follow these [best practices](https://mlir.llvm.org/getting_started/TestingGuide/#filecheck-best-practices), including the "tests should be minimal" section. (Usually running Python code and using the instructions it generates is not minimal.)	9 个月前
lit.cfg.py	Add `llc` to the lit tool configuration (#7992) The triton lit tests leverage several llvm & mlir tools such as opt, mlir-translate and llc. These are generally configured to come directly from the llvm build that triton is using, but llc is not. This means that right now the llc version on the system will be used instead and if it doesn't match the rest of the tools, we can end up with failures. An example failure is using an older version than the llvm build and missing features (such as `ptx83`). To fix this issue, configure lit to use the `llc` directly from the llvm build.	9 个月前
lit.site.cfg.py.in	[BUILD] Some CMake cleanup/modernisation (#5271) - Prefer `find_package` over ad-hoc variable passing - Prefer `target_` api vs global `_directories` apis - Use `target_link_options` to specify link options instead of `target_link_libraries` Closes #5270	1 年前