triton-ascend/third_party/nvidia/include · Ascend/triton-ascend - AtomGit

Jjeshdfix(triton):fix conflict in python/setu

文件	最后提交记录	最后更新时间
Dialect	[WS] reorder partition-loops and lower-aref (#7927) * Reorder `partition-loops` and `lower-aref` passes * Split `lower-aref` to `lower-aref` and `assign-stage-phase` passes * the split separates concerns, as `assign-stage-phase` is more complex pass and testing/debugging can be focused on correctness stage/phase assignment w/o added complexity of `aref->mbarrier` lowering. * `assign-stage-phase` uses enterOp token to assign same stage variable that enterOp uses to exitOp, instead of previously having separate stage for enterOps/exitOps. * `lower-aref` testing verifies correctness of `aref->mbarrier` lowerings * in `load-mma-specialization` don't place final `waitOp` inside ws-region, revert to original behavior before https://github.com/triton-lang/triton/pull/7757, as that change causes perf regression with this pR. Keeping ws.tag to differentiate partitions in different loops as it will be relied upon in `aref-tmem-insertion` (WIP). This is prep PR needed for `aref-tmem-insertion` WIP (will be submitted after this one) # New contributor declaration - [x] I am not making a trivial change, such as fixing a typo in a comment. - [x] I have written a PR description following these [rules](https://cbea.ms/git-commit/#why-not-how). - [x] I have run `pre-commit run --from-ref origin/main --to-ref HEAD`. - Select one of the following. - [x] I have added tests. - `/test` for `lit` tests - `/unittest` for C++ tests - `/python/test` for end-to-end tests - [ ] This PR does not need a test because `FILL THIS IN`. - Select one of the following. - [ ] I have not added any `lit` tests. - [x] The `lit` tests I have added follow these [best practices](https://mlir.llvm.org/getting_started/TestingGuide/#filecheck-best-practices), including the "tests should be minimal" section. (Usually running Python code and using the instructions it generates is not minimal.)	9 个月前
NVGPUToLLVM	[NFC] Remove remaining uses of deprecated `GEN_PASS_CLASSES` (#6922) Continuation of #6898, #6785, #3971 --------- Signed-off-by: Anatoly Myachev <anatoly.myachev@intel.com>	1 年前
TritonNVIDIAGPUToLLVM	[LAYOUTS] Implement generalized swizzling for convert_layout (#7565) We generalize the swizzling algorithm to consider the instructions `ldmatrix/stmatrix` and their transpose versions. To do this, we now require having a dedicated allocator for nvidia, as the required shmem for a convert_layout will now depend on the instructions we can emit. After cleaning up the stmatrix path from the common `convert_layout` lowering, it became clear that we always take the swizzling path. I changed the allocator to reflect this, and I had to change a ton of tests that used it and now don't require padding. We also implement an improved lowering for the indexing of `ldmatrix/stmatrix` following the optimisations from `ld.shared/st.shared`.	9 个月前
CMakeLists.txt	[backend][nvidia] move NVGPU dialect to third_party/nvidia/ (#3773) This commit moves the `NVGPU` dialect to the `third_party/nvidia` directory. To make it happen we need to clear some unnecessary dependency on `NVGPU` dialect in the AMD backend and move some utilities into the `third_party/nvidia` directory too. Overall this gives us a better structure and creates a stricter boundary between core and backends.	2 年前
cublas_instance.h	[BENCH] Various fixes on `bench_mlp.py` (#7926)	9 个月前
cublas_types.h	Make cublas_types.h standalone (#6383) The header relies on these libraries. Include them so it can be compiled on its own for cleanliness.	1 年前