triton-ascend/third_party/nvidia/include/Dialect · Ascend/triton-ascend - AtomGit

Jjeshdfix(triton):fix conflict in python/setu

文件	最后提交记录	最后更新时间
NVGPU	[BACKEND] Support ldmatrix m16n16.b8.trans (#7525)	10 个月前
NVWS	[WS] reorder partition-loops and lower-aref (#7927) * Reorder `partition-loops` and `lower-aref` passes * Split `lower-aref` to `lower-aref` and `assign-stage-phase` passes * the split separates concerns, as `assign-stage-phase` is more complex pass and testing/debugging can be focused on correctness stage/phase assignment w/o added complexity of `aref->mbarrier` lowering. * `assign-stage-phase` uses enterOp token to assign same stage variable that enterOp uses to exitOp, instead of previously having separate stage for enterOps/exitOps. * `lower-aref` testing verifies correctness of `aref->mbarrier` lowerings * in `load-mma-specialization` don't place final `waitOp` inside ws-region, revert to original behavior before https://github.com/triton-lang/triton/pull/7757, as that change causes perf regression with this pR. Keeping ws.tag to differentiate partitions in different loops as it will be relied upon in `aref-tmem-insertion` (WIP). This is prep PR needed for `aref-tmem-insertion` WIP (will be submitted after this one) # New contributor declaration - [x] I am not making a trivial change, such as fixing a typo in a comment. - [x] I have written a PR description following these [rules](https://cbea.ms/git-commit/#why-not-how). - [x] I have run `pre-commit run --from-ref origin/main --to-ref HEAD`. - Select one of the following. - [x] I have added tests. - `/test` for `lit` tests - `/unittest` for C++ tests - `/python/test` for end-to-end tests - [ ] This PR does not need a test because `FILL THIS IN`. - Select one of the following. - [ ] I have not added any `lit` tests. - [x] The `lit` tests I have added follow these [best practices](https://mlir.llvm.org/getting_started/TestingGuide/#filecheck-best-practices), including the "tests should be minimal" section. (Usually running Python code and using the instructions it generates is not minimal.)	9 个月前
CMakeLists.txt	Add an Nvidia Warp Specialization Dialect with an Async Reference Type and Operations (#6288) Per Conversations between OpenAI and Nvidia, this represents a first step in integrating some of the internally developed Warp Specialization abstractions for integration with ongoing backend work. The Aref abstraction was developed by @vinodgro, @3gx, @acollins3, @BinFan, and @chhzh123 , with design input from @masahi and myself. This PR adds the dialect for warp specialization analysis and abstractions and IR defintions for Aref. Future work will focus on lowering to standard ttg/ttng types along with higher level passes for warp specialization analysis. Thanks to @Mogball for helping me get over the last couple of humps of dialect implementation, I'd forgotten how much boilerplate is involved :sweat_smile: This also adds a new lit target for third_party/nvidia/test, I'm not sure if anything else needs to happen to get that target to run in CI.	1 年前