文件最后提交记录最后更新时间
[BACKEND] Support ldmatrix m16n16.b8.trans (#7525) 10 个月前
[WS] reorder partition-loops and lower-aref (#7927) * Reorder partition-loops and lower-aref passes * Split lower-aref to lower-aref and assign-stage-phase passes * the split separates concerns, as assign-stage-phase is more complex pass and testing/debugging can be focused on correctness stage/phase assignment w/o added complexity of aref->mbarrier lowering. * assign-stage-phase uses enterOp token to assign same stage variable that enterOp uses to exitOp, instead of previously having separate stage for enterOps/exitOps. * lower-aref testing verifies correctness of aref->mbarrier lowerings * in load-mma-specialization don't place final waitOp inside ws-region, revert to original behavior before https://github.com/triton-lang/triton/pull/7757, as that change causes perf regression with this pR. Keeping ws.tag to differentiate partitions in different loops as it will be relied upon in aref-tmem-insertion (WIP). This is prep PR needed for aref-tmem-insertion WIP (will be submitted after this one) # New contributor declaration - [x] I am not making a trivial change, such as fixing a typo in a comment. - [x] I have written a PR description following these [rules](https://cbea.ms/git-commit/#why-not-how). - [x] I have run pre-commit run --from-ref origin/main --to-ref HEAD. - Select one of the following. - [x] I have added tests. - /test for lit tests - /unittest for C++ tests - /python/test for end-to-end tests - [ ] This PR does not need a test because FILL THIS IN. - Select one of the following. - [ ] I have not added any lit tests. - [x] The lit tests I have added follow these [best practices](https://mlir.llvm.org/getting_started/TestingGuide/#filecheck-best-practices), including the "tests should be minimal" section. (Usually running Python code and using the instructions it generates is not minimal.)9 个月前
Add an Nvidia Warp Specialization Dialect with an Async Reference Type and Operations (#6288) Per Conversations between OpenAI and Nvidia, this represents a first step in integrating some of the internally developed Warp Specialization abstractions for integration with ongoing backend work. The Aref abstraction was developed by @vinodgro, @3gx, @acollins3, @BinFan, and @chhzh123 , with design input from @masahi and myself. This PR adds the dialect for warp specialization analysis and abstractions and IR defintions for Aref. Future work will focus on lowering to standard ttg/ttng types along with higher level passes for warp specialization analysis. Thanks to @Mogball for helping me get over the last couple of humps of dialect implementation, I'd forgotten how much boilerplate is involved :sweat_smile: This also adds a new lit target for third_party/nvidia/test, I'm not sure if anything else needs to happen to get that target to run in CI.1 年前