文件最后提交记录最后更新时间
[PYTHON] Added frontend to print sass using turingas disasm.py (#109) 4 年前
[FRONTEND] use standard plugin interface for CUDA (#2887) 2 年前
[TOOLS] Add support for autotuning AOT kernel (#2123) This PR makes the following change to AOT kernel - Allow the client to generate AOT kernels with different sets of constexprs and meta-parameters. Each combination of constexpr set and meta-parameters is referred to an "algo". Within an algo client can still give different hints about integer arguments. - Add a API int ${kernle_name}_get_num_algos() that returns the total number of algos. - Add a algo_id to allow client to the generated kernel to select the algo - Remove gX, gY and gZ from the kernel parameter list. This is because the launch grid is usually different with different algos, and the client should not need to care about how to compute the launch grid for each algo. Instead, we ask the client to pass the expression of computing gX, gY and gZ for compile.py (when AOT kernels are generated). The expression can only use kernel parameter or const values. - We also change the testing flow. Now we first build the kernels into a shared library libkernel.so, then the client test.c code is built and link with libkernel.so. This is closer to a typical AOT kernel usage flow.2 年前
[TOOLS] Add support for autotuning AOT kernel (#2123) This PR makes the following change to AOT kernel - Allow the client to generate AOT kernels with different sets of constexprs and meta-parameters. Each combination of constexpr set and meta-parameters is referred to an "algo". Within an algo client can still give different hints about integer arguments. - Add a API int ${kernle_name}_get_num_algos() that returns the total number of algos. - Add a algo_id to allow client to the generated kernel to select the algo - Remove gX, gY and gZ from the kernel parameter list. This is because the launch grid is usually different with different algos, and the client should not need to care about how to compute the launch grid for each algo. Instead, we ask the client to pass the expression of computing gX, gY and gZ for compile.py (when AOT kernels are generated). The expression can only use kernel parameter or const values. - We also change the testing flow. Now we first build the kernels into a shared library libkernel.so, then the client test.c code is built and link with libkernel.so. This is closer to a typical AOT kernel usage flow.2 年前
Refactor compiler specializations to consider backend (#4734) In this PR I am trying to refactor the specializations that we apply to the signature of a given function in Triton. Basically, given a kernel there are some argument properties that can help compilation. E.g., divisibility by 16 and the fact that an integer is equal to 1. In a previous PR: https://github.com/triton-lang/triton/pull/4716, I needed other specializations to add buffer support in the AMD backend (and get back some performance when we were using unaligned masked loads). So this is my attempt to redesign the specialization support to introduce per-backend specializations. The idea is that AttrsDescriptor is now the class that is taking care of doing the analysis of the parameters and adding the specialization. It also has a function table where more specializations can be added per-backend.1 年前
[Tools] Fix get_sass and add a test for it (#4146) This makes it easy to query the sass for a given kernel.1 年前
[Frontend][Backend] Add device-side tma descriptor update API (#4633) This adds two new triton IR operators: 1. ExperimentalTensormapCreateOp which creates a descriptor and stores it in global memory 2. ExperimentalTensormapFenceproxyAcquireOp which produces the required fence to use the updated descriptor I then use these to expose 3 new functions in tl.extra.cuda. 1. experimental_device_tensormap_create1d 2. experimental_device_tensormap_create2d 3. experimental_tensormap_fenceproxy_acquire which match up with the existing host-side tensormap creation API.1 年前
fix(bug): allow using npu-smi info to get the device info Co-authored-by: 刘风昇<liufengsheng2@huawei.com> # message auto-generated for no-merge-commit merge: !1338 merge acl2 into main fix(bug): allow using npu-smi info to get the device info Created-by: meloliu12327 Commit-by: 刘风昇 Merged-by: ascend-robot Description: 添加使用npu-smi info获取硬件型号信息的方式 The core Triton is a small number of people, and we receive many PRs (thank you!). To help us review your code more quickly, **if you are a new contributor (less than 3 PRs merged) we ask that you complete the following tasks and include the filled-out checklist in your PR description.** Complete the following tasks before sending your PR, and replace [ ] with [x] to indicate you have done them. - [ ] I am not making a trivial change, such as fixing a typo in a comment. - [ ] I have written a PR description following these [rules](https://cbea.ms/git-commit/#why-not-how). - [ ] I have run pre-commit run --from-ref origin/main --to-ref HEAD. - Select one of the following. - [ ] I have added tests. - /test for lit tests - /unittest for C++ tests - /python/test for end-to-end tests - [ ] This PR does not need a test because FILL THIS IN. - Select one of the following. - [ ] I have not added any lit tests. - [ ] The lit tests I have added follow these [best practices](https://mlir.llvm.org/getting_started/TestingGuide/#filecheck-best-practices), including the "tests should be minimal" section. (Usually running Python code and using the instructions it generates is not minimal.) See merge request: Ascend/triton-ascend!13382 个月前
fix make_get_num_algos_def bug in link.py (#3330) 2 年前