文件最后提交记录最后更新时间
[FRONTEND] Unify Interpreter and JIT Compilation Errors (#3355) 2 年前
feat(runtime/autotune): add AsyncCompileMode to support parallel compilation in autotuning Co-authored-by: Xuan Peng<pengxuan9@huawei.com> # message auto-generated for no-merge-commit merge: !1268 merge feat/async-compile-0210 into main feat(runtime/autotune): add AsyncCompileMode to support parallel compilation in autotuning Created-by: HinPeng Commit-by: Xuan Peng Merged-by: ascend-robot Description: ## PR description 1. Introduce async compile mode from triton v3.5.1 (with little modification to be compatible with current branch and torch-2.7.1) 2. Refactor autotuner to compile triton kernel in parallel ## Notice 1. Introduce MLIR_DISABLE_MULTITHREADING environment variable ahead from triton v3.5.1 2. Add TRITON_AUTOTUNE_PARALLEL_COMPILE to control whether compiling kernels in parallel in autotuner, default to '1' See merge request: Ascend/triton-ascend!12683 个月前
fix interpret bug of cast op rtz mode in overflow case Co-authored-by: zhuxuejie<zhuxuejie8@huawei.com> # message auto-generated for no-merge-commit merge: !1336 merge int_1 into main fix interpret bug of cast op rtz mode in overflow case Created-by: zhuxuejie Commit-by: zhuxuejie Merged-by: ascend-robot Description: 1、修复解释器模式下 cast op在rtz模式下上溢场景的bug,转为inf而非nan 2、新增对scope op的支持,直接pass The core Triton is a small number of people, and we receive many PRs (thank you!). To help us review your code more quickly, **if you are a new contributor (less than 3 PRs merged) we ask that you complete the following tasks and include the filled-out checklist in your PR description.** Complete the following tasks before sending your PR, and replace [ ] with [x] to indicate you have done them. - [ ] I am not making a trivial change, such as fixing a typo in a comment. - [ ] I have written a PR description following these [rules](https://cbea.ms/git-commit/#why-not-how). - [ ] I have run pre-commit run --from-ref origin/main --to-ref HEAD. - Select one of the following. - [ ] I have added tests. - /test for lit tests - /unittest for C++ tests - /python/test for end-to-end tests - [ ] This PR does not need a test because FILL THIS IN. - Select one of the following. - [ ] I have not added any lit tests. - [ ] The lit tests I have added follow these [best practices](https://mlir.llvm.org/getting_started/TestingGuide/#filecheck-best-practices), including the "tests should be minimal" section. (Usually running Python code and using the instructions it generates is not minimal.) See merge request: Ascend/triton-ascend!13363 个月前
feat(autotune): add autotune function Co-authored-by: hua_yc<huayanchun@h-partners.com> # message auto-generated for no-merge-commit merge: !1329 merge 0303 into main feat(autotune): add autotune function Created-by: hua_yc Commit-by: hua_yc Merged-by: ascend-robot Description: add autotune function The core Triton is a small number of people, and we receive many PRs (thank you!). To help us review your code more quickly, **if you are a new contributor (less than 3 PRs merged) we ask that you complete the following tasks and include the filled-out checklist in your PR description.** Complete the following tasks before sending your PR, and replace [ ] with [x] to indicate you have done them. - [ ] I am not making a trivial change, such as fixing a typo in a comment. - [ ] I have written a PR description following these [rules](https://cbea.ms/git-commit/#why-not-how). - [ ] I have run pre-commit run --from-ref origin/main --to-ref HEAD. - Select one of the following. - [ ] I have added tests. - /test for lit tests - /unittest for C++ tests - /python/test for end-to-end tests - [ ] This PR does not need a test because FILL THIS IN. - Select one of the following. - [ ] I have not added any lit tests. - [ ] The lit tests I have added follow these [best practices](https://mlir.llvm.org/getting_started/TestingGuide/#filecheck-best-practices), including the "tests should be minimal" section. (Usually running Python code and using the instructions it generates is not minimal.) See merge request: Ascend/triton-ascend!13292 个月前
[nvidia] Support passing TMA descriptors by-value (#4498) ## Motivation Currently, Triton passes TMA descriptors by-ref through global memory. This has a number of problems: * Significant launch overhead (5-10us) for the host-to-device memcpy * Users must insert fences for TMA descriptor cache flush (see https://github.com/triton-lang/triton/pull/4342). When users don't insert these fences correctly, they run into very strange bugs: https://github.com/triton-lang/triton/issues/4332 * The memcpy makes it nearly impossible to use cudagraphs There are two possible solutions: * [Pass the descriptor by-value](https://docs.nvidia.com/cuda/cuda-c-programming-guide/#using-tma-to-transfer-multi-dimensional-arrays) * [Create the descriptor on-device](https://docs.nvidia.com/cuda/cuda-c-programming-guide/#encoding-a-tensor-map-on-device) Because of the tricky memory model for TMA descriptors on H100, creating a descriptor on-device requires moving data back and forth from L2 cache. This is relatively expensive (100s of cycles at least) and requires the user or compiler to correctly insert release/acquire fences. In some cases, there is no way to avoid creating the descriptor on-device. But for many use-cases, it's perfectly fine to set up the descriptor on the host and pass by-value, avoiding both performance and correctness issues. This PR implements the by-value functionality. ## User-level API Whenever the user provides a kernel param which implements the method tma_desc_cpu_ptr(), Triton will lower that argument to a __grid_constant__ by-value param. The existing helper methods create_[1d/2d]_tma_descriptor were modified to return such a type, so existing code does not need any changes to take advantage of the new feature. ## Implementation details When a kernel param with tma_desc_cpu_ptr() is detected, we attach an attribute to that param at the TTIR level. The attribute is passed through to TTGIR. When lowering TTGIR to LLIR, we use code ported from Mosaic (https://github.com/google/jax/pull/22175) to set up the correct LLVM attributes. The runtime is also modified to pass by-value TMA descriptors properly. ## Limitations This feature is currently broken when compiling an IRSource directly (which is useful for editing IR and re-compiling). That would require updating some [regexes](https://github.com/triton-lang/triton/blob/edcc2bcb8dd2e9224c94b689df9cbb7d2986ebea/python/triton/compiler/compiler.py#L52-L53) which infer the function signature from the IR. IRSource compilation still works fine for kernels which do not use the new feature. Once the approach I'm taking here is reviewed, I plan to fix that limitation, either in this PR or in a follow-up PR.1 年前
[CACHE] Use base64 for shorter cache directories (#4553) Not sure this is worthy to make it? I was annoyed by long sha256-based cache directory names, mostly 64 chars. So I quickly added base64-based shorter cache directory names. Instead of fixing a dozen places that use hashlib.sha256, I patched the cache manager. 64-char names are mostly reduced to 43-44 chars. A comparison: ``` > % ls -l $TRITON_CACHE_DIR total 0 drwxr-xr-x 1 minjang users 40 Aug 21 19:02 44ae4aee7ef0ee0dd54e860cf44627e3b6cedabe87a228ac75988301b8a6bf60 drwxr-xr-x 1 minjang users 26 Aug 21 19:02 82dc2c9a5508bf07c72e02353c1e751dc54aae85666f139b2867b0a1e95e0e7b drwxr-xr-x 1 minjang users 226 Aug 21 19:02 b8e240968a85711ba57b17bf8450f1ffbc85a8de8cd1f47aa87b241b53f9bf60 drwxr-xr-x 1 minjang users 26 Aug 21 19:03 gtwsmlUIvwfHLgI1PB51HcVKroVmbxObKGewoeleDns drwxr-xr-x 1 minjang users 40 Aug 21 19:03 RK5K7n7w7g3VToYM9EYn47bO2r6HoiisdZiDAbimv2A drwxr-xr-x 1 minjang users 226 Aug 21 19:03 uOJAloqFcRulexe_hFDx_7yFqN6M0fR6qHskG1P5v2A ``` test_core.py runs without any errors, and the cache directory has all base64-based shorter names.1 年前
fix(copyright):Remove the Huawei copyright notices from the extension, runtime, libentry files and OpInterface.h. Co-authored-by: jeshd<chengmaofan@huawei.com> # message auto-generated for no-merge-commit merge: !1346 merge recover-community-copyright into main fix(copyright):Remove the Huawei copyright notices from the extension, runtime, libentry files and OpInterface.h. Created-by: jeshd Commit-by: jeshd Merged-by: ascend-robot Description: 描述 移除extension,runtime和libentry里的Huawei copyright,移除OpInterface.h里的Huawei copyright 修改原因 extension,runtime和libentry中的代码文件为TA新添加的文件,基于开源代码片段的修改,OpInterface.h从triton 3.4.0版本引入,移除对应的Huawei copyright See merge request: Ascend/triton-ascend!13462 个月前
[RUNTIME] Allow setting active driver (#2973) 2 年前
[FRONTEND] Revert use of | operator for uniting types (#3366) We still want to support Python 3.9. Allow writing union types as X | Y is only introduced in Python 3.10. https://peps.python.org/pep-0604/2 年前
fix interpret bug of cast op rtz mode in overflow case Co-authored-by: zhuxuejie<zhuxuejie8@huawei.com> # message auto-generated for no-merge-commit merge: !1336 merge int_1 into main fix interpret bug of cast op rtz mode in overflow case Created-by: zhuxuejie Commit-by: zhuxuejie Merged-by: ascend-robot Description: 1、修复解释器模式下 cast op在rtz模式下上溢场景的bug,转为inf而非nan 2、新增对scope op的支持,直接pass The core Triton is a small number of people, and we receive many PRs (thank you!). To help us review your code more quickly, **if you are a new contributor (less than 3 PRs merged) we ask that you complete the following tasks and include the filled-out checklist in your PR description.** Complete the following tasks before sending your PR, and replace [ ] with [x] to indicate you have done them. - [ ] I am not making a trivial change, such as fixing a typo in a comment. - [ ] I have written a PR description following these [rules](https://cbea.ms/git-commit/#why-not-how). - [ ] I have run pre-commit run --from-ref origin/main --to-ref HEAD. - Select one of the following. - [ ] I have added tests. - /test for lit tests - /unittest for C++ tests - /python/test for end-to-end tests - [ ] This PR does not need a test because FILL THIS IN. - Select one of the following. - [ ] I have not added any lit tests. - [ ] The lit tests I have added follow these [best practices](https://mlir.llvm.org/getting_started/TestingGuide/#filecheck-best-practices), including the "tests should be minimal" section. (Usually running Python code and using the instructions it generates is not minimal.) See merge request: Ascend/triton-ascend!13363 个月前
fix(jit): fix incompatibility problem caused by premature introduction Co-authored-by: luobaiqing<luobaiqing1@huawei.com> # message auto-generated for no-merge-commit merge: !1395 merge fix_bug_main into main fix(jit): fix incompatibility problem caused by premature introduction Created-by: luobaiqing Commit-by: luobaiqing Merged-by: ascend-robot Description: preload提前引入了一些3.5.1的方法,但适配有问题。此pr修复之 The core Triton is a small number of people, and we receive many PRs (thank you!). To help us review your code more quickly, **if you are a new contributor (less than 3 PRs merged) we ask that you complete the following tasks and include the filled-out checklist in your PR description.** Complete the following tasks before sending your PR, and replace [ ] with [x] to indicate you have done them. - [ ] I am not making a trivial change, such as fixing a typo in a comment. - [ ] I have written a PR description following these [rules](https://cbea.ms/git-commit/#why-not-how). - [ ] I have run pre-commit run --from-ref origin/main --to-ref HEAD. - Select one of the following. - [ ] I have added tests. - /test for lit tests - /unittest for C++ tests - /python/test for end-to-end tests - [ ] This PR does not need a test because FILL THIS IN. - Select one of the following. - [ ] I have not added any lit tests. - [ ] The lit tests I have added follow these [best practices](https://mlir.llvm.org/getting_started/TestingGuide/#filecheck-best-practices), including the "tests should be minimal" section. (Usually running Python code and using the instructions it generates is not minimal.) See merge request: Ascend/triton-ascend!13952 个月前
fix(copyright):Remove the Huawei copyright notices from the extension, runtime, libentry files and OpInterface.h. Co-authored-by: jeshd<chengmaofan@huawei.com> # message auto-generated for no-merge-commit merge: !1346 merge recover-community-copyright into main fix(copyright):Remove the Huawei copyright notices from the extension, runtime, libentry files and OpInterface.h. Created-by: jeshd Commit-by: jeshd Merged-by: ascend-robot Description: 描述 移除extension,runtime和libentry里的Huawei copyright,移除OpInterface.h里的Huawei copyright 修改原因 extension,runtime和libentry中的代码文件为TA新添加的文件,基于开源代码片段的修改,OpInterface.h从triton 3.4.0版本引入,移除对应的Huawei copyright See merge request: Ascend/triton-ascend!13462 个月前