triton-ascend/python/triton/runtime · wxlong_ustc/triton-ascend - AtomGit

ascend-robotfix(jit): fix incompatibility problem caused by premature introduction

文件	最后提交记录	最后更新时间
__init__.py	[FRONTEND] Unify Interpreter and JIT Compilation Errors (#3355)	2 年前
_async_compile.py	feat(runtime/autotune): add AsyncCompileMode to support parallel compilation in autotuning Co-authored-by: Xuan Peng<pengxuan9@huawei.com> # message auto-generated for no-merge-commit merge: !1268 merge feat/async-compile-0210 into main feat(runtime/autotune): add AsyncCompileMode to support parallel compilation in autotuning Created-by: HinPeng Commit-by: Xuan Peng Merged-by: ascend-robot Description: ## PR description 1. Introduce async compile mode from triton v3.5.1 (with little modification to be compatible with current branch and torch-2.7.1) 2. Refactor autotuner to compile triton kernel in parallel ## Notice 1. Introduce `MLIR_DISABLE_MULTITHREADING` environment variable ahead from triton v3.5.1 2. Add TRITON_AUTOTUNE_PARALLEL_COMPILE to control whether compiling kernels in parallel in autotuner, default to '1' See merge request: Ascend/triton-ascend!1268	3 个月前
ascend_interpreter.py	fix interpret bug of cast op rtz mode in overflow case Co-authored-by: zhuxuejie<zhuxuejie8@huawei.com> # message auto-generated for no-merge-commit merge: !1336 merge int_1 into main fix interpret bug of cast op rtz mode in overflow case Created-by: zhuxuejie Commit-by: zhuxuejie Merged-by: ascend-robot Description: 1、修复解释器模式下 cast op在rtz模式下上溢场景的bug,转为inf而非nan 2、新增对scope op的支持，直接pass The core Triton is a small number of people, and we receive many PRs (thank you!). To help us review your code more quickly, if you are a new contributor (less than 3 PRs merged) we ask that you complete the following tasks and include the filled-out checklist in your PR description. Complete the following tasks before sending your PR, and replace `[ ]` with `[x]` to indicate you have done them. - [ ] I am not making a trivial change, such as fixing a typo in a comment. - [ ] I have written a PR description following these [rules](https://cbea.ms/git-commit/#why-not-how). - [ ] I have run `pre-commit run --from-ref origin/main --to-ref HEAD`. - Select one of the following. - [ ] I have added tests. - `/test` for `lit` tests - `/unittest` for C++ tests - `/python/test` for end-to-end tests - [ ] This PR does not need a test because `FILL THIS IN`. - Select one of the following. - [ ] I have not added any `lit` tests. - [ ] The `lit` tests I have added follow these [best practices](https://mlir.llvm.org/getting_started/TestingGuide/#filecheck-best-practices), including the "tests should be minimal" section. (Usually running Python code and using the instructions it generates is not minimal.) See merge request: Ascend/triton-ascend!1336	3 个月前
autotuner.py	feat(autotune): add autotune function Co-authored-by: hua_yc<huayanchun@h-partners.com> # message auto-generated for no-merge-commit merge: !1329 merge 0303 into main feat(autotune): add autotune function Created-by: hua_yc Commit-by: hua_yc Merged-by: ascend-robot Description: add autotune function The core Triton is a small number of people, and we receive many PRs (thank you!). To help us review your code more quickly, if you are a new contributor (less than 3 PRs merged) we ask that you complete the following tasks and include the filled-out checklist in your PR description. Complete the following tasks before sending your PR, and replace `[ ]` with `[x]` to indicate you have done them. - [ ] I am not making a trivial change, such as fixing a typo in a comment. - [ ] I have written a PR description following these [rules](https://cbea.ms/git-commit/#why-not-how). - [ ] I have run `pre-commit run --from-ref origin/main --to-ref HEAD`. - Select one of the following. - [ ] I have added tests. - `/test` for `lit` tests - `/unittest` for C++ tests - `/python/test` for end-to-end tests - [ ] This PR does not need a test because `FILL THIS IN`. - Select one of the following. - [ ] I have not added any `lit` tests. - [ ] The `lit` tests I have added follow these [best practices](https://mlir.llvm.org/getting_started/TestingGuide/#filecheck-best-practices), including the "tests should be minimal" section. (Usually running Python code and using the instructions it generates is not minimal.) See merge request: Ascend/triton-ascend!1329	2 个月前
build.py	[nvidia] Support passing TMA descriptors by-value (#4498) ## Motivation Currently, Triton passes TMA descriptors by-ref through global memory. This has a number of problems: * Significant launch overhead (5-10us) for the host-to-device memcpy * Users must insert fences for TMA descriptor cache flush (see https://github.com/triton-lang/triton/pull/4342). When users don't insert these fences correctly, they run into very strange bugs: https://github.com/triton-lang/triton/issues/4332 * The memcpy makes it nearly impossible to use cudagraphs There are two possible solutions: * [Pass the descriptor by-value](https://docs.nvidia.com/cuda/cuda-c-programming-guide/#using-tma-to-transfer-multi-dimensional-arrays) * [Create the descriptor on-device](https://docs.nvidia.com/cuda/cuda-c-programming-guide/#encoding-a-tensor-map-on-device) Because of the tricky memory model for TMA descriptors on H100, creating a descriptor on-device requires moving data back and forth from L2 cache. This is relatively expensive (100s of cycles at least) and requires the user or compiler to correctly insert release/acquire fences. In some cases, there is no way to avoid creating the descriptor on-device. But for many use-cases, it's perfectly fine to set up the descriptor on the host and pass by-value, avoiding both performance and correctness issues. This PR implements the by-value functionality. ## User-level API Whenever the user provides a kernel param which implements the method `tma_desc_cpu_ptr()`, Triton will lower that argument to a `__grid_constant__` by-value param. The existing helper methods `create_[1d/2d]_tma_descriptor` were modified to return such a type, so existing code does not need any changes to take advantage of the new feature. ## Implementation details When a kernel param with `tma_desc_cpu_ptr()` is detected, we attach an attribute to that param at the TTIR level. The attribute is passed through to TTGIR. When lowering TTGIR to LLIR, we use code ported from Mosaic (https://github.com/google/jax/pull/22175) to set up the correct LLVM attributes. The runtime is also modified to pass by-value TMA descriptors properly. ## Limitations This feature is currently broken when compiling an `IRSource` directly (which is useful for editing IR and re-compiling). That would require updating some [regexes](https://github.com/triton-lang/triton/blob/edcc2bcb8dd2e9224c94b689df9cbb7d2986ebea/python/triton/compiler/compiler.py#L52-L53) which infer the function signature from the IR. `IRSource` compilation still works fine for kernels which do not use the new feature. Once the approach I'm taking here is reviewed, I plan to fix that limitation, either in this PR or in a follow-up PR.	1 年前
cache.py	[CACHE] Use base64 for shorter cache directories (#4553) Not sure this is worthy to make it? I was annoyed by long sha256-based cache directory names, mostly 64 chars. So I quickly added base64-based shorter cache directory names. Instead of fixing a dozen places that use `hashlib.sha256`, I patched the cache manager. 64-char names are mostly reduced to 43-44 chars. A comparison: ``` > % ls -l $TRITON_CACHE_DIR total 0 drwxr-xr-x 1 minjang users 40 Aug 21 19:02 44ae4aee7ef0ee0dd54e860cf44627e3b6cedabe87a228ac75988301b8a6bf60 drwxr-xr-x 1 minjang users 26 Aug 21 19:02 82dc2c9a5508bf07c72e02353c1e751dc54aae85666f139b2867b0a1e95e0e7b drwxr-xr-x 1 minjang users 226 Aug 21 19:02 b8e240968a85711ba57b17bf8450f1ffbc85a8de8cd1f47aa87b241b53f9bf60 drwxr-xr-x 1 minjang users 26 Aug 21 19:03 gtwsmlUIvwfHLgI1PB51HcVKroVmbxObKGewoeleDns drwxr-xr-x 1 minjang users 40 Aug 21 19:03 RK5K7n7w7g3VToYM9EYn47bO2r6HoiisdZiDAbimv2A drwxr-xr-x 1 minjang users 226 Aug 21 19:03 uOJAloqFcRulexe_hFDx_7yFqN6M0fR6qHskG1P5v2A ``` `test_core.py` runs without any errors, and the cache directory has all base64-based shorter names.	1 年前
code_cache.py	fix(copyright):Remove the Huawei copyright notices from the extension, runtime, libentry files and OpInterface.h. Co-authored-by: jeshd<chengmaofan@huawei.com> # message auto-generated for no-merge-commit merge: !1346 merge recover-community-copyright into main fix(copyright):Remove the Huawei copyright notices from the extension, runtime, libentry files and OpInterface.h. Created-by: jeshd Commit-by: jeshd Merged-by: ascend-robot Description: 描述移除extension，runtime和libentry里的Huawei copyright，移除OpInterface.h里的Huawei copyright 修改原因 extension，runtime和libentry中的代码文件为TA新添加的文件，基于开源代码片段的修改，OpInterface.h从triton 3.4.0版本引入，移除对应的Huawei copyright See merge request: Ascend/triton-ascend!1346	2 个月前
driver.py	[RUNTIME] Allow setting active driver (#2973)	2 年前
errors.py	[FRONTEND] Revert use of `\|` operator for uniting types (#3366) We still want to support Python 3.9. `Allow writing union types as X \| Y` is only introduced in Python 3.10. https://peps.python.org/pep-0604/	2 年前
interpreter.py	fix interpret bug of cast op rtz mode in overflow case Co-authored-by: zhuxuejie<zhuxuejie8@huawei.com> # message auto-generated for no-merge-commit merge: !1336 merge int_1 into main fix interpret bug of cast op rtz mode in overflow case Created-by: zhuxuejie Commit-by: zhuxuejie Merged-by: ascend-robot Description: 1、修复解释器模式下 cast op在rtz模式下上溢场景的bug,转为inf而非nan 2、新增对scope op的支持，直接pass The core Triton is a small number of people, and we receive many PRs (thank you!). To help us review your code more quickly, if you are a new contributor (less than 3 PRs merged) we ask that you complete the following tasks and include the filled-out checklist in your PR description. Complete the following tasks before sending your PR, and replace `[ ]` with `[x]` to indicate you have done them. - [ ] I am not making a trivial change, such as fixing a typo in a comment. - [ ] I have written a PR description following these [rules](https://cbea.ms/git-commit/#why-not-how). - [ ] I have run `pre-commit run --from-ref origin/main --to-ref HEAD`. - Select one of the following. - [ ] I have added tests. - `/test` for `lit` tests - `/unittest` for C++ tests - `/python/test` for end-to-end tests - [ ] This PR does not need a test because `FILL THIS IN`. - Select one of the following. - [ ] I have not added any `lit` tests. - [ ] The `lit` tests I have added follow these [best practices](https://mlir.llvm.org/getting_started/TestingGuide/#filecheck-best-practices), including the "tests should be minimal" section. (Usually running Python code and using the instructions it generates is not minimal.) See merge request: Ascend/triton-ascend!1336	3 个月前
jit.py	fix(jit): fix incompatibility problem caused by premature introduction Co-authored-by: luobaiqing<luobaiqing1@huawei.com> # message auto-generated for no-merge-commit merge: !1395 merge fix_bug_main into main fix(jit): fix incompatibility problem caused by premature introduction Created-by: luobaiqing Commit-by: luobaiqing Merged-by: ascend-robot Description: preload提前引入了一些3.5.1的方法，但适配有问题。此pr修复之 The core Triton is a small number of people, and we receive many PRs (thank you!). To help us review your code more quickly, if you are a new contributor (less than 3 PRs merged) we ask that you complete the following tasks and include the filled-out checklist in your PR description. Complete the following tasks before sending your PR, and replace `[ ]` with `[x]` to indicate you have done them. - [ ] I am not making a trivial change, such as fixing a typo in a comment. - [ ] I have written a PR description following these [rules](https://cbea.ms/git-commit/#why-not-how). - [ ] I have run `pre-commit run --from-ref origin/main --to-ref HEAD`. - Select one of the following. - [ ] I have added tests. - `/test` for `lit` tests - `/unittest` for C++ tests - `/python/test` for end-to-end tests - [ ] This PR does not need a test because `FILL THIS IN`. - Select one of the following. - [ ] I have not added any `lit` tests. - [ ] The `lit` tests I have added follow these [best practices](https://mlir.llvm.org/getting_started/TestingGuide/#filecheck-best-practices), including the "tests should be minimal" section. (Usually running Python code and using the instructions it generates is not minimal.) See merge request: Ascend/triton-ascend!1395	2 个月前
libentry.py	fix(copyright):Remove the Huawei copyright notices from the extension, runtime, libentry files and OpInterface.h. Co-authored-by: jeshd<chengmaofan@huawei.com> # message auto-generated for no-merge-commit merge: !1346 merge recover-community-copyright into main fix(copyright):Remove the Huawei copyright notices from the extension, runtime, libentry files and OpInterface.h. Created-by: jeshd Commit-by: jeshd Merged-by: ascend-robot Description: 描述移除extension，runtime和libentry里的Huawei copyright，移除OpInterface.h里的Huawei copyright 修改原因 extension，runtime和libentry中的代码文件为TA新添加的文件，基于开源代码片段的修改，OpInterface.h从triton 3.4.0版本引入，移除对应的Huawei copyright See merge request: Ascend/triton-ascend!1346	2 个月前