triton-ascend/python/triton · wxlong_ustc/triton-ascend - AtomGit

ascend-robotfeat: Remove tanh op to align with the upstream, and add bf16 fallback to libdevice.tanh

文件	最后提交记录	最后更新时间
_C	History prior to this date belonged to the now deprecated ISAAC project, and was deleted to save space	4 年前
backends	Add string representation for AttrsDescriptor (#4888) The string representation allows PyTorch Inductor to serialize/derserialize the `AttrsDescriptor` to the `@triton.heuristics` block in the generated code.	1 年前
compiler	fix:Fix the error caused by missing cache_dir attribute & prioritize using the torch_npu getCurrentRawStreamNoWait API. Co-authored-by: wangzhanpeng5<wangzhanpeng5@huawei.com> # message auto-generated for no-merge-commit merge: !1476 merge 0328_main_FileCache_getCurrentRawStreamNoWait into main fix:Fix the error caused by missing cache_dir attribute & prioritize using the torch_npu getCurrentRawStreamNoWait API. Created-by: wangzhanpeng5 Commit-by: wangzhanpeng5 Merged-by: ascend-robot Description: ## Fix issue： [https://gitcode.com/Ascend/triton-ascend/issues/365](https://gitcode.com/Ascend/triton-ascend/issues/365) ### Issue Description When compilation fails, the current exception handler attempts to access `fn_cache_manager.cache_dir` to include the cache path in the error message. However, if `fn_cache_manager` is not an instance of `FileCacheManager` (e.g., another cache implementation), the object lacks the `cache_dir attribute`, causing an AttributeError. This masks the original compilation error and prevents proper error reporting. #### Fix Before adding cache information, check the object type using isinstance(`fn_cache_manager`, `FileCacheManager`): If it is a `FileCacheManager`, use the cache_dir attribute to display the cache directory. Otherwise, fall back to showing `{file_name}.{ext}` to ensure at least the filename is provided. ## Fix issue： [https://gitcode.com/Ascend/triton-ascend/issues/369](https://gitcode.com/Ascend/triton-ascend/issues/369) ### Issue Description Replace `get_current_stream` with `getCurrentRawStreamNoWait` to resolve hangs when calling `get_current_stream` during asynchronous communication combined with Triton kernel launches. ### Fix Check for the existence of the new interface using hasattr(`torch_npu._C, "_npu_getCurrentRawStreamNoWait"`). If available, import and call `_npu_getCurrentRawStreamNoWait`; otherwise, fall back to the original `_npu_getCurrentRawStream`. Maintain consistent interface behavior to ensure backward compatibility. See merge request: Ascend/triton-ascend!1476	2 个月前
extension	feat: Optimized the CV ssbuf feature Co-authored-by: wuzw_05<wuzhiwei37@huawei.com> # message auto-generated for no-merge-commit merge: !1415 merge ssbuf into main feat: Optimized the CV ssbuf feature Created-by: wuzw_05 Commit-by: wuzw_05 Merged-by: ascend-robot Description: # Background The ssbuf scalar state is used to store various states in the CV pipeline. Based on the data transmission state, a greedy algorithm is used to minimize the cavitation of the Cube computation. core goals 1. Construct a "first-in, first-out" buffer queue to ensure orderly memory allocation. 2. Control is based on buffernum to prevent precision issues caused by control flow. # CheckList - [x] I am not making a trivial change, such as fixing a typo in a comment. - [x] I have written a PR description following these [rules](https://cbea.ms/git-commit/#why-not-how). - [x] I have run `pre-commit run --from-ref origin/main --to-ref HEAD`. - Select one of the following. - [ ] I have added tests. - `/test` for `lit` tests - `/unittest` for C++ tests - `/python/test` for end-to-end tests - [x] This PR does not need a test because `FILL THIS IN`. - Select one of the following. - [x] I have not added any `lit` tests. - [ ] The `lit` tests I have added follow these [best practices](https://mlir.llvm.org/getting_started/TestingGuide/#filecheck-best-practices), including the "tests should be minimal" section. (Usually running Python code and using the instructions it generates is not minimal.) See merge request: Ascend/triton-ascend!1415	2 个月前
language	feat: Remove tanh op to align with the upstream, and add bf16 fallback to libdevice.tanh Co-authored-by: jeshd<chengmaofan@huawei.com> # message auto-generated for no-merge-commit merge: !1486 merge remove_tanh into main feat: Remove tanh op to align with the upstream, and add bf16 fallback to libdevice.tanh Created-by: jeshd Commit-by: jeshd Merged-by: ascend-robot Description: ### Summary This PR removes tanh from triton.language.math to keep the fork aligned with upstream, where math does not provide a tanh op, moves tanh usage to triton.language.extra.cann.libdevice.tanh and adds explicit bf16 handling in the libdevice implementation. To preserve existing user-facing behavior, tanh is now provided by triton.language.extra.cann.libdevice, while the call pattern remains unchanged. In other words, users can still write: ``` from triton.language.math import tanh ``` but the actual implementation is resolved to libdevice.tanh. ### Motivation Upstream Triton does not define tanh under triton.language.math. Keeping a fork-specific math.tanh introduces unnecessary divergence and increases maintenance cost. This change removes that divergence and keeps the frontend behavior compatible for existing callers. At the same time, bf16 tanh still needs a workable lowering path. This change makes the call sites explicit and adds a bf16 fallback that computes tanh in fp32 and casts the result back to bf16. ### What Changed Removed the tanh op from triton.language.math Removed the related builder-side path that depended on the old math.tanh implementation Switched the underlying implementation to triton.language.extra.cann.libdevice.tanh Kept the import and call style unchanged for users Extended libdevice.tanh to support bf16 by casting bf16 inputs to fp32, calling the existing fp32 tanh extern path, and casting the result back to bf16 Added tanh test coverage for fp16 and bf16 in addition to fp32 ### CheckList - [x] I am not making a trivial change, such as fixing a typo in a comment. - [x] I have written a PR description following these [rules](https://cbea.ms/git-commit/#why-not-how). - [x] I have run `pre-commit run --from-ref origin/main --to-ref HEAD`. - Select one of the following. - [ ] I have added tests. - `/test` for `lit` tests - `/unittest` for C++ tests - `/python/test` for end-to-end tests - [x] This PR does not need a test because `corresponding test cases already exist`. - Select one of the following. - [x] I have not added any `lit` tests. - [ ] The `lit` tests I have added follow these [best practices](https://mlir.llvm.org/getting_started/TestingGuide/#filecheck-best-practices), including the "tests should be minimal" section. (Usually running Python code and using the instructions it generates is not minimal.) See merge request: Ascend/triton-ascend!1486	2 个月前
runtime	fix(jit): fix incompatibility problem caused by premature introduction Co-authored-by: luobaiqing<luobaiqing1@huawei.com> # message auto-generated for no-merge-commit merge: !1395 merge fix_bug_main into main fix(jit): fix incompatibility problem caused by premature introduction Created-by: luobaiqing Commit-by: luobaiqing Merged-by: ascend-robot Description: preload提前引入了一些3.5.1的方法，但适配有问题。此pr修复之 The core Triton is a small number of people, and we receive many PRs (thank you!). To help us review your code more quickly, if you are a new contributor (less than 3 PRs merged) we ask that you complete the following tasks and include the filled-out checklist in your PR description. Complete the following tasks before sending your PR, and replace `[ ]` with `[x]` to indicate you have done them. - [ ] I am not making a trivial change, such as fixing a typo in a comment. - [ ] I have written a PR description following these [rules](https://cbea.ms/git-commit/#why-not-how). - [ ] I have run `pre-commit run --from-ref origin/main --to-ref HEAD`. - Select one of the following. - [ ] I have added tests. - `/test` for `lit` tests - `/unittest` for C++ tests - `/python/test` for end-to-end tests - [ ] This PR does not need a test because `FILL THIS IN`. - Select one of the following. - [ ] I have not added any `lit` tests. - [ ] The `lit` tests I have added follow these [best practices](https://mlir.llvm.org/getting_started/TestingGuide/#filecheck-best-practices), including the "tests should be minimal" section. (Usually running Python code and using the instructions it generates is not minimal.) See merge request: Ascend/triton-ascend!1395	2 个月前
tools	fix(bug): allow using npu-smi info to get the device info Co-authored-by: 刘风昇<liufengsheng2@huawei.com> # message auto-generated for no-merge-commit merge: !1338 merge acl2 into main fix(bug): allow using npu-smi info to get the device info Created-by: meloliu12327 Commit-by: 刘风昇 Merged-by: ascend-robot Description: 添加使用npu-smi info获取硬件型号信息的方式 The core Triton is a small number of people, and we receive many PRs (thank you!). To help us review your code more quickly, if you are a new contributor (less than 3 PRs merged) we ask that you complete the following tasks and include the filled-out checklist in your PR description. Complete the following tasks before sending your PR, and replace `[ ]` with `[x]` to indicate you have done them. - [ ] I am not making a trivial change, such as fixing a typo in a comment. - [ ] I have written a PR description following these [rules](https://cbea.ms/git-commit/#why-not-how). - [ ] I have run `pre-commit run --from-ref origin/main --to-ref HEAD`. - Select one of the following. - [ ] I have added tests. - `/test` for `lit` tests - `/unittest` for C++ tests - `/python/test` for end-to-end tests - [ ] This PR does not need a test because `FILL THIS IN`. - Select one of the following. - [ ] I have not added any `lit` tests. - [ ] The `lit` tests I have added follow these [best practices](https://mlir.llvm.org/getting_started/TestingGuide/#filecheck-best-practices), including the "tests should be minimal" section. (Usually running Python code and using the instructions it generates is not minimal.) See merge request: Ascend/triton-ascend!1338	2 个月前
__init__.py	feat(runtime/autotune): add AsyncCompileMode to support parallel compilation in autotuning Co-authored-by: Xuan Peng<pengxuan9@huawei.com> # message auto-generated for no-merge-commit merge: !1268 merge feat/async-compile-0210 into main feat(runtime/autotune): add AsyncCompileMode to support parallel compilation in autotuning Created-by: HinPeng Commit-by: Xuan Peng Merged-by: ascend-robot Description: ## PR description 1. Introduce async compile mode from triton v3.5.1 (with little modification to be compatible with current branch and torch-2.7.1) 2. Refactor autotuner to compile triton kernel in parallel ## Notice 1. Introduce `MLIR_DISABLE_MULTITHREADING` environment variable ahead from triton v3.5.1 2. Add TRITON_AUTOTUNE_PARALLEL_COMPILE to control whether compiling kernels in parallel in autotuner, default to '1' See merge request: Ascend/triton-ascend!1268	3 个月前
_internal_testing.py	[Frontend] [BC breaking] Implement PyTorch/JAX/NumPy 2.0 typecast semantics for scalars (#4613) The idea here is that if you have a tensor `t` of dtype `uint8` and you want to do `t << 2`, the result should be of dtype `uint8`, not `int32`! We do this for all dunder ops that don't output booleans. This follows roughly the semantics of PyTorch, JAX and NumPy 2.0. I would like to document this behaviour, but it's not clear to me where is the best place to say so. The PR has much more churn than I would like, as I had to move the `to_tensor` method to `semantic` (which is where it belongs anyway). For reviewers, the only two relevant changes are in `computation_type_impl` and in `bitwise_op_type_checking_impl`, where we say that we do perform casting for bitwise ops.	1 年前
errors.py	[FRONTEND] Unify Interpreter and JIT Compilation Errors (#3355)	2 年前
testing.py	[AUTOTUNER] A quick follow-up for more device-independent do_bench (#4974) This is a quick follow-up for the recent autotuner/testing changes as in https://github.com/triton-lang/triton/pull/4496. This PR moves the empty cache creation into the driver code to make the code more device independent.	1 年前