Auto import backends in triton.language.extra (#4889)
Before 53166efa2494764e11734d57de681cc4d2486981, `importing
triton/language/extra/__init__.py used to import cuda and hip`:
https://github.com/triton-lang/triton/commit/53166efa2494764e11734d57de681cc4d2486981#diff-20c816668bef8f6f25f90585e3cf909d86c209cbb2281c49938de1a2e6729c5dL1-L4
After the commit it no longer imports them because submodules in
triton.language.extra are moved to /third_party and copied during
setup.py. This change breaks code that relies on such behavior:
```python
# Before commit
from triton.language import extra
extra.cuda
>>> <cuda submodule>
# After commit
from triton.language import extra
extra.cuda
>>> Traceback (most recent call last):
File "<stdin>", line 1, in <module>
AttributeError: module 'triton.language.extra' has no attribute 'cuda'
```
To keep the compatibility, this commit is to dynamically import
submodules in triton.language.extra.
The core Triton is a small number of people, and we receive many PRs
(thank
you!). To help us review your code more quickly, **if you are a new
contributor (less than 3 PRs merged) we ask that you complete the
following
tasks and include the filled-out checklist in your PR description.**
Complete the following tasks before sending your PR, and replace [ ]
with
[x] to indicate you have done them.
- [ ] I am not making a trivial change, such as fixing a typo in a
comment.
- [x] I have written a PR description following these
[rules](https://cbea.ms/git-commit/#why-not-how).
- [x] I have run pre-commit run --from-ref origin/main --to-ref HEAD.
- Select one of the following.
- [ ] I have added tests.
- /test for lit tests
- /unittest for C++ tests
- /python/test for end-to-end tests
- [x] This PR does not need a test because it is about module structure.
- Select one of the following.
- [x] I have not added any lit tests.
- [ ] The lit tests I have added follow these [best
practices](https://mlir.llvm.org/getting_started/TestingGuide/#filecheck-best-practices),
including the "tests should be minimal" section. (Usually running Python
code
and using the instructions it generates is not minimal.)
fix(bug): fix import acl runtime error
Co-authored-by: 刘风昇<liufengsheng2@huawei.com>
# message auto-generated for no-merge-commit merge:
!1227 merge acl into main
fix(bug): fix import acl runtime error
Created-by: meloliu12327
Commit-by: 刘风昇
Merged-by: ascend-robot
Description: 解决import acl调用runtime后 环境变量被改写导致runtime报错的问题,当前的解决方式是通过/sys/bus/pci/devices/下的路径获取当前机器的环境信息
主要对照表如下:
Device ID: 0xd802 -> Ascend910B
Device ID: 0xd803 -> Ascend910
Device ID: 0xd806 -> Ascend950
The core Triton is a small number of people, and we receive many PRs (thank
you!). To help us review your code more quickly, **if you are a new
contributor (less than 3 PRs merged) we ask that you complete the following
tasks and include the filled-out checklist in your PR description.**
Complete the following tasks before sending your PR, and replace [ ] with
[x] to indicate you have done them.
- [ ] I am not making a trivial change, such as fixing a typo in a comment.
- [ ] I have written a PR description following these
[rules](https://cbea.ms/git-commit/#why-not-how).
- [ ] I have run pre-commit run --from-ref origin/main --to-ref HEAD.
- Select one of the following.
- [ ] I have added tests.
- /test for lit tests
- /unittest for C++ tests
- /python/test for end-to-end tests
- [ ] This PR does not need a test because FILL THIS IN.
- Select one of the following.
- [ ] I have not added any lit tests.
- [ ] The lit tests I have added follow these [best practices](https://mlir.llvm.org/getting_started/TestingGuide/#filecheck-best-practices),
including the "tests should be minimal" section. (Usually running Python code
and using the instructions it generates is not minimal.)
See merge request: Ascend/triton-ascend!1227
inline_asm use builtins.tuple instead of triton 3.4 tuple
Co-authored-by: zhuxuejie<zhuxuejie8@huawei.com>
# message auto-generated for no-merge-commit merge:
!1396 merge tuple into main
inline_asm use builtins.tuple instead of triton 3.4 tuple
Created-by: zhuxuejie
Commit-by: zhuxuejie
Merged-by: ascend-robot
Description: inline_asm 使用triton 3.2的builtins.tuple ,而不是triton 3.4的class tuple(class tuple为其他3.4op调用提前引入),避免后续的convert fail
The core Triton is a small number of people, and we receive many PRs (thank
you!). To help us review your code more quickly, **if you are a new
contributor (less than 3 PRs merged) we ask that you complete the following
tasks and include the filled-out checklist in your PR description.**
Complete the following tasks before sending your PR, and replace [ ] with
[x] to indicate you have done them.
- [ ] I am not making a trivial change, such as fixing a typo in a comment.
- [ ] I have written a PR description following these
[rules](https://cbea.ms/git-commit/#why-not-how).
- [ ] I have run pre-commit run --from-ref origin/main --to-ref HEAD.
- Select one of the following.
- [ ] I have added tests.
- /test for lit tests
- /unittest for C++ tests
- /python/test for end-to-end tests
- [ ] This PR does not need a test because FILL THIS IN.
- Select one of the following.
- [ ] I have not added any lit tests.
- [ ] The lit tests I have added follow these [best practices](https://mlir.llvm.org/getting_started/TestingGuide/#filecheck-best-practices),
including the "tests should be minimal" section. (Usually running Python code
and using the instructions it generates is not minimal.)
See merge request: Ascend/triton-ascend!1396
feat: Remove tanh op to align with the upstream, and add bf16 fallback to libdevice.tanh
Co-authored-by: jeshd<chengmaofan@huawei.com>
# message auto-generated for no-merge-commit merge:
!1486 merge remove_tanh into main
feat: Remove tanh op to align with the upstream, and add bf16 fallback to libdevice.tanh
Created-by: jeshd
Commit-by: jeshd
Merged-by: ascend-robot
Description: ### Summary
This PR removes tanh from triton.language.math to keep the fork aligned with upstream, where math does not provide a tanh op, moves tanh usage to triton.language.extra.cann.libdevice.tanh and adds explicit bf16 handling in the libdevice implementation.
To preserve existing user-facing behavior, tanh is now provided by triton.language.extra.cann.libdevice, while the call pattern remains unchanged. In other words, users can still write:
```
from triton.language.math import tanh
```
but the actual implementation is resolved to libdevice.tanh.
### Motivation
Upstream Triton does not define tanh under triton.language.math. Keeping a fork-specific math.tanh introduces unnecessary divergence and increases maintenance cost.
This change removes that divergence and keeps the frontend behavior compatible for existing callers. At the same time, bf16 tanh still needs a workable lowering path. This change makes the call sites explicit and adds a bf16 fallback that computes tanh in fp32 and casts the result back to bf16.
### What Changed
Removed the tanh op from triton.language.math
Removed the related builder-side path that depended on the old math.tanh implementation
Switched the underlying implementation to triton.language.extra.cann.libdevice.tanh
Kept the import and call style unchanged for users
Extended libdevice.tanh to support bf16 by casting bf16 inputs to fp32, calling the existing fp32 tanh extern path, and casting the result back to bf16
Added tanh test coverage for fp16 and bf16 in addition to fp32
### CheckList
- [x] I am not making a trivial change, such as fixing a typo in a comment.
- [x] I have written a PR description following these
[rules](https://cbea.ms/git-commit/#why-not-how).
- [x] I have run pre-commit run --from-ref origin/main --to-ref HEAD.
- Select one of the following.
- [ ] I have added tests.
- /test for lit tests
- /unittest for C++ tests
- /python/test for end-to-end tests
- [x] This PR does not need a test because corresponding test cases already exist.
- Select one of the following.
- [x] I have not added any lit tests.
- [ ] The lit tests I have added follow these [best practices](https://mlir.llvm.org/getting_started/TestingGuide/#filecheck-best-practices),
including the "tests should be minimal" section. (Usually running Python code
and using the instructions it generates is not minimal.)
See merge request: Ascend/triton-ascend!1486
fix(load): load make_block_ptr tensor default fill 0 when out of bound
Co-authored-by: zhuxuejie<zhuxuejie8@huawei.com>
# message auto-generated for no-merge-commit merge:
!1462 merge ptr2 into main
fix(load): load make_block_ptr tensor default fill 0 when out of bound
Created-by: zhuxuejie
Commit-by: zhuxuejie
Merged-by: ascend-robot
Description: load 由make_block_ptr构成的tensor时,如果越界,应默认填充0,与releas分支保持一致,否则在qwen3模型中会有精度问题
The core Triton is a small number of people, and we receive many PRs (thank
you!). To help us review your code more quickly, **if you are a new
contributor (less than 3 PRs merged) we ask that you complete the following
tasks and include the filled-out checklist in your PR description.**
Complete the following tasks before sending your PR, and replace [ ] with
[x] to indicate you have done them.
- [ ] I am not making a trivial change, such as fixing a typo in a comment.
- [ ] I have written a PR description following these
[rules](https://cbea.ms/git-commit/#why-not-how).
- [ ] I have run pre-commit run --from-ref origin/main --to-ref HEAD.
- Select one of the following.
- [ ] I have added tests.
- /test for lit tests
- /unittest for C++ tests
- /python/test for end-to-end tests
- [ ] This PR does not need a test because FILL THIS IN.
- Select one of the following.
- [ ] I have not added any lit tests.
- [ ] The lit tests I have added follow these [best practices](https://mlir.llvm.org/getting_started/TestingGuide/#filecheck-best-practices),
including the "tests should be minimal" section. (Usually running Python code
and using the instructions it generates is not minimal.)
See merge request: Ascend/triton-ascend!1462