!1803 merge main_cann into main
feat: Triggering operator recompilation when the CANN version
Created-by: LH_123L
Commit-by: liuhuan
Merged-by: ascend-robot
Description: <!---
The core Triton is a small number of people, and we receive many PRs (thank
you!). To help us review your code more quickly, **if you are a new
contributor (less than 3 PRs merged) we ask that you complete the following
tasks and include the filled-out checklist in your PR description.**
Complete the following tasks before sending your PR, and replace [ ] with
[x] to indicate you have done them.
-->
# New contributor declaration
- [ ] I am not making a trivial change, such as fixing a typo in a comment.
- [ ] I have written a PR description following these
[rules](https://cbea.ms/git-commit/#why-not-how).
- [ ] I have run pre-commit run --from-ref origin/main --to-ref HEAD.
- Select one of the following.
- [ ] I have added tests.
- /test for lit tests
- /unittest for C++ tests
- /python/test for end-to-end tests
- [ ] This PR does not need a test because FILL THIS IN.
- Select one of the following.
- [ ] I have not added any lit tests.
- [ ] The lit tests I have added follow these [best practices](https://mlir.llvm.org/getting_started/TestingGuide/#filecheck-best-practices),
including the "tests should be minimal" section. (Usually running Python code
and using the instructions it generates is not minimal.)
See merge request: Ascend/triton-ascend!1803
!1790 merge sn into main
fix: Enhance offset analysis and change the tensor of the all-one type in the select op to continuous memory access
Created-by: wutianyao
Commit-by: wutianyao
Merged-by: ascend-robot
Description: **bugfix1:** The restriction that forcibly treats tensor types with all 1(<1x1x1x...>) in the select op during pointer analysis as non-contiguous has been removed.
Because under the original restriction, it would lead to a situation where, when processing tensor<1x> types, they would be expanded into discrete memory access.However, when the MLIR's built-in forloop optimization(SimplifyTrivialLoops) recognizing that the loop count is 1, it would fall back. Yet, it did not remove the discrete memory access label applied to the load op during the discrete memory access optimization({DiscreteMemAccess}).
When the linalg pass later attempts to transform the load, if it detects that the load has a discrete memory access label, it will read the init args of the outer forloop of the load. Where the discrete memory access forloop has already been fall back by SimplifyTrivialLoops , this can lead to a series of errors (for example, if the discrete memory access forloop has been erased but the kernel itself still contains a forloop, reading the init args as empty can cause a core dump).
**bugfix2:** When recognize a indextensor in forloop init args in the rewriteforloop, it will be convert to offset + stride,but in some scene, for example,the args is recognize as indextensor, then in blockptranalysis, it will be analysised as scalar, and in rewriteTerminator it need the stride must equal one, so an assert error occurs.
**bugfix3:** When the mask is composed of splats, it is identified as a continuous mask in the discrete mask analysis pass. However, when it enters the Triton to Linalg pass, if all dimensions are 1, the splat op is converted into an insert op. In this case, the mask analysis cannot identify the insert op, and the mask is analyzed as discontinuous. As a result, the problem occurs.
**bugfix4:** In the use analysis, when there is indirect memory access, the 1st load op is initially marked as a meta use. In the Post-process, this situation is handled by identifying and marking the instruction chain related to indirect memory access, and then re-marking it as mixuse to ensure it is not eliminated in subsequent conversion stages.
If an op appears in a computation chain involving a set of indirect memory accesses, such as load(1st) -> computeOp -> load(2nd), and this op has been used through assert or print, it will be marked as Mixuse. In this case, the op will be clone into a mixuse op and a metause op before the Post-process phase.
The mixuse op is used for assert, and the metause op is used for 2nd loads. However, since the op was initially marked as Mixuse, the 1st load op is also marked as Mixuse, thus skipping the post-process. Since the split metause op is used for 2nd loads, its elimination can cause a series of issues.
Currently, a temporary modified for this situation is to disable cloning of select ops.
<!---
The core Triton is a small number of people, and we receive many PRs (thank
you!). To help us review your code more quickly, **if you are a new
contributor (less than 3 PRs merged) we ask that you complete the following
tasks and include the filled-out checklist in your PR description.**
Complete the following tasks before sending your PR, and replace [ ] with
[x] to indicate you have done them.
-->
# New contributor declaration
- [ ] I am not making a trivial change, such as fixing a typo in a comment.
- [ ] I have written a PR description following these
[rules](https://cbea.ms/git-commit/#why-not-how).
- [ ] I have run pre-commit run --from-ref origin/main --to-ref HEAD.
- Select one of the following.
- [ ] I have added tests.
- /test for lit tests
- /unittest for C++ tests
- /python/test for end-to-end tests
- [ ] This PR does not need a test because FILL THIS IN.
- Select one of the following.
- [ ] I have not added any lit tests.
- [ ] The lit tests I have added follow these [best practices](https://mlir.llvm.org/getting_started/TestingGuide/#filecheck-best-practices),
including the "tests should be minimal" section. (Usually running Python code
and using the instructions it generates is not minimal.)
See merge request: Ascend/triton-ascend!1790
!1764 merge conv1d_main into main
feat(conv1d) : add conv1d op
Created-by: gymgit1
Commit-by: gymgit1
Merged-by: ascend-robot
Description: 添加conv1d op
位置: third_party/ascend/language/cann/extension
参考torch的conv1d接口,ta侧做透传,进行参数解析,将tl.conv1d转换为hfusion.conv1d,torch接口如图(ta侧删去padding_mode):

具体更改:
core.py 注册&参数解析
semantic.py 语义逻辑
triton_ascend.cc pybind11绑定
TritonAscendOps.td 定义Conv1dOp及其属性
TritonAscendOps.cpp 校验&形状推导
TritonToHFusion.cpp 注册Conversion Patten&提取参数&创建hfusion::Conv1DOp
参数解释:
:param input: Input tensor of shape (N, C_in, L_in) or (C_in, L_in). N is a batch size, C denotes a number of channels, L is a length of signal sequence.
:type input: tensor
:param weight: Weight tensor of shape (C_out, C_in // groups, kernel_size).
:type weight: tensor
:param bias: Bias tensor of shape (C_out) or None. Default: None.
:type bias: tensor or None
:param stride: The stride of the convolution kernel. Can be an int or a 1-element tuple.
:type stride: int or Tuple[int]
:param padding_size: Padding added to both sides of the input. Can be an int, a 1-element tuple, or a string. Can be a string {'valid', 'same'}, single number or a one-element tuple.
padding_size='valid' is the same as no padding.
padding_size='same' pads the input so the output has the same shape as the input. However, this mode doesn't support any stride values other than 1.
:type padding_size: int, Tuple[int], or str
:param dilation: The spacing between kernel elements. Can be an int or a 1-element tuple.
:type dilation: int or Tuple[int]
:param groups: Number of blocked connections from input to output channels.
:type groups: int
See merge request: Ascend/triton-ascend!1764
!1790 merge sn into main
fix: Enhance offset analysis and change the tensor of the all-one type in the select op to continuous memory access
Created-by: wutianyao
Commit-by: wutianyao
Merged-by: ascend-robot
Description: **bugfix1:** The restriction that forcibly treats tensor types with all 1(<1x1x1x...>) in the select op during pointer analysis as non-contiguous has been removed.
Because under the original restriction, it would lead to a situation where, when processing tensor<1x> types, they would be expanded into discrete memory access.However, when the MLIR's built-in forloop optimization(SimplifyTrivialLoops) recognizing that the loop count is 1, it would fall back. Yet, it did not remove the discrete memory access label applied to the load op during the discrete memory access optimization({DiscreteMemAccess}).
When the linalg pass later attempts to transform the load, if it detects that the load has a discrete memory access label, it will read the init args of the outer forloop of the load. Where the discrete memory access forloop has already been fall back by SimplifyTrivialLoops , this can lead to a series of errors (for example, if the discrete memory access forloop has been erased but the kernel itself still contains a forloop, reading the init args as empty can cause a core dump).
**bugfix2:** When recognize a indextensor in forloop init args in the rewriteforloop, it will be convert to offset + stride,but in some scene, for example,the args is recognize as indextensor, then in blockptranalysis, it will be analysised as scalar, and in rewriteTerminator it need the stride must equal one, so an assert error occurs.
**bugfix3:** When the mask is composed of splats, it is identified as a continuous mask in the discrete mask analysis pass. However, when it enters the Triton to Linalg pass, if all dimensions are 1, the splat op is converted into an insert op. In this case, the mask analysis cannot identify the insert op, and the mask is analyzed as discontinuous. As a result, the problem occurs.
**bugfix4:** In the use analysis, when there is indirect memory access, the 1st load op is initially marked as a meta use. In the Post-process, this situation is handled by identifying and marking the instruction chain related to indirect memory access, and then re-marking it as mixuse to ensure it is not eliminated in subsequent conversion stages.
If an op appears in a computation chain involving a set of indirect memory accesses, such as load(1st) -> computeOp -> load(2nd), and this op has been used through assert or print, it will be marked as Mixuse. In this case, the op will be clone into a mixuse op and a metause op before the Post-process phase.
The mixuse op is used for assert, and the metause op is used for 2nd loads. However, since the op was initially marked as Mixuse, the 1st load op is also marked as Mixuse, thus skipping the post-process. Since the split metause op is used for 2nd loads, its elimination can cause a series of issues.
Currently, a temporary modified for this situation is to disable cloning of select ops.
<!---
The core Triton is a small number of people, and we receive many PRs (thank
you!). To help us review your code more quickly, **if you are a new
contributor (less than 3 PRs merged) we ask that you complete the following
tasks and include the filled-out checklist in your PR description.**
Complete the following tasks before sending your PR, and replace [ ] with
[x] to indicate you have done them.
-->
# New contributor declaration
- [ ] I am not making a trivial change, such as fixing a typo in a comment.
- [ ] I have written a PR description following these
[rules](https://cbea.ms/git-commit/#why-not-how).
- [ ] I have run pre-commit run --from-ref origin/main --to-ref HEAD.
- Select one of the following.
- [ ] I have added tests.
- /test for lit tests
- /unittest for C++ tests
- /python/test for end-to-end tests
- [ ] This PR does not need a test because FILL THIS IN.
- Select one of the following.
- [ ] I have not added any lit tests.
- [ ] The lit tests I have added follow these [best practices](https://mlir.llvm.org/getting_started/TestingGuide/#filecheck-best-practices),
including the "tests should be minimal" section. (Usually running Python code
and using the instructions it generates is not minimal.)
See merge request: Ascend/triton-ascend!1790
!1836 merge all_ut into main
feat(ssbuf): add ut for AddControlFlowCondition pass
Created-by: ma_journey
Commit-by: m-everglow
Merged-by: ascend-robot
Description: 动态CV控制:添加AddControlFlowCondition控制流部分的UT
<!---
The core Triton is a small number of people, and we receive many PRs (thank
you!). To help us review your code more quickly, **if you are a new
contributor (less than 3 PRs merged) we ask that you complete the following
tasks and include the filled-out checklist in your PR description.**
Complete the following tasks before sending your PR, and replace [ ] with
[x] to indicate you have done them.
-->
# New contributor declaration
- [ ] I am not making a trivial change, such as fixing a typo in a comment.
- [ ] I have written a PR description following these
[rules](https://cbea.ms/git-commit/#why-not-how).
- [ ] I have run pre-commit run --from-ref origin/main --to-ref HEAD.
- Select one of the following.
- [ ] I have added tests.
- /test for lit tests
- /unittest for C++ tests
- /python/test for end-to-end tests
- [ ] This PR does not need a test because FILL THIS IN.
- Select one of the following.
- [ ] I have not added any lit tests.
- [ ] The lit tests I have added follow these [best practices](https://mlir.llvm.org/getting_started/TestingGuide/#filecheck-best-practices),
including the "tests should be minimal" section. (Usually running Python code
and using the instructions it generates is not minimal.)
See merge request: Ascend/triton-ascend!1836
ci(hitest_cov):Added hitest cov tool
Co-authored-by: wangzhanpeng5<wangzhanpeng5@huawei.com>
# message auto-generated for no-merge-commit merge:
!1406 merge 0318_main_hitest into main
ci(hitest_cov):Added hitest cov tool
Created-by: wangzhanpeng5
Commit-by: wangzhanpeng5
Merged-by: ascend-robot
Description: There are currently two ways to obtain C++ code coverage.
1.HITEST
To enable the hitest coverage tool, you need to set the environment variable TRITON_ENABLE_COVERAGE_HITEST=1
2.llvm-cov
To enable llvm-cov coverage tool, you need to pass the argument TRITON_APPEND_CMAKE_ARGS="-DTRITON_ENABLE_COVERAGE_LLVM_COV=ON" during compilation.
See merge request: Ascend/triton-ascend!1406
!1764 merge conv1d_main into main
feat(conv1d) : add conv1d op
Created-by: gymgit1
Commit-by: gymgit1
Merged-by: ascend-robot
Description: 添加conv1d op
位置: third_party/ascend/language/cann/extension
参考torch的conv1d接口,ta侧做透传,进行参数解析,将tl.conv1d转换为hfusion.conv1d,torch接口如图(ta侧删去padding_mode):

具体更改:
core.py 注册&参数解析
semantic.py 语义逻辑
triton_ascend.cc pybind11绑定
TritonAscendOps.td 定义Conv1dOp及其属性
TritonAscendOps.cpp 校验&形状推导
TritonToHFusion.cpp 注册Conversion Patten&提取参数&创建hfusion::Conv1DOp
参数解释:
:param input: Input tensor of shape (N, C_in, L_in) or (C_in, L_in). N is a batch size, C denotes a number of channels, L is a length of signal sequence.
:type input: tensor
:param weight: Weight tensor of shape (C_out, C_in // groups, kernel_size).
:type weight: tensor
:param bias: Bias tensor of shape (C_out) or None. Default: None.
:type bias: tensor or None
:param stride: The stride of the convolution kernel. Can be an int or a 1-element tuple.
:type stride: int or Tuple[int]
:param padding_size: Padding added to both sides of the input. Can be an int, a 1-element tuple, or a string. Can be a string {'valid', 'same'}, single number or a one-element tuple.
padding_size='valid' is the same as no padding.
padding_size='same' pads the input so the output has the same shape as the input. However, this mode doesn't support any stride values other than 1.
:type padding_size: int, Tuple[int], or str
:param dilation: The spacing between kernel elements. Can be an int or a 1-element tuple.
:type dilation: int or Tuple[int]
:param groups: Number of blocked connections from input to output channels.
:type groups: int
See merge request: Ascend/triton-ascend!1764