triton-ascend/third_party/ascend · Ascend/triton-ascend - AtomGit

文件	最后提交记录	最后更新时间
backend	!1803 merge main_cann into main feat: Triggering operator recompilation when the CANN version Created-by: LH_123L Commit-by: liuhuan Merged-by: ascend-robot Description: <!--- The core Triton is a small number of people, and we receive many PRs (thank you!). To help us review your code more quickly, if you are a new contributor (less than 3 PRs merged) we ask that you complete the following tasks and include the filled-out checklist in your PR description. Complete the following tasks before sending your PR, and replace `[ ]` with `[x]` to indicate you have done them. --> # New contributor declaration - [ ] I am not making a trivial change, such as fixing a typo in a comment. - [ ] I have written a PR description following these [rules](https://cbea.ms/git-commit/#why-not-how). - [ ] I have run `pre-commit run --from-ref origin/main --to-ref HEAD`. - Select one of the following. - [ ] I have added tests. - `/test` for `lit` tests - `/unittest` for C++ tests - `/python/test` for end-to-end tests - [ ] This PR does not need a test because `FILL THIS IN`. - Select one of the following. - [ ] I have not added any `lit` tests. - [ ] The `lit` tests I have added follow these [best practices](https://mlir.llvm.org/getting_started/TestingGuide/#filecheck-best-practices), including the "tests should be minimal" section. (Usually running Python code and using the instructions it generates is not minimal.) See merge request: Ascend/triton-ascend!1803	20 天前
include	!1790 merge sn into main fix: Enhance offset analysis and change the tensor of the all-one type in the select op to continuous memory access Created-by: wutianyao Commit-by: wutianyao Merged-by: ascend-robot Description: bugfix1: The restriction that forcibly treats tensor types with all 1（<1x1x1x...>） in the select op during pointer analysis as non-contiguous has been removed. Because under the original restriction, it would lead to a situation where, when processing tensor<1x> types, they would be expanded into discrete memory access.However, when the MLIR's built-in forloop optimization（SimplifyTrivialLoops） recognizing that the loop count is 1, it would fall back. Yet, it did not remove the discrete memory access label applied to the load op during the discrete memory access optimization（{DiscreteMemAccess}）. When the linalg pass later attempts to transform the load, if it detects that the load has a discrete memory access label, it will read the init args of the outer forloop of the load. Where the discrete memory access forloop has already been fall back by SimplifyTrivialLoops , this can lead to a series of errors (for example, if the discrete memory access forloop has been erased but the kernel itself still contains a forloop, reading the init args as empty can cause a core dump). bugfix2: When recognize a indextensor in forloop init args in the rewriteforloop, it will be convert to offset + stride，but in some scene, for example，the args is recognize as indextensor, then in blockptranalysis, it will be analysised as scalar, and in rewriteTerminator it need the stride must equal one, so an assert error occurs. bugfix3: When the mask is composed of splats, it is identified as a continuous mask in the discrete mask analysis pass. However, when it enters the Triton to Linalg pass, if all dimensions are 1, the splat op is converted into an insert op. In this case, the mask analysis cannot identify the insert op, and the mask is analyzed as discontinuous. As a result, the problem occurs. bugfix4: In the use analysis, when there is indirect memory access, the 1st load op is initially marked as a meta use. In the Post-process, this situation is handled by identifying and marking the instruction chain related to indirect memory access, and then re-marking it as mixuse to ensure it is not eliminated in subsequent conversion stages. If an op appears in a computation chain involving a set of indirect memory accesses, such as load(1st) -> computeOp -> load(2nd), and this op has been used through assert or print, it will be marked as Mixuse. In this case, the op will be clone into a mixuse op and a metause op before the Post-process phase. The mixuse op is used for assert, and the metause op is used for 2nd loads. However, since the op was initially marked as Mixuse, the 1st load op is also marked as Mixuse, thus skipping the post-process. Since the split metause op is used for 2nd loads, its elimination can cause a series of issues. Currently, a temporary modified for this situation is to disable cloning of select ops. <!--- The core Triton is a small number of people, and we receive many PRs (thank you!). To help us review your code more quickly, if you are a new contributor (less than 3 PRs merged) we ask that you complete the following tasks and include the filled-out checklist in your PR description. Complete the following tasks before sending your PR, and replace `[ ]` with `[x]` to indicate you have done them. --> # New contributor declaration - [ ] I am not making a trivial change, such as fixing a typo in a comment. - [ ] I have written a PR description following these [rules](https://cbea.ms/git-commit/#why-not-how). - [ ] I have run `pre-commit run --from-ref origin/main --to-ref HEAD`. - Select one of the following. - [ ] I have added tests. - `/test` for `lit` tests - `/unittest` for C++ tests - `/python/test` for end-to-end tests - [ ] This PR does not need a test because `FILL THIS IN`. - Select one of the following. - [ ] I have not added any `lit` tests. - [ ] The `lit` tests I have added follow these [best practices](https://mlir.llvm.org/getting_started/TestingGuide/#filecheck-best-practices), including the "tests should be minimal" section. (Usually running Python code and using the instructions it generates is not minimal.) See merge request: Ascend/triton-ascend!1790	19 天前
language	!1764 merge conv1d_main into main feat(conv1d) : add conv1d op Created-by: gymgit1 Commit-by: gymgit1 Merged-by: ascend-robot Description: 添加conv1d op 位置： third_party/ascend/language/cann/extension 参考torch的conv1d接口，ta侧做透传，进行参数解析，将tl.conv1d转换为hfusion.conv1d，torch接口如图（ta侧删去padding_mode）： ![image.png](https://raw.gitcode.com/user-images/assets/7623225/225c2a7e-6163-4067-bf09-391435bb9bf4/image.png 'image.png') 具体更改： core.py 注册&参数解析 semantic.py 语义逻辑 triton_ascend.cc pybind11绑定 TritonAscendOps.td 定义Conv1dOp及其属性 TritonAscendOps.cpp 校验&形状推导 TritonToHFusion.cpp 注册Conversion Patten&提取参数&创建hfusion::Conv1DOp 参数解释： :param input: Input tensor of shape (N, C_in, L_in) or (C_in, L_in). N is a batch size, C denotes a number of channels, L is a length of signal sequence. :type input: tensor :param weight: Weight tensor of shape (C_out, C_in // groups, kernel_size). :type weight: tensor :param bias: Bias tensor of shape (C_out) or None. Default: None. :type bias: tensor or None :param stride: The stride of the convolution kernel. Can be an int or a 1-element tuple. :type stride: int or Tuple[int] :param padding_size: Padding added to both sides of the input. Can be an int, a 1-element tuple, or a string. Can be a string {'valid', 'same'}, single number or a one-element tuple. `padding_size='valid'` is the same as no padding. `padding_size='same'` pads the input so the output has the same shape as the input. However, this mode doesn't support any stride values other than 1. :type padding_size: int, Tuple[int], or str :param dilation: The spacing between kernel elements. Can be an int or a 1-element tuple. :type dilation: int or Tuple[int] :param groups: Number of blocked connections from input to output channels. :type groups: int See merge request: Ascend/triton-ascend!1764	20 天前
lib	!1790 merge sn into main fix: Enhance offset analysis and change the tensor of the all-one type in the select op to continuous memory access Created-by: wutianyao Commit-by: wutianyao Merged-by: ascend-robot Description: bugfix1: The restriction that forcibly treats tensor types with all 1（<1x1x1x...>） in the select op during pointer analysis as non-contiguous has been removed. Because under the original restriction, it would lead to a situation where, when processing tensor<1x> types, they would be expanded into discrete memory access.However, when the MLIR's built-in forloop optimization（SimplifyTrivialLoops） recognizing that the loop count is 1, it would fall back. Yet, it did not remove the discrete memory access label applied to the load op during the discrete memory access optimization（{DiscreteMemAccess}）. When the linalg pass later attempts to transform the load, if it detects that the load has a discrete memory access label, it will read the init args of the outer forloop of the load. Where the discrete memory access forloop has already been fall back by SimplifyTrivialLoops , this can lead to a series of errors (for example, if the discrete memory access forloop has been erased but the kernel itself still contains a forloop, reading the init args as empty can cause a core dump). bugfix2: When recognize a indextensor in forloop init args in the rewriteforloop, it will be convert to offset + stride，but in some scene, for example，the args is recognize as indextensor, then in blockptranalysis, it will be analysised as scalar, and in rewriteTerminator it need the stride must equal one, so an assert error occurs. bugfix3: When the mask is composed of splats, it is identified as a continuous mask in the discrete mask analysis pass. However, when it enters the Triton to Linalg pass, if all dimensions are 1, the splat op is converted into an insert op. In this case, the mask analysis cannot identify the insert op, and the mask is analyzed as discontinuous. As a result, the problem occurs. bugfix4: In the use analysis, when there is indirect memory access, the 1st load op is initially marked as a meta use. In the Post-process, this situation is handled by identifying and marking the instruction chain related to indirect memory access, and then re-marking it as mixuse to ensure it is not eliminated in subsequent conversion stages. If an op appears in a computation chain involving a set of indirect memory accesses, such as load(1st) -> computeOp -> load(2nd), and this op has been used through assert or print, it will be marked as Mixuse. In this case, the op will be clone into a mixuse op and a metause op before the Post-process phase. The mixuse op is used for assert, and the metause op is used for 2nd loads. However, since the op was initially marked as Mixuse, the 1st load op is also marked as Mixuse, thus skipping the post-process. Since the split metause op is used for 2nd loads, its elimination can cause a series of issues. Currently, a temporary modified for this situation is to disable cloning of select ops. <!--- The core Triton is a small number of people, and we receive many PRs (thank you!). To help us review your code more quickly, if you are a new contributor (less than 3 PRs merged) we ask that you complete the following tasks and include the filled-out checklist in your PR description. Complete the following tasks before sending your PR, and replace `[ ]` with `[x]` to indicate you have done them. --> # New contributor declaration - [ ] I am not making a trivial change, such as fixing a typo in a comment. - [ ] I have written a PR description following these [rules](https://cbea.ms/git-commit/#why-not-how). - [ ] I have run `pre-commit run --from-ref origin/main --to-ref HEAD`. - Select one of the following. - [ ] I have added tests. - `/test` for `lit` tests - `/unittest` for C++ tests - `/python/test` for end-to-end tests - [ ] This PR does not need a test because `FILL THIS IN`. - Select one of the following. - [ ] I have not added any `lit` tests. - [ ] The `lit` tests I have added follow these [best practices](https://mlir.llvm.org/getting_started/TestingGuide/#filecheck-best-practices), including the "tests should be minimal" section. (Usually running Python code and using the instructions it generates is not minimal.) See merge request: Ascend/triton-ascend!1790	19 天前
llvm_patch	feat: update llvm hash and upload fad3272.patch Co-authored-by: candyhong<1102229410@qq.com> # message auto-generated for no-merge-commit merge: !60 merge release/3.5.x-upgrade-candy-dev into release/3.5.x-upgrade feat: update llvm hash and upload fad3272.patch Created-by: candyhong Commit-by: candyhong Merged-by: zhuxuejie Description: ## 描述 <!--在这里详细描述你的改动，包括改动的原因和所采取的方法。--> ## checklist <!-- [x] 表示选中 --> - [ ] 是否通过本地IDE对代码进行静态检查 - [ ] 是否通过本地IDE对代码进行格式化处理 - [ ] 是否进行空指针校验 - [ ] 是否进行返回值校验 - [ ] 是否正确释放new/malloc申请的内存 - [ ] 是否充分考虑接口的异常场景 - [ ] 是否正确记录错误日志 See merge request: jeshd/triton-ascend!60	1 个月前
tutorials	Fit 06-fused-attention.py for the lastest triton-ascend 当前最新版本的triton-ascend版本已经不再支持triton.language.extra.cann.extension>，示例代码中import triton.language.extra.cann.extension as extension会报错解决方案：移除import triton.language.extra.cann.extension as extension 替换extension.extract_slice和extension.insert_slice为tl.extract_slice让示例脚本可以正常运行	27 天前
unittest	!1836 merge all_ut into main feat(ssbuf): add ut for AddControlFlowCondition pass Created-by: ma_journey Commit-by: m-everglow Merged-by: ascend-robot Description: 动态CV控制：添加AddControlFlowCondition控制流部分的UT <!--- The core Triton is a small number of people, and we receive many PRs (thank you!). To help us review your code more quickly, if you are a new contributor (less than 3 PRs merged) we ask that you complete the following tasks and include the filled-out checklist in your PR description. Complete the following tasks before sending your PR, and replace `[ ]` with `[x]` to indicate you have done them. --> # New contributor declaration - [ ] I am not making a trivial change, such as fixing a typo in a comment. - [ ] I have written a PR description following these [rules](https://cbea.ms/git-commit/#why-not-how). - [ ] I have run `pre-commit run --from-ref origin/main --to-ref HEAD`. - Select one of the following. - [ ] I have added tests. - `/test` for `lit` tests - `/unittest` for C++ tests - `/python/test` for end-to-end tests - [ ] This PR does not need a test because `FILL THIS IN`. - Select one of the following. - [ ] I have not added any `lit` tests. - [ ] The `lit` tests I have added follow these [best practices](https://mlir.llvm.org/getting_started/TestingGuide/#filecheck-best-practices), including the "tests should be minimal" section. (Usually running Python code and using the instructions it generates is not minimal.) See merge request: Ascend/triton-ascend!1836	19 天前
AscendNPU-IR @de76a453	feature: Implement for dot scale fp4	1 个月前
.gitignore	change triton-ascend to fork mode Co-authored-by: zhang-chunli01<zhangchunli19@huawei.com> Co-authored-by: Xuan Peng<pengxuan9@huawei.com> Co-authored-by: luobaiqing<luobaiqing1@huawei.com> Co-authored-by: zhuxuejie<zhuxuejie8@huawei.com> Co-authored-by: candyhong<1102229410@qq.com> Co-authored-by: wutianyao<wutianyao1@huawei.com> # message auto-generated for no-merge-commit merge: !1000 merge out-of-tree-refactor into main change triton-ascend to fork mode Created-by: zhang-chunli01 Commit-by: candyhong;luobaiqing;zhang-chunli01;zhuxuejie;wutianyao;Xuan Peng Merged-by: ascend-robot Description: ![image.png](https://raw.gitcode.com/user-images/assets/7623225/d78ef132-1952-4d6e-8d0e-420bd62d0332/image.png 'image.png') See merge request: Ascend/triton-ascend!1000	5 个月前
CMakeLists.txt	ci(hitest_cov):Added hitest cov tool Co-authored-by: wangzhanpeng5<wangzhanpeng5@huawei.com> # message auto-generated for no-merge-commit merge: !1406 merge 0318_main_hitest into main ci(hitest_cov):Added hitest cov tool Created-by: wangzhanpeng5 Commit-by: wangzhanpeng5 Merged-by: ascend-robot Description: There are currently two ways to obtain `C++` code coverage. 1.HITEST To enable the hitest coverage tool, you need to set the environment variable `TRITON_ENABLE_COVERAGE_HITEST=1` 2.llvm-cov To enable llvm-cov coverage tool, you need to pass the argument `TRITON_APPEND_CMAKE_ARGS="-DTRITON_ENABLE_COVERAGE_LLVM_COV=ON"` during compilation. See merge request: Ascend/triton-ascend!1406	2 个月前
ascend_ir.cc	feat: Support tensor for fixpipe in affinity	1 个月前
isin_by_search_kernel.mlir	feat: Enhance offset analysis and change the tensor of the all-one type in the select op to continuous memory access	22 天前
triton_ascend.cc	!1764 merge conv1d_main into main feat(conv1d) : add conv1d op Created-by: gymgit1 Commit-by: gymgit1 Merged-by: ascend-robot Description: 添加conv1d op 位置： third_party/ascend/language/cann/extension 参考torch的conv1d接口，ta侧做透传，进行参数解析，将tl.conv1d转换为hfusion.conv1d，torch接口如图（ta侧删去padding_mode）： ![image.png](https://raw.gitcode.com/user-images/assets/7623225/225c2a7e-6163-4067-bf09-391435bb9bf4/image.png 'image.png') 具体更改： core.py 注册&参数解析 semantic.py 语义逻辑 triton_ascend.cc pybind11绑定 TritonAscendOps.td 定义Conv1dOp及其属性 TritonAscendOps.cpp 校验&形状推导 TritonToHFusion.cpp 注册Conversion Patten&提取参数&创建hfusion::Conv1DOp 参数解释： :param input: Input tensor of shape (N, C_in, L_in) or (C_in, L_in). N is a batch size, C denotes a number of channels, L is a length of signal sequence. :type input: tensor :param weight: Weight tensor of shape (C_out, C_in // groups, kernel_size). :type weight: tensor :param bias: Bias tensor of shape (C_out) or None. Default: None. :type bias: tensor or None :param stride: The stride of the convolution kernel. Can be an int or a 1-element tuple. :type stride: int or Tuple[int] :param padding_size: Padding added to both sides of the input. Can be an int, a 1-element tuple, or a string. Can be a string {'valid', 'same'}, single number or a one-element tuple. `padding_size='valid'` is the same as no padding. `padding_size='same'` pads the input so the output has the same shape as the input. However, this mode doesn't support any stride values other than 1. :type padding_size: int, Tuple[int], or str :param dilation: The spacing between kernel elements. Can be an int or a 1-element tuple. :type dilation: int or Tuple[int] :param groups: Number of blocked connections from input to output channels. :type groups: int See merge request: Ascend/triton-ascend!1764	20 天前