文件最后提交记录最后更新时间
fix(MaskAnalysis): support int1 cmp analysis Co-authored-by: luobaiqing<luobaiqing1@huawei.com> # message auto-generated for no-merge-commit merge: !1207 merge fixCmpInt1 into main fix(MaskAnalysis): support int1 cmp analysis Created-by: luobaiqing Commit-by: luobaiqing Merged-by: ascend-robot Description: 此前掩码分析在处理int1时将int1作为intScalar处理,导致掩码信息丢失,造成精度问题或者访存越界,导致diagonal_backward算子报错(该算子中,先做了一次cmp得到i1,再splat成tensor,parseIntScalar时,只添加scalar而没有stateInfo,这个splat会认为该tensor不包含有效信息。生成掩码时丢失该维度) ``` %49 = arith.cmpi slt, %42, %2 : i64 loc(#loc30) %54 = tt.splat %49 : i1 -> tensor<1x32x1xi1> loc(#loc32) %55 = arith.andi %54, %53 : tensor<1x32x1xi1> loc(#loc32) ``` 此pr针对该场景进行修复,在int1时不进入intScalar,而是获取其definingOp进行parse,并解除cmp不支持对scalar进行分析的限制。目前仅支持对int1的cmpOp做分析,在其他可能产生int1运算的op中做了fallback处理。 在正常修复掩码分析后,发现另一问题,在以上情景的基础上,当掩码维度为1时,掩码的生成过程会被优化成这样: ``` %splat_lhs = tt.splat %val1 : tensor<128xi32> %splat_rhs = tt.splat %val2 : tensor<128xi32> %cmp = arith.cmpi slt, %splat_lhs, %splat_rhs : tensor<128xi32> ``` 在tolinalg的掩码分析中同样不支持scalar的cmp。在tolinalg的掩码分析中增加scalar cmp的方法,经讨论,不如新增一个converter去处理这种情况,为此在线性化pass后增加一个converter,用于提升binary操作,转换后: ``` %cmp_scalar = arith.cmpi slt, %val1, %val2 %splat_cmp = tt.splat %cmp_scalar : tensor<128xi1> ``` 目前该converter只提升cmpi,但实际上可以增加任何的二元运算符去优化splat再运算的场景,这样也能减少ub占用。 The core Triton is a small number of people, and we receive many PRs (thank you!). To help us review your code more quickly, **if you are a new contributor (less than 3 PRs merged) we ask that you complete the following tasks and include the filled-out checklist in your PR description.** Complete the following tasks before sending your PR, and replace [ ] with [x] to indicate you have done them. - [ ] I am not making a trivial change, such as fixing a typo in a comment. - [ ] I have written a PR description following these [rules](https://cbea.ms/git-commit/#why-not-how). - [ ] I have run pre-commit run --from-ref origin/main --to-ref HEAD. - Select one of the following. - [ ] I have added tests. - /test for lit tests - /unittest for C++ tests - /python/test for end-to-end tests - [ ] This PR does not need a test because FILL THIS IN. - Select one of the following. - [ ] I have not added any lit tests. - [ ] The lit tests I have added follow these [best practices](https://mlir.llvm.org/getting_started/TestingGuide/#filecheck-best-practices), including the "tests should be minimal" section. (Usually running Python code and using the instructions it generates is not minimal.) See merge request: Ascend/triton-ascend!12074 个月前
feat(Cannonicalizer):add if converter to extract add expr from if body; support constant dense select in linalg.parseSelect Co-authored-by: luobaiqing<luobaiqing1@huawei.com> # message auto-generated for no-merge-commit merge: !1397 merge ifConverter into main feat(Cannonicalizer):add if converter to extract add expr from if body; support constant dense select in linalg.parseSelect Created-by: luobaiqing Commit-by: luobaiqing Merged-by: ascend-robot Description: 1. 增加一个对ifOp进行优化的converter,将参数迭代运算抽取到ifOp外部,简化ifOp ``` %arg = xxx %if = scf.if %cond { %thenYield = arith.addi %arg, %other scf.yield %thenYield, %xxx, ... } else { scf.yield %arg, %xxx, ... } -------> %arg = xxx %cst0 = arith.constant dense <0> %if = scf.if %cond { ... scf.yield %other, %xxx, ... } else { scf.yield %cst0, %xxx, ... } %newIfRes = arith.addi %arg, %if#0 ----> 在当前处理的特例中 %other是外部的另一个constant dense,实际上会被优化成: %arg = xxx %cst0 = arith.constant dense <0> %updateVal = arith.select %cond, %other, %cst0 %if = scf.if %cond { ... scf.yield %xxx, ... } else { scf.yield %xxx, ... } %newIfRes = arith.addi %arg, %updateVal ``` 2. 修改离散访存parseIf/select,如果几个分支结果都是scalarlike,设状态为scalarlike,否则视为unstructure,(因为这代表着他仍是连续访存,只是offset可能变化) 注:放开后遇到特殊场景,当shape的维度为1时,且在for循环内,(且是个间接访存),虽然他也是scalarlike,但在rewriteloop时,rewriteTerminator会失败,暂时规避这一场景不放开,即当select的结果shape维度均为1时,仍按照离散访存处理。后续明确原因后再放开 3. 修改linalg pass的parseSelect,支持selectOp的两个选项均为constant dense的情况,创建一个scalar select作为scalar offset See merge request: Ascend/triton-ascend!13972 个月前
fix(MaskAnalysis): support int1 cmp analysis Co-authored-by: luobaiqing<luobaiqing1@huawei.com> # message auto-generated for no-merge-commit merge: !1207 merge fixCmpInt1 into main fix(MaskAnalysis): support int1 cmp analysis Created-by: luobaiqing Commit-by: luobaiqing Merged-by: ascend-robot Description: 此前掩码分析在处理int1时将int1作为intScalar处理,导致掩码信息丢失,造成精度问题或者访存越界,导致diagonal_backward算子报错(该算子中,先做了一次cmp得到i1,再splat成tensor,parseIntScalar时,只添加scalar而没有stateInfo,这个splat会认为该tensor不包含有效信息。生成掩码时丢失该维度) ``` %49 = arith.cmpi slt, %42, %2 : i64 loc(#loc30) %54 = tt.splat %49 : i1 -> tensor<1x32x1xi1> loc(#loc32) %55 = arith.andi %54, %53 : tensor<1x32x1xi1> loc(#loc32) ``` 此pr针对该场景进行修复,在int1时不进入intScalar,而是获取其definingOp进行parse,并解除cmp不支持对scalar进行分析的限制。目前仅支持对int1的cmpOp做分析,在其他可能产生int1运算的op中做了fallback处理。 在正常修复掩码分析后,发现另一问题,在以上情景的基础上,当掩码维度为1时,掩码的生成过程会被优化成这样: ``` %splat_lhs = tt.splat %val1 : tensor<128xi32> %splat_rhs = tt.splat %val2 : tensor<128xi32> %cmp = arith.cmpi slt, %splat_lhs, %splat_rhs : tensor<128xi32> ``` 在tolinalg的掩码分析中同样不支持scalar的cmp。在tolinalg的掩码分析中增加scalar cmp的方法,经讨论,不如新增一个converter去处理这种情况,为此在线性化pass后增加一个converter,用于提升binary操作,转换后: ``` %cmp_scalar = arith.cmpi slt, %val1, %val2 %splat_cmp = tt.splat %cmp_scalar : tensor<128xi1> ``` 目前该converter只提升cmpi,但实际上可以增加任何的二元运算符去优化splat再运算的场景,这样也能减少ub占用。 The core Triton is a small number of people, and we receive many PRs (thank you!). To help us review your code more quickly, **if you are a new contributor (less than 3 PRs merged) we ask that you complete the following tasks and include the filled-out checklist in your PR description.** Complete the following tasks before sending your PR, and replace [ ] with [x] to indicate you have done them. - [ ] I am not making a trivial change, such as fixing a typo in a comment. - [ ] I have written a PR description following these [rules](https://cbea.ms/git-commit/#why-not-how). - [ ] I have run pre-commit run --from-ref origin/main --to-ref HEAD. - Select one of the following. - [ ] I have added tests. - /test for lit tests - /unittest for C++ tests - /python/test for end-to-end tests - [ ] This PR does not need a test because FILL THIS IN. - Select one of the following. - [ ] I have not added any lit tests. - [ ] The lit tests I have added follow these [best practices](https://mlir.llvm.org/getting_started/TestingGuide/#filecheck-best-practices), including the "tests should be minimal" section. (Usually running Python code and using the instructions it generates is not minimal.) See merge request: Ascend/triton-ascend!12074 个月前
feat(refactor): unify implicit permute logic Co-authored-by: candyhong<1102229410@qq.com> Co-authored-by: LH_123L<liuhuan261@huawei.com> Co-authored-by: KanuaK<zhouyihan1@huawei.com> # message auto-generated for no-merge-commit merge: !1245 merge main/feat-implicit-permute-candy-test into main feat(refactor): unify implicit permute logic Created-by: candyhong Commit-by: candyhong;KanuaK;LH_123L Merged-by: ascend-robot Description: ## Background ### Related ISSUE:[#305](https://gitcode.com/Ascend/triton-ascend/issues/305) The current implicit permute logic in Triton-Ascend (TA) has flaws that impact maintainability, hardware utilization, and scenario coverage: | Issue Category | Key Symptoms | Impact | | --------------------- | ----------------------------------------------------------------- | ----------------------------------------------- | | Dispersed Logic | No unified entry for adaptation logic | High maintenance cost, risk of logic conflicts | | Missing HW Adaptation | No differentiated processing logic for hardware types | Failed to leverage hardware capabilities(NDDMA) | | Incomplete Capability | No support for variable stride analysis/For-loop pointer analysis | Poor coverage of scenarios | ## Optimization Goals 1. **Unify Logic**: Converge all implicit permute logic to a single entry, define unified rules for hardware/scenarios. 2. **Complete Capabilities**: Support constant/variable stride analysis, For-loop pointer analysis, and unify ptr/mask analysis logic. 3. **HW Adaptation**: Implement hardware-specific/software fallback implicit permute solutions. ## Verification | Test Suite | Test Count | Main Branch Result | PR Result | Status | | -------------------- | ---------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ | ------ | | generalization(main) | 91373 | [pre-smoke-108](https://devcloud.cn-north-4.huaweicloud.com/codeci/project/1e20b309fcb34b00a0043a87e461c95a/codeci/detail/workspace/15834d5b21a14cd99ef2ea8651c54f7d/111)<br><br>81773 pass | [per-smoke-111](https://devcloud.cn-north-4.huaweicloud.com/cicd/project/1e20b309fcb34b00a0043a87e461c95a/pipeline/detail/945b705f6f1d4087a1a08d834fbc2c51/cdbf3a4922e241e8a0b1a6dadb3d5a82?v=1)<br><br>81773 pass | Pass | ## Checklist - [x] I am not making a trivial change, such as fixing a typo in a comment. - [x] I have written a PR description following these [rules](https://cbea.ms/git-commit/#why-not-how). - [x] I have run pre-commit run --from-ref origin/main --to-ref HEAD. - Select one of the following. - [x] I have added tests. - /test for lit tests - /unittest for C++ tests - /python/test for end-to-end tests - [ ] This PR does not need a test because FILL THIS IN. - Select one of the following. - [x] I have not added any lit tests. - [ ] The lit tests I have added follow these [best practices](https://mlir.llvm.org/getting_started/TestingGuide/#filecheck-best-practices), including the "tests should be minimal" section. (Usually running Python code and using the instructions it generates is not minimal.) See merge request: Ascend/triton-ascend!12453 个月前
feat(refactor): unify implicit permute logic Co-authored-by: candyhong<1102229410@qq.com> Co-authored-by: LH_123L<liuhuan261@huawei.com> Co-authored-by: KanuaK<zhouyihan1@huawei.com> # message auto-generated for no-merge-commit merge: !1245 merge main/feat-implicit-permute-candy-test into main feat(refactor): unify implicit permute logic Created-by: candyhong Commit-by: candyhong;KanuaK;LH_123L Merged-by: ascend-robot Description: ## Background ### Related ISSUE:[#305](https://gitcode.com/Ascend/triton-ascend/issues/305) The current implicit permute logic in Triton-Ascend (TA) has flaws that impact maintainability, hardware utilization, and scenario coverage: | Issue Category | Key Symptoms | Impact | | --------------------- | ----------------------------------------------------------------- | ----------------------------------------------- | | Dispersed Logic | No unified entry for adaptation logic | High maintenance cost, risk of logic conflicts | | Missing HW Adaptation | No differentiated processing logic for hardware types | Failed to leverage hardware capabilities(NDDMA) | | Incomplete Capability | No support for variable stride analysis/For-loop pointer analysis | Poor coverage of scenarios | ## Optimization Goals 1. **Unify Logic**: Converge all implicit permute logic to a single entry, define unified rules for hardware/scenarios. 2. **Complete Capabilities**: Support constant/variable stride analysis, For-loop pointer analysis, and unify ptr/mask analysis logic. 3. **HW Adaptation**: Implement hardware-specific/software fallback implicit permute solutions. ## Verification | Test Suite | Test Count | Main Branch Result | PR Result | Status | | -------------------- | ---------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ | ------ | | generalization(main) | 91373 | [pre-smoke-108](https://devcloud.cn-north-4.huaweicloud.com/codeci/project/1e20b309fcb34b00a0043a87e461c95a/codeci/detail/workspace/15834d5b21a14cd99ef2ea8651c54f7d/111)<br><br>81773 pass | [per-smoke-111](https://devcloud.cn-north-4.huaweicloud.com/cicd/project/1e20b309fcb34b00a0043a87e461c95a/pipeline/detail/945b705f6f1d4087a1a08d834fbc2c51/cdbf3a4922e241e8a0b1a6dadb3d5a82?v=1)<br><br>81773 pass | Pass | ## Checklist - [x] I am not making a trivial change, such as fixing a typo in a comment. - [x] I have written a PR description following these [rules](https://cbea.ms/git-commit/#why-not-how). - [x] I have run pre-commit run --from-ref origin/main --to-ref HEAD. - Select one of the following. - [x] I have added tests. - /test for lit tests - /unittest for C++ tests - /python/test for end-to-end tests - [ ] This PR does not need a test because FILL THIS IN. - Select one of the following. - [x] I have not added any lit tests. - [ ] The lit tests I have added follow these [best practices](https://mlir.llvm.org/getting_started/TestingGuide/#filecheck-best-practices), including the "tests should be minimal" section. (Usually running Python code and using the instructions it generates is not minimal.) See merge request: Ascend/triton-ascend!12453 个月前
Merge Triton-Ascend 425236de into release/3.5.x 2 个月前