pytorch/torch_npu/csrc/inductor · Ascend/pytorch - AtomGit

文件	最后提交记录	最后更新时间
aoti_package	[FEAT] [Inductor] Add support for sequence length bucketing in dynamic shapes Co-authored-by: luqichao<luqichao1@huawei.com> # message auto-generated for no-merge-commit merge: !28555 merge v2.7.1_shape_2dim into v2.7.1 [FEAT] [Inductor] Add support for sequence length bucketing in dynamic shapes Created-by: luqichao Commit-by: luqichao Merged-by: ascend-robot Description: 修改后全量配置参考： ``` self.shape_options = { "enable_shape_handling": True, "shape_handling_configs":[ { "type": "BATCHSIZE", # 处理的维度类型(Required) "dimensions": 0, # 该维度在tensor中的下标(BATCHSIZE默认为0)(Optional) "indices": [0, 1], # 需要处理的tensor下标(默认为所有tensor)(Optional) "value": 0.0, # padding时填充的值, 默认为0.0(Optional) "gears": [], # 自定义档位信息(Optional) "min_size": 1, # 该维度的最小大小(档位), 默认为1(Optional) "max_size": 1024, # 该维度的最大大小(档位), 默认为1024(Optional) "policy": "TIMES", # 依据min_size, max_size自动生成gears的策略, 默认为TIMES, 表示生成范围内2的整数幂档位(Optional) }, { "type": "SEQLEN", # 处理的维度类型(Required) "dimensions": [1, 1], # 该维度在tensor中的下标, 与indices一一对应, 也可以接收dimension表示所有tensor使用相同的dimension index, 优先接收dimension(SEQLEN默认为1)(Optional) "indices": [0, 1], # 需要处理的tensor下标(默认为所有tensor)(Optional) "value": 0.0, # padding时填充的值, 默认为0.0(Optional) "gears": [], # 自定义档位信息(Optional) "min_size": 1, # 该维度的最小大小(档位), 默认为1(Optional) "max_size": 1024, # 该维度的最大大小(档位), 默认为1024(Optional) "policy": "TIMES", # 依据min_size, max_size自动生成gears的策略, 默认为TIMES, 表示生成范围内2的整数幂档位(Optional) } ], "shape_handling_dict": { "trans_pre_fn": transform_pre_fn, "trans_post_fn": transform_post_fn, "re_pre_fn": recover_pre_fn, "re_post_fn": recover_post_fn, } } ``` configs可以不提供，默认只对batchsize做分档 See merge request: Ascend/pytorch!28555	4 个月前
aoti_runner	【Inductor】add aot inductor for v2.7.1 Co-authored-by: Zichun Ye<zichun.ye@huawei.com> # message auto-generated for no-merge-commit merge: merge aoti_v27 into v2.7.1 【Inductor】add aot inductor for v2.7.1 Created-by: zichun_ye Commit-by: Zichun Ye Merged-by: ascend-robot Description: <!-- Thanks for sending a pull request! --> What type of PR is this? > Uncomment only one `/kind <>` line, hit enter to put that in a new line, and remove leading whitespaces from that line: > > /kind bug > /kind task /kind feature What does this PR do / why do we need it: add aot inductor for v2.7.1 update logic to generate cpp wrapper for v2.7.1 add the whole cpp utils functions for aot inductor fix some bugs Which issue(s) this PR fixes: <!-- Automatically closes linked issue when PR is merged. Usage: `Fixes #<issue number>`, or `Fixes (paste link of issue)`. --> Fixes # Special notes for your reviewers*: See merge request: Ascend/pytorch !24853	8 个月前
aoti_runtime	[Inductor] add aoti support Co-authored-by: zhuceHW<zhuce@huawei.com> # message auto-generated for no-merge-commit merge: !32868 merge v2.7.1-26.0.0 into v2.7.1-26.0.0 [Inductor] add aoti support Created-by: zhucehw Commit-by: zhuceHW Merged-by: ascend-robot Description: # 【合入来源】 add aoti support - [ ] 需求 - [ ] 问题单 - [ ] issue/工单 - [x] 重构优化 - [ ] 资料更新 # 【修改方案】 1. add ffts check, device guard, dynamic shape support for AOTInductor, make CppWrapperNpu extends CppWrapperGpu, make fallback when cpp_wrapper meets mm/bmm/gmm, add utils_npu.h, shim_npu.h shim_npu.cpp into csrc\inductor, now AOTI works for v2.7.1 in A2\A3\A5 2. refactor triton heuristic logic, now get_heuristic will return heuristic type like community('pointwise', 'reduction' etc) 3. add support for cpp_wrapper # 【资料变更】不涉及 # 【接口变更】不涉及 # 【功能验证】 ci passes # 【CheckList】 > PR提交人对以下CheckList自检项进行全量自检，自检通过或不涉及，均修改 [ ] 为 [x] - [x] 代码注释完备，正确记录错误日志 - [x] 代码实现进行了返回值、空指针等校验 - [x] PR标题正确使用类型标签，如：feat、fix、refactor、docs、test等 - [x] PR持续集成流水线（CI）执行通过，代码检查无异常 See merge request: Ascend/pytorch!32868	1 个月前
aoti_torch	[Inductor] add aoti support Co-authored-by: zhuceHW<zhuce@huawei.com> # message auto-generated for no-merge-commit merge: !32868 merge v2.7.1-26.0.0 into v2.7.1-26.0.0 [Inductor] add aoti support Created-by: zhucehw Commit-by: zhuceHW Merged-by: ascend-robot Description: # 【合入来源】 add aoti support - [ ] 需求 - [ ] 问题单 - [ ] issue/工单 - [x] 重构优化 - [ ] 资料更新 # 【修改方案】 1. add ffts check, device guard, dynamic shape support for AOTInductor, make CppWrapperNpu extends CppWrapperGpu, make fallback when cpp_wrapper meets mm/bmm/gmm, add utils_npu.h, shim_npu.h shim_npu.cpp into csrc\inductor, now AOTI works for v2.7.1 in A2\A3\A5 2. refactor triton heuristic logic, now get_heuristic will return heuristic type like community('pointwise', 'reduction' etc) 3. add support for cpp_wrapper # 【资料变更】不涉及 # 【接口变更】不涉及 # 【功能验证】 ci passes # 【CheckList】 > PR提交人对以下CheckList自检项进行全量自检，自检通过或不涉及，均修改 [ ] 为 [x] - [x] 代码注释完备，正确记录错误日志 - [x] 代码实现进行了返回值、空指针等校验 - [x] PR标题正确使用类型标签，如：feat、fix、refactor、docs、test等 - [x] PR持续集成流水线（CI）执行通过，代码检查无异常 See merge request: Ascend/pytorch!32868	1 个月前
dvm	dvm: refactor ws alloc paths and reorder pybind_api functions Co-authored-by: huangchengnuo<huangchengnuo1@huawei.com> # message auto-generated for no-merge-commit merge: !31627 merge fix_inductor_dvm_bug into v2.7.1 dvm: refactor ws alloc paths and reorder pybind_api functions Created-by: SorryNaCN Commit-by: huangchengnuo Merged-by: ascend-robot Description: <!-- PR描述模板更新日期：20260203 --> # 【合入来源】 > <font color="red">如有社区issue，请关联issue链接</font>\ > <font color="red">请勿携带内部流程信息（需求链接、问题单、内部issue等）</font> - [ ] 需求 - [ ] 问题单 - [ ] issue/工单 - [x] 重构优化 - [ ] 资料更新 # 【修改方案】 1. 在 `torch_npu/csrc/inductor/dvm/pybind_api.cpp` 增加流捕获状态判断 `IsCurrentStreamCapturing()`，并引入两套 workspace 分配/launch 路径： `LaunchV2 + AllocWorkspaceV2`（常规路径）和 `LaunchV1 + AllocWorkspaceV1`（流捕获路径）。 2. 在 `TorchKernelPy::Call` 和 `GraphSplitKernelPy::Call` 增加 `if (!IsCurrentStreamCapturing()) ... else ...` 分支，确保流捕获场景不走 `allocate_workspace` 路径。 3. 将 `TorchKernelPy::Launch` 拆分为 `LaunchV1/LaunchV2`；将 `GraphSplitBase::Launch` 拆分为 `LaunchV1/LaunchV2`，并提供 `ExternalWsAllocator` 支持外部 workspace 指针。 4. 在 `GraphSplitKernelPy::Setup` 增加 fake allocator 路径，用于提前探测并记录 `ws_size_`。 5. 在动态 shape 路径增加保护： `DynKernelPy::Call`、`DynGraphSplitKernelPy::Call` 中新增 `TORCH_CHECK(!IsCurrentStreamCapturing(), ...)`，明确动态 shape 运行时 CodeGen/Infer 不支持流捕获。 6. 头文件 `torch_npu/csrc/inductor/dvm/pybind_api.h` 同步接口声明变更（`LaunchV1/LaunchV2`、`SetWorkspaceSize`、`GraphSplitBase::Alloc` 等）。 # 【资料变更】不涉及。 # 【接口变更】不涉及（无跨仓接口或客户可见接口变更）。 # 【功能验证】 1. 代码变更范围仅限： `torch_npu/csrc/inductor/dvm/pybind_api.cpp`、 `torch_npu/csrc/inductor/dvm/pybind_api.h`。 2. 已完成代码级差异核对（与目标提交中流捕获判断相关逻辑对齐）。 3. 本地未完成完整编译/UT（当前环境 `build` 目录缺少可用 Makefile，`cmake --build build --target torch_npu` 无法执行），后续以 CI 结果为准。 # 【CheckList】 > PR提交人对以下CheckList自检项进行全量自检，自检通过或不涉及，均修改 [ ] 为 [x] - [x] 代码注释完备，正确记录错误日志 - [x] 代码实现进行了返回值、空指针等校验 - [x] PR标题正确使用类型标签，如：feat、fix、refactor、docs、test等 - [ ] PR持续集成流水线（CI）执行通过，代码检查无异常 See merge request: Ascend/pytorch!31627	2 个月前
mlir	refactor: fix mlir compile Co-authored-by: huangchengnuo<huangchengnuo1@huawei.com> # message auto-generated for no-merge-commit merge: !30363 merge fix_mlir_compile into v2.7.1 refactor: fix mlir compile Created-by: SorryNaCN Commit-by: huangchengnuo Merged-by: ascend-robot Description: <!-- PR描述模板更新日期：20260203 --> # 【合入来源】 > (如有)请关联需求文档/issue链接 - [ ] 需求 - [ ] 问题单 - [ ] issue/工单 - [x] 重构优化 - [ ] 资料更新 # 【修改方案】 1. 移除 Python 侧动态构建 MLIR 扩展与相关打包逻辑：清理 `setup.py` 中 pybind11 扩展、libcpp_common 编译及文件拷贝路径。（`setup.py`） 2. MLIR 绑定下沉到 C++：新增 `torch_npu._C.mlir` 子模块，提供 `load_kernel_binary` 接口，替代原 `_inductor/ ascend_npu_ir/_C `绑定实现。（`torch_npu/csrc/inductor/mlir/mlir_bindings.cpp`） 3. 统一 MLIR 运行时公共代码位置：`hacl_rt.h`/`cpp_common` 迁移到 `torch_npu/csrc/inductor/mlir/`，并补齐导出与 msprof 头文件引用。（`torch_npu/csrc/inductor/mlir/cpp_common.{h,cpp}`） 4. 适配调用路径：`mlir_compiler.py` 使用 `torch_npu._C.mlir.load_kernel_binary`，同时删除 `build_ext` 初始化逻辑。（`torch_npu/_inductor/__init__.py`、`torch_npu/utils/_dynamo.py`、`torch_npu/_inductor/ ascend_npu_ir/...`） 5. 更新测试与构建脚本：移除对 `build_ext` 的 public bindings 依赖，修正 CMake 源文件列表。（`test/npu/ test_public_bindings.py`、`torch_npu/csrc/inductor/CMakeLists.txt`） # 【资料变更】不涉及 # 【接口变更】不涉及（内部绑定实现迁移，无跨仓/对外接口变更） # 【功能验证】 - 未本地执行，待CI验证（如需可补自验证截图） # 【CheckList】 - [x] 代码注释完备，正确记录错误日志 - [x] 代码实现进行了返回值、空指针等校验 - [x] PR标题正确使用类型标签，如：feat、fix、refactor、docs、test等 - [x] PR持续集成流水线（CI）执行通过，代码检查无异常 See merge request: Ascend/pytorch!30363	3 个月前
CMakeLists.txt	refactor: fix mlir compile Co-authored-by: huangchengnuo<huangchengnuo1@huawei.com> # message auto-generated for no-merge-commit merge: !30363 merge fix_mlir_compile into v2.7.1 refactor: fix mlir compile Created-by: SorryNaCN Commit-by: huangchengnuo Merged-by: ascend-robot Description: <!-- PR描述模板更新日期：20260203 --> # 【合入来源】 > (如有)请关联需求文档/issue链接 - [ ] 需求 - [ ] 问题单 - [ ] issue/工单 - [x] 重构优化 - [ ] 资料更新 # 【修改方案】 1. 移除 Python 侧动态构建 MLIR 扩展与相关打包逻辑：清理 `setup.py` 中 pybind11 扩展、libcpp_common 编译及文件拷贝路径。（`setup.py`） 2. MLIR 绑定下沉到 C++：新增 `torch_npu._C.mlir` 子模块，提供 `load_kernel_binary` 接口，替代原 `_inductor/ ascend_npu_ir/_C `绑定实现。（`torch_npu/csrc/inductor/mlir/mlir_bindings.cpp`） 3. 统一 MLIR 运行时公共代码位置：`hacl_rt.h`/`cpp_common` 迁移到 `torch_npu/csrc/inductor/mlir/`，并补齐导出与 msprof 头文件引用。（`torch_npu/csrc/inductor/mlir/cpp_common.{h,cpp}`） 4. 适配调用路径：`mlir_compiler.py` 使用 `torch_npu._C.mlir.load_kernel_binary`，同时删除 `build_ext` 初始化逻辑。（`torch_npu/_inductor/__init__.py`、`torch_npu/utils/_dynamo.py`、`torch_npu/_inductor/ ascend_npu_ir/...`） 5. 更新测试与构建脚本：移除对 `build_ext` 的 public bindings 依赖，修正 CMake 源文件列表。（`test/npu/ test_public_bindings.py`、`torch_npu/csrc/inductor/CMakeLists.txt`） # 【资料变更】不涉及 # 【接口变更】不涉及（内部绑定实现迁移，无跨仓/对外接口变更） # 【功能验证】 - 未本地执行，待CI验证（如需可补自验证截图） # 【CheckList】 - [x] 代码注释完备，正确记录错误日志 - [x] 代码实现进行了返回值、空指针等校验 - [x] PR标题正确使用类型标签，如：feat、fix、refactor、docs、test等 - [x] PR持续集成流水线（CI）执行通过，代码检查无异常 See merge request: Ascend/pytorch!30363	3 个月前
array_ref_impl.h	【Inductor】add aot inductor for v2.7.1 Co-authored-by: Zichun Ye<zichun.ye@huawei.com> # message auto-generated for no-merge-commit merge: merge aoti_v27 into v2.7.1 【Inductor】add aot inductor for v2.7.1 Created-by: zichun_ye Commit-by: Zichun Ye Merged-by: ascend-robot Description: <!-- Thanks for sending a pull request! --> What type of PR is this? > Uncomment only one `/kind <>` line, hit enter to put that in a new line, and remove leading whitespaces from that line: > > /kind bug > /kind task /kind feature What does this PR do / why do we need it: add aot inductor for v2.7.1 update logic to generate cpp wrapper for v2.7.1 add the whole cpp utils functions for aot inductor fix some bugs Which issue(s) this PR fixes: <!-- Automatically closes linked issue when PR is merged. Usage: `Fixes #<issue number>`, or `Fixes (paste link of issue)`. --> Fixes # Special notes for your reviewers*: See merge request: Ascend/pytorch !24853	8 个月前
inductor_ops.h	【Inductor】add aot inductor for v2.7.1 Co-authored-by: Zichun Ye<zichun.ye@huawei.com> # message auto-generated for no-merge-commit merge: merge aoti_v27 into v2.7.1 【Inductor】add aot inductor for v2.7.1 Created-by: zichun_ye Commit-by: Zichun Ye Merged-by: ascend-robot Description: <!-- Thanks for sending a pull request! --> What type of PR is this? > Uncomment only one `/kind <>` line, hit enter to put that in a new line, and remove leading whitespaces from that line: > > /kind bug > /kind task /kind feature What does this PR do / why do we need it: add aot inductor for v2.7.1 update logic to generate cpp wrapper for v2.7.1 add the whole cpp utils functions for aot inductor fix some bugs Which issue(s) this PR fixes: <!-- Automatically closes linked issue when PR is merged. Usage: `Fixes #<issue number>`, or `Fixes (paste link of issue)`. --> Fixes # Special notes for your reviewers*: See merge request: Ascend/pytorch !24853	8 个月前