pytorch/torch_npu/csrc · Ascend/pytorch - AtomGit

ascend-robot[fix]修改接口 aclrtMallocHostWithCfg 约束的驱动版本号

文件	最后提交记录	最后更新时间
afd	!24249 add module afd Merge pull request !24249 from 李宁/v2.7.1	8 个月前
aten	fix: fix from blob bug Co-authored-by: luochao60<luochao60@huawei.com> # message auto-generated for no-merge-commit merge: !33192 merge pta_fix_from_blob_20260212_v2.7.1-26.0.0 into v2.7.1-26.0.0 fix: fix from blob bug Created-by: luochao60 Commit-by: luochao60 Merged-by: ascend-robot Description: <!-- PR描述模板更新日期：20260203 --> # 【合入来源】 > (如有)请关联需求文档/issue链接 - [ ] 需求 - [x] 问题单 - [ ] issue/工单 - [ ] 重构优化 - [ ] 资料更新 # 【修改方案】 > 请描述修改内容的具体实现，涉及哪些组件之间进行交互，可以用1、2、3、...进行罗列 > 如果是需求或者重构类的PR，需要补充详细设计文档（说明上下游组件关系、时序图、类图、DFX能力等内容） 1. 修复 `torch_npu/csrc/aten/common/from_blob.cpp` 中 `TensorMaker::computeStorageSize()` 计算 storage size 时对 `storage_offset_` 的处理 bug：原代码将 `storage_offset_`（按元素数计）直接累加到字节大小上，缺少 `* itemsize`，导致非 float32 类型或带偏移的 tensor 计算出的 storage size 偏小，from_blob 创建的 tensor storage 不足。修复后两处分支均改为 `storage_size += storage_offset_.value() * itemsize;`。 2. 修复 `torch_npu/csrc/npu/Module.cpp` 中 `_weak_ref_tensor` 实现：原实现通过 `t.data_ptr()` + `t.sizes()` + `t.strides()` 调用 `from_blob` 构造新 tensor，丢失了原 tensor 的 `storage_offset` 信息，且当原 tensor 是 view（带 offset 或非平凡 stride）时，新 tensor 的 storage 会被错误地按视图形状重新计算，与原 storage 不一致。修复方式：基于原 tensor 的完整 `storage().mutable_data()` 与 `storage().nbytes() / element_size()` 调用 `from_blob` 构造新 tensor（覆盖完整 storage），随后通过 `set_sizes_and_strides` 与 `set_storage_offset` 还原原 tensor 的视图信息，确保弱引用 tensor 的 storage、sizes、strides、offset 与原 tensor 完全一致。 3. 测试用例重构与新增： - 将原本散落在 `test/cpp_extensions/extension.cpp` 中的 from_blob 相关测试（`check_from_blob`/`check_from_blob_strides`/`check_from_blob_delete`）迁移到独立的 `test/cpp_extensions/test_from_blob.cpp`，按 `at_npu::native::from_blob` 接口能力分门别类组织。 - `test/cpp_extensions/setup.py` 注册新的扩展模块 `torch_test_cpp_extension.npu_from_blob`。 - `test/cpp_extensions/test/test_cpp_extensions_aot.py` 新增 `TestFromBlob` 测试类，覆盖 basic / deleter / strides / storage_offset / storage_offset_2d / storage_offset_dtype / storage_offset_contiguous / non_owning / clone 等场景；`test_storage_sizes` 增加 `@SupportedDevices(['Ascend910B', 'Ascend910C'])` 限制。 - `test/npu/test_npu_format.py` 新增 `test_weak_ref_tensor_with_storage_offset` 用例，构造带非平凡 stride 与 storage_offset 的 view，验证 `_weak_ref_tensor` 返回的 tensor 在 size、stride、storage_offset、storage().nbytes() 与数值上都与原 tensor 一致。 # 【资料变更】 > 请确认是否涉及资料变更。如涉及，需要在PR中体现，并简要说明修改内容。如不涉及，需填写"不涉及" 不涉及 # 【接口变更】 > 请确认是否涉及跨代码仓或者客户面可见的接口变更。如涉及，需要详细说明接口以及对应的变更内容，同时需要在资料中体现。如不涉及，需填写"不涉及" 不涉及 # 【功能验证】 > 说明测试场景，测试方法。如果本次测试方式与常规单元测试不同，请详细说明您的测试步骤 > 新增/变更内容是否已新增/适配UT测试用例看护，并补充测试自验证截图 1. C++ 扩展用例 `test/cpp_extensions/test/test_cpp_extensions_aot.py::TestFromBlob`：覆盖 `at_npu::native::from_blob` 在 basic、自定义 deleter、显式 strides、带 storage_offset、二维带 offset、不同 dtype、contiguous 标志、non-owning 语义、clone 后数据正确性等场景，验证修复后 storage 计算正确。 2. Python 用例 `test/npu/test_npu_format.py::TestNPUFormat::test_weak_ref_tensor_with_storage_offset`：构造 `view_shape=[2,1,8,64]`、`view_strides=[1536,0,192,1]`、`view_offset=128` 的 strided view，验证 `_weak_ref_tensor` 保持 size/stride/offset/storage 字节数一致且数值相等。 3. UT 已随 PR 一同提交，本地自验证通过。 # 【CheckList】 > PR提交人对以下CheckList自检项进行全量自检，自检通过或不涉及，均修改 [ ] 为 [x] - [x] 代码注释完备，正确记录错误日志 - [x] 代码实现进行了返回值、空指针等校验 - [x] PR标题正确使用类型标签，如：feat、fix、refactor、docs、test等 - [x] PR持续集成流水线（CI）执行通过，代码检查无异常 See merge request: Ascend/pytorch!33192	1 个月前
core	[fix]修改接口`aclrtMallocHostWithCfg`约束的驱动版本号 Co-authored-by: geyi<geyi2@huawei.com> # message auto-generated for no-merge-commit merge: !34675 merge v2.7.1-26.0.0_fix_hdk into v2.7.1-26.0.0 [fix]修改接口`aclrtMallocHostWithCfg`约束的驱动版本号 Created-by: gleaming-spark Commit-by: geyi Merged-by: ascend-robot Description: <!-- PR描述模板更新日期：20260203 --> # 【合入来源】 > <font color="red">如有社区issue，请关联issue链接</font>\ > <font color="red">请勿携带内部流程信息（需求链接、问题单、内部issue等）</font> - [ ] 需求 - [x] 问题单 - [ ] issue/工单 - [ ] 重构优化 - [ ] 资料更新 # 【修改方案】 > 请描述修改内容的具体实现，涉及哪些组件之间进行交互，可以用1、2、3、...进行罗列\ > 如果是需求或者重构类的PR，需要补充详细设计文档（说明上下游组件关系、时序图、类图、DFX能力等内容）接口存在性判断增加对驱动版本的判断 # 【资料变更】 > 请确认是否涉及资料变更。如涉及，需要在PR中体现，并简要说明修改内容。如不涉及，需填写“不涉及” https://gitcode.com/Ascend/op-plugin/pull/4870 # 【接口变更】 > 请确认是否涉及跨代码仓或者客户面可见的接口变更。如涉及，需要详细说明接口以及对应的变更内容，同时需要在资料中体现。如不涉及，需填写“不涉及” 不涉及 # 【功能验证】 > 说明测试场景，测试方法。如果本次测试方式与常规单元测试不同，请详细说明您的测试步骤\ > 新增/变更内容是否已新增/适配UT测试用例看护，并补充测试自验证截图不涉及 # 【CheckList】 > PR提交人对以下CheckList自检项进行全量自检，自检通过或不涉及，均修改 [ ] 为 [x] - [x] 代码注释完备，正确记录错误日志 - [x] 代码实现进行了返回值、空指针等校验 - [x] PR标题正确使用类型标签，如：feat、fix、refactor、docs、test等 - [x] PR持续集成流水线（CI）执行通过，代码检查无异常 See merge request: Ascend/pytorch!34675	1 个月前
custom_dtype	Add GetAclDataTypeItemSize Co-authored-by: chuboning<chuboning1@huawei.com> # message auto-generated for no-merge-commit merge: !26902 merge v2.7.1 into v2.7.1 Add GetAclDataTypeItemSize Created-by: chuboning Commit-by: chuboning Merged-by: ascend-robot Description: <!-- Thanks for sending a pull request! --> What type of PR is this? > Uncomment only one `/kind <>` line, hit enter to put that in a new line, and remove leading whitespaces from that line: > > /kind bug > /kind task > /kind feature What does this PR do / why do we need it: Which issue(s) this PR fixes: <!-- Automatically closes linked issue when PR is merged. Usage: `Fixes #<issue number>`, or `Fixes (paste link of issue)`. --> Fixes # Special notes for your reviewers*: See merge request: Ascend/pytorch!26902	6 个月前
distributed	PREMUL_SUM dtype constraints Co-authored-by: jizewei<jizewei@huawei.com> # message auto-generated for no-merge-commit merge: !34510 merge v2.7.1-26.0.0_fix_premul_sum into v2.7.1-26.0.0 PREMUL_SUM dtype constraints Created-by: jizewei Commit-by: jizewei Merged-by: ascend-robot Description: <!-- PR描述模板更新日期：20260203 --> # 【合入来源】 > <font color="red">如有社区issue，请关联issue链接</font>\ > <font color="red">请勿携带内部流程信息（需求链接、问题单、内部issue等）</font> - [ ] 需求 - [x] 问题单 - [ ] issue/工单 - [ ] 重构优化 - [ ] 资料更新 # 【修改方案】 > 请描述修改内容的具体实现，涉及哪些组件之间进行交互，可以用1、2、3、...进行罗列\ > 如果是需求或者重构类的PR，需要补充详细设计文档（说明上下游组件关系、时序图、类图、DFX能力等内容） 1. aclnnInplaceMuls对齐cuda，不支持int32 * float，修改不合理用例 2. PREMUL_SUM增加dtype校验，对齐NCCL # 【资料变更】 > 请确认是否涉及资料变更。如涉及，需要在PR中体现，并简要说明修改内容。如不涉及，需填写“不涉及” # 【接口变更】 > 请确认是否涉及跨代码仓或者客户面可见的接口变更。如涉及，需要详细说明接口以及对应的变更内容，同时需要在资料中体现。如不涉及，需填写“不涉及” # 【功能验证】 > 说明测试场景，测试方法。如果本次测试方式与常规单元测试不同，请详细说明您的测试步骤\ > 新增/变更内容是否已新增/适配UT测试用例看护，并补充测试自验证截图现有UT用例看护 ![image.png](https://raw.gitcode.com/user-images/assets/7404318/91e5b9bf-0baf-4703-a234-94bd7a66b447/image.png 'image.png') # 【CheckList】 > PR提交人对以下CheckList自检项进行全量自检，自检通过或不涉及，均修改 [ ] 为 [x] - [ ] 代码注释完备，正确记录错误日志 - [ ] 代码实现进行了返回值、空指针等校验 - [ ] PR标题正确使用类型标签，如：feat、fix、refactor、docs、test等 - [ ] PR持续集成流水线（CI）执行通过，代码检查无异常 See merge request: Ascend/pytorch!34510	1 个月前
flopcount	!24215 Modify the calculation logic Merge pull request !24215 from 郭光浩/v2.7.1	9 个月前
framework	[fix] add USE_FP32_ADD for cube_math_type=4 Co-authored-by: adelaideliu<adelaideliu@163.com> # message auto-generated for no-merge-commit merge: !34037 merge v2.7.1_cube_4_26 into v2.7.1-26.0.0 [fix] add USE_FP32_ADD for cube_math_type=4 Created-by: adelaideliu Commit-by: adelaideliu Merged-by: ascend-robot Description: <!-- PR描述模板更新日期：20260203 --> # 【合入来源】 > <font color="red">如有社区issue，请关联issue链接</font>\ > <font color="red">请勿携带内部流程信息（需求链接、问题单、内部issue等）</font> - [ ] 需求 - [x] 问题单 - [ ] issue/工单 - [ ] 重构优化 - [ ] 资料更新 # 【修改方案】因为需求变化，算子端cube_math_type=4功能有改动，现有枚举值名称不符合语义；同时防止已有用户在使用原cube_math_type=4的枚举值，新增cube_math_type=4的枚举值 # 【资料变更】涉及，在另一个pr里提交了修改 ![image.png](https://raw.gitcode.com/user-images/assets/7404318/60dd1693-95fa-4223-8dad-ae49b8859f39/image.png 'image.png') # 【接口变更】不涉及 # 【功能验证】 ![image.png](https://raw.gitcode.com/user-images/assets/7404318/0a4e67a2-cf62-4ec5-91cd-1a07baa538bc/image.png 'image.png') ![image.png](https://raw.gitcode.com/user-images/assets/7404318/1671df01-e2be-4c5f-8feb-24ebca2da49f/image.png 'image.png') # 【CheckList】 > PR提交人对以下CheckList自检项进行全量自检，自检通过或不涉及，均修改 [ ] 为 [x] - [ ] 代码注释完备，正确记录错误日志 - [ ] 代码实现进行了返回值、空指针等校验 - [ ] PR标题正确使用类型标签，如：feat、fix、refactor、docs、test等 - [ ] PR持续集成流水线（CI）执行通过，代码检查无异常 See merge request: Ascend/pytorch!34037	1 个月前
include	!9021 add ops header files Merge pull request !9021 from 赖长铃/v23_add_ops_head	2 年前
inductor	[Inductor] add aoti support Co-authored-by: zhuceHW<zhuce@huawei.com> # message auto-generated for no-merge-commit merge: !32868 merge v2.7.1-26.0.0 into v2.7.1-26.0.0 [Inductor] add aoti support Created-by: zhucehw Commit-by: zhuceHW Merged-by: ascend-robot Description: # 【合入来源】 add aoti support - [ ] 需求 - [ ] 问题单 - [ ] issue/工单 - [x] 重构优化 - [ ] 资料更新 # 【修改方案】 1. add ffts check, device guard, dynamic shape support for AOTInductor, make CppWrapperNpu extends CppWrapperGpu, make fallback when cpp_wrapper meets mm/bmm/gmm, add utils_npu.h, shim_npu.h shim_npu.cpp into csrc\inductor, now AOTI works for v2.7.1 in A2\A3\A5 2. refactor triton heuristic logic, now get_heuristic will return heuristic type like community('pointwise', 'reduction' etc) 3. add support for cpp_wrapper # 【资料变更】不涉及 # 【接口变更】不涉及 # 【功能验证】 ci passes # 【CheckList】 > PR提交人对以下CheckList自检项进行全量自检，自检通过或不涉及，均修改 [ ] 为 [x] - [x] 代码注释完备，正确记录错误日志 - [x] 代码实现进行了返回值、空指针等校验 - [x] PR标题正确使用类型标签，如：feat、fix、refactor、docs、test等 - [x] PR持续集成流水线（CI）执行通过，代码检查无异常 See merge request: Ascend/pytorch!32868	1 个月前
ipc	Event supports cross-process and cross-device (IPC event) Co-authored-by: liujunzhu<liujunzhu@huawei.com> # message auto-generated for no-merge-commit merge: !28520 merge v2.7.1 into v2.7.1 Event supports cross-process and cross-device (IPC event) Created-by: liujunzhu Commit-by: liujunzhu Merged-by: ascend-robot Description: <!-- Thanks for sending a pull request! --> What type of PR is this? > Uncomment only one `/kind <>` line, hit enter to put that in a new line, and remove leading whitespaces from that line: > > /kind bug > /kind task /kind feature What does this PR do / why do we need it: event能力对齐cuda，支持跨进程与跨设备使用。 CUDA跨进程共享内存和跨设备拷贝内存时使用Event进行同步，而torch npu使用SynchronizeStream进行同步，希望torch_npu支持在跨设备和跨内存的场景也使用Event进行同步，提升整体性能。另外，需支持通过Python接口在进程间传递Event对象或Event handle并使用该Event进行进程间的同步。相关场景如下： 1、跨进程使用Event：进程间传递interprocess=True的Event对象、Event的IPC Handle或torch.multiprocessing.reductions.reduce_event(event)的结果。 2、跨进程共享NPU内存：包括进程间通过参数或队列传递Tensor、进程间通过torch.multiprocessing.reductions.reduce_tensor()传递Tensor、进程间通过_share_npu_传递Tensor。 3、跨设备拷贝NPU内存：调用Tensor.to()或Tensor.copy_()接口。不支持跨设备或跨进程使用interprocess=False的Event。 Which issue(s) this PR fixes: <!-- Automatically closes linked issue when PR is merged. Usage: `Fixes #<issue number>`, or `Fixes (paste link of issue)`. --> Fixes # Special notes for your reviewers*: See merge request: Ascend/pytorch!28520	5 个月前
libs	Fix shutdown process host block destruction bug Co-authored-by: unknown<chenzihao65@huawei.com> # message auto-generated for no-merge-commit merge: !28112 merge shut27 into v2.7.1 Fix shutdown process host block destruction bug Created-by: gcw_5tF58QLT Commit-by: unknown Merged-by: ascend-robot Description: <!-- Thanks for sending a pull request! --> What type of PR is this? > Uncomment only one `/kind <>` line, hit enter to put that in a new line, and remove leading whitespaces from that line: > > /kind bug What does this PR do / why do we need it: Which issue(s) this PR fixes: <!-- Automatically closes linked issue when PR is merged. Usage: `Fixes #<issue number>`, or `Fixes (paste link of issue)`. --> Fixes # Special notes for your reviewers*: See merge request: Ascend/pytorch!28112	5 个月前
logging	record filename in plog and delete unused macros Co-authored-by: zhaoyu<nanzhaogang@qq.com> # message auto-generated for no-merge-commit merge: !31953 merge v2.7.1-plog into v2.7.1 record filename in plog and delete unused macros Created-by: zhaoyu65 Commit-by: zhaoyu Merged-by: ascend-robot Description: <!-- PR描述模板更新日期：20260203 --> # 【合入来源】 > <font color="red">如有社区issue，请关联issue链接</font>\ > <font color="red">请勿携带内部流程信息（需求链接、问题单、内部issue等）</font> - [ ] 需求 - [ ] 问题单 - [x] issue/工单 [#1568](https://gitcode.com/Ascend/pytorch/issues/1568) - [ ] 重构优化 - [ ] 资料更新 # 【修改方案】 1、删除CMakeLists.txt文件中关于`__FILENAME__`和`__FILE__`的定义，以及其他2处使用到的地方：torch_npu/csrc/profiler/feature_mgr.cpp和torch_npu/csrc/profiler/profiler_mgr.cpp 2、在CMakeLists.txt文件中添加编译选项add_compile_options(-fmacro-prefix-map=${CMAKE_SOURCE_DIR}/=) 3、修改调用到aclAppLog的地方； 4、将日志模块宏从torch_npu\csrc\logging\LogContext.h文件中挪到各模块的头文件中； 5、删除没有用到的宏；打印效果： 1、日志模块宏的打印效果 ```text [223916] [2026-03-12 10:47:24:086] torch_npu.memory: [DEBUG] [223916] Allocating memory: size=64, device=6 [223916] [2026-03-12 10:47:24:086] torch_npu.memory: [DEBUG] [223916] Rounded size: 512, alloc size: 2097152, using small pool on device 6 [223916] [2026-03-12 10:47:24:086] torch_npu.memory: [DEBUG] [223916] Searching for free block: size=512, stream=0x22e309a0, device=6 ``` 2、plog日志打印效果 ```text [DEBUG] APP(223916,python):2026-03-12-10:47:28.033.940 [log_inner.cpp:87]223916 free:NPUCachingAllocator.cpp:1441: "[PTA]:"Freeing memory block: size=512, ptr=0x12c041200000, device=6"" [DEBUG] APP(223916,python):2026-03-12-10:47:28.033.973 [log_inner.cpp:87]223916 free_block:NPUCachingAllocator.cpp:2307: "[PTA]:"Freeing block to cache: size=512, ptr=0x12c041200000, device=6"" [DEBUG] APP(223916,python):2026-03-12-10:47:28.033.993 [log_inner.cpp:87]223916 free:NPUCachingAllocator.cpp:1479: "[PTA]:"PTA CachingAllocator free: free = 512, cached = 2097152, allocated = 1024"" [DEBUG] APP(223916,python):2026-03-12-10:47:28.289.574 [log_inner.cpp:87]223916 free:NPUCachingAllocator.cpp:1441: "[PTA]:"Freeing memory block: size=512, ptr=0x12c041200200, device=6"" ``` # 【资料变更】不涉及 # 【接口变更】不涉及 # 【功能验证】验证Ok # 【CheckList】 > PR提交人对以下CheckList自检项进行全量自检，自检通过或不涉及，均修改 [ ] 为 [x] - [x] 代码注释完备，正确记录错误日志 - [x] 代码实现进行了返回值、空指针等校验 - [x] PR标题正确使用类型标签，如：feat、fix、refactor、docs、test等 - [x] PR持续集成流水线（CI）执行通过，代码检查无异常 See merge request: Ascend/pytorch!31953	2 个月前
npu	sk头文件和cann保持一致 Co-authored-by: zhukkk<zhuke11@huawei.com> # message auto-generated for no-merge-commit merge: !34583 merge v2.7.1-26.0.0 into v2.7.1-26.0.0 sk头文件和cann保持一致 Created-by: zhukkk Commit-by: zhukkk Merged-by: ascend-robot Description: <!-- PR描述模板更新日期：20260203 --> # 【合入来源】 > <font color="red">如有社区issue，请关联issue链接</font>\ > <font color="red">请勿携带内部流程信息（需求链接、问题单、内部issue等）</font> - [ ] 需求 - [ ] 问题单 - [ ] issue/工单 - [x] 重构优化 - [ ] 资料更新 # 【修改方案】 acl_sk头文件和cann下面的super_kernel头文件保持名称和内容一致 # 【资料变更】不涉及 # 【接口变更】不涉及 # 【功能验证】 > 说明测试场景，测试方法。如果本次测试方式与常规单元测试不同，请详细说明您的测试步骤\ > 新增/变更内容是否已新增/适配UT测试用例看护，并补充测试自验证截图 ![验证.png](https://raw.gitcode.com/user-images/assets/7404318/404c4dc1-21b1-499b-a4d6-37a49347bbad/验证.png '验证.png') # 【CheckList】 > PR提交人对以下CheckList自检项进行全量自检，自检通过或不涉及，均修改 [ ] 为 [x] - [x] 代码注释完备，正确记录错误日志 - [x] 代码实现进行了返回值、空指针等校验 - [x] PR标题正确使用类型标签，如：feat、fix、refactor、docs、test等 - [x] PR持续集成流水线（CI）执行通过，代码检查无异常 See merge request: Ascend/pytorch!34583	1 个月前
profiler	record filename in plog and delete unused macros Co-authored-by: zhaoyu<nanzhaogang@qq.com> # message auto-generated for no-merge-commit merge: !31953 merge v2.7.1-plog into v2.7.1 record filename in plog and delete unused macros Created-by: zhaoyu65 Commit-by: zhaoyu Merged-by: ascend-robot Description: <!-- PR描述模板更新日期：20260203 --> # 【合入来源】 > <font color="red">如有社区issue，请关联issue链接</font>\ > <font color="red">请勿携带内部流程信息（需求链接、问题单、内部issue等）</font> - [ ] 需求 - [ ] 问题单 - [x] issue/工单 [#1568](https://gitcode.com/Ascend/pytorch/issues/1568) - [ ] 重构优化 - [ ] 资料更新 # 【修改方案】 1、删除CMakeLists.txt文件中关于`__FILENAME__`和`__FILE__`的定义，以及其他2处使用到的地方：torch_npu/csrc/profiler/feature_mgr.cpp和torch_npu/csrc/profiler/profiler_mgr.cpp 2、在CMakeLists.txt文件中添加编译选项add_compile_options(-fmacro-prefix-map=${CMAKE_SOURCE_DIR}/=) 3、修改调用到aclAppLog的地方； 4、将日志模块宏从torch_npu\csrc\logging\LogContext.h文件中挪到各模块的头文件中； 5、删除没有用到的宏；打印效果： 1、日志模块宏的打印效果 ```text [223916] [2026-03-12 10:47:24:086] torch_npu.memory: [DEBUG] [223916] Allocating memory: size=64, device=6 [223916] [2026-03-12 10:47:24:086] torch_npu.memory: [DEBUG] [223916] Rounded size: 512, alloc size: 2097152, using small pool on device 6 [223916] [2026-03-12 10:47:24:086] torch_npu.memory: [DEBUG] [223916] Searching for free block: size=512, stream=0x22e309a0, device=6 ``` 2、plog日志打印效果 ```text [DEBUG] APP(223916,python):2026-03-12-10:47:28.033.940 [log_inner.cpp:87]223916 free:NPUCachingAllocator.cpp:1441: "[PTA]:"Freeing memory block: size=512, ptr=0x12c041200000, device=6"" [DEBUG] APP(223916,python):2026-03-12-10:47:28.033.973 [log_inner.cpp:87]223916 free_block:NPUCachingAllocator.cpp:2307: "[PTA]:"Freeing block to cache: size=512, ptr=0x12c041200000, device=6"" [DEBUG] APP(223916,python):2026-03-12-10:47:28.033.993 [log_inner.cpp:87]223916 free:NPUCachingAllocator.cpp:1479: "[PTA]:"PTA CachingAllocator free: free = 512, cached = 2097152, allocated = 1024"" [DEBUG] APP(223916,python):2026-03-12-10:47:28.289.574 [log_inner.cpp:87]223916 free:NPUCachingAllocator.cpp:1441: "[PTA]:"Freeing memory block: size=512, ptr=0x12c041200200, device=6"" ``` # 【资料变更】不涉及 # 【接口变更】不涉及 # 【功能验证】验证Ok # 【CheckList】 > PR提交人对以下CheckList自检项进行全量自检，自检通过或不涉及，均修改 [ ] 为 [x] - [x] 代码注释完备，正确记录错误日志 - [x] 代码实现进行了返回值、空指针等校验 - [x] PR标题正确使用类型标签，如：feat、fix、refactor、docs、test等 - [x] PR持续集成流水线（CI）执行通过，代码检查无异常 See merge request: Ascend/pytorch!31953	2 个月前
sanitizer	Event supports cross-process and cross-device (IPC event) Co-authored-by: liujunzhu<liujunzhu@huawei.com> # message auto-generated for no-merge-commit merge: !28520 merge v2.7.1 into v2.7.1 Event supports cross-process and cross-device (IPC event) Created-by: liujunzhu Commit-by: liujunzhu Merged-by: ascend-robot Description: <!-- Thanks for sending a pull request! --> What type of PR is this? > Uncomment only one `/kind <>` line, hit enter to put that in a new line, and remove leading whitespaces from that line: > > /kind bug > /kind task /kind feature What does this PR do / why do we need it: event能力对齐cuda，支持跨进程与跨设备使用。 CUDA跨进程共享内存和跨设备拷贝内存时使用Event进行同步，而torch npu使用SynchronizeStream进行同步，希望torch_npu支持在跨设备和跨内存的场景也使用Event进行同步，提升整体性能。另外，需支持通过Python接口在进程间传递Event对象或Event handle并使用该Event进行进程间的同步。相关场景如下： 1、跨进程使用Event：进程间传递interprocess=True的Event对象、Event的IPC Handle或torch.multiprocessing.reductions.reduce_event(event)的结果。 2、跨进程共享NPU内存：包括进程间通过参数或队列传递Tensor、进程间通过torch.multiprocessing.reductions.reduce_tensor()传递Tensor、进程间通过_share_npu_传递Tensor。 3、跨设备拷贝NPU内存：调用Tensor.to()或Tensor.copy_()接口。不支持跨设备或跨进程使用interprocess=False的Event。 Which issue(s) this PR fixes: <!-- Automatically closes linked issue when PR is merged. Usage: `Fixes #<issue number>`, or `Fixes (paste link of issue)`. --> Fixes # Special notes for your reviewers*: See merge request: Ascend/pytorch!28520	5 个月前
toolkit	add Bsymbolic-functions for torch_npu and npu_pcofiler Co-authored-by: wangchao430<wangchao430@huawei.com> # message auto-generated for no-merge-commit merge: !28396 merge v2.7.0_abi into v2.7.1 add Bsymbolic-functions for torch_npu and npu_pcofiler Created-by: wangchao430 Commit-by: wangchao430 Merged-by: ascend-robot Description: <!-- Thanks for sending a pull request! --> What type of PR is this? > Uncomment only one `/kind <>` line, hit enter to put that in a new line, and remove leading whitespaces from that line: > > /kind bug > /kind task > /kind feature What does this PR do / why do we need it: Which issue(s) this PR fixes: <!-- Automatically closes linked issue when PR is merged. Usage: `Fixes #<issue number>`, or `Fixes (paste link of issue)`. --> Fixes # Special notes for your reviewers*: See merge request: Ascend/pytorch!28396	5 个月前
utils	!22489 cleancode Merge pull request !22489 from SCh-zx/cleancode27	10 个月前
InitNpuBindings.cpp	refactor: fix mlir compile Co-authored-by: huangchengnuo<huangchengnuo1@huawei.com> # message auto-generated for no-merge-commit merge: !30363 merge fix_mlir_compile into v2.7.1 refactor: fix mlir compile Created-by: SorryNaCN Commit-by: huangchengnuo Merged-by: ascend-robot Description: <!-- PR描述模板更新日期：20260203 --> # 【合入来源】 > (如有)请关联需求文档/issue链接 - [ ] 需求 - [ ] 问题单 - [ ] issue/工单 - [x] 重构优化 - [ ] 资料更新 # 【修改方案】 1. 移除 Python 侧动态构建 MLIR 扩展与相关打包逻辑：清理 `setup.py` 中 pybind11 扩展、libcpp_common 编译及文件拷贝路径。（`setup.py`） 2. MLIR 绑定下沉到 C++：新增 `torch_npu._C.mlir` 子模块，提供 `load_kernel_binary` 接口，替代原 `_inductor/ ascend_npu_ir/_C `绑定实现。（`torch_npu/csrc/inductor/mlir/mlir_bindings.cpp`） 3. 统一 MLIR 运行时公共代码位置：`hacl_rt.h`/`cpp_common` 迁移到 `torch_npu/csrc/inductor/mlir/`，并补齐导出与 msprof 头文件引用。（`torch_npu/csrc/inductor/mlir/cpp_common.{h,cpp}`） 4. 适配调用路径：`mlir_compiler.py` 使用 `torch_npu._C.mlir.load_kernel_binary`，同时删除 `build_ext` 初始化逻辑。（`torch_npu/_inductor/__init__.py`、`torch_npu/utils/_dynamo.py`、`torch_npu/_inductor/ ascend_npu_ir/...`） 5. 更新测试与构建脚本：移除对 `build_ext` 的 public bindings 依赖，修正 CMake 源文件列表。（`test/npu/ test_public_bindings.py`、`torch_npu/csrc/inductor/CMakeLists.txt`） # 【资料变更】不涉及 # 【接口变更】不涉及（内部绑定实现迁移，无跨仓/对外接口变更） # 【功能验证】 - 未本地执行，待CI验证（如需可补自验证截图） # 【CheckList】 - [x] 代码注释完备，正确记录错误日志 - [x] 代码实现进行了返回值、空指针等校验 - [x] PR标题正确使用类型标签，如：feat、fix、refactor、docs、test等 - [x] PR持续集成流水线（CI）执行通过，代码检查无异常 See merge request: Ascend/pytorch!30363	3 个月前