文件最后提交记录最后更新时间
fix(torch-event): add elapsedTime, queryStream and synchronizeStream methods Co-authored-by: chenkun<chenkun82@huawei.com> # message auto-generated for no-merge-commit merge: !26654 merge master_npu_guard_impl into master fix(torch-event): add elapsedTime, queryStream and synchronizeStream methods Created-by: kuhn7 Commit-by: chenkun Merged-by: ascend-robot Description: <!-- Thanks for sending a pull request! --> **What type of PR is this?** feature **What does this PR do / why do we need it**: implement timing and stream control for Event - Add elapsedTime method to measure duration between events. - Add queryStream and synchronizeStream for NPU stream management. **Which issue(s) this PR fixes**: <!-- *Automatically closes linked issue when PR is merged. Usage: Fixes #<issue number>, or Fixes (paste link of issue). --> Fixes # **Special notes for your reviewers**: See merge request: Ascend/pytorch!266546 个月前
[torch-master]修复profiler采集内存流ID数据没有aclrtStreamGetId接口兼容性问题 Co-authored-by: yu-liang-bin<y1416490440@163.com> # message auto-generated for no-merge-commit merge: !27671 merge bug_fix_memory into master [torch-master]修复profiler采集内存流ID数据没有aclrtStreamGetId接口兼容性问题 Created-by: yu_liangbin Commit-by: yu-liang-bin Merged-by: ascend-robot Description: <!-- Thanks for sending a pull request! --> **What type of PR is this?** > Uncomment only one /kind <> line, hit enter to put that in a new line, and remove leading whitespaces from that line: > > /kind bug > /kind task > /kind feature **What does this PR do / why do we need it**: profiler采集内存数据时,会通过aclrtStreamGetId接口采集流ID,如果没有这个接口,会打断训练 **Which issue(s) this PR fixes**: 在调用aclrtStreamGetId接口采集流ID数据前增加aclrtStreamGetId接口检查函数 <!-- *Automatically closes linked issue when PR is merged. Usage: Fixes #<issue number>, or Fixes (paste link of issue). --> Fixes # **Special notes for your reviewers**: See merge request: Ascend/pytorch!276715 个月前
add lock for workspaceallocator Co-authored-by: huangyunlong2022<huangyunlong4@h-partners.com> Co-authored-by: zhaoyu65<nanzhaogang@qq.com> # message auto-generated for no-merge-commit merge: !26720 merge 2.10ts into master add lock for workspaceallocator Created-by: huangyunlong2022 Commit-by: zhaoyu65;huangyunlong2022 Merged-by: ascend-robot Description: <!-- Thanks for sending a pull request! --> **What type of PR is this?** > Uncomment only one /kind <> line, hit enter to put that in a new line, and remove leading whitespaces from that line: > > /kind bug > /kind task > /kind feature **What does this PR do / why do we need it**: 1、通过环境变量控制是否开启每个流一个taskqueue,默认关闭 2、开启后,在enqueue的时候初始化taskqueue,避免流创建的时候初始化太多taskqueue线程,同时为了避免多线程多次初始化进行加锁保护 3、初始化的时候默认选择当前流下发,当前计算算子都是下发到当前流上,对于通信算子下发到通信流上,将其下发的流传到enqueue使用传入的通信流下发 4、取流的时候进行清queue,只清对应流上的queue,可以避免不必要的清queue耗时 5、当前event需要先record后wait,在多taskqueue时为了保序,需要wait在enqueue时确保record已经下发(dequeue阶段通过record数量判断,在event复用场景下,wait之后的record会导致record数量判断失效,导致卡死) 6、event销毁需要确保record,wait已经下发后进行,为了不阻塞,当前采用lazy destroy 7、workspaceallocator进行加锁保护,防止多taskqueue出现竞争问题 **Which issue(s) this PR fixes**: <!-- *Automatically closes linked issue when PR is merged. Usage: Fixes #<issue number>, or Fixes (paste link of issue). --> Fixes # **Special notes for your reviewers**: See merge request: Ascend/pytorch!267206 个月前
[SHMEM] support npu shmem Co-authored-by: wangchao430<wangchao430@huawei.com> # message auto-generated for no-merge-commit merge: !26027 merge v2.99.0_shmem1 into master [SHMEM] support npu shmem Created-by: wangchao430 Commit-by: wangchao430 Merged-by: ascend-robot Description: <!-- Thanks for sending a pull request! --> **What type of PR is this?** > Uncomment only one /kind <> line, hit enter to put that in a new line, and remove leading whitespaces from that line: > > /kind bug > /kind task > /kind feature **What does this PR do / why do we need it**: **Which issue(s) this PR fixes**: <!-- *Automatically closes linked issue when PR is merged. Usage: Fixes #<issue number>, or Fixes (paste link of issue). --> Fixes # **Special notes for your reviewers**: See merge request: Ascend/pytorch!260276 个月前
CachingHostAllocator master Co-authored-by: zihao-intuition<chenzihao65@huawei.com> # message auto-generated for no-merge-commit merge: !26498 merge master into master CachingHostAllocator master Created-by: gcw_5tF58QLT Commit-by: zihao-intuition Merged-by: ascend-robot Description: <!-- Thanks for sending a pull request! --> **What type of PR is this?** > Uncomment only one /kind <> line, hit enter to put that in a new line, and remove leading whitespaces from that line: > > /kind task **What does this PR do / why do we need it**: 需求 **Which issue(s) this PR fixes**: <!-- *Automatically closes linked issue when PR is merged. Usage: Fixes #<issue number>, or Fixes (paste link of issue). --> Fixes # **Special notes for your reviewers**: See merge request: Ascend/pytorch!264986 个月前
Align with cuda hostallocator record_event Co-authored-by: zihao-intuition<chenzihao65@huawei.com> # message auto-generated for no-merge-commit merge: !26851 merge 2.10 into master Align with cuda hostallocator record_event Created-by: gcw_5tF58QLT Commit-by: zihao-intuition Merged-by: ascend-robot Description: <!-- Thanks for sending a pull request! --> **What type of PR is this?** > Uncomment only one /kind <> line, hit enter to put that in a new line, and remove leading whitespaces from that line: > > /kind task **What does this PR do / why do we need it**: Align with cuda hostallocator record_event **Which issue(s) this PR fixes**: <!-- *Automatically closes linked issue when PR is merged. Usage: Fixes #<issue number>, or Fixes (paste link of issue). --> Fixes # **Special notes for your reviewers**: See merge request: Ascend/pytorch!268516 个月前
!19331 cleancode Merge pull request !19331 from jiangpengfei/master 1 年前
[security]fix 'new' and 'stoi' Co-authored-by: SCh_zx<1325467101@qq.com> # message auto-generated for no-merge-commit merge: !26126 merge master into master [security]fix 'new' and 'stoi' Created-by: SCh_zx Commit-by: SCh-zx;SCh_zx Merged-by: ascend-robot Description: <!-- Thanks for sending a pull request! --> **What type of PR is this?** > Uncomment only one /kind <> line, hit enter to put that in a new line, and remove leading whitespaces from that line: > > /kind bug > /kind task > /kind feature **What does this PR do / why do we need it**: 对new操作添加错误捕获;用strtol代替stoi **Which issue(s) this PR fixes**: <!-- *Automatically closes linked issue when PR is merged. Usage: Fixes #<issue number>, or Fixes (paste link of issue). --> Fixes # **Special notes for your reviewers**: See merge request: Ascend/pytorch!261266 个月前
!21125 Add some functions to get affinity cpu info. Merge pull request !21125 from yuhaiyan/master-dev2 1 年前
Add a new format similar to 8.5.0-alpha.1 and 8.5.0.alpha001 Co-authored-by: yuhaiyan<yuhaiyan8@huawei.com> # message auto-generated for no-merge-commit merge: !27250 merge cherry-pick-mr-27249-1764330466213-auto into master Add a new format similar to 8.5.0-alpha.1 and 8.5.0.alpha001 Created-by: yuhaiyan Commit-by: yuhaiyan Merged-by: ascend-robot Description: <!-- Thanks for sending a pull request! --> **What type of PR is this?** > Uncomment only one /kind <> line, hit enter to put that in a new line, and remove leading whitespaces from that line: > > /kind bug > /kind task > /kind feature **What does this PR do / why do we need it**: **Which issue(s) this PR fixes**: <!-- *Automatically closes linked issue when PR is merged. Usage: Fixes #<issue number>, or Fixes (paste link of issue). --> Fixes # **Special notes for your reviewers**: See merge request: Ascend/pytorch!272506 个月前
!19827 add IsGteCANNDriverVersion Merge pull request !19827 from 王嘉诚/master_driver_ver 1 年前
!24305 Add affinty conf Merge pull request !24305 from 姜怡文/main_aff 9 个月前
!24305 Add affinty conf Merge pull request !24305 from 姜怡文/main_aff 9 个月前
CachingHostAllocator master Co-authored-by: zihao-intuition<chenzihao65@huawei.com> # message auto-generated for no-merge-commit merge: !26498 merge master into master CachingHostAllocator master Created-by: gcw_5tF58QLT Commit-by: zihao-intuition Merged-by: ascend-robot Description: <!-- Thanks for sending a pull request! --> **What type of PR is this?** > Uncomment only one /kind <> line, hit enter to put that in a new line, and remove leading whitespaces from that line: > > /kind task **What does this PR do / why do we need it**: 需求 **Which issue(s) this PR fixes**: <!-- *Automatically closes linked issue when PR is merged. Usage: Fixes #<issue number>, or Fixes (paste link of issue). --> Fixes # **Special notes for your reviewers**: See merge request: Ascend/pytorch!264986 个月前
CachingHostAllocator master Co-authored-by: zihao-intuition<chenzihao65@huawei.com> # message auto-generated for no-merge-commit merge: !26498 merge master into master CachingHostAllocator master Created-by: gcw_5tF58QLT Commit-by: zihao-intuition Merged-by: ascend-robot Description: <!-- Thanks for sending a pull request! --> **What type of PR is this?** > Uncomment only one /kind <> line, hit enter to put that in a new line, and remove leading whitespaces from that line: > > /kind task **What does this PR do / why do we need it**: 需求 **Which issue(s) this PR fixes**: <!-- *Automatically closes linked issue when PR is merged. Usage: Fixes #<issue number>, or Fixes (paste link of issue). --> Fixes # **Special notes for your reviewers**: See merge request: Ascend/pytorch!264986 个月前
!6756 Add exposed functions Merge pull request !6756 from 姜怡文/master_TORCH_API 2 年前
[Fixbug] add parameter check for PYTORCH_NPU_ALLOC_CONF Co-authored-by: XDaoHong<xudaohong@huawei.com> # message auto-generated for no-merge-commit merge: !27665 merge master into master [Fixbug] add parameter check for PYTORCH_NPU_ALLOC_CONF Created-by: XDaoHong Commit-by: XDaoHong Merged-by: ascend-robot Description: <!-- Thanks for sending a pull request! --> **What type of PR is this?** > Uncomment only one /kind <> line, hit enter to put that in a new line, and remove leading whitespaces from that line: > > bug **What does this PR do / why do we need it**: 增加参数校验 **Which issue(s) this PR fixes**: <!-- *Automatically closes linked issue when PR is merged. Usage: Fixes #<issue number>, or Fixes (paste link of issue). --> Fixes # **Special notes for your reviewers**: See merge request: Ascend/pytorch!276655 个月前
support ERASE_RECORD_STREAM_WITH_OPTIMIZE Co-authored-by: yanpengquan<yanpengquan@huawei.com> # message auto-generated for no-merge-commit merge: !26962 merge master into master support ERASE_RECORD_STREAM_WITH_OPTIMIZE Created-by: yanpengquan Commit-by: yanpengquan Merged-by: ascend-robot Description: <!-- Thanks for sending a pull request! --> **What type of PR is this?** > Uncomment only one /kind <> line, hit enter to put that in a new line, and remove leading whitespaces from that line: > > /kind bug > /kind task > /kind feature **What does this PR do / why do we need it**: **Which issue(s) this PR fixes**: <!-- *Automatically closes linked issue when PR is merged. Usage: Fixes #<issue number>, or Fixes (paste link of issue). --> Fixes # **Special notes for your reviewers**: See merge request: Ascend/pytorch!269626 个月前
!13881 Add the test cases to check the fault mode. Merge pull request !13881 from yuhaiyan/master-dev2 1 年前
Compatible with multiple taskqueues Co-authored-by: huangyunlong2022<huangyunlong4@h-partners.com> # message auto-generated for no-merge-commit merge: !27178 merge 2.10ev into master Compatible with multiple taskqueues Created-by: huangyunlong2022 Commit-by: huangyunlong2022 Merged-by: ascend-robot Description: <!-- Thanks for sending a pull request! --> **What type of PR is this?** > Uncomment only one /kind <> line, hit enter to put that in a new line, and remove leading whitespaces from that line: > > /kind bug > /kind task > /kind feature **What does this PR do / why do we need it**: **Which issue(s) this PR fixes**: <!-- *Automatically closes linked issue when PR is merged. Usage: Fixes #<issue number>, or Fixes (paste link of issue). --> Fixes # **Special notes for your reviewers**: See merge request: Ascend/pytorch!271786 个月前
!20261 [feat] aclGraph task group Merge pull request !20261 from xudaohong/master 1 年前
add lock for workspaceallocator Co-authored-by: huangyunlong2022<huangyunlong4@h-partners.com> Co-authored-by: zhaoyu65<nanzhaogang@qq.com> # message auto-generated for no-merge-commit merge: !26720 merge 2.10ts into master add lock for workspaceallocator Created-by: huangyunlong2022 Commit-by: zhaoyu65;huangyunlong2022 Merged-by: ascend-robot Description: <!-- Thanks for sending a pull request! --> **What type of PR is this?** > Uncomment only one /kind <> line, hit enter to put that in a new line, and remove leading whitespaces from that line: > > /kind bug > /kind task > /kind feature **What does this PR do / why do we need it**: 1、通过环境变量控制是否开启每个流一个taskqueue,默认关闭 2、开启后,在enqueue的时候初始化taskqueue,避免流创建的时候初始化太多taskqueue线程,同时为了避免多线程多次初始化进行加锁保护 3、初始化的时候默认选择当前流下发,当前计算算子都是下发到当前流上,对于通信算子下发到通信流上,将其下发的流传到enqueue使用传入的通信流下发 4、取流的时候进行清queue,只清对应流上的queue,可以避免不必要的清queue耗时 5、当前event需要先record后wait,在多taskqueue时为了保序,需要wait在enqueue时确保record已经下发(dequeue阶段通过record数量判断,在event复用场景下,wait之后的record会导致record数量判断失效,导致卡死) 6、event销毁需要确保record,wait已经下发后进行,为了不阻塞,当前采用lazy destroy 7、workspaceallocator进行加锁保护,防止多taskqueue出现竞争问题 **Which issue(s) this PR fixes**: <!-- *Automatically closes linked issue when PR is merged. Usage: Fixes #<issue number>, or Fixes (paste link of issue). --> Fixes # **Special notes for your reviewers**: See merge request: Ascend/pytorch!267206 个月前
add lock for workspaceallocator Co-authored-by: huangyunlong2022<huangyunlong4@h-partners.com> Co-authored-by: zhaoyu65<nanzhaogang@qq.com> # message auto-generated for no-merge-commit merge: !26720 merge 2.10ts into master add lock for workspaceallocator Created-by: huangyunlong2022 Commit-by: zhaoyu65;huangyunlong2022 Merged-by: ascend-robot Description: <!-- Thanks for sending a pull request! --> **What type of PR is this?** > Uncomment only one /kind <> line, hit enter to put that in a new line, and remove leading whitespaces from that line: > > /kind bug > /kind task > /kind feature **What does this PR do / why do we need it**: 1、通过环境变量控制是否开启每个流一个taskqueue,默认关闭 2、开启后,在enqueue的时候初始化taskqueue,避免流创建的时候初始化太多taskqueue线程,同时为了避免多线程多次初始化进行加锁保护 3、初始化的时候默认选择当前流下发,当前计算算子都是下发到当前流上,对于通信算子下发到通信流上,将其下发的流传到enqueue使用传入的通信流下发 4、取流的时候进行清queue,只清对应流上的queue,可以避免不必要的清queue耗时 5、当前event需要先record后wait,在多taskqueue时为了保序,需要wait在enqueue时确保record已经下发(dequeue阶段通过record数量判断,在event复用场景下,wait之后的record会导致record数量判断失效,导致卡死) 6、event销毁需要确保record,wait已经下发后进行,为了不阻塞,当前采用lazy destroy 7、workspaceallocator进行加锁保护,防止多taskqueue出现竞争问题 **Which issue(s) this PR fixes**: <!-- *Automatically closes linked issue when PR is merged. Usage: Fixes #<issue number>, or Fixes (paste link of issue). --> Fixes # **Special notes for your reviewers**: See merge request: Ascend/pytorch!267206 个月前
check oom with error code EL0004, add AclrtGetMemUsageInfo Co-authored-by: zhaoyu<nanzhaogang@qq.com> # message auto-generated for no-merge-commit merge: !26067 merge snapshot-master into master check oom with error code EL0004, add AclrtGetMemUsageInfo Created-by: zhaoyu65 Commit-by: zhaoyu Merged-by: ascend-robot Description: <!-- Thanks for sending a pull request! --> **What type of PR is this?** > Uncomment only one /kind <> line, hit enter to put that in a new line, and remove leading whitespaces from that line: /kind feature **What does this PR do / why do we need it**: 1、更新对OOM场景的校验逻辑,使用EL0004错误码进行校验。 2、添加AclrtGetMemUsageInfo获取CANN内存快照 **Which issue(s) this PR fixes**: <!-- *Automatically closes linked issue when PR is merged. Usage: Fixes #<issue number>, or Fixes (paste link of issue). --> Fixes # **Special notes for your reviewers**: See merge request: Ascend/pytorch!260676 个月前
check oom with error code EL0004, add AclrtGetMemUsageInfo Co-authored-by: zhaoyu<nanzhaogang@qq.com> # message auto-generated for no-merge-commit merge: !26067 merge snapshot-master into master check oom with error code EL0004, add AclrtGetMemUsageInfo Created-by: zhaoyu65 Commit-by: zhaoyu Merged-by: ascend-robot Description: <!-- Thanks for sending a pull request! --> **What type of PR is this?** > Uncomment only one /kind <> line, hit enter to put that in a new line, and remove leading whitespaces from that line: /kind feature **What does this PR do / why do we need it**: 1、更新对OOM场景的校验逻辑,使用EL0004错误码进行校验。 2、添加AclrtGetMemUsageInfo获取CANN内存快照 **Which issue(s) this PR fixes**: <!-- *Automatically closes linked issue when PR is merged. Usage: Fixes #<issue number>, or Fixes (paste link of issue). --> Fixes # **Special notes for your reviewers**: See merge request: Ascend/pytorch!260676 个月前
!20733 Add NPUSwapMemoryAllocator and empty_with_swap_memory Merge pull request !20733 from 姜怡文/main_vm 1 年前
!20733 Add NPUSwapMemoryAllocator and empty_with_swap_memory Merge pull request !20733 from 姜怡文/main_vm 1 年前
add hasPrimaryContext Co-authored-by: huangyunlong2022<huangyunlong4@h-partners.com> # message auto-generated for no-merge-commit merge: !27392 merge 2.10ct into master add hasPrimaryContext Created-by: huangyunlong2022 Commit-by: huangyunlong2022 Merged-by: ascend-robot Description: <!-- Thanks for sending a pull request! --> **What type of PR is this?** > Uncomment only one /kind <> line, hit enter to put that in a new line, and remove leading whitespaces from that line: > > /kind bug > /kind task > /kind feature **What does this PR do / why do we need it**: **Which issue(s) this PR fixes**: <!-- *Automatically closes linked issue when PR is merged. Usage: Fixes #<issue number>, or Fixes (paste link of issue). --> Fixes # **Special notes for your reviewers**: See merge request: Ascend/pytorch!273925 个月前
add hasPrimaryContext Co-authored-by: huangyunlong2022<huangyunlong4@h-partners.com> # message auto-generated for no-merge-commit merge: !27392 merge 2.10ct into master add hasPrimaryContext Created-by: huangyunlong2022 Commit-by: huangyunlong2022 Merged-by: ascend-robot Description: <!-- Thanks for sending a pull request! --> **What type of PR is this?** > Uncomment only one /kind <> line, hit enter to put that in a new line, and remove leading whitespaces from that line: > > /kind bug > /kind task > /kind feature **What does this PR do / why do we need it**: **Which issue(s) this PR fixes**: <!-- *Automatically closes linked issue when PR is merged. Usage: Fixes #<issue number>, or Fixes (paste link of issue). --> Fixes # **Special notes for your reviewers**: See merge request: Ascend/pytorch!273925 个月前
【bugfix】Fix when taskQueue is closed with aclgraph enabled--master Co-authored-by: yurongkun<yurongkun@huawei.com> # message auto-generated for no-merge-commit merge: !27672 merge task_queue_master into master 【bugfix】Fix when taskQueue is closed with aclgraph enabled--master Created-by: yurongkun Commit-by: yurongkun Merged-by: ascend-robot Description: <!-- Thanks for sending a pull request! --> **What type of PR is this?** > Uncomment only one /kind <> line, hit enter to put that in a new line, and remove leading whitespaces from that line: > > /kind bug **What does this PR do / why do we need it**: 修复关闭taskQueue且打开aclgraph场景,device资源释放过早导致NPUgraph析构报错的问题 **Which issue(s) this PR fixes**: <!-- *Automatically closes linked issue when PR is merged. Usage: Fixes #<issue number>, or Fixes (paste link of issue). --> Fixes # **Special notes for your reviewers**: See merge request: Ascend/pytorch!276725 个月前
[feat] support npugraph debug_dump API Co-authored-by: zhukkk<zhuke11@huawei.com> # message auto-generated for no-merge-commit merge: !26075 merge master into master [feat] support npugraph debug_dump API Created-by: zhukkk Commit-by: zhukkk Merged-by: ascend-robot Description: <!-- Thanks for sending a pull request! --> **What type of PR is this?** > Uncomment only one /kind <> line, hit enter to put that in a new line, and remove leading whitespaces from that line: > > /kind bug > /kind task kind feature **What does this PR do / why do we need it**: aclgraph增强DFX能力,对capture的aclgraph提供dump能力,支持NPUGraph的debug_dump API **Which issue(s) this PR fixes**: <!-- *Automatically closes linked issue when PR is merged. Usage: Fixes #<issue number>, or Fixes (paste link of issue). --> Fixes # **Special notes for your reviewers**: See merge request: Ascend/pytorch!260757 个月前
!23769 Extended generator to support random number generation in aclgraph scenarios Merge pull request !23769 from 闫鹏全/master 9 个月前
!21299 Exposed currentStreamCaptureStatusMayInitCtx api Merge pull request !21299 from wgb/master 1 年前
!19525 [cleancode]core Merge pull request !19525 from SCh-zx/master 1 年前
[fix] Fix NPUHooksInterface::getDeviceFromPtr with aclrtPointerGetAttributes Co-authored-by: wangchao430<wangchao430@huawei.com> # message auto-generated for no-merge-commit merge: !26329 merge v2.99.0_getdevice into master [fix] Fix NPUHooksInterface::getDeviceFromPtr with aclrtPointerGetAttributes Created-by: wangchao430 Commit-by: wangchao430 Merged-by: ascend-robot Description: <!-- Thanks for sending a pull request! --> **What type of PR is this?** > Uncomment only one /kind <> line, hit enter to put that in a new line, and remove leading whitespaces from that line: > > /kind bug > /kind task > /kind feature **What does this PR do / why do we need it**: **Which issue(s) this PR fixes**: <!-- *Automatically closes linked issue when PR is merged. Usage: Fixes #<issue number>, or Fixes (paste link of issue). --> Fixes # **Special notes for your reviewers**: See merge request: Ascend/pytorch!263296 个月前
[SHMEM] support npu shmem Co-authored-by: wangchao430<wangchao430@huawei.com> # message auto-generated for no-merge-commit merge: !26027 merge v2.99.0_shmem1 into master [SHMEM] support npu shmem Created-by: wangchao430 Commit-by: wangchao430 Merged-by: ascend-robot Description: <!-- Thanks for sending a pull request! --> **What type of PR is this?** > Uncomment only one /kind <> line, hit enter to put that in a new line, and remove leading whitespaces from that line: > > /kind bug > /kind task > /kind feature **What does this PR do / why do we need it**: **Which issue(s) this PR fixes**: <!-- *Automatically closes linked issue when PR is merged. Usage: Fixes #<issue number>, or Fixes (paste link of issue). --> Fixes # **Special notes for your reviewers**: See merge request: Ascend/pytorch!260276 个月前
!22300 Add support for custom dtype Merge pull request !22300 from chuboning/master 11 个月前
!16686 add memory_swap api and expose host allocator api Merge pull request !16686 from ChenDonYY/master_swap_api_and_host_allocator 1 年前
!16686 add memory_swap api and expose host allocator api Merge pull request !16686 from ChenDonYY/master_swap_api_and_host_allocator 1 年前
!23852 [Bugfix] Optimized the P2P Enable connection limit Merge pull request !23852 from kuhn/master_fix 9 个月前
!10010 support single process multiple device Merge pull request !10010 from 闫鹏全/master 2 年前
check oom with error code EL0004, add AclrtGetMemUsageInfo Co-authored-by: zhaoyu<nanzhaogang@qq.com> # message auto-generated for no-merge-commit merge: !26067 merge snapshot-master into master check oom with error code EL0004, add AclrtGetMemUsageInfo Created-by: zhaoyu65 Commit-by: zhaoyu Merged-by: ascend-robot Description: <!-- Thanks for sending a pull request! --> **What type of PR is this?** > Uncomment only one /kind <> line, hit enter to put that in a new line, and remove leading whitespaces from that line: /kind feature **What does this PR do / why do we need it**: 1、更新对OOM场景的校验逻辑,使用EL0004错误码进行校验。 2、添加AclrtGetMemUsageInfo获取CANN内存快照 **Which issue(s) this PR fixes**: <!-- *Automatically closes linked issue when PR is merged. Usage: Fixes #<issue number>, or Fixes (paste link of issue). --> Fixes # **Special notes for your reviewers**: See merge request: Ascend/pytorch!260676 个月前
Optimize NPUEvent synchronize method Co-authored-by: 周锐淇<zhouruiqi5@huawei.com> # message auto-generated for no-merge-commit merge: !26564 merge master into master Optimize NPUEvent synchronize method Created-by: rich9527 Commit-by: 周锐淇 Merged-by: ascend-robot Description: <!-- Thanks for sending a pull request! --> **What type of PR is this?** > /kind task **What does this PR do / why do we need it**: Optimize the event synchronization method to reduce blocking time. **Which issue(s) this PR fixes**: <!-- *Automatically closes linked issue when PR is merged. Usage: Fixes #<issue number>, or Fixes (paste link of issue). --> Fixes # **Special notes for your reviewers**: See merge request: Ascend/pytorch!265646 个月前
!1414 Modify the scope of the libtorch macro Merge pull request !1414 from 闫鹏全/master 1 年前
!1414 Modify the scope of the libtorch macro Merge pull request !1414 from 闫鹏全/master 1 年前
add lock for workspaceallocator Co-authored-by: huangyunlong2022<huangyunlong4@h-partners.com> Co-authored-by: zhaoyu65<nanzhaogang@qq.com> # message auto-generated for no-merge-commit merge: !26720 merge 2.10ts into master add lock for workspaceallocator Created-by: huangyunlong2022 Commit-by: zhaoyu65;huangyunlong2022 Merged-by: ascend-robot Description: <!-- Thanks for sending a pull request! --> **What type of PR is this?** > Uncomment only one /kind <> line, hit enter to put that in a new line, and remove leading whitespaces from that line: > > /kind bug > /kind task > /kind feature **What does this PR do / why do we need it**: 1、通过环境变量控制是否开启每个流一个taskqueue,默认关闭 2、开启后,在enqueue的时候初始化taskqueue,避免流创建的时候初始化太多taskqueue线程,同时为了避免多线程多次初始化进行加锁保护 3、初始化的时候默认选择当前流下发,当前计算算子都是下发到当前流上,对于通信算子下发到通信流上,将其下发的流传到enqueue使用传入的通信流下发 4、取流的时候进行清queue,只清对应流上的queue,可以避免不必要的清queue耗时 5、当前event需要先record后wait,在多taskqueue时为了保序,需要wait在enqueue时确保record已经下发(dequeue阶段通过record数量判断,在event复用场景下,wait之后的record会导致record数量判断失效,导致卡死) 6、event销毁需要确保record,wait已经下发后进行,为了不阻塞,当前采用lazy destroy 7、workspaceallocator进行加锁保护,防止多taskqueue出现竞争问题 **Which issue(s) this PR fixes**: <!-- *Automatically closes linked issue when PR is merged. Usage: Fixes #<issue number>, or Fixes (paste link of issue). --> Fixes # **Special notes for your reviewers**: See merge request: Ascend/pytorch!267206 个月前
add lock for workspaceallocator Co-authored-by: huangyunlong2022<huangyunlong4@h-partners.com> Co-authored-by: zhaoyu65<nanzhaogang@qq.com> # message auto-generated for no-merge-commit merge: !26720 merge 2.10ts into master add lock for workspaceallocator Created-by: huangyunlong2022 Commit-by: zhaoyu65;huangyunlong2022 Merged-by: ascend-robot Description: <!-- Thanks for sending a pull request! --> **What type of PR is this?** > Uncomment only one /kind <> line, hit enter to put that in a new line, and remove leading whitespaces from that line: > > /kind bug > /kind task > /kind feature **What does this PR do / why do we need it**: 1、通过环境变量控制是否开启每个流一个taskqueue,默认关闭 2、开启后,在enqueue的时候初始化taskqueue,避免流创建的时候初始化太多taskqueue线程,同时为了避免多线程多次初始化进行加锁保护 3、初始化的时候默认选择当前流下发,当前计算算子都是下发到当前流上,对于通信算子下发到通信流上,将其下发的流传到enqueue使用传入的通信流下发 4、取流的时候进行清queue,只清对应流上的queue,可以避免不必要的清queue耗时 5、当前event需要先record后wait,在多taskqueue时为了保序,需要wait在enqueue时确保record已经下发(dequeue阶段通过record数量判断,在event复用场景下,wait之后的record会导致record数量判断失效,导致卡死) 6、event销毁需要确保record,wait已经下发后进行,为了不阻塞,当前采用lazy destroy 7、workspaceallocator进行加锁保护,防止多taskqueue出现竞争问题 **Which issue(s) this PR fixes**: <!-- *Automatically closes linked issue when PR is merged. Usage: Fixes #<issue number>, or Fixes (paste link of issue). --> Fixes # **Special notes for your reviewers**: See merge request: Ascend/pytorch!267206 个月前
Support to print the tensor create by the interface torch_npu.empty_with_swapped_memory Co-authored-by: geyi<geyi2@huawei.com> # message auto-generated for no-merge-commit merge: !26440 merge master into master Support to print the tensor create by the interface torch_npu.empty_with_swapped_memory Created-by: gleaming-spark Commit-by: geyi Merged-by: ascend-robot Description: <!-- Thanks for sending a pull request! --> **What type of PR is this?** > Uncomment only one /kind <> line, hit enter to put that in a new line, and remove leading whitespaces from that line: > > /kind feature **What does this PR do / why do we need it**: **Which issue(s) this PR fixes**: <!-- *Automatically closes linked issue when PR is merged. Usage: Fixes #<issue number>, or Fixes (paste link of issue). --> Support to print the tensor create by the interface torch_npu.empty_with_swapped_memory Fixes # **Special notes for your reviewers**: See merge request: Ascend/pytorch!264406 个月前
!20843 Rename swap memory to swapped memory Merge pull request !20843 from 姜怡文/main_sw 1 年前
add lock for workspaceallocator Co-authored-by: huangyunlong2022<huangyunlong4@h-partners.com> Co-authored-by: zhaoyu65<nanzhaogang@qq.com> # message auto-generated for no-merge-commit merge: !26720 merge 2.10ts into master add lock for workspaceallocator Created-by: huangyunlong2022 Commit-by: zhaoyu65;huangyunlong2022 Merged-by: ascend-robot Description: <!-- Thanks for sending a pull request! --> **What type of PR is this?** > Uncomment only one /kind <> line, hit enter to put that in a new line, and remove leading whitespaces from that line: > > /kind bug > /kind task > /kind feature **What does this PR do / why do we need it**: 1、通过环境变量控制是否开启每个流一个taskqueue,默认关闭 2、开启后,在enqueue的时候初始化taskqueue,避免流创建的时候初始化太多taskqueue线程,同时为了避免多线程多次初始化进行加锁保护 3、初始化的时候默认选择当前流下发,当前计算算子都是下发到当前流上,对于通信算子下发到通信流上,将其下发的流传到enqueue使用传入的通信流下发 4、取流的时候进行清queue,只清对应流上的queue,可以避免不必要的清queue耗时 5、当前event需要先record后wait,在多taskqueue时为了保序,需要wait在enqueue时确保record已经下发(dequeue阶段通过record数量判断,在event复用场景下,wait之后的record会导致record数量判断失效,导致卡死) 6、event销毁需要确保record,wait已经下发后进行,为了不阻塞,当前采用lazy destroy 7、workspaceallocator进行加锁保护,防止多taskqueue出现竞争问题 **Which issue(s) this PR fixes**: <!-- *Automatically closes linked issue when PR is merged. Usage: Fixes #<issue number>, or Fixes (paste link of issue). --> Fixes # **Special notes for your reviewers**: See merge request: Ascend/pytorch!267206 个月前
!23908 Fix hang on bug while tq=2 Merge pull request !23908 from 姜怡文/main_wk 9 个月前
[950] Add 950 support Co-authored-by: chuboning<chuboning1@huawei.com> Co-authored-by: lilongqianxi<lilongqianxi@h-partners.com> Co-authored-by: 路有兵<luyoubing@huawei.com> # message auto-generated for no-merge-commit merge: !26229 merge master into master [950] Add 950 support Created-by: chuboning Commit-by: chuboning;路有兵;lilongqianxi Merged-by: ascend-robot Description: <!-- Thanks for sending a pull request! --> **What type of PR is this?** > Uncomment only one /kind <> line, hit enter to put that in a new line, and remove leading whitespaces from that line: > > /kind bug > /kind task > /kind feature **What does this PR do / why do we need it**: **Which issue(s) this PR fixes**: <!-- *Automatically closes linked issue when PR is merged. Usage: Fixes #<issue number>, or Fixes (paste link of issue). --> Fixes # **Special notes for your reviewers**: See merge request: Ascend/pytorch!262296 个月前
[950] Add 950 support Co-authored-by: chuboning<chuboning1@huawei.com> Co-authored-by: lilongqianxi<lilongqianxi@h-partners.com> Co-authored-by: 路有兵<luyoubing@huawei.com> # message auto-generated for no-merge-commit merge: !26229 merge master into master [950] Add 950 support Created-by: chuboning Commit-by: chuboning;路有兵;lilongqianxi Merged-by: ascend-robot Description: <!-- Thanks for sending a pull request! --> **What type of PR is this?** > Uncomment only one /kind <> line, hit enter to put that in a new line, and remove leading whitespaces from that line: > > /kind bug > /kind task > /kind feature **What does this PR do / why do we need it**: **Which issue(s) this PR fixes**: <!-- *Automatically closes linked issue when PR is merged. Usage: Fixes #<issue number>, or Fixes (paste link of issue). --> Fixes # **Special notes for your reviewers**: See merge request: Ascend/pytorch!262296 个月前
!8153 Modify torch_npu apis Merge pull request !8153 from 姜怡文/master_apis 2 年前
!9301 Close some temporary open interfaces Merge pull request !9301 from 姜怡文/master_api 2 年前
!15776 TORCH MAIN SYNC : macro conflict/int1-7/unified accelerator init/api deprecation Merge pull request !15776 from dilililiwhy/main_sync_20241031 1 年前