| add lock for workspaceallocator
Co-authored-by: huangyunlong2022<huangyunlong4@h-partners.com>
Co-authored-by: zhaoyu65<nanzhaogang@qq.com>
# message auto-generated for no-merge-commit merge:
!26720 merge 2.10ts into master
add lock for workspaceallocator
Created-by: huangyunlong2022
Commit-by: zhaoyu65;huangyunlong2022
Merged-by: ascend-robot
Description: <!-- Thanks for sending a pull request!
-->
**What type of PR is this?**
> Uncomment only one /kind <> line, hit enter to put that in a new line, and remove leading whitespaces from that line:
>
> /kind bug
> /kind task
> /kind feature
**What does this PR do / why do we need it**:
1、通过环境变量控制是否开启每个流一个taskqueue,默认关闭
2、开启后,在enqueue的时候初始化taskqueue,避免流创建的时候初始化太多taskqueue线程,同时为了避免多线程多次初始化进行加锁保护
3、初始化的时候默认选择当前流下发,当前计算算子都是下发到当前流上,对于通信算子下发到通信流上,将其下发的流传到enqueue使用传入的通信流下发
4、取流的时候进行清queue,只清对应流上的queue,可以避免不必要的清queue耗时
5、当前event需要先record后wait,在多taskqueue时为了保序,需要wait在enqueue时确保record已经下发(dequeue阶段通过record数量判断,在event复用场景下,wait之后的record会导致record数量判断失效,导致卡死)
6、event销毁需要确保record,wait已经下发后进行,为了不阻塞,当前采用lazy destroy
7、workspaceallocator进行加锁保护,防止多taskqueue出现竞争问题
**Which issue(s) this PR fixes**:
<!--
*Automatically closes linked issue when PR is merged.
Usage: Fixes #<issue number>, or Fixes (paste link of issue).
-->
Fixes #
**Special notes for your reviewers**:
See merge request: Ascend/pytorch!26720 | 6 个月前 |