support npu_fusion_attention_v3 with fake register
Co-authored-by: wangchao430<wangchao430@huawei.com>
# message auto-generated for no-merge-commit merge:
!4123 merge master_fav3cpu into master
support npu_fusion_attention_v3 with fake register
Created-by: wangchao430
Commit-by: wangchao430
Merged-by: ascend-robot
Description: <!-- Thanks for sending a pull request!
-->
**What type of PR is this?**
> Uncomment only one /kind <> line, hit enter to put that in a new line, and remove leading whitespaces from that line:
>
> /kind feature
**What does this PR do / why do we need it**:
新增npu_fusion_attention_v3算子,支持入图和aclgraph(不带update)。与npu_fusion_attention变更点主要包含:
1. npu_fusion_attention_v3适配层入口处,prefix参数转为symint数组,actual_seq_qlen、actual_seq_kvlen参数转为cpu Tensor(内容为一维int数组),其中actual_seq_qlen、actual_seq_kvlen尽可能避免发生内存拷贝。
2. 返回值中的seed和offset改为tensor,numel删除(在反向中可以通过重计算得到)。在aclgraph场景下,seed和offset为npu Tensor,直接用于dropout计算,否则为cpu Tensor。由于算子get_max_workspace能力暂未支持,update逻辑暂未适配,aclgraph仅支持不需要actual_seq_qlen、actual_seq_kvlen的BNSD布局。
**Special notes for your reviewers**:
See merge request: Ascend/op-plugin!4123