| support npu_fusion_attention_v3 with fake register Co-authored-by: wangchao430<wangchao430@huawei.com> # message auto-generated for no-merge-commit merge: !4123 merge master_fav3cpu into master support npu_fusion_attention_v3 with fake register Created-by: wangchao430 Commit-by: wangchao430 Merged-by: ascend-robot Description: <!-- Thanks for sending a pull request! --> **What type of PR is this?** > Uncomment only one /kind <> line, hit enter to put that in a new line, and remove leading whitespaces from that line: > > /kind feature **What does this PR do / why do we need it**: 新增npu_fusion_attention_v3算子,支持入图和aclgraph(不带update)。与npu_fusion_attention变更点主要包含: 1. npu_fusion_attention_v3适配层入口处,prefix参数转为symint数组,actual_seq_qlen、actual_seq_kvlen参数转为cpu Tensor(内容为一维int数组),其中actual_seq_qlen、actual_seq_kvlen尽可能避免发生内存拷贝。 2. 返回值中的seed和offset改为tensor,numel删除(在反向中可以通过重计算得到)。在aclgraph场景下,seed和offset为npu Tensor,直接用于dropout计算,否则为cpu Tensor。由于算子get_max_workspace能力暂未支持,update逻辑暂未适配,aclgraph仅支持不需要actual_seq_qlen、actual_seq_kvlen的BNSD布局。 **Special notes for your reviewers**: See merge request: Ascend/op-plugin!4123 | 4 个月前 |