文件	最后提交记录	最后更新时间
README.md	doc Tools工具扫描问题修改 Co-authored-by: gitee-yanglulu<yanglulul@h-partners.com>	2 个月前
build.sh	update license Co-authored-by: qq_44359711<caobingjie@huawei.com>	7 个月前
paged_attention_demo.cpp	update license Co-authored-by: qq_44359711<caobingjie@huawei.com>	7 个月前
paged_attention_inference_demo.cpp	[bug] fix mlapa demo fp16 dtype, pa demo int16_t dtype Co-authored-by: qiuqianjin233<qiuqianjin@huawei.com>	6 个月前
paged_attention_qwen_demo.cpp	update license Co-authored-by: qq_44359711<caobingjie@huawei.com>	7 个月前

加速库PagedAttentionOperation C++ Demo

介绍

该目录下为加速库PagedAttentionOperation C++调用示例。

示例中生成的数据不代表实际场景，如需数据生成参考请查看根目录下的python用例目录： tests/apitest/opstest/python/operations/paged_attention/

本op在Atlas A2/A3系列和Atlas 推理系列产品上实现有所区别

提供demo编译运行时需要对应更改build脚本：

不开启并行解码且带mask场景：

paged_attention_demo.cpp

默认编译脚本可编译运行，该demo仅支持在Atlas A2/A3系列上运行。

参数设置：

数据规格：

tensor名字	数据类型	数据格式	维度信息	cpu/npu
`query`	float16	nd	[2, 32, 128]	npu
`keyCache`	float16	nd	[16, 128, 32, 128]	npu
`valueCache`	float16	nd	[16, 128, 32, 128]	npu
`blockTables`	int32	nd	[2, 8]	npu
`contextLens`	int32	nd	[2]	cpu
`mask`	int32	nd	[2, 1, 1024]	npu
`attnOut`	float16	nd	[2, 32, 128]	npu

paged_attention_qwen_demo.cpp

该demo仅支持在Atlas A2/A3系列上运行。

参数设置：

数据规格：

tensor名字	数据类型	数据格式	维度信息	cpu/npu
`query`	bf16	nd	[1, 5, 128]	npu
`qkScale`	bf16	nd	[9, 128, 1, 128]	npu
`valueCache`	bf16	nd	[9, 128, 1, 128]	npu
`blockTables`	int32	nd	[1, 8]	npu
`contextLens`	int32	nd	[1]	cpu
`attnOut`	bf16	nd	[1, 5, 128]	npu

不带mask：

paged_attention_inference_demo.cpp
该demo仅支持在Atlas推理系列产品上运行。
参数设置：

数据规格：

tensor名字	数据类型	数据格式	维度信息	cpu/npu
`query`	bf16	nd	[2, 32, 128]	npu
`qkScale`	bf16	nd	[16, 1024, 128, 16]	npu
`valueCache`	bf16	nd	[16, 1024, 128, 16]	npu
`blockTables`	int32	nd	[2, 8]	npu
`contextLens`	int32	nd	[2]	cpu
`attnOut`	bf16	nd	[2, 32, 128]	npu