文件	最后提交记录	最后更新时间
README.md	doc Tools工具扫描问题修改	2 个月前
build.sh	update license	7 个月前
mlapo_demo.cpp	[bug] fix mlapa demo fp16 dtype, pa demo int16_t dtype	6 个月前
mlapo_ds_demo.cpp	[bug] fix mlapa demo fp16 dtype, pa demo int16_t dtype	6 个月前

加速库MlaPreprocessOperation C++ Demo

介绍

该目录下为加速库MlaPreprocessOperation C++调用示例。

示例中生成的数据不代表实际场景，如需数据生成参考请查看根目录下的python用例目录： tests/apitest/opstest/python/operations/mla_preprocess/

提供demo分别对应不同产品/场景，具体场景区别可以参见官网。用户调用demo二进制文件时key输入dtype，tokenNum，headNum，控制数据类型和shape。 dtype对应input的数据类型，分别支持float16/bf16，对应下面数据规格的两列。

mlapo_demo.cpp：

场景：MLAPO int8量化叠加rope切分
- ctkv和qNope经过per_head静态对称量化为int8类型；并使用rope拆分kvcache和query，并且krope和ctkv转为NZ格式输出，ctkv和qNope经过per_head静态对称量化为int8类型。

参数设置：

成员名称	取值
cacheMode	INT8_NZCACHE

数据规格：

tensor名字	数据类型	数据格式	维度信息
Input
`input`	float16/bf16	nd	[tokenNum, 7168]
`gamma0`	float16/bf16	nd	[7168]
`beta0`	float16/bf16	nd	[7168]
`quantScale0`	float16/bf16	nd	[1]
`quantOffset0`	int8	nd	[1]
`wdqkv`	int8	nz	[1, 224, 2112, 32]
`deScale`	int64/float	nd	[2112]
`bias0`	int32	nd	[2112]
`gamma1`	float16/bf16	nd	[1536]
`beta1`	float16/bf16	nd	[1536]
`quantScale1`	float16/bf16	nd	[1]
`quantOffset1`	int8	nd	[1]
`wuq`	int8	nz	[1, 48, 24576, 32]
`deScale1`	int64/float	nd	[24576]
`bias1`	int32	nd	[headNum * 192]
`gamma2`	float16/bf16	nd	[512]
`cos`	float16/bf16	nd	[tokenNum, 64]
`sin`	float16/bf16	nd	[tokenNum, 64]
`wuk`	float16/bf16	nz	[headNum, 32, 128, 16]
`kvCache`	int8	nz	[64, headNum * 512/32, 128, 32]
`kvCacheRope`	float16/bf16	nd	[64, headNum * 64/16, 128, 16]
`slotmapping`	int32	nd	[tokenNum]
`ctkvScale`	float16/bf16	nd	[1]
`qNopeScale`	float16/bf16	nd	[headNum]
Output
`qOut0`	int8	nd	[tokenNum, headNum, 512]
`kvCacheOut0`	int8	nz	[64, headNum * 512/32, 128, 32]
`qOut1`	float16/bf16	nd	[tokenNum, headNum, 64]
`kvCacheOut1`	float16/bf16	nz	[64, headNum * 64/16, 128, 16]

默认值：dtype = float16, tokenNum = 4, headNum = 128

mlapo_ds_demo.cpp：

参数设置：

数据规格：

tensor名字	数据类型	数据格式	维度信息
Input
`input`	float16/bf16	nd	[tokenNum, 7168]
`gamma0`	float16/bf16	nd	[7168]
`beta0`	float16/bf16	nd	[7168]
`quantScale0`	float16/bf16	nd	[1]
`quantOffset0`	int8	nd	[1]
`wdqkv`	int8	nz	[2112, 7168]
`deScale`	int64/float	nd	[2112]
`bias0`	int32	nd	[2112]
`gamma1`	float16/bf16	nd	[1536]
`beta1`	float16/bf16	nd	[1536]
`quantScale1`	float16/bf16	nd	[1]
`quantOffset1`	int8	nd	[1]
`wuq`	int8	nz	[24576, 1536]
`deScale1`	int64/float	nd	[24576]
`bias1`	int32	nd	[headNum * 192]
`gamma2`	float16/bf16	nd	[512]
`cos`	float16/bf16	nd	[tokenNum, 64]
`sin`	float16/bf16	nd	[tokenNum, 64]
`wuk`	float16/bf16	nd	[tokenNum, 128, 512]
`kvCache`	float16/bf16	nd	[161, 128, 1, 512]
`kvCacheRope`	float16/bf16	nd	[161, 128, 1, 64]
`slotmapping`	int32	nd	[tokenNum]
`ctkvScale`	float16/bf16	nd	[1]
`qNopeScale`	float16/bf16	nd	[1]
Output
`qOut0`	int8	nd	[tokenNum, headNum, 512]
`kvCacheOut0`	int8	nz	[161, 128, 1, 512]
`qOut1`	float16/bf16	nd	[tokenNum, headNum, 64]
`kvCacheOut1`	float16/bf16	nz	[161, 128, 1, 64]

默认值：dtype = float16, tokenNum = 32, headNum = 128