RopeWithSinCosCache

产品支持情况

产品	是否支持
Ascend 950PR/Ascend 950DT	√
Atlas A3 训练系列产品/Atlas A3 推理系列产品	√
Atlas A2 训练系列产品/Atlas A2 推理系列产品	√
Atlas 200I/500 A2 推理产品	×
Atlas 推理系列产品	×
Atlas 训练系列产品	×
Kirin X90 处理器系列产品	√
Kirin 9030 处理器系列产品	√

功能说明

算子功能：推理网络为了提升性能，将sin和cos输入通过cache传入，执行旋转位置编码计算。
计算公式：

1、mrope模式：positions的shape输入是[3, numTokens]：
$c o s S i n [i] = c o s S i n C a c h e [p o s i t i o n s [i]]$ $c o s, s i n = c o s S i n . c h u n k (2, d i m = - 1)$ $c o s 0 = c o s [0, :, : m r o p e S e c t i o n [0]]$ $c o s 1 = c o s [1, :, m r o p e S e c t i o n [0] : (m r o p e S e c t i o n [0] + m r o p e S e c t i o n [1])]$ $c o s 2 = c o s [2, :, (m r o p e S e c t i o n [0] + m r o p e S e c t i o n [1]) : (m r o p e S e c t i o n [0] + m r o p e S e c t i o n [1] + m r o p e S e c t i o n [2])]$ $c o s = t o r c h . c a t ((c o s 0, c o s 1, c o s 2), d i m = - 1)$ $s i n 0 = s i n [0, :, : m r o p e S e c t i o n [0]]$ $s i n 1 = s i n [1, :, m r o p e S e c t i o n [0] : (m r o p e S e c t i o n [0] + m r o p e S e c t i o n [1])]$ $s i n 2 = s i n [2, :, (m r o p e S e c t i o n [0] + m r o p e S e c t i o n [1]) : (m r o p e S e c t i o n [0] + m r o p e S e c t i o n [1] + m r o p e S e c t i o n [2])]$ $s i n = t o r c h . c a t ((s i n 0, s i n 1, s i n 2), d i m = - 1)$ $q u e r y R o t = q u e r y [. . ., : r o t a r y D i m]$ $q u e r y P a s s = q u e r y [. . ., r o t a r y D i m :]$
（1）rotate_half（GPT-NeoX style）计算模式：
$x 1, x 2 = t o r c h . c h u n k (q u e r y R o t, 2, d i m = - 1)$ $o 1 [i] = x 1 [i] * c o s [i] - x 2 [i] * s i n [i]$ $o 2 [i] = x 2 [i] * c o s [i] + x 1 [i] * s i n [i]$ $q u e r y R o t = t o r c h . c a t ((o 1, o 2), d i m = - 1)$ $q u e r y = t o r c h . c a t ((q u e r y R o t, q u e r y P a s s), d i m = - 1)$
（2）rotate_interleaved（GPT-J style）计算模式：
$x 1 = q u e r y R o t [. . ., : : 2]$ $x 2 = q u e r y R o t [. . ., 1 : : 2]$ $o 1 [i] = x 1 [i] * c o s [i] - x 2 [i] * s i n [i]$ $o 2 [i] = x 2 [i] * c o s [i] + x 1 [i] * s i n [i]$ $q u e r y R o t = t o r c h . s t a c k ((o 1, o 2), d i m = - 1)$ $q u e r y = t o r c h . c a t ((q u e r y R o t, q u e r y P a s s), d i m = - 1)$
2、rope模式：positions的shape输入是[numTokens]：
$c o s S i n [i] = c o s S i n C a c h e [p o s i t i o n s [i]]$ $c o s, s i n = c o s S i n . c h u n k (2, d i m = - 1)$ $q u e r y R o t = q u e r y [. . ., : r o t a r y D i m]$ $q u e r y P a s s = q u e r y [. . ., r o t a r y D i m :]$
（1）rotate_half（GPT-NeoX style）计算模式：
$x 1, x 2 = t o r c h . c h u n k (q u e r y R o t, 2, d i m = - 1)$ $o 1 [i] = x 1 [i] * c o s [i] - x 2 [i] * s i n [i]$ $o 2 [i] = x 2 [i] * c o s [i] + x 1 [i] * s i n [i]$ $q u e r y R o t = t o r c h . c a t ((o 1, o 2), d i m = - 1)$ $q u e r y = t o r c h . c a t ((q u e r y R o t, q u e r y P a s s), d i m = - 1)$
（2）rotate_interleaved（GPT-J style）计算模式：
$x 1 = q u e r y R o t [. . ., : : 2]$ $x 2 = q u e r y R o t [. . ., 1 : : 2]$ $o 1 [i] = x 1 [i] * c o s [i] - x 2 [i] * s i n [i]$ $o 2 [i] = x 2 [i] * c o s [i] + x 1 [i] * s i n [i]$ $q u e r y R o t = t o r c h . s t a c k ((o 1, o 2), d i m = - 1)$ $q u e r y = t o r c h . c a t ((q u e r y R o t, q u e r y P a s s), d i m = - 1)$

参数说明

参数名	输入/输出/属性	描述	数据类型	数据格式
positions	输入	Device侧的aclTensor，输入索引。	INT32、INT64	ND
queryIn	输入	Device侧的aclTensor，表示要执行旋转位置编码的第一个张量，公式中的`query`。	BFLOAT16、FLOAT16、FLOAT32	ND
keyIn	输入	Device侧的aclTensor，表示要执行旋转位置编码的第二个张量。	BFLOAT16、FLOAT16、FLOAT32	ND
cosSinCache	输入	Device侧的aclTensor，表示参与计算的位置编码张量。	BFLOAT16、FLOAT16、FLOAT32	ND
mropeSection	输入	mrope模式下用于整合输入的位置编码张量信息，公式中的`mropeSection`。	INT64	-
headSize	输入	表示每个注意力头维度大小。	INT64	-
isNeoxStyle	输入	true表示rotate\_half（GPT-NeoX style）计算模式，false表示rotate\_interleaved（GPT-J style）计算模式。	BOOL	-
queryOut	输出	输出query执行旋转位置编码后的结果。	FLOAT、FLOAT16、BFLOAT16	ND
keyOut	输出	输出key执行旋转位置编码后的结果。	FLOAT、FLOAT16、BFLOAT16	ND

Kirin X90/Kirin 9030 处理器系列产品: 不支持 BFLOAT16。

约束说明

queryIn、keyIn、cosSinCache只支持2维shape输入。
headSize: 数据类型为BFLOAT16或FLOAT16时为32的倍数，数据类型为FLOAT32时为16的倍数。
rotaryDim: 始终小于等于headSize；数据类型为BFLOAT16或FLOAT16时为32的倍数，数据类型为FLOAT32时为16的倍数;mrope模式下应满足rotaryDim = mropeSection[0] + mropeSection[1] + mropeSection[2]。
输入tensor positions的取值应小于cosSinCache的0维maxSeqLen。
aclnnRopeWithSinCosCache默认确定性实现。
mropeSection:取值限制为[16, 24, 24]。

调用说明

调用方式	样例代码	说明
aclnn接口	test_aclnn_rope_with_sin_cos_cache	通过aclnnRopeWithSinCosCache接口方式调用RopeWithSinCosCache算子。