文件最后提交记录最后更新时间
doc Tools工具扫描问题修改 Co-authored-by: gitee-yanglulu<yanglulul@h-partners.com> # message auto-generated for no-merge-commit merge: !3432 merge master into master doc Tools工具扫描问题修改 Created-by: gitee-yanglulu Commit-by: gitee-yanglulu Merged-by: cann-robot Description: doc Tools工具扫描问题修改 See merge request: cann/ops-transformer!34322 个月前
fix: aclrtMemcpy host to host for schedule_context, 修正示例代码内存拷贝类型错误 Co-authored-by: Developer user<jialimin1@huawei.com> # message auto-generated for no-merge-commit merge: !5918 merge master into master fix: aclrtMemcpy host to host for schedule_context, 修正示例代码内存拷贝类型错误 Created-by: jialimin1 Commit-by: Developer user Merged-by: cann-robot Description: ## 描述 1、修改aclrtMemcpy host to host for schedule_context, 修正示例代码内存拷贝类型错误 2、修改日志打印内容 ## 关联的Issue Issue #2640 Issue #2641 <!-- 如果这个PR是为了解决特定的Issue,请在这里提供Issue链接。例如:关联Issue #000--> <!-- 如果这个PR是为了解决特定的问题单,请在这里描述问题单单号。--> ## 测试 ``` 编译包验证: [2026-05-26 03:18:07] ./uninstall.sh [2026-05-26 03:18:07] ./upgrade.sh [2026-05-26 03:18:07] CRC: 1888598674 [2026-05-26 03:18:07] SHA256: e5e52d301ebc0a74b5f34866bf194936a5fc441b6658e781ba1bc761fd8fbde1 [2026-05-26 03:18:07] Skipping md5sum at user request [2026-05-26 03:18:07] [2026-05-26 03:18:07] Self-extractable archive "cann-ops-transformer-custom_linux-x86_64.run" successfully created. 样例验证: [2026-05-26 03:21:42] Start compile and run example file: ../ffn/ffn_worker_scheduler/examples/test_aclnn_inplace_ffn_worker_scheduler.cpp [2026-05-26 03:21:49] Init ffn success, token_info_buf_size=192,token_data_buf_size= 1024. [2026-05-26 03:21:49] layer_ids_buf_size = 8. [2026-05-26 03:21:49] session_ids_buf_size = 8. [2026-05-26 03:21:49] micro_batch_ids_buf_size = 8. [2026-05-26 03:21:49] expert_ids_buf_size = 80. [2026-05-26 03:21:49] layer_ids[0] is: 55 [2026-05-26 03:21:49] layer_ids[1] is: 55 [2026-05-26 03:21:49] session_ids[0] is: 0 [2026-05-26 03:21:49] session_ids[1] is: 1 [2026-05-26 03:21:49] micro_batch_ids[0] is: 0 [2026-05-26 03:21:49] micro_batch_ids[1] is: 0 [2026-05-26 03:21:49] expert_ids[0] is: 0 [2026-05-26 03:21:49] expert_ids[1] is: 1 [2026-05-26 03:21:49] expert_ids[2] is: 2 [2026-05-26 03:21:49] expert_ids[3] is: 3 [2026-05-26 03:21:49] expert_ids[4] is: 4 [2026-05-26 03:21:49] expert_ids[5] is: 5 [2026-05-26 03:21:49] expert_ids[6] is: 6 [2026-05-26 03:21:49] expert_ids[7] is: 7 [2026-05-26 03:21:49] expert_ids[8] is: 8 [2026-05-26 03:21:49] expert_ids[9] is: 9 [2026-05-26 03:21:49] expert_ids[10] is: 0 [2026-05-26 03:21:49] expert_ids[11] is: 1 [2026-05-26 03:21:49] expert_ids[12] is: 2 [2026-05-26 03:21:49] expert_ids[13] is: 3 [2026-05-26 03:21:49] expert_ids[14] is: 4 [2026-05-26 03:21:49] expert_ids[15] is: 5 [2026-05-26 03:21:49] expert_ids[16] is: 6 [2026-05-26 03:21:49] expert_ids[17] is: 7 [2026-05-26 03:21:49] expert_ids[18] is: 8 [2026-05-26 03:21:49] expert_ids[19] is: 9 [2026-05-26 03:21:50] run test_aclnn_ffn_worker_scheduler, execute samples success [2026-05-26 03:21:50] Example completed successfully ``` <!--描述进行了哪些测试来验证你的改动。包括但不限于二级冒烟、算子泛化等。--> ## 文档更新 不涉及 <!--如果这个PR包含文档的更新,请在这里指出。例如:更新了README.md文件。--> ## 类型标签 <!-- [x] 表示选中 --> - [x]? Bug 修复 - [ ] ✨ 新特性 - [ ] ⚡ 性能优化 - [ ] ♻️ 重构 - [ ] 🧪 测试 - [ ] 📦 构建/CI - [ ] 🔧 配置变更 - [ ] 📝 文档更新 - [ ] ⬆️ 依赖升级 - [ ] 🔒 安全修复 - [ ] 🧹 代码清理 - [ ] ❓ 其他,请描述: See merge request: cann/ops-transformer!59183 天前
Add the AI CPU implementation of FfnWorkerScheduler and AttentionWorkerScheduler Co-authored-by: @ding-jing12<dingjing19@huawei.com> # message auto-generated for no-merge-commit merge: !1076 merge atten_ffn_op into master Add the AI CPU implementation of FfnWorkerScheduler and AttentionWorkerScheduler Created-by: Ding_Jing Commit-by: @ding-jing12 Merged-by: cann-robot Description: ## 描述 <!--在这里详细描述你的改动,包括改动的原因和所采取的方法。--> Add the AI CPU implementation of FfnWorkerScheduler and AttentionWorkerScheduler ## 关联的Issue <!-- 如果这个PR是为了解决特定的Issue,请在这里提供Issue链接。例如:关联Issue #000--> <!-- 如果这个PR是为了解决特定的问题单,请在这里描述问题单单号。--> ## 测试 <!--描述进行了哪些测试来验证你的改动。包括但不限于二级冒烟、算子泛化等。--> ``` 功能验证:AttentionWorkerScheduler [2026-01-28 11:32:27] [2026-01-28 11:32:27] Start to run example,name:attention_worker_scheduler mode:eager [2026-01-28 11:32:27] Start compile and run example file: ../attention/attention_worker_scheduler/examples/test_aclnn_inplace_attention_worker_scheduler.cpp [2026-01-28 11:32:27] pkg_mode:cust vendor_name:custom [2026-01-28 11:32:33] micro_batch_id = 0. [2026-01-28 11:32:33] run test_aclnn_attention_worker_scheduler, execute samples success [2026-01-28 11:32:33] Example completed successfully 功能验证:FfnWorkerScheduler [2026-01-28 11:40:44] [2026-01-28 11:40:44] Start to run example,name:ffn_worker_scheduler mode:eager [2026-01-28 11:40:44] Start compile and run example file: ../ffn/ffn_worker_scheduler/examples/test_aclnn_inplace_ffn_worker_scheduler.cpp [2026-01-28 11:40:44] pkg_mode:cust vendor_name:custom [2026-01-28 11:40:50] Init ffn success, token_info_buf_size=192,token_data_buf_size= 1024. [2026-01-28 11:40:50] layer_ids_buf_size = 8. [2026-01-28 11:40:50] session_ids_buf_size = 8. [2026-01-28 11:40:50] micro_batch_ids_buf_size = 8. [2026-01-28 11:40:50] expert_ids_buf_size = 80. [2026-01-28 11:40:50] layer_ids[0] is: 55 [2026-01-28 11:40:50] layer_ids[1] is: 55 [2026-01-28 11:40:50] session_ids[0] is: 0 [2026-01-28 11:40:50] session_ids[1] is: 1 [2026-01-28 11:40:50] micro_batch_ids[0] is: 0 [2026-01-28 11:40:50] micro_batch_ids[1] is: 0 [2026-01-28 11:40:50] expert_ids[0] is: 0 [2026-01-28 11:40:50] expert_ids[1] is: 1 [2026-01-28 11:40:50] expert_ids[2] is: 2 [2026-01-28 11:40:50] expert_ids[3] is: 3 [2026-01-28 11:40:50] expert_ids[4] is: 4 [2026-01-28 11:40:50] expert_ids[5] is: 5 [2026-01-28 11:40:50] expert_ids[6] is: 6 [2026-01-28 11:40:50] expert_ids[7] is: 7 [2026-01-28 11:40:50] expert_ids[8] is: 8 [2026-01-28 11:40:50] expert_ids[9] is: 9 [2026-01-28 11:40:50] expert_ids[10] is: 0 [2026-01-28 11:40:50] expert_ids[11] is: 1 [2026-01-28 11:40:50] expert_ids[12] is: 2 [2026-01-28 11:40:50] expert_ids[13] is: 3 [2026-01-28 11:40:50] expert_ids[14] is: 4 [2026-01-28 11:40:50] expert_ids[15] is: 5 [2026-01-28 11:40:50] expert_ids[16] is: 6 [2026-01-28 11:40:50] expert_ids[17] is: 7 [2026-01-28 11:40:50] expert_ids[18] is: 8 [2026-01-28 11:40:50] expert_ids[19] is: 9 [2026-01-28 11:40:50] run test_aclnn_ffn_worker_scheduler, execute samples success [2026-01-28 11:40:50] Example completed successfully ``` ## 文档更新 <!--如果这个PR包含文档的更新,请在这里指出。例如:更新了README.md文件。--> Update the README.md files in attention/attention_worker_scheduler/ and ffn/ffn_worker_scheduler/, adding usage instructions ## 类型标签 <!-- [x] 表示选中 --> - [ ] Bug修复 - [x] 新特性 - [ ] 性能优化 - [ ] 文档更新 - [ ] 其他,请描述: See merge request: cann/ops-transformer!10763 个月前
common目录结构整改 Co-authored-by: hello_simida<wangyi206@huawei.com> # message auto-generated for no-merge-commit merge: !5133 merge master into master common目录结构整改 Created-by: hello_simida Commit-by: hello_simida Merged-by: cann-robot Description: ## 描述 完成 common 目录结构调整,并适配所有 CMake 和源码引用路径。 ## 关联的Issue Closes #2369 ## 测试 - 已通过单算子编译验证:bash build.sh --pkg --soc=ascend910b --ops=all_gather_matmul_v2 - build_out/ 已成功生成 .run 包 ## 文档更新 无 ## 类型标签 - [ ] 🐛 Bug 修复 - [ ] ✨ 新特性 - [ ] ⚡ 性能优化 - [x] ♻️ 重构 - [ ] 🧪 测试 - [ ] 📦 构建/CI - [ ] 🔧 配置变更 - [ ] 📝 文档更新 - [ ] ⬆️ 依赖升级 - [ ] 🔒 安全修复 - [ ] 🧹 代码清理 - [ ] ❓ 其他,请描述: See merge request: cann/ops-transformer!513314 天前
fix: aclrtMemcpy host to host for schedule_context, 修正示例代码内存拷贝类型错误 Co-authored-by: Developer user<jialimin1@huawei.com> # message auto-generated for no-merge-commit merge: !5918 merge master into master fix: aclrtMemcpy host to host for schedule_context, 修正示例代码内存拷贝类型错误 Created-by: jialimin1 Commit-by: Developer user Merged-by: cann-robot Description: ## 描述 1、修改aclrtMemcpy host to host for schedule_context, 修正示例代码内存拷贝类型错误 2、修改日志打印内容 ## 关联的Issue Issue #2640 Issue #2641 <!-- 如果这个PR是为了解决特定的Issue,请在这里提供Issue链接。例如:关联Issue #000--> <!-- 如果这个PR是为了解决特定的问题单,请在这里描述问题单单号。--> ## 测试 ``` 编译包验证: [2026-05-26 03:18:07] ./uninstall.sh [2026-05-26 03:18:07] ./upgrade.sh [2026-05-26 03:18:07] CRC: 1888598674 [2026-05-26 03:18:07] SHA256: e5e52d301ebc0a74b5f34866bf194936a5fc441b6658e781ba1bc761fd8fbde1 [2026-05-26 03:18:07] Skipping md5sum at user request [2026-05-26 03:18:07] [2026-05-26 03:18:07] Self-extractable archive "cann-ops-transformer-custom_linux-x86_64.run" successfully created. 样例验证: [2026-05-26 03:21:42] Start compile and run example file: ../ffn/ffn_worker_scheduler/examples/test_aclnn_inplace_ffn_worker_scheduler.cpp [2026-05-26 03:21:49] Init ffn success, token_info_buf_size=192,token_data_buf_size= 1024. [2026-05-26 03:21:49] layer_ids_buf_size = 8. [2026-05-26 03:21:49] session_ids_buf_size = 8. [2026-05-26 03:21:49] micro_batch_ids_buf_size = 8. [2026-05-26 03:21:49] expert_ids_buf_size = 80. [2026-05-26 03:21:49] layer_ids[0] is: 55 [2026-05-26 03:21:49] layer_ids[1] is: 55 [2026-05-26 03:21:49] session_ids[0] is: 0 [2026-05-26 03:21:49] session_ids[1] is: 1 [2026-05-26 03:21:49] micro_batch_ids[0] is: 0 [2026-05-26 03:21:49] micro_batch_ids[1] is: 0 [2026-05-26 03:21:49] expert_ids[0] is: 0 [2026-05-26 03:21:49] expert_ids[1] is: 1 [2026-05-26 03:21:49] expert_ids[2] is: 2 [2026-05-26 03:21:49] expert_ids[3] is: 3 [2026-05-26 03:21:49] expert_ids[4] is: 4 [2026-05-26 03:21:49] expert_ids[5] is: 5 [2026-05-26 03:21:49] expert_ids[6] is: 6 [2026-05-26 03:21:49] expert_ids[7] is: 7 [2026-05-26 03:21:49] expert_ids[8] is: 8 [2026-05-26 03:21:49] expert_ids[9] is: 9 [2026-05-26 03:21:49] expert_ids[10] is: 0 [2026-05-26 03:21:49] expert_ids[11] is: 1 [2026-05-26 03:21:49] expert_ids[12] is: 2 [2026-05-26 03:21:49] expert_ids[13] is: 3 [2026-05-26 03:21:49] expert_ids[14] is: 4 [2026-05-26 03:21:49] expert_ids[15] is: 5 [2026-05-26 03:21:49] expert_ids[16] is: 6 [2026-05-26 03:21:49] expert_ids[17] is: 7 [2026-05-26 03:21:49] expert_ids[18] is: 8 [2026-05-26 03:21:49] expert_ids[19] is: 9 [2026-05-26 03:21:50] run test_aclnn_ffn_worker_scheduler, execute samples success [2026-05-26 03:21:50] Example completed successfully ``` <!--描述进行了哪些测试来验证你的改动。包括但不限于二级冒烟、算子泛化等。--> ## 文档更新 不涉及 <!--如果这个PR包含文档的更新,请在这里指出。例如:更新了README.md文件。--> ## 类型标签 <!-- [x] 表示选中 --> - [x]? Bug 修复 - [ ] ✨ 新特性 - [ ] ⚡ 性能优化 - [ ] ♻️ 重构 - [ ] 🧪 测试 - [ ] 📦 构建/CI - [ ] 🔧 配置变更 - [ ] 📝 文档更新 - [ ] ⬆️ 依赖升级 - [ ] 🔒 安全修复 - [ ] 🧹 代码清理 - [ ] ❓ 其他,请描述: See merge request: cann/ops-transformer!59183 天前
common目录结构整改 Co-authored-by: hello_simida<wangyi206@huawei.com> # message auto-generated for no-merge-commit merge: !5133 merge master into master common目录结构整改 Created-by: hello_simida Commit-by: hello_simida Merged-by: cann-robot Description: ## 描述 完成 common 目录结构调整,并适配所有 CMake 和源码引用路径。 ## 关联的Issue Closes #2369 ## 测试 - 已通过单算子编译验证:bash build.sh --pkg --soc=ascend910b --ops=all_gather_matmul_v2 - build_out/ 已成功生成 .run 包 ## 文档更新 无 ## 类型标签 - [ ] 🐛 Bug 修复 - [ ] ✨ 新特性 - [ ] ⚡ 性能优化 - [x] ♻️ 重构 - [ ] 🧪 测试 - [ ] 📦 构建/CI - [ ] 🔧 配置变更 - [ ] 📝 文档更新 - [ ] ⬆️ 依赖升级 - [ ] 🔒 安全修复 - [ ] 🧹 代码清理 - [ ] ❓ 其他,请描述: See merge request: cann/ops-transformer!513314 天前
bugfix: fix libopai undefined symbol Co-authored-by: liukejin<liukejin@huawei.com> # message auto-generated for no-merge-commit merge: !927 merge fix_aclnn into master bugfix: fix libopai undefined symbol Created-by: liukejin Commit-by: liukejin Merged-by: cann-robot Description: ## 描述 <!--在这里详细描述你的改动,包括改动的原因和所采取的方法。--> bugfix: fix libopai undefined symbol 1. ffn_worker_scheduler ## 关联的Issue <!-- 如果这个PR是为了解决特定的Issue,请在这里提供Issue链接。例如:关联Issue #000--> <!-- 如果这个PR是为了解决特定的问题单,请在这里描述问题单单号。--> NA ## 测试 <!--描述进行了哪些测试来验证你的改动。包括但不限于二级冒烟、算子泛化等。--> ![image.png](https://raw.gitcode.com/user-images/assets/7673863/b983aeed-3574-4c9d-9b24-3348c3c88214/image.png 'image.png') ## 文档更新 <!--如果这个PR包含文档的更新,请在这里指出。例如:更新了README.md文件。--> NA ## 类型标签 <!-- [x] 表示选中 --> - [x] Bug修复 - [ ] 新特性 - [ ] 性能优化 - [ ] 文档更新 - [ ] 其他,请描述: See merge request: cann/ops-transformer!9274 个月前
doc Tools工具扫描问题修改 Co-authored-by: gitee-yanglulu<yanglulul@h-partners.com> # message auto-generated for no-merge-commit merge: !3432 merge master into master doc Tools工具扫描问题修改 Created-by: gitee-yanglulu Commit-by: gitee-yanglulu Merged-by: cann-robot Description: doc Tools工具扫描问题修改 See merge request: cann/ops-transformer!34322 个月前
README.md

FfnWorkScheduler

产品支持情况

产品 是否支持
Ascend 950PR/Ascend 950DT
Atlas A3 训练系列产品/Atlas A3 推理系列产品
Atlas A2 训练系列产品/Atlas A2 推理系列产品
Atlas 200I/500 A2 推理产品 ×
Atlas 推理系列产品
Atlas 训练系列产品

功能说明

  • 算子功能:Attention和FFN分离场景下,FFN侧数据扫描算子。该算子接收AttentionToFFN算子发送的数据,进行扫描并完成数据整理。

    不建议直接使用,需要与AttentionToFFN,FFNWorkerBatching配合使用。

    1. 接收AttentionToFFN算子发送的数据。该数据以ScheduleContext结构体内存排布方式存储。其具体定义参见调用示例。该结构体包含CommonArea,ControlArea,AttentionArea,FfnArea域。本接口涉及CommonArea(用于存储配置信息,如session_num,micro_batch_num,micro_batch_size,selected_expert_num),ControlArea(用于上层控制进程是否退出),FfnArea域(负责管理本算子计算过程中所需的输入及输出缓冲区,其中token_info_buf字段用来存储该算子的输入信息)。

    2. 扫描token_info_buf存储的信息,当通信数据准备就绪时,本算子开始进行数据整理。整理如下图所示,将layer id, session id,micro batch id,expert ids分别写入layer_id_buf,session_id_buf,micro_batch_id_buf,expert_ids_buf的device内存上。

    graph TB
        %% 输入缓冲区
        A[token_info_buf输入]
    
        %% Session 层级结构
        A --> Session0
        A --> Session1
    
        %% Session 0 内部结构
        subgraph Session0[session 0]
            direction TB
            S0_M1[micro batch id 0]:::micro
            S0_L1[layer id 0]:::layer
            S0_S1[session id 0]:::session0
            S0_E1[expert ids 0]:::expert
        end
    
        %% Session 1 内部结构
        subgraph Session1[session 1]
            direction TB
            S1_M1[micro batch id 0]:::micro
            S1_L1[layer id 0]:::layer
            S1_S1[session id 1]:::session1
            S1_E1[expert ids 0]:::expert
        end
    
        %% 输出缓冲区索引区域
        subgraph Output[输出区域]
            direction TB
            O1[layer_ids_buf]:::layer
            O2[session_ids_buf]:::output
            O3[micro_batch_ids_buf]:::micro
            O4[expert_ids_buf]:::expert
        end
    
        %% 数据流向
        S0_L1 -.-> O1
        S0_S1 -.-> O2
        S0_M1 -.-> O3
        S0_E1 -.-> O4
    
        S1_L1 -.-> O1
        S1_S1 -.-> O2
        S1_M1 -.-> O3
        S1_E1 -.-> O4
    
        classDef layer fill:#c8e6c9
        classDef session0 fill:#ffcdd2
        classDef session1 fill:#ffccbc
        classDef output fill:#e3f2fd
        classDef micro fill:#e1f5fe
        classDef expert fill:#bbdefd
        
        %% 添加子图背景色样式
        style Session0 fill:#fff3e0,stroke:#ff9800,stroke-width:2px
        style Session1 fill:#fce4ec,stroke:#e91e63,stroke-width:2px
        style Output fill:#e8f5e8,stroke:#4caf50,stroke-width:2px
    
    1. 完成数据整理后,后续可供FFNWorkerBatching算子使用。
  • 计算公式:

  1. 初始化,根据入参ScheduleContext中的session_num和sync_group_size计算分组个数。

  2. 若分组个数为1,表示全同步处理数据,待全部session数据准备就绪后,进行数据整理。

  3. 若分组个数不为1,表示非全同步处理数据,待group内的session数据准备就绪后,进行数据整理。

    Initialize:group_num=session_numsync_group_size\text{Initialize:} \quad\text{group\_num} = \frac{\text{session\_num}}{\text{sync\_group\_size}}

Process={check_all_session_ready()data_reorganization()if group_num=1check_all_sessions_of_group_ready()data_reorganization()otherwise\text{Process} = \begin{cases} \text{check\_all\_session\_ready()} \quad \text{data\_reorganization()} & \text{if } \text{group\_num} = 1 \\ \text{check\_all\_sessions\_of\_group\_ready()} \quad \text{data\_reorganization()} & \text{otherwise} \end{cases}

参数说明

  • 参数说明:

    参数名 输入/输出 描述 使用说明 数据类型 数据格式 维度(shape) 非连续Tensor
    scheduleContextRef 输入/输出 FFN侧接收的待处理数据,表示ScheduleContext信息,详细结构参见调用示例 不支持空tensor。 INT8 ND 1维,shape为(1024) ×
    syncGroupSize 输入 每个同步组处理的session个数。 取值范围为(0,session_num],session_num表示待处理数据的最大会话数,即调用示例中结构体ScheduleContext中CommonArea域的session_num字段。 INT32 - - -
    executeMode 输入 执行模式。 只支持模式0, 表示执行完一次退出。 INT32 - - -
    workspaceSize 输出 返回需要在Device侧申请的workspace大小。 - - - - -
    executor 输出 返回op执行器,包含了算子计算流程。 - - - - -

约束说明

无。

调用说明

调用方式 样例代码 说明
aclnn接口 test_aclnn_inplace_ffn_worker_scheduler 通过aclnnInplaceFfnWorkerScheduler接口方式调用FfnWorkScheduler算子。