ops-transformer_8242/attention/block_sparse_attention/op_kernel · zhuzemao/ops-transformer_8242 - AtomGit

文件	最后提交记录	最后更新时间
arch35	attn_infra 重复安装头文件修改 Co-authored-by: chenglongyu<chenglongyu@huawei.com> # message auto-generated for no-merge-commit merge: !5682 merge refactor_attn_infra_prefix_front into master attn_infra 重复安装头文件修改 Created-by: chenglongyu Commit-by: chenglongyu Merged-by: cann-robot Description: ## 描述 block_sparse_attention、block_sparse_attention_grad、rain_fusion_attention、fused_infer_attention_score算子的op_kernel目录下各自维护的attn_infra目录存在头文件命名重复的问题。各头文件加上算子名前缀做区分。 ## 关联的Issue https://gitcode.com/cann/ops-transformer/issues/2680 ## 测试 <!--描述进行了哪些测试来验证你的改动。包括但不限于二级冒烟、算子泛化等。--> ## 文档更新 <!--如果这个PR包含文档的更新，请在这里指出。例如：更新了README.md文件。--> ## 类型标签 <!-- [x] 表示选中 --> - [ ] 🐛 Bug 修复 - [ ] ✨ 新特性 - [ ] ⚡ 性能优化 - [ ] ♻️ 重构 - [ ] 🧪 测试 - [ ] 📦 构建/CI - [ ] 🔧 配置变更 - [ ] 📝 文档更新 - [ ] ⬆️ 依赖升级 - [ ] 🔒 安全修复 - [ ] 🧹 代码清理 - [ ] ❓ 其他，请描述： See merge request: cann/ops-transformer!5682	7 天前
attn_infra	attn_infra 重复安装头文件修改 Co-authored-by: chenglongyu<chenglongyu@huawei.com> # message auto-generated for no-merge-commit merge: !5682 merge refactor_attn_infra_prefix_front into master attn_infra 重复安装头文件修改 Created-by: chenglongyu Commit-by: chenglongyu Merged-by: cann-robot Description: ## 描述 block_sparse_attention、block_sparse_attention_grad、rain_fusion_attention、fused_infer_attention_score算子的op_kernel目录下各自维护的attn_infra目录存在头文件命名重复的问题。各头文件加上算子名前缀做区分。 ## 关联的Issue https://gitcode.com/cann/ops-transformer/issues/2680 ## 测试 <!--描述进行了哪些测试来验证你的改动。包括但不限于二级冒烟、算子泛化等。--> ## 文档更新 <!--如果这个PR包含文档的更新，请在这里指出。例如：更新了README.md文件。--> ## 类型标签 <!-- [x] 表示选中 --> - [ ] 🐛 Bug 修复 - [ ] ✨ 新特性 - [ ] ⚡ 性能优化 - [ ] ♻️ 重构 - [ ] 🧪 测试 - [ ] 📦 构建/CI - [ ] 🔧 配置变更 - [ ] 📝 文档更新 - [ ] ⬆️ 依赖升级 - [ ] 🔒 安全修复 - [ ] 🧹 代码清理 - [ ] ❓ 其他，请描述： See merge request: cann/ops-transformer!5682	7 天前
tla	attn_infra 重复安装头文件修改 Co-authored-by: chenglongyu<chenglongyu@huawei.com> # message auto-generated for no-merge-commit merge: !5682 merge refactor_attn_infra_prefix_front into master attn_infra 重复安装头文件修改 Created-by: chenglongyu Commit-by: chenglongyu Merged-by: cann-robot Description: ## 描述 block_sparse_attention、block_sparse_attention_grad、rain_fusion_attention、fused_infer_attention_score算子的op_kernel目录下各自维护的attn_infra目录存在头文件命名重复的问题。各头文件加上算子名前缀做区分。 ## 关联的Issue https://gitcode.com/cann/ops-transformer/issues/2680 ## 测试 <!--描述进行了哪些测试来验证你的改动。包括但不限于二级冒烟、算子泛化等。--> ## 文档更新 <!--如果这个PR包含文档的更新，请在这里指出。例如：更新了README.md文件。--> ## 类型标签 <!-- [x] 表示选中 --> - [ ] 🐛 Bug 修复 - [ ] ✨ 新特性 - [ ] ⚡ 性能优化 - [ ] ♻️ 重构 - [ ] 🧪 测试 - [ ] 📦 构建/CI - [ ] 🔧 配置变更 - [ ] 📝 文档更新 - [ ] ⬆️ 依赖升级 - [ ] 🔒 安全修复 - [ ] 🧹 代码清理 - [ ] ❓ 其他，请描述： See merge request: cann/ops-transformer!5682	7 天前
block_sparse_attention.cpp	【Feature】Introduce bsa on 950 Co-authored-by: chenyizhou<chenyizhou6@huawei.com> # message auto-generated for no-merge-commit merge: !3323 merge bsa_950_poc into master 【Feature】Introduce bsa on 950 Created-by: chenyizhou Commit-by: chenyizhou Merged-by: cann-robot Description: ## 描述新增aclnnBlockSparseAttention的Ascend950DT/Ascend950PR实现 - 新增了aclnnBlockSparseAttention依赖的attn_infra组件与kernel文件 - 完善了tiling文件以同时支持A2/A3/Ascend950DT/Ascend950PR系列产品 - 修改了文档中的芯片支持描述、入参配置在不同代际硬件下的支持情况 ## 关联的Issue 关联Issue [#1628](https://gitcode.com/cann/ops-transformer/issues/1628) ## 测试在当前的入参允许范围内进行了2000条用例的泛化测试，精度正常 ## 文档更新更新了aclnnBlockSparseAttention算子说明文档 ## 类型标签 <!-- [x] 表示选中 --> - [ ] 🐛 Bug 修复 - [x] ✨ 新特性 - [ ] ⚡ 性能优化 - [ ] ♻️ 重构 - [ ] 🧪 测试 - [ ] 📦 构建/CI - [ ] 🔧 配置变更 - [ ] 📝 文档更新 - [ ] ⬆️ 依赖升级 - [ ] 🔒 安全修复 - [ ] 🧹 代码清理 - [ ] ❓ 其他，请描述： See merge request: cann/ops-transformer!3323	1 个月前
block_sparse_attention_kernel_common.hpp	attention 重复安装头文件修改 Co-authored-by: chenglongyu<chenglongyu@huawei.com> # message auto-generated for no-merge-commit merge: !6020 merge repeat_clean_ins into master attention 重复安装头文件修改 Created-by: chenglongyu Commit-by: chenglongyu Merged-by: cann-robot Description: ## 描述各算子的目录下各自维护的头文件存在命名重复的问题。各头文件加上算子名前缀做区分。 \| 原头文件名称 \| 新头文件名称 \| 文件路径 \| \|-----------\|-----------\|---------\| \| common_header.h \| sparse_flash_mla_grad_common_header.h \| attention/sparse_flash_mla_grad/op_kernel/arch22/basic_modules/sparse_flash_mla_grad_common_header.h \| \| common_header.h \| sparse_flash_attention_grad_common_header.h \| attention/sparse_flash_attention_grad/basic_modules/sparse_flash_attention_grad_common_header.h \| \| common_header.h \| nsa_selected_attention_grad_common_header.h \| attention/nsa_selected_attention_grad/basic_modules/nsa_selected_attention_grad_common_header.h \| \| common_header.h \| flash_attention_score_grad_common_header.h \| attention/flash_attention_score_grad/op_kernel/arch22/basic_modules/flash_attention_score_grad_common_header.h \| \| common_utils.h \| attention_worker_combine_common_utils.h \| attention/attention_worker_combine/op_kernel/attention_worker_combine_common_utils.h \| \| gm_to_l1_iterator.h \| mla_preprocess_gm_to_l1_iterator.h \| attention/mla_preprocess/op_kernel/mla_preprocess_gm_to_l1_iterator.h \| \| gm_to_ub_iterator.h \| mla_preprocess_gm_to_ub_iterator.h \| attention/mla_preprocess/op_kernel/mla_preprocess_gm_to_ub_iterator.h \| \| kernel_common.hpp \| rain_fusion_attention_kernel_common.hpp \| attention/rain_fusion_attention/op_kernel/rain_fusion_attention_kernel_common.hpp \| \| kernel_common.hpp \| fia_kernel_common.hpp \| attention/fused_infer_attention_score/op_kernel/fia_kernel_common.hpp \| \| kernel_common.hpp \| block_sparse_attention_kernel_common.hpp \| attention/block_sparse_attention/op_kernel/block_sparse_attention_kernel_common.hpp \| \| l0c_to_gm_iterator.h \| mla_preprocess_l0c_to_gm_iterator.h \| attention/mla_preprocess/op_kernel/mla_preprocess_l0c_to_gm_iterator.h \| \| l0c_to_l1_iterator.h \| mla_preprocess_l0c_to_l1_iterator.h \| attention/mla_preprocess/op_kernel/mla_preprocess_l0c_to_l1_iterator.h \| \| l0c_to_ub_iterator.h \| mla_preprocess_l0c_to_ub_iterator.h \| attention/mla_preprocess/op_kernel/mla_preprocess_l0c_to_ub_iterator.h \| \| l1_to_bt_iterator.h \| mla_preprocess_l1_to_bt_iterator.h \| attention/mla_preprocess/op_kernel/mla_preprocess_l1_to_bt_iterator.h \| \| l1_to_fb_iterator.h \| mla_preprocess_l1_to_fb_iterator.h \| attention/mla_preprocess/op_kernel/mla_preprocess_l1_to_fb_iterator.h \| \| l1_to_l0_iterator.h \| mla_preprocess_l1_to_l0_iterator.h \| attention/mla_preprocess/op_kernel/mla_preprocess_l1_to_l0_iterator.h \| \| l1_to_ub_iterator.h \| mla_preprocess_l1_to_ub_iterator.h \| attention/mla_preprocess/op_kernel/mla_preprocess_l1_to_ub_iterator.h \| \| mla_common.h \| prompt_flash_attention_mla_common.h \| attention/prompt_flash_attention/op_kernel/arch22/prompt_flash_attention_mla_common.h \| \| mla_common.h \| mla_preprocess_mla_common.h \| attention/mla_preprocess/op_kernel/mla_preprocess_mla_common.h \| \| cube_op.h \| sparse_flash_mla_grad_cube_op.h \| attention/sparse_flash_mla_grad/op_kernel/arch22/basic_modules/sparse_flash_mla_grad_cube_op.h \| \| matmul.h \| sparse_flash_mla_grad_matmul.h \| attention/sparse_flash_mla_grad/op_kernel/arch22/basic_modules/sparse_flash_mla_grad_matmul.h \| \| vec_op.h \| sparse_flash_mla_grad_vec_op.h \| attention/sparse_flash_mla_grad/op_kernel/arch22/basic_modules/sparse_flash_mla_grad_vec_op.h \| \| cube_op.h \| sparse_flash_attention_grad_cube_op.h \| attention/sparse_flash_attention_grad/basic_modules/sparse_flash_attention_grad_cube_op.h \| \| matmul.h \| sparse_flash_attention_grad_matmul.h \| attention/sparse_flash_attention_grad/basic_modules/sparse_flash_attention_grad_matmul.h \| \| vec_op.h \| sparse_flash_attention_grad_vec_op.h \| attention/sparse_flash_attention_grad/basic_modules/sparse_flash_attention_grad_vec_op.h \| ## 关联的Issue https://gitcode.com/cann/ops-transformer/issues/2680 ## 测试 <!--描述进行了哪些测试来验证你的改动。包括但不限于二级冒烟、算子泛化等。--> ## 文档更新 <!--如果这个PR包含文档的更新，请在这里指出。例如：更新了README.md文件。--> ## 类型标签 <!-- [x] 表示选中 --> - [ ] 🐛 Bug 修复 - [ ] ✨ 新特性 - [ ] ⚡ 性能优化 - [ ] ♻️ 重构 - [ ] 🧪 测试 - [ ] 📦 构建/CI - [ ] 🔧 配置变更 - [ ] 📝 文档更新 - [ ] ⬆️ 依赖升级 - [ ] 🔒 安全修复 - [ ] 🧹 代码清理 - [ ] ❓ 其他，请描述： See merge request: cann/ops-transformer!6020	4 天前
block_sparse_attention_kernel_interface.cpp	cmake arch32整改为arch22 Co-authored-by: huang-chuhong<huangchuhong1@h-partners.com> # message auto-generated for no-merge-commit merge: !5408 merge master into master cmake arch32整改为arch22 Created-by: huang-chuhong Commit-by: huang-chuhong Merged-by: cann-robot Description: ## 描述 cmake arch32整改为arch22 ## 关联的Issue https://gitcode.com/cann/ops-transformer/issues/1784 ## 测试 <!--描述进行了哪些测试来验证你的改动。包括但不限于二级冒烟、算子泛化等。--> ## 文档更新 <!--如果这个PR包含文档的更新，请在这里指出。例如：更新了README.md文件。--> ## 类型标签 <!-- [x] 表示选中 --> - [ ] 🐛 Bug 修复 - [ ] ✨ 新特性 - [ ] ⚡ 性能优化 - [ ] ♻️ 重构 - [ ] 🧪 测试 - [ ] 📦 构建/CI - [ ] 🔧 配置变更 - [ ] 📝 文档更新 - [ ] ⬆️ 依赖升级 - [ ] 🔒 安全修复 - [ ] 🧹 代码清理 - [ ] ❓ 其他，请描述： See merge request: cann/ops-transformer!5408	15 天前
block_sparse_attention_kernel_regular_arch22.h	attention 重复安装头文件修改 Co-authored-by: chenglongyu<chenglongyu@huawei.com> # message auto-generated for no-merge-commit merge: !6020 merge repeat_clean_ins into master attention 重复安装头文件修改 Created-by: chenglongyu Commit-by: chenglongyu Merged-by: cann-robot Description: ## 描述各算子的目录下各自维护的头文件存在命名重复的问题。各头文件加上算子名前缀做区分。 \| 原头文件名称 \| 新头文件名称 \| 文件路径 \| \|-----------\|-----------\|---------\| \| common_header.h \| sparse_flash_mla_grad_common_header.h \| attention/sparse_flash_mla_grad/op_kernel/arch22/basic_modules/sparse_flash_mla_grad_common_header.h \| \| common_header.h \| sparse_flash_attention_grad_common_header.h \| attention/sparse_flash_attention_grad/basic_modules/sparse_flash_attention_grad_common_header.h \| \| common_header.h \| nsa_selected_attention_grad_common_header.h \| attention/nsa_selected_attention_grad/basic_modules/nsa_selected_attention_grad_common_header.h \| \| common_header.h \| flash_attention_score_grad_common_header.h \| attention/flash_attention_score_grad/op_kernel/arch22/basic_modules/flash_attention_score_grad_common_header.h \| \| common_utils.h \| attention_worker_combine_common_utils.h \| attention/attention_worker_combine/op_kernel/attention_worker_combine_common_utils.h \| \| gm_to_l1_iterator.h \| mla_preprocess_gm_to_l1_iterator.h \| attention/mla_preprocess/op_kernel/mla_preprocess_gm_to_l1_iterator.h \| \| gm_to_ub_iterator.h \| mla_preprocess_gm_to_ub_iterator.h \| attention/mla_preprocess/op_kernel/mla_preprocess_gm_to_ub_iterator.h \| \| kernel_common.hpp \| rain_fusion_attention_kernel_common.hpp \| attention/rain_fusion_attention/op_kernel/rain_fusion_attention_kernel_common.hpp \| \| kernel_common.hpp \| fia_kernel_common.hpp \| attention/fused_infer_attention_score/op_kernel/fia_kernel_common.hpp \| \| kernel_common.hpp \| block_sparse_attention_kernel_common.hpp \| attention/block_sparse_attention/op_kernel/block_sparse_attention_kernel_common.hpp \| \| l0c_to_gm_iterator.h \| mla_preprocess_l0c_to_gm_iterator.h \| attention/mla_preprocess/op_kernel/mla_preprocess_l0c_to_gm_iterator.h \| \| l0c_to_l1_iterator.h \| mla_preprocess_l0c_to_l1_iterator.h \| attention/mla_preprocess/op_kernel/mla_preprocess_l0c_to_l1_iterator.h \| \| l0c_to_ub_iterator.h \| mla_preprocess_l0c_to_ub_iterator.h \| attention/mla_preprocess/op_kernel/mla_preprocess_l0c_to_ub_iterator.h \| \| l1_to_bt_iterator.h \| mla_preprocess_l1_to_bt_iterator.h \| attention/mla_preprocess/op_kernel/mla_preprocess_l1_to_bt_iterator.h \| \| l1_to_fb_iterator.h \| mla_preprocess_l1_to_fb_iterator.h \| attention/mla_preprocess/op_kernel/mla_preprocess_l1_to_fb_iterator.h \| \| l1_to_l0_iterator.h \| mla_preprocess_l1_to_l0_iterator.h \| attention/mla_preprocess/op_kernel/mla_preprocess_l1_to_l0_iterator.h \| \| l1_to_ub_iterator.h \| mla_preprocess_l1_to_ub_iterator.h \| attention/mla_preprocess/op_kernel/mla_preprocess_l1_to_ub_iterator.h \| \| mla_common.h \| prompt_flash_attention_mla_common.h \| attention/prompt_flash_attention/op_kernel/arch22/prompt_flash_attention_mla_common.h \| \| mla_common.h \| mla_preprocess_mla_common.h \| attention/mla_preprocess/op_kernel/mla_preprocess_mla_common.h \| \| cube_op.h \| sparse_flash_mla_grad_cube_op.h \| attention/sparse_flash_mla_grad/op_kernel/arch22/basic_modules/sparse_flash_mla_grad_cube_op.h \| \| matmul.h \| sparse_flash_mla_grad_matmul.h \| attention/sparse_flash_mla_grad/op_kernel/arch22/basic_modules/sparse_flash_mla_grad_matmul.h \| \| vec_op.h \| sparse_flash_mla_grad_vec_op.h \| attention/sparse_flash_mla_grad/op_kernel/arch22/basic_modules/sparse_flash_mla_grad_vec_op.h \| \| cube_op.h \| sparse_flash_attention_grad_cube_op.h \| attention/sparse_flash_attention_grad/basic_modules/sparse_flash_attention_grad_cube_op.h \| \| matmul.h \| sparse_flash_attention_grad_matmul.h \| attention/sparse_flash_attention_grad/basic_modules/sparse_flash_attention_grad_matmul.h \| \| vec_op.h \| sparse_flash_attention_grad_vec_op.h \| attention/sparse_flash_attention_grad/basic_modules/sparse_flash_attention_grad_vec_op.h \| ## 关联的Issue https://gitcode.com/cann/ops-transformer/issues/2680 ## 测试 <!--描述进行了哪些测试来验证你的改动。包括但不限于二级冒烟、算子泛化等。--> ## 文档更新 <!--如果这个PR包含文档的更新，请在这里指出。例如：更新了README.md文件。--> ## 类型标签 <!-- [x] 表示选中 --> - [ ] 🐛 Bug 修复 - [ ] ✨ 新特性 - [ ] ⚡ 性能优化 - [ ] ♻️ 重构 - [ ] 🧪 测试 - [ ] 📦 构建/CI - [ ] 🔧 配置变更 - [ ] 📝 文档更新 - [ ] ⬆️ 依赖升级 - [ ] 🔒 安全修复 - [ ] 🧹 代码清理 - [ ] ❓ 其他，请描述： See merge request: cann/ops-transformer!6020	4 天前
block_sparse_attention_tilingkey.h	【Feature】Introduce bsa on 950 Co-authored-by: chenyizhou<chenyizhou6@huawei.com> # message auto-generated for no-merge-commit merge: !3323 merge bsa_950_poc into master 【Feature】Introduce bsa on 950 Created-by: chenyizhou Commit-by: chenyizhou Merged-by: cann-robot Description: ## 描述新增aclnnBlockSparseAttention的Ascend950DT/Ascend950PR实现 - 新增了aclnnBlockSparseAttention依赖的attn_infra组件与kernel文件 - 完善了tiling文件以同时支持A2/A3/Ascend950DT/Ascend950PR系列产品 - 修改了文档中的芯片支持描述、入参配置在不同代际硬件下的支持情况 ## 关联的Issue 关联Issue [#1628](https://gitcode.com/cann/ops-transformer/issues/1628) ## 测试在当前的入参允许范围内进行了2000条用例的泛化测试，精度正常 ## 文档更新更新了aclnnBlockSparseAttention算子说明文档 ## 类型标签 <!-- [x] 表示选中 --> - [ ] 🐛 Bug 修复 - [x] ✨ 新特性 - [ ] ⚡ 性能优化 - [ ] ♻️ 重构 - [ ] 🧪 测试 - [ ] 📦 构建/CI - [ ] 🔧 配置变更 - [ ] 📝 文档更新 - [ ] ⬆️ 依赖升级 - [ ] 🔒 安全修复 - [ ] 🧹 代码清理 - [ ] ❓ 其他，请描述： See merge request: cann/ops-transformer!3323	1 个月前