MindIE-SD/mindiesd/layers/flash_attn · Ascend/MindIE-SD - AtomGit

ascend-robot[Feature][SLA] Add AscendC backend

文件	最后提交记录	最后更新时间
__init__.py	[Bugfix][Compilation]import mindiesd crashes with std::bad_alloc when triton is not installed Co-authored-by: chy3<843049740@qq.com> # message auto-generated for no-merge-commit merge: !334 merge dev into dev [Bugfix][Compilation]import mindiesd crashes with std::bad_alloc when triton is not installed Created-by: chy3 Commit-by: chy3 Merged-by: ascend-robot Description: <!-- PR描述模板更新日期：20251224 --> # Which issue(s) this PR fixes or accomplishes > Fixes #ISSUE 168\ # Purpose 当环境中未安装 triton 时，import mindiesd 会因 ModuleNotFoundError 失败，随后破坏 PyTorch C++ 运行时状态，导致任何后续张量操作触发 std::bad_alloc 或 std::length_error: vector::reserve 崩溃。现通过环境中是否有triton包来判断是否使用triton相关融合算子，具体来说： `python if _HAS_TRITON and _extension_module is not None : 调用triton融合算子，如SparseLinearAttention else: def ops(args, *kwargs): raise RuntimeError("ops requires Triton-Ascend >= 3.2.1 but it is not available.") warnings.warn( "Triton-Ascend is not available or is below 3.2.1." "xxx ops is disabled. Install required dependencies to use it.", UserWarning, )` # Test Plan - 1、在不具备要求的triton环境中使用triton融合算子，应出现userwarning，且不影响其他算子使用； - 2、在具备要求的triton环境中使用triton融合算子，可以正常调用。 # Test Report - 1、不具备要求的triton环境： ![image.png](https://raw.gitcode.com/user-images/assets/8476587/bd275fe2-9812-42b8-ba80-1e2d358f69cb/image.png 'image.png') - 2、具备要求的triton环境： ![image.png](https://raw.gitcode.com/user-images/assets/8476587/2f2262cd-73a3-48d2-8635-9c41150fd517/image.png 'image.png') See merge request: Ascend/MindIE-SD!334	21 天前
ascend_laser_attention.py	[Feature][flash_attn]A5 注意力/稀疏算子兼容适配：公共 API 自动路由与旧算子保护 Co-authored-by: daviwang<daviwang@noreply.gitcode.com> # message auto-generated for no-merge-commit merge: !315 merge dev into dev [Feature][flash_attn]A5 注意力/稀疏算子兼容适配：公共 API 自动路由与旧算子保护 Created-by: mazhixin00_00 Commit-by: daviwang Merged-by: ascend-robot Description: <!-- PR描述模板更新日期：20251224 --> # Which issue(s) this PR fixes or accomplishes Fixes #162 # Purpose 适配 A5芯片：部分注意力与稀疏注意力算子在 A5 上不再可用，本 PR 让上层代码无需改动即可在 A5 上继续运行，并对直接调用下线算子的场景给出清晰报错与迁移指引。 - 新增 `is_a5_device()`（`mindiesd/utils/get_platform.py`）统一识别 A5。 - 公共 API 在 A5 上自动路由： - `attention_forward`（manual / static / 默认）：`ascend_laser_attention`、`prompt_flash_attn` → `fused_attn_score`（warning 提示迁移）。 - `sparse_attention`：`rf_v2` → `rf_v3`（`inner_precise` 按 v3 要求强制为 4，info 日志）；`ada_bsa` 抛 `ParametersInvalid`，后继算子待下个版本（v2）提供。 - 底层算子被绕过公共 API 直接调用时，在 A5 上抛 `ParametersInvalid` 并指引使用公共 API： `ascend_laser_attention`、`ascend_laser_preprocess`、`prompt_flash_attn`（同时移除 `register_op_a5` 注册）、`rain_fusion_attention`、`ada_block_sparse_attention` / `get_estimate_mask`。 - 行为变更：`attention_func.get_attention_function_static` 默认 op_type 由 `prompt_flash_attn` 调整为 `fused_attn_score`（影响非 A5 默认路径，已确认为预期）。 # Test Plan 1、在A5上，调用attention_forward接口，测试传入LA、PFA的场景，自动路由到fascore 2、在A5上，调用sparse_attention接口，测试传入ada_bsa、rf_v2的场景。传入ada_bsa，抛错，等A5版本；传入rf_v2，路由到rf_v3 3、全量测试，在A5上不能跑的用例直接跳过 # Test Report 1、 ![image.png](https://raw.gitcode.com/user-images/assets/8476587/66a8b561-abad-443b-bd6e-117e00cbff5a/image.png 'image.png') 返回 ![image.png](https://raw.gitcode.com/user-images/assets/8476587/0f1e73ec-672e-4136-8849-c61d1db67d76/image.png 'image.png') 2、 ![image.png](https://raw.gitcode.com/user-images/assets/8476587/0143e02f-d28d-4183-b74f-cb1e5617be35/image.png 'image.png') 返回 ![image.png](https://raw.gitcode.com/user-images/assets/8476587/435ab5b5-8c71-4b0d-84e8-c47d45d66392/image.png 'image.png') See merge request: Ascend/MindIE-SD!315	26 天前
ascend_laser_preprocess.py	[Bugfix][ops] Fix la_preprocess buffer allocation mismatch with infer-shape contract Co-authored-by: changetheway<guotaoyuan1@h-partners.com> # message auto-generated for no-merge-commit merge: !307 merge la_preprocess into dev [Bugfix][ops] Fix la_preprocess buffer allocation mismatch with infer-shape contract Created-by: changetheway Commit-by: changetheway Merged-by: ascend-robot Description: # Which issue(s) this PR fixes or accomplishes Fixes #<142> # Purpose 1. C++ 插件 (`csrc/plugin/la_preprocess.cpp`)：调整实现细节。 2. Python 实现 (`mindiesd/layers/flash_attn/ascend_laser_preprocess.py`)： - 移除 `AttentionParam` 依赖，简化 `forward_preprocess` 逻辑； - 新增 query/key/value 的 4D 维度校验； - 将 head dimensions 联合校验拆分为 key/value 两个独立条件，分别抛出带上下文的 `ParametersInvalid` 异常，提升可调试性。 3. 测试 (`tests/plugin/test_la_preprocess.py`)： - 重构并大幅扩充测试覆盖，新增 shape、consistency、align_len、dtype、layout、memory、device 等多维度测试； - 补齐 `test_python_entry_value_head__mismatch` 的异常消息断言，消除假阴性； - 同步更新 key 测试的断言以匹配新的独立错误消息。 4. 代码规范* (`pre-commit/pyproject.toml`)： - 为 `tests/*/` 添加 `per-file-ignores`，跳过 `F401`/`I`/`E402`，避免测试代码因 import 风格阻塞提交。 # Test Plan 1. 编译验证：`cd build && bash build.sh` 2. 回归验证：`python3 tests/plugin/test_la_preprocess.py` # Test Report - bash build.sh 编译通过 - `python3 tests/plugin/test_la_preprocess.py -v` 通过（18/18） ![image.png](https://raw.atomgit.com/user-images/assets/8476587/9e9dafb5-b3b4-49ff-9bb5-367986eda509/image.png 'image.png') See merge request: Ascend/MindIE-SD!307	12 天前
attention_forward.py	[Bugfix][log]Unify MindIE SD logging and improve diagnostics Co-authored-by: guowenna1<guowenna1@huawei.com> # message auto-generated for no-merge-commit merge: !328 merge 0603_log into dev [Bugfix][log]Unify MindIE SD logging and improve diagnostics Created-by: guowenna1 Commit-by: guowenna1 Merged-by: ascend-robot Description: # Which issue(s) this PR fixes or accomplishes Fix part of https://gitcode.com/Ascend/MindIE-SD/issues/158 # Purpose 本 PR 修复 MindIE SD 日志问题，主要包括： 1. 统一 MindIE SD Python 模块日志出口，避免 compilation、share_memory 等模块直接使用标准库 `logging.getLogger(__name__)`，导致日志格式、落盘路径、过滤级别和开关行为不一致。 2. 优化日志模块默认输出格式，确保默认与 verbose 模式均包含 MindIE SD 组件标识。 3. 精简默认 INFO 场景日志，将正常流程、调试态信息降级为 DEBUG，避免默认运行场景产生不必要日志。 4. 增强 WARNING/ERROR 日志内容，补充问题描述、可能根因、参数期望值/实际值和进一步排查建议。 5. 修复 pre-commit 暴露的日志格式、pylint、bandit、typos 等问题，包括日志参数数量不匹配、拼写错误、动态 API 静态检查误报标注和 EPLB scheduler 嵌套层级过深问题。eplb_scheduler、greedy_algorithm中此类改动较多，多是形式改动，无实际影响。 # Test Plan 1. 执行 pre-commit 全量检查，覆盖 ruff、pylint、bandit、typos 等静态质量门禁。 2. 执行 Python 编译检查，确认本次修改未引入语法错误。 3. 执行 git diff 空白检查，确认无行尾空白、格式污染。 4. 白盒检查 `mindiesd` 正式代码中默认 INFO 日志是否清理完成。 5. 白盒检查除日志模块本体外，是否仍存在直接使用标准库 `logging.getLogger(__name__)` 的模块。 # Test Report 已执行并通过： ![image.png](https://raw.gitcode.com/user-images/assets/8476587/ff59361f-df29-445c-b41f-a0de7bc44ff2/image.png 'image.png') See merge request: Ascend/MindIE-SD!328	23 天前
attention_forward_varlen.py	【docs】文档修改-增加API参考&加速API Co-authored-by: xiao-qing123<xiaoqing14@h-partners.com> # message auto-generated for no-merge-commit merge: !263 merge dev into dev 【docs】文档修改-增加API参考&加速API Created-by: xiao-qing123 Commit-by: xiao-qing123 Merged-by: ascend-robot Description: fixes [#86](https://gitcode.com/Ascend/MindIE-SD/issues/86) 1、新增API参考（社区API接口） 2、新增加速API（原社区layer层） 3、删除readme中的快速入门和单多卡并行示例内容（有单独的quick_start承载） 4、算子融合单独拆分出来，在特性章节独立存在 5、删除特性章节目录名称中的“加速特性” 6、黄区大模型检测问题修改 See merge request: Ascend/MindIE-SD!263	2 个月前
attention_func.py	[Bugfix][log]Unify MindIE SD logging and improve diagnostics Co-authored-by: guowenna1<guowenna1@huawei.com> # message auto-generated for no-merge-commit merge: !328 merge 0603_log into dev [Bugfix][log]Unify MindIE SD logging and improve diagnostics Created-by: guowenna1 Commit-by: guowenna1 Merged-by: ascend-robot Description: # Which issue(s) this PR fixes or accomplishes Fix part of https://gitcode.com/Ascend/MindIE-SD/issues/158 # Purpose 本 PR 修复 MindIE SD 日志问题，主要包括： 1. 统一 MindIE SD Python 模块日志出口，避免 compilation、share_memory 等模块直接使用标准库 `logging.getLogger(__name__)`，导致日志格式、落盘路径、过滤级别和开关行为不一致。 2. 优化日志模块默认输出格式，确保默认与 verbose 模式均包含 MindIE SD 组件标识。 3. 精简默认 INFO 场景日志，将正常流程、调试态信息降级为 DEBUG，避免默认运行场景产生不必要日志。 4. 增强 WARNING/ERROR 日志内容，补充问题描述、可能根因、参数期望值/实际值和进一步排查建议。 5. 修复 pre-commit 暴露的日志格式、pylint、bandit、typos 等问题，包括日志参数数量不匹配、拼写错误、动态 API 静态检查误报标注和 EPLB scheduler 嵌套层级过深问题。eplb_scheduler、greedy_algorithm中此类改动较多，多是形式改动，无实际影响。 # Test Plan 1. 执行 pre-commit 全量检查，覆盖 ruff、pylint、bandit、typos 等静态质量门禁。 2. 执行 Python 编译检查，确认本次修改未引入语法错误。 3. 执行 git diff 空白检查，确认无行尾空白、格式污染。 4. 白盒检查 `mindiesd` 正式代码中默认 INFO 日志是否清理完成。 5. 白盒检查除日志模块本体外，是否仍存在直接使用标准库 `logging.getLogger(__name__)` 的模块。 # Test Report 已执行并通过： ![image.png](https://raw.gitcode.com/user-images/assets/8476587/ff59361f-df29-445c-b41f-a0de7bc44ff2/image.png 'image.png') See merge request: Ascend/MindIE-SD!328	23 天前
attention_operate.py	【docs】文档修改-增加API参考&加速API Co-authored-by: xiao-qing123<xiaoqing14@h-partners.com> # message auto-generated for no-merge-commit merge: !263 merge dev into dev 【docs】文档修改-增加API参考&加速API Created-by: xiao-qing123 Commit-by: xiao-qing123 Merged-by: ascend-robot Description: fixes [#86](https://gitcode.com/Ascend/MindIE-SD/issues/86) 1、新增API参考（社区API接口） 2、新增加速API（原社区layer层） 3、删除readme中的快速入门和单多卡并行示例内容（有单独的quick_start承载） 4、算子融合单独拆分出来，在特性章节独立存在 5、删除特性章节目录名称中的“加速特性” 6、黄区大模型检测问题修改 See merge request: Ascend/MindIE-SD!263	2 个月前
common.py	【docs】文档修改-增加API参考&加速API Co-authored-by: xiao-qing123<xiaoqing14@h-partners.com> # message auto-generated for no-merge-commit merge: !263 merge dev into dev 【docs】文档修改-增加API参考&加速API Created-by: xiao-qing123 Commit-by: xiao-qing123 Merged-by: ascend-robot Description: fixes [#86](https://gitcode.com/Ascend/MindIE-SD/issues/86) 1、新增API参考（社区API接口） 2、新增加速API（原社区layer层） 3、删除readme中的快速入门和单多卡并行示例内容（有单独的quick_start承载） 4、算子融合单独拆分出来，在特性章节独立存在 5、删除特性章节目录名称中的“加速特性” 6、黄区大模型检测问题修改 See merge request: Ascend/MindIE-SD!263	2 个月前
fused_attn_score.py	【docs】文档修改-增加API参考&加速API Co-authored-by: xiao-qing123<xiaoqing14@h-partners.com> # message auto-generated for no-merge-commit merge: !263 merge dev into dev 【docs】文档修改-增加API参考&加速API Created-by: xiao-qing123 Commit-by: xiao-qing123 Merged-by: ascend-robot Description: fixes [#86](https://gitcode.com/Ascend/MindIE-SD/issues/86) 1、新增API参考（社区API接口） 2、新增加速API（原社区layer层） 3、删除readme中的快速入门和单多卡并行示例内容（有单独的quick_start承载） 4、算子融合单独拆分出来，在特性章节独立存在 5、删除特性章节目录名称中的“加速特性” 6、黄区大模型检测问题修改 See merge request: Ascend/MindIE-SD!263	2 个月前
prompt_flash_attn.py	[Feature][flash_attn]A5 注意力/稀疏算子兼容适配：公共 API 自动路由与旧算子保护 Co-authored-by: daviwang<daviwang@noreply.gitcode.com> # message auto-generated for no-merge-commit merge: !315 merge dev into dev [Feature][flash_attn]A5 注意力/稀疏算子兼容适配：公共 API 自动路由与旧算子保护 Created-by: mazhixin00_00 Commit-by: daviwang Merged-by: ascend-robot Description: <!-- PR描述模板更新日期：20251224 --> # Which issue(s) this PR fixes or accomplishes Fixes #162 # Purpose 适配 A5芯片：部分注意力与稀疏注意力算子在 A5 上不再可用，本 PR 让上层代码无需改动即可在 A5 上继续运行，并对直接调用下线算子的场景给出清晰报错与迁移指引。 - 新增 `is_a5_device()`（`mindiesd/utils/get_platform.py`）统一识别 A5。 - 公共 API 在 A5 上自动路由： - `attention_forward`（manual / static / 默认）：`ascend_laser_attention`、`prompt_flash_attn` → `fused_attn_score`（warning 提示迁移）。 - `sparse_attention`：`rf_v2` → `rf_v3`（`inner_precise` 按 v3 要求强制为 4，info 日志）；`ada_bsa` 抛 `ParametersInvalid`，后继算子待下个版本（v2）提供。 - 底层算子被绕过公共 API 直接调用时，在 A5 上抛 `ParametersInvalid` 并指引使用公共 API： `ascend_laser_attention`、`ascend_laser_preprocess`、`prompt_flash_attn`（同时移除 `register_op_a5` 注册）、`rain_fusion_attention`、`ada_block_sparse_attention` / `get_estimate_mask`。 - 行为变更：`attention_func.get_attention_function_static` 默认 op_type 由 `prompt_flash_attn` 调整为 `fused_attn_score`（影响非 A5 默认路径，已确认为预期）。 # Test Plan 1、在A5上，调用attention_forward接口，测试传入LA、PFA的场景，自动路由到fascore 2、在A5上，调用sparse_attention接口，测试传入ada_bsa、rf_v2的场景。传入ada_bsa，抛错，等A5版本；传入rf_v2，路由到rf_v3 3、全量测试，在A5上不能跑的用例直接跳过 # Test Report 1、 ![image.png](https://raw.gitcode.com/user-images/assets/8476587/66a8b561-abad-443b-bd6e-117e00cbff5a/image.png 'image.png') 返回 ![image.png](https://raw.gitcode.com/user-images/assets/8476587/0f1e73ec-672e-4136-8849-c61d1db67d76/image.png 'image.png') 2、 ![image.png](https://raw.gitcode.com/user-images/assets/8476587/0143e02f-d28d-4183-b74f-cb1e5617be35/image.png 'image.png') 返回 ![image.png](https://raw.gitcode.com/user-images/assets/8476587/435ab5b5-8c71-4b0d-84e8-c47d45d66392/image.png 'image.png') See merge request: Ascend/MindIE-SD!315	26 天前
sparse_flash_attn.py	[feature][bsa] support fp8 bsa Co-authored-by: hyh_hh<huyinghong1@huawei.com> # message auto-generated for no-merge-commit merge: !337 merge bsa into dev [feature][bsa] support fp8 bsa Created-by: hyh_hh Commit-by: hyh_hh Merged-by: ascend-robot Description: # Purpose 支持 FP8 BSA（Block Sparse Attention），包括： - block_sparse_attention.cpp 算子接口更新，aclnnBlockSparseAttention -> aclnnBlockSparseAttentionV2，后者兼容前者BF16实现 - sparse_flash_attn_rf_v3.py 新增 FP8 量化路径的稀疏注意力实现，模型侧通过传入q_rot、k_rot使能FP8量化稀疏，block_size支持q=128，kv=256或512 - csrc/plugin/pytorch_npu_helper.h新增 FP8 类型 # Test Plan 1. UT：pytest tests/plugin/test_rf_v3_attention.py 和 pytest tests/plugin/test_block_sparse_attention.py 2. Wan2.2模型接入验证 # Test Report 1. UT： ![image.png](https://raw.gitcode.com/user-images/assets/8476587/cc2b9d32-c5fb-4902-8a58-19bfe6d94261/image.png 'image.png') 2. Wan2.2模型接入验证：功能和精度正常 See merge request: Ascend/MindIE-SD!337	14 天前
sparse_flash_attn_ada_bsa.py	[Feature][flash_attn]A5 注意力/稀疏算子兼容适配：公共 API 自动路由与旧算子保护 Co-authored-by: daviwang<daviwang@noreply.gitcode.com> # message auto-generated for no-merge-commit merge: !315 merge dev into dev [Feature][flash_attn]A5 注意力/稀疏算子兼容适配：公共 API 自动路由与旧算子保护 Created-by: mazhixin00_00 Commit-by: daviwang Merged-by: ascend-robot Description: <!-- PR描述模板更新日期：20251224 --> # Which issue(s) this PR fixes or accomplishes Fixes #162 # Purpose 适配 A5芯片：部分注意力与稀疏注意力算子在 A5 上不再可用，本 PR 让上层代码无需改动即可在 A5 上继续运行，并对直接调用下线算子的场景给出清晰报错与迁移指引。 - 新增 `is_a5_device()`（`mindiesd/utils/get_platform.py`）统一识别 A5。 - 公共 API 在 A5 上自动路由： - `attention_forward`（manual / static / 默认）：`ascend_laser_attention`、`prompt_flash_attn` → `fused_attn_score`（warning 提示迁移）。 - `sparse_attention`：`rf_v2` → `rf_v3`（`inner_precise` 按 v3 要求强制为 4，info 日志）；`ada_bsa` 抛 `ParametersInvalid`，后继算子待下个版本（v2）提供。 - 底层算子被绕过公共 API 直接调用时，在 A5 上抛 `ParametersInvalid` 并指引使用公共 API： `ascend_laser_attention`、`ascend_laser_preprocess`、`prompt_flash_attn`（同时移除 `register_op_a5` 注册）、`rain_fusion_attention`、`ada_block_sparse_attention` / `get_estimate_mask`。 - 行为变更：`attention_func.get_attention_function_static` 默认 op_type 由 `prompt_flash_attn` 调整为 `fused_attn_score`（影响非 A5 默认路径，已确认为预期）。 # Test Plan 1、在A5上，调用attention_forward接口，测试传入LA、PFA的场景，自动路由到fascore 2、在A5上，调用sparse_attention接口，测试传入ada_bsa、rf_v2的场景。传入ada_bsa，抛错，等A5版本；传入rf_v2，路由到rf_v3 3、全量测试，在A5上不能跑的用例直接跳过 # Test Report 1、 ![image.png](https://raw.gitcode.com/user-images/assets/8476587/66a8b561-abad-443b-bd6e-117e00cbff5a/image.png 'image.png') 返回 ![image.png](https://raw.gitcode.com/user-images/assets/8476587/0f1e73ec-672e-4136-8849-c61d1db67d76/image.png 'image.png') 2、 ![image.png](https://raw.gitcode.com/user-images/assets/8476587/0143e02f-d28d-4183-b74f-cb1e5617be35/image.png 'image.png') 返回 ![image.png](https://raw.gitcode.com/user-images/assets/8476587/435ab5b5-8c71-4b0d-84e8-c47d45d66392/image.png 'image.png') See merge request: Ascend/MindIE-SD!315	26 天前
sparse_flash_attn_rf_v2.py	[Feature][flash_attn]A5 注意力/稀疏算子兼容适配：公共 API 自动路由与旧算子保护 Co-authored-by: daviwang<daviwang@noreply.gitcode.com> # message auto-generated for no-merge-commit merge: !315 merge dev into dev [Feature][flash_attn]A5 注意力/稀疏算子兼容适配：公共 API 自动路由与旧算子保护 Created-by: mazhixin00_00 Commit-by: daviwang Merged-by: ascend-robot Description: <!-- PR描述模板更新日期：20251224 --> # Which issue(s) this PR fixes or accomplishes Fixes #162 # Purpose 适配 A5芯片：部分注意力与稀疏注意力算子在 A5 上不再可用，本 PR 让上层代码无需改动即可在 A5 上继续运行，并对直接调用下线算子的场景给出清晰报错与迁移指引。 - 新增 `is_a5_device()`（`mindiesd/utils/get_platform.py`）统一识别 A5。 - 公共 API 在 A5 上自动路由： - `attention_forward`（manual / static / 默认）：`ascend_laser_attention`、`prompt_flash_attn` → `fused_attn_score`（warning 提示迁移）。 - `sparse_attention`：`rf_v2` → `rf_v3`（`inner_precise` 按 v3 要求强制为 4，info 日志）；`ada_bsa` 抛 `ParametersInvalid`，后继算子待下个版本（v2）提供。 - 底层算子被绕过公共 API 直接调用时，在 A5 上抛 `ParametersInvalid` 并指引使用公共 API： `ascend_laser_attention`、`ascend_laser_preprocess`、`prompt_flash_attn`（同时移除 `register_op_a5` 注册）、`rain_fusion_attention`、`ada_block_sparse_attention` / `get_estimate_mask`。 - 行为变更：`attention_func.get_attention_function_static` 默认 op_type 由 `prompt_flash_attn` 调整为 `fused_attn_score`（影响非 A5 默认路径，已确认为预期）。 # Test Plan 1、在A5上，调用attention_forward接口，测试传入LA、PFA的场景，自动路由到fascore 2、在A5上，调用sparse_attention接口，测试传入ada_bsa、rf_v2的场景。传入ada_bsa，抛错，等A5版本；传入rf_v2，路由到rf_v3 3、全量测试，在A5上不能跑的用例直接跳过 # Test Report 1、 ![image.png](https://raw.gitcode.com/user-images/assets/8476587/66a8b561-abad-443b-bd6e-117e00cbff5a/image.png 'image.png') 返回 ![image.png](https://raw.gitcode.com/user-images/assets/8476587/0f1e73ec-672e-4136-8849-c61d1db67d76/image.png 'image.png') 2、 ![image.png](https://raw.gitcode.com/user-images/assets/8476587/0143e02f-d28d-4183-b74f-cb1e5617be35/image.png 'image.png') 返回 ![image.png](https://raw.gitcode.com/user-images/assets/8476587/435ab5b5-8c71-4b0d-84e8-c47d45d66392/image.png 'image.png') See merge request: Ascend/MindIE-SD!315	26 天前
sparse_flash_attn_rf_v3.py	[bugfix]update import Co-authored-by: hyh_hh<huyinghong1@huawei.com> # message auto-generated for no-merge-commit merge: !366 merge bsa into dev [bugfix]update import Created-by: hyh_hh Commit-by: hyh_hh Merged-by: ascend-robot Description: # Purpose 修复UT import报错 # Test Plan ![image.png](https://raw.gitcode.com/user-images/assets/8476587/d49cb6d1-24c7-4d62-8917-781406c4bc70/image.png 'image.png') # Test Report ![image.png](https://raw.gitcode.com/user-images/assets/8476587/025bafe9-2979-4cfc-8a80-2a10a713aa73/image.png 'image.png') ![image.png](https://raw.gitcode.com/user-images/assets/8476587/345774de-6e42-4185-95bf-808b52078b5b/image.png 'image.png') See merge request: Ascend/MindIE-SD!366	11 天前
sparse_linear_attn.py	[Feature][SLA] Add AscendC backend Co-authored-by: yujunyu2<yujunyu3@huawei.com> Co-authored-by: openLiBingCI<openlibing-robot@openlibing.com> # message auto-generated for no-merge-commit merge: !332 merge dev into dev [Feature][SLA] Add AscendC backend Created-by: yjy_ac Commit-by: yujunyu2;openLiBingCI Merged-by: ascend-robot Description: <!-- PR描述模板更新日期：20251224 --> # Which issue(s) this PR fixes or accomplishes > Fixes #171 # Purpose 为了得到更高推理性能，补充SLA可选AscendC上Block Sparse Attention算子后端。 # Test Plan ## 运行方式 `bash # CPU：参数校验（无需 NPU） export MINDIE_TEST_MODE=CPU python -m unittest tests.layers.flash_attn.test_sparse_linear_attn -v # 全量：含 NPU 实机（需 Ascend + build/build_plugin.sh + pip install -e .） export MINDIE_TEST_MODE=ALL python -m unittest tests.layers.flash_attn.test_sparse_linear_attn -v` ## 测试用例 ### 初始化（CPU） - 默认 backend 为 triton - 支持 backend：triton、ascendc - ascendc head_dim 仅 64/128；triton head_dim 仅 16/32/64/128/256 - triton 默认 BLKQ=BLKK=64；triton 块大小仅 64/128；ascendc BLKK 须为 128 倍数 - 非法 backend / head_dim / block size 初始化时抛 ParametersInvalid ### get_block_map（NPU） - head_dim=64、BLK=128：sparse_map 为 int8，shape 正确，real_topk > 0 - head_dim=128、fp16、BLK=128：同上 - BLKQ=BLKK=64：按块 64 划分，shape 正确 ### forward 校验（CPU，mock get_block_map） - ascendc / triton：非 NPU 设备抛错，且不调用 get_block_map - ascendc：forward 输入 head_dim 非 64/128 抛错 - triton：forward 输入 head_dim 非 16/32/64/128/256 抛错 ### 端到端 smoke（NPU，无 mock） - triton：head_dim=64，BLK=128 → 输出 shape (1,2,1024,64)，fp16 - triton：BLKQ=BLKK=64 → 输出 shape 正确，fp16 - triton：head_dim=128 → 输出 shape (1,2,1024,128)，fp16 - ascendc：head_dim=64/128 → 输出 shape/dtype 正确（950 系 inner_precise=4，其余为 1） # Test Report 测试用例验证，其中包括同输入条件下AscendC和Triton后端精度对比测试一致 ![SLA接入AscendC_BSA测试.png](https://raw.gitcode.com/user-images/assets/8476587/a5be9678-5db6-4769-b695-b3636003b61b/SLA接入AscendC_BSA测试.png 'SLA接入AscendC_BSA测试.png') 相较triton后端，ascendC后端性能提升接近15倍 ![image.png](https://raw.gitcode.com/user-images/assets/8476587/c2b13306-8803-4e50-b0e5-ab01c32e3c7b/image.png 'image.png') ![image.png](https://raw.gitcode.com/user-images/assets/8476587/b8ea4b77-3f75-4f0f-bb58-eb17e91f740a/image.png 'image.png') See merge request: Ascend/MindIE-SD!332	4 天前
sparse_linear_attn_triton.py	[Feature][SLA] Add Sparse Linear Attention triton operators for A2 and A5 Co-authored-by: chy3<843049740@qq.com> # message auto-generated for no-merge-commit merge: !319 merge dev into dev [Feature][SLA] Add Sparse Linear Attention triton operators for A2 and A5 Created-by: chy3 Commit-by: chy3 Merged-by: ascend-robot Description: <!-- PR描述模板更新日期：20251224 --> # Which issue(s) this PR fixes or accomplishes [#100](https://gitcode.com/Ascend/MindIE-SD/issues/100) # Purpose 在mindiesd.layers.flash_attn模块中增加 sparse_linear_attn 模块，里面包含SparseLinearAttention算子，与开源仓https://github.com/thu-ml/SLA 接口一致，使用方式即 `python from mindiesd.layers import SparseLinearAttention attn = SparseLinearAttention( head_dim=128, topk=0.2, # = 1 - sparsity feature_map="softmax", # options: elu, relu, softmax BLKQ=64, BLKK=64, ).npu() B, H, L, D = 2, 4, 4096, 128 q = torch.randn((B, H, L, D), dtype=torch.bfloat16, device='npu') k = torch.randn((B, H, L, D), dtype=torch.bfloat16, device='npu') v = torch.randn((B, H, L, D), dtype=torch.bfloat16, device='npu') o = attn(q, k, v)` 注意： - 该PR上传的SparseLinearAttention（SLA）算子以triton-ascend方式实现，支持A2、A3和A5硬件平台； - SLA支持BLKK={64， 128}， BLKQ={64， 128}选择； - SLA现支持q,k,v的数值类型：bf16，fp16； # Test Plan 1、测试调用SparseLinearAttention，传入相应参数，能正常返回结果； # Test Report - 调用 ![image.png](https://raw.gitcode.com/user-images/assets/8476587/7eebec23-51c1-499a-8d29-323e32f83a95/image.png 'image.png') - 返回 ![image.png](https://raw.gitcode.com/user-images/assets/8476587/c721447c-3a29-4a18-95d1-ab3014020772/image.png 'image.png') See merge request: Ascend/MindIE-SD!319	25 天前