msmodeling/tensor_cast/performance_model/builtin_model · Ascend/MindStudio-Modeling - AtomGit

ascend-robot【Bugfix】修复DeepSeek V4 attention建模的问题

文件	最后提交记录	最后更新时间
__init__.py	【同步】【非开发代码】代码从 develop 同步到 master Co-authored-by: yydyzr<liuyuncong1@huawei.com> Co-authored-by: gcw_61YBRfIt<chuzhenxing@huawei.com> Co-authored-by: 孔炳翔<1120200577@qq.com> Co-authored-by: zhengxinqian<qianzhengxin@huawei.com> Co-authored-by: hw_whx<wanghexiang7@huawei.com> Co-authored-by: jgong5<steven.gong@gmail.com> Co-authored-by: hw_whx<2952154980@qq.com> # message auto-generated for no-merge-commit merge: !330 merge master into master 【同步】【非开发代码】代码从 develop 同步到 master Created-by: AvadaKedavrua Commit-by: liujiawang;ascend-robot;AvadaKedavrua;lutean;Horacehxw;eveyin1;minghang_c;zwt__;tt0cool;elrond-g;jia_ya_nan;zhenyu_zhang;ChenHuiwen;wangshen001;Hudingyi;wendellX;Secluded_Ocean;jhon-117;yaohan404;jiangruitao;zhenghaojie;stormchasingg;panyj1993;cmh1056291129;yuyinkai1;sunguozhong;genius52;liu_jiaxu;HongMaoShuiGuai;zhengxinqian;weixin_43368449;jsez-li-bin;jgong5;wqh17101;w00609794;yydyzr;JieZhang679;sppedforcy;gcw_61YBRfIt;Jiong Gong;hw_whx;gongjiong;孔炳翔 Merged-by: ascend-robot Description: 代码从 develop 同步到 master，后续基于 master 演进，并支持打包 See merge request: Ascend/msmodeling!330	16 天前
deepseek_v4.py	【Bugfix】修复DeepSeek V4 attention建模的问题 Co-authored-by: ChenHuiwen<chenhuiwen7@huawei.com> # message auto-generated for no-merge-commit merge: !357 merge fix-ds-v4-atten into master 【Bugfix】修复DeepSeek V4 attention建模的问题 Created-by: ChenHuiwen Commit-by: ChenHuiwen Merged-by: ascend-robot Description: PR Type / PR类型 - [ ] Feature（功能新增） - [x] Bugfix（Bug 修复） - [ ] Docs（文档更新） - [ ] CI/CD（持续集成/持续部署） - [ ] Refactor（代码重构） - [ ] Perf（性能优化） - [x] Test-Cases（测试用例更新） - [ ] Other（其他） ## 🔍 Motivation / 变更动机 This PR fixes DeepSeek V4 sparse attention modeling issues in predictive decode and prefill cache-update paths. The previous decode/prefill heuristic could misclassify short MTP decode batches, and the prefill path used a full-tensor arithmetic dependency to keep KV cache updates alive in compiled graphs. 本 PR 修复 DeepSeek V4 稀疏注意力在预测式解码和 prefill KV cache 更新路径中的建模问题。此前 decode/prefill 判定可能误判短 MTP decode batch，且 prefill 路径通过 full-tensor 算术依赖来保持 KV cache 更新链路不被编译图裁剪。 ------ ## 📝 Modification / 修改内容 - Add `_is_decode_attention_batch` to align V4 decode detection with the predictive decoding rule: query length `< 5` is treated as decode. - Replace the prefill full-cache arithmetic anchor with an explicit optional `kv_dependency` argument on `sparse_attn_sharedkv`. - Update the V4 sparse-attention performance model to exclude the optional dependency input from memory accounting. - Add regression tests for MTP decode heuristic, prefill boundary behavior, optional `kv_dependency`, and the V4 attention forward cache path. - 新增 `_is_decode_attention_batch`，使 V4 decode 判定与预测式解码规则保持一致：query length `< 5` 视为 decode。 - 将 prefill 中的 full-cache 算术 anchor 替换为 `sparse_attn_sharedkv` 的可选 `kv_dependency` 参数。 - 更新 V4 稀疏注意力性能模型，避免将可选依赖参数计入 memory access。 - 新增回归测试覆盖 MTP decode 判定、prefill 边界、可选 `kv_dependency` 以及 V4 attention forward cache 路径。 ------ ## 📐 Associated Test Results / 关联测试结果 - Added/updated regression tests in `tests/regression/tensor_cast/test_deepseek_v4.py`. - Recommended validation command: ![image.png](https://raw.gitcode.com/user-images/assets/8428112/b193fa22-b5d9-473f-9d2f-9d31bd595da2/image.png 'image.png') ```bash python -m pytest tests/regression/tensor_cast/test_deepseek_v4.py -q See merge request: Ascend/msmodeling!357	13 天前