| 文件 | 最后提交记录 | 最后更新时间 |
|---|---|---|
【同步】【非开发代码】代码从 develop 同步到 master Co-authored-by: yydyzr<liuyuncong1@huawei.com> Co-authored-by: gcw_61YBRfIt<chuzhenxing@huawei.com> Co-authored-by: 孔炳翔<1120200577@qq.com> Co-authored-by: zhengxinqian<qianzhengxin@huawei.com> Co-authored-by: hw_whx<wanghexiang7@huawei.com> Co-authored-by: jgong5<steven.gong@gmail.com> Co-authored-by: hw_whx<2952154980@qq.com> # message auto-generated for no-merge-commit merge: !330 merge master into master 【同步】【非开发代码】代码从 develop 同步到 master Created-by: AvadaKedavrua Commit-by: liujiawang;ascend-robot;AvadaKedavrua;lutean;Horacehxw;eveyin1;minghang_c;zwt__;tt0cool;elrond-g;jia_ya_nan;zhenyu_zhang;ChenHuiwen;wangshen001;Hudingyi;wendellX;Secluded_Ocean;jhon-117;yaohan404;jiangruitao;zhenghaojie;stormchasingg;panyj1993;cmh1056291129;yuyinkai1;sunguozhong;genius52;liu_jiaxu;HongMaoShuiGuai;zhengxinqian;weixin_43368449;jsez-li-bin;jgong5;wqh17101;w00609794;yydyzr;JieZhang679;sppedforcy;gcw_61YBRfIt;Jiong Gong;hw_whx;gongjiong;孔炳翔 Merged-by: ascend-robot Description: 代码从 develop 同步到 master,后续基于 master 演进,并支持打包 See merge request: Ascend/msmodeling!330 | 16 天前 | |
【Bugfix】修复DeepSeek V4 attention建模的问题 Co-authored-by: ChenHuiwen<chenhuiwen7@huawei.com> # message auto-generated for no-merge-commit merge: !357 merge fix-ds-v4-atten into master 【Bugfix】修复DeepSeek V4 attention建模的问题 Created-by: ChenHuiwen Commit-by: ChenHuiwen Merged-by: ascend-robot Description: **PR Type / PR类型** - [ ] Feature(功能新增) - [x] Bugfix(Bug 修复) - [ ] Docs(文档更新) - [ ] CI/CD(持续集成/持续部署) - [ ] Refactor(代码重构) - [ ] Perf(性能优化) - [x] Test-Cases(测试用例更新) - [ ] Other(其他) ## 🔍 Motivation / 变更动机 This PR fixes DeepSeek V4 sparse attention modeling issues in predictive decode and prefill cache-update paths. The previous decode/prefill heuristic could misclassify short MTP decode batches, and the prefill path used a full-tensor arithmetic dependency to keep KV cache updates alive in compiled graphs. 本 PR 修复 DeepSeek V4 稀疏注意力在预测式解码和 prefill KV cache 更新路径中的建模问题。此前 decode/prefill 判定可能误判短 MTP decode batch,且 prefill 路径通过 full-tensor 算术依赖来保持 KV cache 更新链路不被编译图裁剪。 ------ ## 📝 Modification / 修改内容 - Add _is_decode_attention_batch to align V4 decode detection with the predictive decoding rule: query length < 5 is treated as decode. - Replace the prefill full-cache arithmetic anchor with an explicit optional kv_dependency argument on sparse_attn_sharedkv. - Update the V4 sparse-attention performance model to exclude the optional dependency input from memory accounting. - Add regression tests for MTP decode heuristic, prefill boundary behavior, optional kv_dependency, and the V4 attention forward cache path. - 新增 _is_decode_attention_batch,使 V4 decode 判定与预测式解码规则保持一致:query length < 5 视为 decode。 - 将 prefill 中的 full-cache 算术 anchor 替换为 sparse_attn_sharedkv 的可选 kv_dependency 参数。 - 更新 V4 稀疏注意力性能模型,避免将可选依赖参数计入 memory access。 - 新增回归测试覆盖 MTP decode 判定、prefill 边界、可选 kv_dependency 以及 V4 attention forward cache 路径。 ------ ## 📐 Associated Test Results / 关联测试结果 - Added/updated regression tests in tests/regression/tensor_cast/test_deepseek_v4.py. - Recommended validation command:  ```bash python -m pytest tests/regression/tensor_cast/test_deepseek_v4.py -q See merge request: Ascend/msmodeling!357 | 13 天前 |
| 文件 | 最后提交记录 | 最后更新时间 |
|---|---|---|
| 16 天前 | ||
| 13 天前 |