| 文件 | 最后提交记录 | 最后更新时间 |
|---|---|---|
【Bugfix】deepseek-v4模型kvcache计算错误问题修复 Co-authored-by: ChenHuiwen<chenhuiwen7@huawei.com> # message auto-generated for no-merge-commit merge: !321 merge ds-kvcache-fix into develop 【Bugfix】deepseek-v4模型kvcache计算错误问题修复 Created-by: ChenHuiwen Commit-by: ChenHuiwen Merged-by: ascend-robot Description: **PR Type / PR类型** - [x] Feature(功能新增) - [x] Bugfix(Bug 修复) - [ ] Docs(文档更新) - [ ] CI/CD(持续集成/持续部署) - [ ] Refactor(代码重构) - [x] Perf(性能优化) - [ ] Test-Cases(测试用例更新) - [ ] Other(其他) ## 🔍 Motivation / 变更动机 This PR fixes inaccurate DeepSeek V4 KV cache sizing and memory estimation in msmodeling. The previous implementation used the full paged KV cache footprint for DeepSeek V4 sparse/compressed attention, which over-counted KV cache memory and affected throughput / memory estimation accuracy. 该 PR 修复 DeepSeek V4 KV cache 尺寸和内存估算不准确的问题。原实现未按 V4 sparse/compressed attention 的压缩缓存语义计算 KV cache,导致 KV cache 内存被高估,进而影响吞吐和显存占用评估结果。 ------ ## 📝 Modification / 修改内容 - Fix DeepSeek V4 main KV cache sizing according to compress_ratio, sliding_window, batch size, and total KV tokens. - Keep DeepSeek V4 main KV cache dtype as model dtype, while allowing indexer cache to follow attention quantization dtype. - Add compressed sizing for DeepSeek V4 indexer cache, gated explicitly by model_type == "deepseek_v4" to avoid affecting other MLA/DSA models. - Update input generation paths to pass batch/token information into KV cache helpers. - Calibrate multiple DeepSeek V4 analytic performance model operators to better match the reference fused-kernel behavior and avoid double-counted memory traffic. - Add --quantize-backbone-linear-action to support different quantization actions for backbone linear layers and routed MoE experts. ------ ## 📐 Associated Test Results / 关联测试结果 Not run yet in this commit.  See merge request: Ascend/msmodeling!321 | 18 天前 | |
Support users to customize operator performance modeling in the performance_model/custom_op directory Co-authored-by: HongMaoShuiGuai<1120200577@qq.com> Co-authored-by: genius52<taochengcheng@h-partners.com> # message auto-generated for no-merge-commit merge: !50 merge custom_op into develop Support users to customize operator performance modeling in the performance_model/custom_op directory Created-by: genius52 Commit-by: genius52;HongMaoShuiGuai Merged-by: ascend-robot Description: ### 1. 修改描述 - **修改原因:** 1、支持在performance_model/custom_op目录下,用户自定义算子性能建模实现 2、performance_model/\_\_init\_\_.py, 优先加载custom_op - **修改内容:** 1、OpInvokeInfo.register_op_properties 支持用户自定义或覆盖已有算子性能建模 2、register_op_estimator 支持用户对通信预估自定义性能建模 3、为解决循环依赖问题, \_\_init\_\_.py中定义的类挪到独立文件 - [ ] **涉及代码双合**(贴上另一个PR链接): ---- ### 2. 功能验证 - [ ] **功能自验** - [ ] **本地自验用例截图**(请确保不体现个人信息) - [ ] **冒烟是否通过** ### 3. 代码检视 - **要求:** - 合入代码大于 200 行,需三人以上会议检视。 - 检视密度≥2个/100行。 - 检视缺陷密度达不到要求的需给出说明。 - 大于 1000 行代码原则上不允许合入,需进行备案。 - [ ] **是否经过代码检视** - [ ] **是否具备UT测试用例看护** ---- ### 4. 安全自检 **典型安全编码问题** - [ ] **若涉及对外接口,是否已校验外部数据** - [ ] **MR 标题和描述是否按格式填写** - [ ] **是否进行空指针校验** - [ ] **是否进行返回值校验** - [ ] **是否正确考虑文件权限配置** - [ ] **是否充分考虑接口的异常场景** - [ ] **是否正确记录错误日志** - [ ] **若涉及正则表达式,是否对正则表达式做 ReDos 校验** - [ ] **若涉及运算,是否存在整数溢出、除零等风险** ---- ### 5. 变更知会 - **资料修改:** - **变更通知(消息知会 + 邮件知会):** ---- ### 6. 冒烟修改 - **PR 来源:** - [ ] 问题单 - [ ] 需求特性 - [ ] 安全排查 - [ ] 其他 - [ ] **是否存在冒烟可以拦截却未拦截的情况** - [ ] **是否需要添加冒烟:** ---- See merge request: Ascend/msmodeling!50 | 4 个月前 |
| 文件 | 最后提交记录 | 最后更新时间 |
|---|---|---|
| 18 天前 | ||
| 4 个月前 |