msmodeling/tensor_cast/core/quantization · Ascend/MindStudio-Modeling - AtomGit

ascend-robot【Bugfix】deepseek-v4模型kvcache计算错误问题修复

文件	最后提交记录	最后更新时间
config.py	【Bugfix】deepseek-v4模型kvcache计算错误问题修复 Co-authored-by: ChenHuiwen<chenhuiwen7@huawei.com> # message auto-generated for no-merge-commit merge: !321 merge ds-kvcache-fix into develop 【Bugfix】deepseek-v4模型kvcache计算错误问题修复 Created-by: ChenHuiwen Commit-by: ChenHuiwen Merged-by: ascend-robot Description: PR Type / PR类型 - [x] Feature（功能新增） - [x] Bugfix（Bug 修复） - [ ] Docs（文档更新） - [ ] CI/CD（持续集成/持续部署） - [ ] Refactor（代码重构） - [x] Perf（性能优化） - [ ] Test-Cases（测试用例更新） - [ ] Other（其他） ## 🔍 Motivation / 变更动机 This PR fixes inaccurate DeepSeek V4 KV cache sizing and memory estimation in msmodeling. The previous implementation used the full paged KV cache footprint for DeepSeek V4 sparse/compressed attention, which over-counted KV cache memory and affected throughput / memory estimation accuracy. 该 PR 修复 DeepSeek V4 KV cache 尺寸和内存估算不准确的问题。原实现未按 V4 sparse/compressed attention 的压缩缓存语义计算 KV cache，导致 KV cache 内存被高估，进而影响吞吐和显存占用评估结果。 ------ ## 📝 Modification / 修改内容 - Fix DeepSeek V4 main KV cache sizing according to `compress_ratio`, `sliding_window`, batch size, and total KV tokens. - Keep DeepSeek V4 main KV cache dtype as model dtype, while allowing indexer cache to follow attention quantization dtype. - Add compressed sizing for DeepSeek V4 indexer cache, gated explicitly by `model_type == "deepseek_v4"` to avoid affecting other MLA/DSA models. - Update input generation paths to pass batch/token information into KV cache helpers. - Calibrate multiple DeepSeek V4 analytic performance model operators to better match the reference fused-kernel behavior and avoid double-counted memory traffic. - Add `--quantize-backbone-linear-action` to support different quantization actions for backbone linear layers and routed MoE experts. ------ ## 📐 Associated Test Results / 关联测试结果 Not run yet in this commit. ![image.png](https://raw.gitcode.com/user-images/assets/8428112/ac052839-697c-40c8-adbd-ac845fc33a5f/image.png 'image.png') See merge request: Ascend/msmodeling!321	18 天前
datatypes.py	Support users to customize operator performance modeling in the performance_model/custom_op directory Co-authored-by: HongMaoShuiGuai<1120200577@qq.com> Co-authored-by: genius52<taochengcheng@h-partners.com> # message auto-generated for no-merge-commit merge: !50 merge custom_op into develop Support users to customize operator performance modeling in the performance_model/custom_op directory Created-by: genius52 Commit-by: genius52;HongMaoShuiGuai Merged-by: ascend-robot Description: ### 1. 修改描述 - 修改原因： 1、支持在performance_model/custom_op目录下，用户自定义算子性能建模实现 2、performance_model/\_\_init\_\_.py, 优先加载custom_op - 修改内容： 1、OpInvokeInfo.register_op_properties 支持用户自定义或覆盖已有算子性能建模 2、register_op_estimator 支持用户对通信预估自定义性能建模 3、为解决循环依赖问题， \_\_init\_\_.py中定义的类挪到独立文件 - [ ] 涉及代码双合(贴上另一个PR链接)： ---- ### 2. 功能验证 - [ ] 功能自验 - [ ] 本地自验用例截图（请确保不体现个人信息） - [ ] 冒烟是否通过 ### 3. 代码检视 - 要求： - 合入代码大于 200 行，需三人以上会议检视。 - 检视密度≥2个/100行。 - 检视缺陷密度达不到要求的需给出说明。 - 大于 1000 行代码原则上不允许合入，需进行备案。 - [ ] 是否经过代码检视 - [ ] 是否具备UT测试用例看护 ---- ### 4. 安全自检典型安全编码问题 - [ ] 若涉及对外接口，是否已校验外部数据 - [ ] MR 标题和描述是否按格式填写 - [ ] 是否进行空指针校验 - [ ] 是否进行返回值校验 - [ ] 是否正确考虑文件权限配置 - [ ] 是否充分考虑接口的异常场景 - [ ] 是否正确记录错误日志 - [ ] 若涉及正则表达式，是否对正则表达式做 ReDos 校验 - [ ] 若涉及运算，是否存在整数溢出、除零等风险 ---- ### 5. 变更知会 - 资料修改： - 变更通知（消息知会 + 邮件知会）： ---- ### 6. 冒烟修改 - PR 来源： - [ ] 问题单 - [ ] 需求特性 - [ ] 安全排查 - [ ] 其他 - [ ] 是否存在冒烟可以拦截却未拦截的情况 - [ ] 是否需要添加冒烟： ---- See merge request: Ascend/msmodeling!50	4 个月前