msmodeling/tensor_cast/performance_model · Ascend/MindStudio-Modeling - AtomGit

ascend-robotperf(tensor_cast): refine sparse attention roofline

文件	最后提交记录	最后更新时间
builtin_model	【Bugfix】修复DeepSeek V4 attention建模的问题 Co-authored-by: ChenHuiwen<chenhuiwen7@huawei.com> # message auto-generated for no-merge-commit merge: !357 merge fix-ds-v4-atten into master 【Bugfix】修复DeepSeek V4 attention建模的问题 Created-by: ChenHuiwen Commit-by: ChenHuiwen Merged-by: ascend-robot Description: PR Type / PR类型 - [ ] Feature（功能新增） - [x] Bugfix（Bug 修复） - [ ] Docs（文档更新） - [ ] CI/CD（持续集成/持续部署） - [ ] Refactor（代码重构） - [ ] Perf（性能优化） - [x] Test-Cases（测试用例更新） - [ ] Other（其他） ## 🔍 Motivation / 变更动机 This PR fixes DeepSeek V4 sparse attention modeling issues in predictive decode and prefill cache-update paths. The previous decode/prefill heuristic could misclassify short MTP decode batches, and the prefill path used a full-tensor arithmetic dependency to keep KV cache updates alive in compiled graphs. 本 PR 修复 DeepSeek V4 稀疏注意力在预测式解码和 prefill KV cache 更新路径中的建模问题。此前 decode/prefill 判定可能误判短 MTP decode batch，且 prefill 路径通过 full-tensor 算术依赖来保持 KV cache 更新链路不被编译图裁剪。 ------ ## 📝 Modification / 修改内容 - Add `_is_decode_attention_batch` to align V4 decode detection with the predictive decoding rule: query length `< 5` is treated as decode. - Replace the prefill full-cache arithmetic anchor with an explicit optional `kv_dependency` argument on `sparse_attn_sharedkv`. - Update the V4 sparse-attention performance model to exclude the optional dependency input from memory accounting. - Add regression tests for MTP decode heuristic, prefill boundary behavior, optional `kv_dependency`, and the V4 attention forward cache path. - 新增 `_is_decode_attention_batch`，使 V4 decode 判定与预测式解码规则保持一致：query length `< 5` 视为 decode。 - 将 prefill 中的 full-cache 算术 anchor 替换为 `sparse_attn_sharedkv` 的可选 `kv_dependency` 参数。 - 更新 V4 稀疏注意力性能模型，避免将可选依赖参数计入 memory access。 - 新增回归测试覆盖 MTP decode 判定、prefill 边界、可选 `kv_dependency` 以及 V4 attention forward cache 路径。 ------ ## 📐 Associated Test Results / 关联测试结果 - Added/updated regression tests in `tests/regression/tensor_cast/test_deepseek_v4.py`. - Recommended validation command: ![image.png](https://raw.gitcode.com/user-images/assets/8428112/b193fa22-b5d9-473f-9d2f-9d31bd595da2/image.png 'image.png') ```bash python -m pytest tests/regression/tensor_cast/test_deepseek_v4.py -q See merge request: Ascend/msmodeling!357	12 天前
custom_op	【同步】【非开发代码】代码从 develop 同步到 master Co-authored-by: yydyzr<liuyuncong1@huawei.com> Co-authored-by: gcw_61YBRfIt<chuzhenxing@huawei.com> Co-authored-by: 孔炳翔<1120200577@qq.com> Co-authored-by: zhengxinqian<qianzhengxin@huawei.com> Co-authored-by: hw_whx<wanghexiang7@huawei.com> Co-authored-by: jgong5<steven.gong@gmail.com> Co-authored-by: hw_whx<2952154980@qq.com> # message auto-generated for no-merge-commit merge: !330 merge master into master 【同步】【非开发代码】代码从 develop 同步到 master Created-by: AvadaKedavrua Commit-by: liujiawang;ascend-robot;AvadaKedavrua;lutean;Horacehxw;eveyin1;minghang_c;zwt__;tt0cool;elrond-g;jia_ya_nan;zhenyu_zhang;ChenHuiwen;wangshen001;Hudingyi;wendellX;Secluded_Ocean;jhon-117;yaohan404;jiangruitao;zhenghaojie;stormchasingg;panyj1993;cmh1056291129;yuyinkai1;sunguozhong;genius52;liu_jiaxu;HongMaoShuiGuai;zhengxinqian;weixin_43368449;jsez-li-bin;jgong5;wqh17101;w00609794;yydyzr;JieZhang679;sppedforcy;gcw_61YBRfIt;Jiong Gong;hw_whx;gongjiong;孔炳翔 Merged-by: ascend-robot Description: 代码从 develop 同步到 master，后续基于 master 演进，并支持打包 See merge request: Ascend/msmodeling!330	15 天前
profiling_database	fix: align shape grid model ids and database Co-authored-by: Secluded_Ocean<tangchuxiao0709@qq.com> # message auto-generated for no-merge-commit merge: !348 merge codex/fix-shape-grid-profile-db into master fix: align shape grid model ids and database Created-by: Secluded_Ocean Commit-by: Secluded_Ocean Merged-by: ascend-robot Description: ## Summary - align generate_shape_grid.py --target-models with text_generate model_id naming and reject legacy short names such as dsv3 - keep --rows effective when sampling is capped but rng/seed is None - replace vllm0.18.0_torch2.9.0_cann8.5 with the shape_generated database after validating it has more effective data ## Validation - Database comparison before replacement: old=68 CSV / 823 valid shape rows / 823 positive metric rows; shape_generated=104 CSV / 36198 valid shape rows / 15008 positive metric rows - Final database path: 104 CSV / 36198 shape rows / 15008 positive metric rows; shape_generated path removed - python -m py_compile tools/perf_data_collection/generate_shape_grid.py tools/perf_data_collection/grid_generator/model_configs.py tools/perf_data_collection/grid_generator/theory_router.py tools/perf_data_collection/grid_generator/generators/fused_attention.py - pytest tests/regression/cli/test_shape_grid_model_configs.py tests/regression/cli/test_model_configs.py tests/regression/cli/test_runner.py tests/regression/cli/test_theory_router_pure.py tests/regression/cli/test_generate_shape_grid.py -q See merge request: Ascend/msmodeling!348	3 天前
__init__.py	perf(tensor_cast): refine sparse attention roofline Model sparse MLA and dsa_indexer paged-cache traffic with calibrated data-movement efficiency so operator and end-to-end estimates align with GLM-5.1 profiling targets. Signed-off-by: minghang_c <chiminghang@h-partners.com> Co-authored-by: minghang_c<chiminghang@h-partners.com> # message auto-generated for no-merge-commit merge: !421 merge develop-on-upstream-master into master perf(tensor_cast): refine sparse attention roofline Created-by: minghang_c Commit-by: minghang_c Merged-by: ascend-robot Description: Thanks for your contribution; we appreciate it a lot. The following instructions will make your pull request healthier and help you get feedback more easily. If you do not understand some items, don't worry, just make the pull request and seek help from maintainers. 感谢您的贡献，我们非常重视。以下说明将使您的拉取请求更健康，更易于获得反馈。如果您不理解某些项目，请不要担心，只需提交拉取请求并从维护人员那里寻求帮助即可。 PR Type / PR类型 - [ ] Feature（功能新增） - [ ] Bugfix（Bug 修复） - [ ] Docs（文档更新） - [ ] CI/CD（持续集成/持续部署） - [ ] Refactor（代码重构） - [x] Perf（性能优化） - [x] Test-Cases（测试用例更新） - [ ] Other（其他） ## 🔍 Motivation / 变更动机 Refine TensorCast roofline modeling for sparse MLA, `dsa_indexer`, and GLM-5-series W4A8 MLA preprocessing so sparse-attention estimates better match operator profiling and end-to-end latency targets while keeping the model based on explicit data-movement and compute-efficiency assumptions. The main modeling gap is that sparse MLA KV reads and `dsa_indexer` historical-cache reads are dominated by random/paged memory access. Treating those bytes as ideal contiguous bandwidth traffic makes the analytic roofline too optimistic, especially for long-context GLM-5.1 prefill/decode scenarios. The latest GLM-5.1 W4A8 validation also showed that `mlapo_quant` needs to model packed W4 weights carefully: the tensor storage dtype is `torch.uint8`, but the logical MMA throughput should follow the INT8 compute path used by existing grouped quant matmul modeling. Otherwise the trace can report `mlapo_quant` MMA time as zero even though the op has nonzero projection MMA work. ------ ## 📝 Modification / 修改内容 - Add sparse/paged KV traffic accounting for MLA with separate decode and prefill data-movement efficiency. - Add `dsa_indexer` historical cache read efficiency modeling and separate append cache/scale write traffic. - Keep `dsa_indexer` block-table traffic covered by generic input memory accounting instead of a separate operator-specific model. - Use decode-only sparse page count for mixed prefill/decode sparse MLA batches. - Use raw sparse-index bytes in the quant/physical MLA path so physical KV/block-table/sparse-index accounting is consistent. - Tighten `dsa_indexer` helper signatures so `request_total_seq_lens` is required where the model depends on it. - Keep generic `tensor_cast.attention.default` accounting unchanged, so non-MLA attention models do not inherit sparse-attention calibration. - Extend GLM-5-series compile handling to cover both `GLM-5` and `GLM-5.1`, while excluding `GLM-5.2` because its config has meaningful indexer/long-context differences. - Refine `mlapo_quant` W4A8 modeling so packed `torch.uint8` weights use the logical INT8 MMA throughput path instead of losing MMA time in trace/statistics. - Add `mlapo`/`mlapo_quant` intermediate memory and static-cost accounting for the fused MLA preprocessing path. - Update related performance-model tests for sparse memory breakdowns and `mlapo`/`mlapo_quant` modeling behavior. ------ ## 📐 Associated Test Results / 关联测试结果 - `uvx --python .venv/bin/python pre-commit run --files tensor_cast/performance_model/__init__.py tests/regression/tensor_cast/test_runtime.py` - Passed after auto-format rerun. - `uv run --group ci --with socksio python -m unittest tests.benchmark.models.test_model_regression` - Log: `/tmp/msmodeling_model_regression_develop_after_pick.log` - `Ran 15 tests in 42.029s` - `OK` - `Total Cases: 15 \| Passed: 15 \| Failed: 0 \| No Baseline: 0` - `* All Operator Checks Passed ` - GLM-5.1 e2e validation across 10 query/context scenarios from 3.5k to 128k after the latest `mlapo_quant` W4A8 modeling update: - Log: `/tmp/msmodeling_glm51_e2e_after_user_change_rerun3.log` - `e2e_count=10` - `mean_e2e_err=28.717478%`, meeting the `≤30%` target. - Earlier GLM-5.1 sparse-attention e2e validation across the same 10 scenarios: - Log: `/tmp/msmodeling_glm51_e2e_26_1_0_latest.log` - `e2e_count=10` - `mean_e2e_err=27.678365%`, meeting the `≤30%` target. - GLM-5 e2e validation after applying the GLM-5-series compile override: - Log: `/tmp/msmodeling_glm5_e2e_with_glm5_override.log` - `e2e_count=10` - `mean_e2e_err=27.678365%`, matching the GLM-5.1 run with the same parameters. - Operator-level validation from the sparse MLA / `dsa_indexer` profiling set: - `mean_operator_err = 6.487008%` - `max_operator_err = 18.658699%` - Meets the `≤20%` target. - Issue #103 2.5K GLM-5.1 scenario: - Prefill analytic result: old roofline `182.377 ms` → new roofline `631.874 ms`; real wall `1225.849 ms`; new roofline/wall `51.55%`. - Decode analytic result: old roofline `48.685 ms` → new roofline `103.071 ms`; real wall `82.528 ms`; new roofline/wall `124.89%`. - Decode compared with kernel sum: new roofline `103.071 ms` vs kernel sum `117.158 ms`, ratio `87.97%`. ------ ## 🌟 Use cases (Optional) / 使用案例（可选） GLM-5.1 sparse attention inference latency estimation for prefill and decode scenarios from 3.5k to 128k context length. The latest e2e analytic results were validated with: `bash .venv/bin/python -m cli.inference.text_generate zai-org/GLM-5.1 \ --device ATLAS_800_A3_752T_128G_DIE \ --num-devices 16 \ --tp-size 16 \ --dp-size 1 \ --ep-size 16 \ --num-queries 1 \ --num-mtp-tokens 3 \ --compile \ --quantize-linear-action W4A8_STATIC \ --dump-input-shapes \ --context-length <context> \ --query-length <query>` \| Scenario \| Query length \| Context length \| Target latency \| Analytic latency \| Relative error \| \|---\|---:\|---:\|---:\|---:\|---:\| \| 3.5k-prefill \| 3500 \| 0 \| `1553.21 ms` \| `1010.00 ms` \| `34.9734%` \| \| 3.5k-decode \| 4 \| 3500 \| `69.90 ms` \| `44.79 ms` \| `35.9270%` \| \| 16k-prefill \| 4096 \| 12000 \| `1867.68 ms` \| `1449.00 ms` \| `22.4171%` \| \| 16k-decode \| 4 \| 16000 \| `68.10 ms` \| `47.22 ms` \| `30.6637%` \| \| 32k-prefill \| 4096 \| 28000 \| `2295.99 ms` \| `1807.00 ms` \| `21.2976%` \| \| 32k-decode \| 4 \| 32000 \| `68.70 ms` \| `47.76 ms` \| `30.4862%` \| \| 64k-prefill \| 4096 \| 60000 \| `3256.48 ms` \| `2522.00 ms` \| `22.5544%` \| \| 64k-decode \| 4 \| 64000 \| `71.70 ms` \| `49.63 ms` \| `30.7768%` \| \| 128k-prefill \| 4096 \| 124000 \| `5341.23 ms` \| `3952.00 ms` \| `26.0096%` \| \| 128k-decode \| 4 \| 128000 \| `78.30 ms` \| `53.19 ms` \| `32.0690%` \| `mean_e2e_err=28.717478%` ------ ## ✅ Checklist / 检查列表 Before PR*: - [ ] Bug fixes are fully covered by unit tests, the case that causes the bug should be added in the unit tests. / 修复的 Bug 已完全由单元测试覆盖，导致 Bug 的情况应在单元测试中添加。 - [x] The modification is covered by validation runs and targeted regression coverage. / 此拉取请求中的修改已通过验证用例和定向回归覆盖。 - [x] All relevant documentation (API docs, docstrings, example tutorials) has been updated to reflect these changes. / 所有相关文档（API 文档、文档字符串、示例教程）已更新以反映这些更改。 - [x] Please ensure code files contain no Chinese comments. / 请保证代码文件中不含中文注释。 ------ See merge request: Ascend/msmodeling!421	13 小时前
analytic.py	【同步】【非开发代码】代码从 develop 同步到 master Co-authored-by: yydyzr<liuyuncong1@huawei.com> Co-authored-by: gcw_61YBRfIt<chuzhenxing@huawei.com> Co-authored-by: 孔炳翔<1120200577@qq.com> Co-authored-by: zhengxinqian<qianzhengxin@huawei.com> Co-authored-by: hw_whx<wanghexiang7@huawei.com> Co-authored-by: jgong5<steven.gong@gmail.com> Co-authored-by: hw_whx<2952154980@qq.com> # message auto-generated for no-merge-commit merge: !330 merge master into master 【同步】【非开发代码】代码从 develop 同步到 master Created-by: AvadaKedavrua Commit-by: liujiawang;ascend-robot;AvadaKedavrua;lutean;Horacehxw;eveyin1;minghang_c;zwt__;tt0cool;elrond-g;jia_ya_nan;zhenyu_zhang;ChenHuiwen;wangshen001;Hudingyi;wendellX;Secluded_Ocean;jhon-117;yaohan404;jiangruitao;zhenghaojie;stormchasingg;panyj1993;cmh1056291129;yuyinkai1;sunguozhong;genius52;liu_jiaxu;HongMaoShuiGuai;zhengxinqian;weixin_43368449;jsez-li-bin;jgong5;wqh17101;w00609794;yydyzr;JieZhang679;sppedforcy;gcw_61YBRfIt;Jiong Gong;hw_whx;gongjiong;孔炳翔 Merged-by: ascend-robot Description: 代码从 develop 同步到 master，后续基于 master 演进，并支持打包 See merge request: Ascend/msmodeling!330	15 天前
base.py	【同步】【非开发代码】代码从 develop 同步到 master Co-authored-by: yydyzr<liuyuncong1@huawei.com> Co-authored-by: gcw_61YBRfIt<chuzhenxing@huawei.com> Co-authored-by: 孔炳翔<1120200577@qq.com> Co-authored-by: zhengxinqian<qianzhengxin@huawei.com> Co-authored-by: hw_whx<wanghexiang7@huawei.com> Co-authored-by: jgong5<steven.gong@gmail.com> Co-authored-by: hw_whx<2952154980@qq.com> # message auto-generated for no-merge-commit merge: !330 merge master into master 【同步】【非开发代码】代码从 develop 同步到 master Created-by: AvadaKedavrua Commit-by: liujiawang;ascend-robot;AvadaKedavrua;lutean;Horacehxw;eveyin1;minghang_c;zwt__;tt0cool;elrond-g;jia_ya_nan;zhenyu_zhang;ChenHuiwen;wangshen001;Hudingyi;wendellX;Secluded_Ocean;jhon-117;yaohan404;jiangruitao;zhenghaojie;stormchasingg;panyj1993;cmh1056291129;yuyinkai1;sunguozhong;genius52;liu_jiaxu;HongMaoShuiGuai;zhengxinqian;weixin_43368449;jsez-li-bin;jgong5;wqh17101;w00609794;yydyzr;JieZhang679;sppedforcy;gcw_61YBRfIt;Jiong Gong;hw_whx;gongjiong;孔炳翔 Merged-by: ascend-robot Description: 代码从 develop 同步到 master，后续基于 master 演进，并支持打包 See merge request: Ascend/msmodeling!330	15 天前
bound_analyzer.py	【同步】【非开发代码】代码从 develop 同步到 master Co-authored-by: yydyzr<liuyuncong1@huawei.com> Co-authored-by: gcw_61YBRfIt<chuzhenxing@huawei.com> Co-authored-by: 孔炳翔<1120200577@qq.com> Co-authored-by: zhengxinqian<qianzhengxin@huawei.com> Co-authored-by: hw_whx<wanghexiang7@huawei.com> Co-authored-by: jgong5<steven.gong@gmail.com> Co-authored-by: hw_whx<2952154980@qq.com> # message auto-generated for no-merge-commit merge: !330 merge master into master 【同步】【非开发代码】代码从 develop 同步到 master Created-by: AvadaKedavrua Commit-by: liujiawang;ascend-robot;AvadaKedavrua;lutean;Horacehxw;eveyin1;minghang_c;zwt__;tt0cool;elrond-g;jia_ya_nan;zhenyu_zhang;ChenHuiwen;wangshen001;Hudingyi;wendellX;Secluded_Ocean;jhon-117;yaohan404;jiangruitao;zhenghaojie;stormchasingg;panyj1993;cmh1056291129;yuyinkai1;sunguozhong;genius52;liu_jiaxu;HongMaoShuiGuai;zhengxinqian;weixin_43368449;jsez-li-bin;jgong5;wqh17101;w00609794;yydyzr;JieZhang679;sppedforcy;gcw_61YBRfIt;Jiong Gong;hw_whx;gongjiong;孔炳翔 Merged-by: ascend-robot Description: 代码从 develop 同步到 master，后续基于 master 演进，并支持打包 See merge request: Ascend/msmodeling!330	15 天前
comm_analytic.py	【同步】【非开发代码】代码从 develop 同步到 master Co-authored-by: yydyzr<liuyuncong1@huawei.com> Co-authored-by: gcw_61YBRfIt<chuzhenxing@huawei.com> Co-authored-by: 孔炳翔<1120200577@qq.com> Co-authored-by: zhengxinqian<qianzhengxin@huawei.com> Co-authored-by: hw_whx<wanghexiang7@huawei.com> Co-authored-by: jgong5<steven.gong@gmail.com> Co-authored-by: hw_whx<2952154980@qq.com> # message auto-generated for no-merge-commit merge: !330 merge master into master 【同步】【非开发代码】代码从 develop 同步到 master Created-by: AvadaKedavrua Commit-by: liujiawang;ascend-robot;AvadaKedavrua;lutean;Horacehxw;eveyin1;minghang_c;zwt__;tt0cool;elrond-g;jia_ya_nan;zhenyu_zhang;ChenHuiwen;wangshen001;Hudingyi;wendellX;Secluded_Ocean;jhon-117;yaohan404;jiangruitao;zhenghaojie;stormchasingg;panyj1993;cmh1056291129;yuyinkai1;sunguozhong;genius52;liu_jiaxu;HongMaoShuiGuai;zhengxinqian;weixin_43368449;jsez-li-bin;jgong5;wqh17101;w00609794;yydyzr;JieZhang679;sppedforcy;gcw_61YBRfIt;Jiong Gong;hw_whx;gongjiong;孔炳翔 Merged-by: ascend-robot Description: 代码从 develop 同步到 master，后续基于 master 演进，并支持打包 See merge request: Ascend/msmodeling!330	15 天前
empirical.py	【同步】【非开发代码】代码从 develop 同步到 master Co-authored-by: yydyzr<liuyuncong1@huawei.com> Co-authored-by: gcw_61YBRfIt<chuzhenxing@huawei.com> Co-authored-by: 孔炳翔<1120200577@qq.com> Co-authored-by: zhengxinqian<qianzhengxin@huawei.com> Co-authored-by: hw_whx<wanghexiang7@huawei.com> Co-authored-by: jgong5<steven.gong@gmail.com> Co-authored-by: hw_whx<2952154980@qq.com> # message auto-generated for no-merge-commit merge: !330 merge master into master 【同步】【非开发代码】代码从 develop 同步到 master Created-by: AvadaKedavrua Commit-by: liujiawang;ascend-robot;AvadaKedavrua;lutean;Horacehxw;eveyin1;minghang_c;zwt__;tt0cool;elrond-g;jia_ya_nan;zhenyu_zhang;ChenHuiwen;wangshen001;Hudingyi;wendellX;Secluded_Ocean;jhon-117;yaohan404;jiangruitao;zhenghaojie;stormchasingg;panyj1993;cmh1056291129;yuyinkai1;sunguozhong;genius52;liu_jiaxu;HongMaoShuiGuai;zhengxinqian;weixin_43368449;jsez-li-bin;jgong5;wqh17101;w00609794;yydyzr;JieZhang679;sppedforcy;gcw_61YBRfIt;Jiong Gong;hw_whx;gongjiong;孔炳翔 Merged-by: ascend-robot Description: 代码从 develop 同步到 master，后续基于 master 演进，并支持打包 See merge request: Ascend/msmodeling!330	15 天前
memory_tracker.py	【同步】【非开发代码】代码从 develop 同步到 master Co-authored-by: yydyzr<liuyuncong1@huawei.com> Co-authored-by: gcw_61YBRfIt<chuzhenxing@huawei.com> Co-authored-by: 孔炳翔<1120200577@qq.com> Co-authored-by: zhengxinqian<qianzhengxin@huawei.com> Co-authored-by: hw_whx<wanghexiang7@huawei.com> Co-authored-by: jgong5<steven.gong@gmail.com> Co-authored-by: hw_whx<2952154980@qq.com> # message auto-generated for no-merge-commit merge: !330 merge master into master 【同步】【非开发代码】代码从 develop 同步到 master Created-by: AvadaKedavrua Commit-by: liujiawang;ascend-robot;AvadaKedavrua;lutean;Horacehxw;eveyin1;minghang_c;zwt__;tt0cool;elrond-g;jia_ya_nan;zhenyu_zhang;ChenHuiwen;wangshen001;Hudingyi;wendellX;Secluded_Ocean;jhon-117;yaohan404;jiangruitao;zhenghaojie;stormchasingg;panyj1993;cmh1056291129;yuyinkai1;sunguozhong;genius52;liu_jiaxu;HongMaoShuiGuai;zhengxinqian;weixin_43368449;jsez-li-bin;jgong5;wqh17101;w00609794;yydyzr;JieZhang679;sppedforcy;gcw_61YBRfIt;Jiong Gong;hw_whx;gongjiong;孔炳翔 Merged-by: ascend-robot Description: 代码从 develop 同步到 master，后续基于 master 演进，并支持打包 See merge request: Ascend/msmodeling!330	15 天前
metrics_collector.py	【同步】【非开发代码】代码从 develop 同步到 master Co-authored-by: yydyzr<liuyuncong1@huawei.com> Co-authored-by: gcw_61YBRfIt<chuzhenxing@huawei.com> Co-authored-by: 孔炳翔<1120200577@qq.com> Co-authored-by: zhengxinqian<qianzhengxin@huawei.com> Co-authored-by: hw_whx<wanghexiang7@huawei.com> Co-authored-by: jgong5<steven.gong@gmail.com> Co-authored-by: hw_whx<2952154980@qq.com> # message auto-generated for no-merge-commit merge: !330 merge master into master 【同步】【非开发代码】代码从 develop 同步到 master Created-by: AvadaKedavrua Commit-by: liujiawang;ascend-robot;AvadaKedavrua;lutean;Horacehxw;eveyin1;minghang_c;zwt__;tt0cool;elrond-g;jia_ya_nan;zhenyu_zhang;ChenHuiwen;wangshen001;Hudingyi;wendellX;Secluded_Ocean;jhon-117;yaohan404;jiangruitao;zhenghaojie;stormchasingg;panyj1993;cmh1056291129;yuyinkai1;sunguozhong;genius52;liu_jiaxu;HongMaoShuiGuai;zhengxinqian;weixin_43368449;jsez-li-bin;jgong5;wqh17101;w00609794;yydyzr;JieZhang679;sppedforcy;gcw_61YBRfIt;Jiong Gong;hw_whx;gongjiong;孔炳翔 Merged-by: ascend-robot Description: 代码从 develop 同步到 master，后续基于 master 演进，并支持打包 See merge request: Ascend/msmodeling!330	15 天前
op_benchmark.py	【同步】【非开发代码】代码从 develop 同步到 master Co-authored-by: yydyzr<liuyuncong1@huawei.com> Co-authored-by: gcw_61YBRfIt<chuzhenxing@huawei.com> Co-authored-by: 孔炳翔<1120200577@qq.com> Co-authored-by: zhengxinqian<qianzhengxin@huawei.com> Co-authored-by: hw_whx<wanghexiang7@huawei.com> Co-authored-by: jgong5<steven.gong@gmail.com> Co-authored-by: hw_whx<2952154980@qq.com> # message auto-generated for no-merge-commit merge: !330 merge master into master 【同步】【非开发代码】代码从 develop 同步到 master Created-by: AvadaKedavrua Commit-by: liujiawang;ascend-robot;AvadaKedavrua;lutean;Horacehxw;eveyin1;minghang_c;zwt__;tt0cool;elrond-g;jia_ya_nan;zhenyu_zhang;ChenHuiwen;wangshen001;Hudingyi;wendellX;Secluded_Ocean;jhon-117;yaohan404;jiangruitao;zhenghaojie;stormchasingg;panyj1993;cmh1056291129;yuyinkai1;sunguozhong;genius52;liu_jiaxu;HongMaoShuiGuai;zhengxinqian;weixin_43368449;jsez-li-bin;jgong5;wqh17101;w00609794;yydyzr;JieZhang679;sppedforcy;gcw_61YBRfIt;Jiong Gong;hw_whx;gongjiong;孔炳翔 Merged-by: ascend-robot Description: 代码从 develop 同步到 master，后续基于 master 演进，并支持打包 See merge request: Ascend/msmodeling!330	15 天前
op_estimator_registry.py	【同步】【非开发代码】代码从 develop 同步到 master Co-authored-by: yydyzr<liuyuncong1@huawei.com> Co-authored-by: gcw_61YBRfIt<chuzhenxing@huawei.com> Co-authored-by: 孔炳翔<1120200577@qq.com> Co-authored-by: zhengxinqian<qianzhengxin@huawei.com> Co-authored-by: hw_whx<wanghexiang7@huawei.com> Co-authored-by: jgong5<steven.gong@gmail.com> Co-authored-by: hw_whx<2952154980@qq.com> # message auto-generated for no-merge-commit merge: !330 merge master into master 【同步】【非开发代码】代码从 develop 同步到 master Created-by: AvadaKedavrua Commit-by: liujiawang;ascend-robot;AvadaKedavrua;lutean;Horacehxw;eveyin1;minghang_c;zwt__;tt0cool;elrond-g;jia_ya_nan;zhenyu_zhang;ChenHuiwen;wangshen001;Hudingyi;wendellX;Secluded_Ocean;jhon-117;yaohan404;jiangruitao;zhenghaojie;stormchasingg;panyj1993;cmh1056291129;yuyinkai1;sunguozhong;genius52;liu_jiaxu;HongMaoShuiGuai;zhengxinqian;weixin_43368449;jsez-li-bin;jgong5;wqh17101;w00609794;yydyzr;JieZhang679;sppedforcy;gcw_61YBRfIt;Jiong Gong;hw_whx;gongjiong;孔炳翔 Merged-by: ascend-robot Description: 代码从 develop 同步到 master，后续基于 master 演进，并支持打包 See merge request: Ascend/msmodeling!330	15 天前
op_invoke_info.py	[feat]qwen3.5精度增强 Co-authored-by: yuyinkai1<769293914@qq.com> # message auto-generated for no-merge-commit merge: !349 merge master into master [feat]qwen3.5精度增强 Created-by: yuyinkai1 Commit-by: yuyinkai1 Merged-by: ascend-robot Description: # PR Template Thanks for your contribution; we appreciate it a lot. The following instructions will make your pull request healthier and help you get feedback more easily. If you do not understand some items, don't worry, just make the pull request and seek help from maintainers. 感谢您的贡献，我们非常重视。以下说明将使您的拉取请求更健康，更易于获得反馈。如果您不理解某些项目，请不要担心，只需提交拉取请求并从维护人员那里寻求帮助即可。 PR Type / PR类型 - [✅️ ] Feature（功能新增） - [ ] Bugfix（Bug 修复） - [ ] Docs（文档更新） - [ ] CI/CD（持续集成/持续部署） - [ ] Refactor（代码重构） - [ ] Perf（性能优化） - [ ] Test-Cases（测试用例更新） - [ ] Other（其他） ## 🔍 Motivation / 变更动机 Please describe the motivation of this PR and the goal you want to achieve through this PR. QWEN3.5仿真精度和实测prefill<30% decode<20% ------ ## 📝 Modification / 修改内容 Please briefly describe what modification is made in this PR. QWEN3.5lineattion算子的重构,MTP修复，量化算子没实现修复 ------ ## 📐 Associated Test Results / 关联测试结果 Please provide the related test results, such as test reports, etc. ![image.png](https://raw.gitcode.com/user-images/assets/8428112/adf0ef02-e96e-47f8-9984-a3c576f99f7e/image.png 'image.png') ------ ## 🌟 Use cases (Optional) / 使用案例（可选） If this PR introduces a new feature, it is better to list some use cases here and update the documentation. 如果此拉取请求引入了新功能，最好在此处列出一些用例并更新文档。 ------ ## ✅ Checklist / 检查列表 Before PR: - [ ] Bug fixes are fully covered by unit tests, the case that causes the bug should be added in the unit tests. / 修复的 Bug 已完全由单元测试覆盖，导致 Bug 的情况应在单元测试中添加。 - [ ] The modification is covered by complete unit tests. If not, please add more unit tests to ensure the correctness. / 此拉取请求中的修改已完全由单元测试覆盖。如果不是，请添加更多单元测试以确保正确性。 - [ ] All relevant documentation (API docs, docstrings, example tutorials) has been updated to reflect these changes. / 所有相关文档（API 文档、文档字符串、示例教程）已更新以反映这些更改。 - [ ] Please ensure code files contain no Chinese comments. / 请保证代码文件中不含中文注释。 ------ See merge request: Ascend/msmodeling!349	12 天前
utils.py	【同步】【非开发代码】代码从 develop 同步到 master Co-authored-by: yydyzr<liuyuncong1@huawei.com> Co-authored-by: gcw_61YBRfIt<chuzhenxing@huawei.com> Co-authored-by: 孔炳翔<1120200577@qq.com> Co-authored-by: zhengxinqian<qianzhengxin@huawei.com> Co-authored-by: hw_whx<wanghexiang7@huawei.com> Co-authored-by: jgong5<steven.gong@gmail.com> Co-authored-by: hw_whx<2952154980@qq.com> # message auto-generated for no-merge-commit merge: !330 merge master into master 【同步】【非开发代码】代码从 develop 同步到 master Created-by: AvadaKedavrua Commit-by: liujiawang;ascend-robot;AvadaKedavrua;lutean;Horacehxw;eveyin1;minghang_c;zwt__;tt0cool;elrond-g;jia_ya_nan;zhenyu_zhang;ChenHuiwen;wangshen001;Hudingyi;wendellX;Secluded_Ocean;jhon-117;yaohan404;jiangruitao;zhenghaojie;stormchasingg;panyj1993;cmh1056291129;yuyinkai1;sunguozhong;genius52;liu_jiaxu;HongMaoShuiGuai;zhengxinqian;weixin_43368449;jsez-li-bin;jgong5;wqh17101;w00609794;yydyzr;JieZhang679;sppedforcy;gcw_61YBRfIt;Jiong Gong;hw_whx;gongjiong;孔炳翔 Merged-by: ascend-robot Description: 代码从 develop 同步到 master，后续基于 master 演进，并支持打包 See merge request: Ascend/msmodeling!330	15 天前