| 文件 | 最后提交记录 | 最后更新时间 |
|---|---|---|
【Bugfix】修复DeepSeek V4 attention建模的问题 Co-authored-by: ChenHuiwen<chenhuiwen7@huawei.com> # message auto-generated for no-merge-commit merge: !357 merge fix-ds-v4-atten into master 【Bugfix】修复DeepSeek V4 attention建模的问题 Created-by: ChenHuiwen Commit-by: ChenHuiwen Merged-by: ascend-robot Description: **PR Type / PR类型** - [ ] Feature(功能新增) - [x] Bugfix(Bug 修复) - [ ] Docs(文档更新) - [ ] CI/CD(持续集成/持续部署) - [ ] Refactor(代码重构) - [ ] Perf(性能优化) - [x] Test-Cases(测试用例更新) - [ ] Other(其他) ## 🔍 Motivation / 变更动机 This PR fixes DeepSeek V4 sparse attention modeling issues in predictive decode and prefill cache-update paths. The previous decode/prefill heuristic could misclassify short MTP decode batches, and the prefill path used a full-tensor arithmetic dependency to keep KV cache updates alive in compiled graphs. 本 PR 修复 DeepSeek V4 稀疏注意力在预测式解码和 prefill KV cache 更新路径中的建模问题。此前 decode/prefill 判定可能误判短 MTP decode batch,且 prefill 路径通过 full-tensor 算术依赖来保持 KV cache 更新链路不被编译图裁剪。 ------ ## 📝 Modification / 修改内容 - Add _is_decode_attention_batch to align V4 decode detection with the predictive decoding rule: query length < 5 is treated as decode. - Replace the prefill full-cache arithmetic anchor with an explicit optional kv_dependency argument on sparse_attn_sharedkv. - Update the V4 sparse-attention performance model to exclude the optional dependency input from memory accounting. - Add regression tests for MTP decode heuristic, prefill boundary behavior, optional kv_dependency, and the V4 attention forward cache path. - 新增 _is_decode_attention_batch,使 V4 decode 判定与预测式解码规则保持一致:query length < 5 视为 decode。 - 将 prefill 中的 full-cache 算术 anchor 替换为 sparse_attn_sharedkv 的可选 kv_dependency 参数。 - 更新 V4 稀疏注意力性能模型,避免将可选依赖参数计入 memory access。 - 新增回归测试覆盖 MTP decode 判定、prefill 边界、可选 kv_dependency 以及 V4 attention forward cache 路径。 ------ ## 📐 Associated Test Results / 关联测试结果 - Added/updated regression tests in tests/regression/tensor_cast/test_deepseek_v4.py. - Recommended validation command:  ```bash python -m pytest tests/regression/tensor_cast/test_deepseek_v4.py -q See merge request: Ascend/msmodeling!357 | 12 天前 | |
【同步】【非开发代码】代码从 develop 同步到 master Co-authored-by: yydyzr<liuyuncong1@huawei.com> Co-authored-by: gcw_61YBRfIt<chuzhenxing@huawei.com> Co-authored-by: 孔炳翔<1120200577@qq.com> Co-authored-by: zhengxinqian<qianzhengxin@huawei.com> Co-authored-by: hw_whx<wanghexiang7@huawei.com> Co-authored-by: jgong5<steven.gong@gmail.com> Co-authored-by: hw_whx<2952154980@qq.com> # message auto-generated for no-merge-commit merge: !330 merge master into master 【同步】【非开发代码】代码从 develop 同步到 master Created-by: AvadaKedavrua Commit-by: liujiawang;ascend-robot;AvadaKedavrua;lutean;Horacehxw;eveyin1;minghang_c;zwt__;tt0cool;elrond-g;jia_ya_nan;zhenyu_zhang;ChenHuiwen;wangshen001;Hudingyi;wendellX;Secluded_Ocean;jhon-117;yaohan404;jiangruitao;zhenghaojie;stormchasingg;panyj1993;cmh1056291129;yuyinkai1;sunguozhong;genius52;liu_jiaxu;HongMaoShuiGuai;zhengxinqian;weixin_43368449;jsez-li-bin;jgong5;wqh17101;w00609794;yydyzr;JieZhang679;sppedforcy;gcw_61YBRfIt;Jiong Gong;hw_whx;gongjiong;孔炳翔 Merged-by: ascend-robot Description: 代码从 develop 同步到 master,后续基于 master 演进,并支持打包 See merge request: Ascend/msmodeling!330 | 15 天前 | |
fix: align shape grid model ids and database Co-authored-by: Secluded_Ocean<tangchuxiao0709@qq.com> # message auto-generated for no-merge-commit merge: !348 merge codex/fix-shape-grid-profile-db into master fix: align shape grid model ids and database Created-by: Secluded_Ocean Commit-by: Secluded_Ocean Merged-by: ascend-robot Description: ## Summary - align generate_shape_grid.py --target-models with text_generate model_id naming and reject legacy short names such as dsv3 - keep --rows effective when sampling is capped but rng/seed is None - replace vllm0.18.0_torch2.9.0_cann8.5 with the shape_generated database after validating it has more effective data ## Validation - Database comparison before replacement: old=68 CSV / 823 valid shape rows / 823 positive metric rows; shape_generated=104 CSV / 36198 valid shape rows / 15008 positive metric rows - Final database path: 104 CSV / 36198 shape rows / 15008 positive metric rows; shape_generated path removed - python -m py_compile tools/perf_data_collection/generate_shape_grid.py tools/perf_data_collection/grid_generator/model_configs.py tools/perf_data_collection/grid_generator/theory_router.py tools/perf_data_collection/grid_generator/generators/fused_attention.py - pytest tests/regression/cli/test_shape_grid_model_configs.py tests/regression/cli/test_model_configs.py tests/regression/cli/test_runner.py tests/regression/cli/test_theory_router_pure.py tests/regression/cli/test_generate_shape_grid.py -q See merge request: Ascend/msmodeling!348 | 3 天前 | |
perf(tensor_cast): refine sparse attention roofline Model sparse MLA and dsa_indexer paged-cache traffic with calibrated data-movement efficiency so operator and end-to-end estimates align with GLM-5.1 profiling targets. Signed-off-by: minghang_c <chiminghang@h-partners.com> Co-authored-by: minghang_c<chiminghang@h-partners.com> # message auto-generated for no-merge-commit merge: !421 merge develop-on-upstream-master into master perf(tensor_cast): refine sparse attention roofline Created-by: minghang_c Commit-by: minghang_c Merged-by: ascend-robot Description: Thanks for your contribution; we appreciate it a lot. The following instructions will make your pull request healthier and help you get feedback more easily. If you do not understand some items, don't worry, just make the pull request and seek help from maintainers. 感谢您的贡献,我们非常重视。以下说明将使您的拉取请求更健康,更易于获得反馈。如果您不理解某些项目,请不要担心,只需提交拉取请求并从维护人员那里寻求帮助即可。 **PR Type / PR类型** - [ ] Feature(功能新增) - [ ] Bugfix(Bug 修复) - [ ] Docs(文档更新) - [ ] CI/CD(持续集成/持续部署) - [ ] Refactor(代码重构) - [x] Perf(性能优化) - [x] Test-Cases(测试用例更新) - [ ] Other(其他) ## 🔍 Motivation / 变更动机 Refine TensorCast roofline modeling for sparse MLA, dsa_indexer, and GLM-5-series W4A8 MLA preprocessing so sparse-attention estimates better match operator profiling and end-to-end latency targets while keeping the model based on explicit data-movement and compute-efficiency assumptions. The main modeling gap is that sparse MLA KV reads and dsa_indexer historical-cache reads are dominated by random/paged memory access. Treating those bytes as ideal contiguous bandwidth traffic makes the analytic roofline too optimistic, especially for long-context GLM-5.1 prefill/decode scenarios. The latest GLM-5.1 W4A8 validation also showed that mlapo_quant needs to model packed W4 weights carefully: the tensor storage dtype is torch.uint8, but the logical MMA throughput should follow the INT8 compute path used by existing grouped quant matmul modeling. Otherwise the trace can report mlapo_quant MMA time as zero even though the op has nonzero projection MMA work. ------ ## 📝 Modification / 修改内容 - Add sparse/paged KV traffic accounting for MLA with separate decode and prefill data-movement efficiency. - Add dsa_indexer historical cache read efficiency modeling and separate append cache/scale write traffic. - Keep dsa_indexer block-table traffic covered by generic input memory accounting instead of a separate operator-specific model. - Use decode-only sparse page count for mixed prefill/decode sparse MLA batches. - Use raw sparse-index bytes in the quant/physical MLA path so physical KV/block-table/sparse-index accounting is consistent. - Tighten dsa_indexer helper signatures so request_total_seq_lens is required where the model depends on it. - Keep generic tensor_cast.attention.default accounting unchanged, so non-MLA attention models do not inherit sparse-attention calibration. - Extend GLM-5-series compile handling to cover both GLM-5 and GLM-5.1, while excluding GLM-5.2 because its config has meaningful indexer/long-context differences. - Refine mlapo_quant W4A8 modeling so packed torch.uint8 weights use the logical INT8 MMA throughput path instead of losing MMA time in trace/statistics. - Add mlapo/mlapo_quant intermediate memory and static-cost accounting for the fused MLA preprocessing path. - Update related performance-model tests for sparse memory breakdowns and mlapo/mlapo_quant modeling behavior. ------ ## 📐 Associated Test Results / 关联测试结果 - uvx --python .venv/bin/python pre-commit run --files tensor_cast/performance_model/__init__.py tests/regression/tensor_cast/test_runtime.py - Passed after auto-format rerun. - uv run --group ci --with socksio python -m unittest tests.benchmark.models.test_model_regression - Log: /tmp/msmodeling_model_regression_develop_after_pick.log - Ran 15 tests in 42.029s - OK - Total Cases: 15 | Passed: 15 | Failed: 0 | No Baseline: 0 - *** All Operator Checks Passed *** - GLM-5.1 e2e validation across 10 query/context scenarios from 3.5k to 128k after the latest mlapo_quant W4A8 modeling update: - Log: /tmp/msmodeling_glm51_e2e_after_user_change_rerun3.log - e2e_count=10 - mean_e2e_err=28.717478%, meeting the ≤30% target. - Earlier GLM-5.1 sparse-attention e2e validation across the same 10 scenarios: - Log: /tmp/msmodeling_glm51_e2e_26_1_0_latest.log - e2e_count=10 - mean_e2e_err=27.678365%, meeting the ≤30% target. - GLM-5 e2e validation after applying the GLM-5-series compile override: - Log: /tmp/msmodeling_glm5_e2e_with_glm5_override.log - e2e_count=10 - mean_e2e_err=27.678365%, matching the GLM-5.1 run with the same parameters. - Operator-level validation from the sparse MLA / dsa_indexer profiling set: - mean_operator_err = 6.487008% - max_operator_err = 18.658699% - Meets the ≤20% target. - Issue #103 2.5K GLM-5.1 scenario: - Prefill analytic result: old roofline 182.377 ms → new roofline 631.874 ms; real wall 1225.849 ms; new roofline/wall 51.55%. - Decode analytic result: old roofline 48.685 ms → new roofline 103.071 ms; real wall 82.528 ms; new roofline/wall 124.89%. - Decode compared with kernel sum: new roofline 103.071 ms vs kernel sum 117.158 ms, ratio 87.97%. ------ ## 🌟 Use cases (Optional) / 使用案例(可选) GLM-5.1 sparse attention inference latency estimation for prefill and decode scenarios from 3.5k to 128k context length. The latest e2e analytic results were validated with: bash .venv/bin/python -m cli.inference.text_generate zai-org/GLM-5.1 \ --device ATLAS_800_A3_752T_128G_DIE \ --num-devices 16 \ --tp-size 16 \ --dp-size 1 \ --ep-size 16 \ --num-queries 1 \ --num-mtp-tokens 3 \ --compile \ --quantize-linear-action W4A8_STATIC \ --dump-input-shapes \ --context-length <context> \ --query-length <query> | Scenario | Query length | Context length | Target latency | Analytic latency | Relative error | |---|---:|---:|---:|---:|---:| | 3.5k-prefill | 3500 | 0 | 1553.21 ms | 1010.00 ms | 34.9734% | | 3.5k-decode | 4 | 3500 | 69.90 ms | 44.79 ms | 35.9270% | | 16k-prefill | 4096 | 12000 | 1867.68 ms | 1449.00 ms | 22.4171% | | 16k-decode | 4 | 16000 | 68.10 ms | 47.22 ms | 30.6637% | | 32k-prefill | 4096 | 28000 | 2295.99 ms | 1807.00 ms | 21.2976% | | 32k-decode | 4 | 32000 | 68.70 ms | 47.76 ms | 30.4862% | | 64k-prefill | 4096 | 60000 | 3256.48 ms | 2522.00 ms | 22.5544% | | 64k-decode | 4 | 64000 | 71.70 ms | 49.63 ms | 30.7768% | | 128k-prefill | 4096 | 124000 | 5341.23 ms | 3952.00 ms | 26.0096% | | 128k-decode | 4 | 128000 | 78.30 ms | 53.19 ms | 32.0690% | mean_e2e_err=28.717478% ------ ## ✅ Checklist / 检查列表 **Before PR**: - [ ] Bug fixes are fully covered by unit tests, the case that causes the bug should be added in the unit tests. / 修复的 Bug 已完全由单元测试覆盖,导致 Bug 的情况应在单元测试中添加。 - [x] The modification is covered by validation runs and targeted regression coverage. / 此拉取请求中的修改已通过验证用例和定向回归覆盖。 - [x] All relevant documentation (API docs, docstrings, example tutorials) has been updated to reflect these changes. / 所有相关文档(API 文档、文档字符串、示例教程)已更新以反映这些更改。 - [x] Please ensure code files contain no Chinese comments. / 请保证代码文件中不含中文注释。 ------ See merge request: Ascend/msmodeling!421 | 13 小时前 | |
【同步】【非开发代码】代码从 develop 同步到 master Co-authored-by: yydyzr<liuyuncong1@huawei.com> Co-authored-by: gcw_61YBRfIt<chuzhenxing@huawei.com> Co-authored-by: 孔炳翔<1120200577@qq.com> Co-authored-by: zhengxinqian<qianzhengxin@huawei.com> Co-authored-by: hw_whx<wanghexiang7@huawei.com> Co-authored-by: jgong5<steven.gong@gmail.com> Co-authored-by: hw_whx<2952154980@qq.com> # message auto-generated for no-merge-commit merge: !330 merge master into master 【同步】【非开发代码】代码从 develop 同步到 master Created-by: AvadaKedavrua Commit-by: liujiawang;ascend-robot;AvadaKedavrua;lutean;Horacehxw;eveyin1;minghang_c;zwt__;tt0cool;elrond-g;jia_ya_nan;zhenyu_zhang;ChenHuiwen;wangshen001;Hudingyi;wendellX;Secluded_Ocean;jhon-117;yaohan404;jiangruitao;zhenghaojie;stormchasingg;panyj1993;cmh1056291129;yuyinkai1;sunguozhong;genius52;liu_jiaxu;HongMaoShuiGuai;zhengxinqian;weixin_43368449;jsez-li-bin;jgong5;wqh17101;w00609794;yydyzr;JieZhang679;sppedforcy;gcw_61YBRfIt;Jiong Gong;hw_whx;gongjiong;孔炳翔 Merged-by: ascend-robot Description: 代码从 develop 同步到 master,后续基于 master 演进,并支持打包 See merge request: Ascend/msmodeling!330 | 15 天前 | |
【同步】【非开发代码】代码从 develop 同步到 master Co-authored-by: yydyzr<liuyuncong1@huawei.com> Co-authored-by: gcw_61YBRfIt<chuzhenxing@huawei.com> Co-authored-by: 孔炳翔<1120200577@qq.com> Co-authored-by: zhengxinqian<qianzhengxin@huawei.com> Co-authored-by: hw_whx<wanghexiang7@huawei.com> Co-authored-by: jgong5<steven.gong@gmail.com> Co-authored-by: hw_whx<2952154980@qq.com> # message auto-generated for no-merge-commit merge: !330 merge master into master 【同步】【非开发代码】代码从 develop 同步到 master Created-by: AvadaKedavrua Commit-by: liujiawang;ascend-robot;AvadaKedavrua;lutean;Horacehxw;eveyin1;minghang_c;zwt__;tt0cool;elrond-g;jia_ya_nan;zhenyu_zhang;ChenHuiwen;wangshen001;Hudingyi;wendellX;Secluded_Ocean;jhon-117;yaohan404;jiangruitao;zhenghaojie;stormchasingg;panyj1993;cmh1056291129;yuyinkai1;sunguozhong;genius52;liu_jiaxu;HongMaoShuiGuai;zhengxinqian;weixin_43368449;jsez-li-bin;jgong5;wqh17101;w00609794;yydyzr;JieZhang679;sppedforcy;gcw_61YBRfIt;Jiong Gong;hw_whx;gongjiong;孔炳翔 Merged-by: ascend-robot Description: 代码从 develop 同步到 master,后续基于 master 演进,并支持打包 See merge request: Ascend/msmodeling!330 | 15 天前 | |
【同步】【非开发代码】代码从 develop 同步到 master Co-authored-by: yydyzr<liuyuncong1@huawei.com> Co-authored-by: gcw_61YBRfIt<chuzhenxing@huawei.com> Co-authored-by: 孔炳翔<1120200577@qq.com> Co-authored-by: zhengxinqian<qianzhengxin@huawei.com> Co-authored-by: hw_whx<wanghexiang7@huawei.com> Co-authored-by: jgong5<steven.gong@gmail.com> Co-authored-by: hw_whx<2952154980@qq.com> # message auto-generated for no-merge-commit merge: !330 merge master into master 【同步】【非开发代码】代码从 develop 同步到 master Created-by: AvadaKedavrua Commit-by: liujiawang;ascend-robot;AvadaKedavrua;lutean;Horacehxw;eveyin1;minghang_c;zwt__;tt0cool;elrond-g;jia_ya_nan;zhenyu_zhang;ChenHuiwen;wangshen001;Hudingyi;wendellX;Secluded_Ocean;jhon-117;yaohan404;jiangruitao;zhenghaojie;stormchasingg;panyj1993;cmh1056291129;yuyinkai1;sunguozhong;genius52;liu_jiaxu;HongMaoShuiGuai;zhengxinqian;weixin_43368449;jsez-li-bin;jgong5;wqh17101;w00609794;yydyzr;JieZhang679;sppedforcy;gcw_61YBRfIt;Jiong Gong;hw_whx;gongjiong;孔炳翔 Merged-by: ascend-robot Description: 代码从 develop 同步到 master,后续基于 master 演进,并支持打包 See merge request: Ascend/msmodeling!330 | 15 天前 | |
【同步】【非开发代码】代码从 develop 同步到 master Co-authored-by: yydyzr<liuyuncong1@huawei.com> Co-authored-by: gcw_61YBRfIt<chuzhenxing@huawei.com> Co-authored-by: 孔炳翔<1120200577@qq.com> Co-authored-by: zhengxinqian<qianzhengxin@huawei.com> Co-authored-by: hw_whx<wanghexiang7@huawei.com> Co-authored-by: jgong5<steven.gong@gmail.com> Co-authored-by: hw_whx<2952154980@qq.com> # message auto-generated for no-merge-commit merge: !330 merge master into master 【同步】【非开发代码】代码从 develop 同步到 master Created-by: AvadaKedavrua Commit-by: liujiawang;ascend-robot;AvadaKedavrua;lutean;Horacehxw;eveyin1;minghang_c;zwt__;tt0cool;elrond-g;jia_ya_nan;zhenyu_zhang;ChenHuiwen;wangshen001;Hudingyi;wendellX;Secluded_Ocean;jhon-117;yaohan404;jiangruitao;zhenghaojie;stormchasingg;panyj1993;cmh1056291129;yuyinkai1;sunguozhong;genius52;liu_jiaxu;HongMaoShuiGuai;zhengxinqian;weixin_43368449;jsez-li-bin;jgong5;wqh17101;w00609794;yydyzr;JieZhang679;sppedforcy;gcw_61YBRfIt;Jiong Gong;hw_whx;gongjiong;孔炳翔 Merged-by: ascend-robot Description: 代码从 develop 同步到 master,后续基于 master 演进,并支持打包 See merge request: Ascend/msmodeling!330 | 15 天前 | |
【同步】【非开发代码】代码从 develop 同步到 master Co-authored-by: yydyzr<liuyuncong1@huawei.com> Co-authored-by: gcw_61YBRfIt<chuzhenxing@huawei.com> Co-authored-by: 孔炳翔<1120200577@qq.com> Co-authored-by: zhengxinqian<qianzhengxin@huawei.com> Co-authored-by: hw_whx<wanghexiang7@huawei.com> Co-authored-by: jgong5<steven.gong@gmail.com> Co-authored-by: hw_whx<2952154980@qq.com> # message auto-generated for no-merge-commit merge: !330 merge master into master 【同步】【非开发代码】代码从 develop 同步到 master Created-by: AvadaKedavrua Commit-by: liujiawang;ascend-robot;AvadaKedavrua;lutean;Horacehxw;eveyin1;minghang_c;zwt__;tt0cool;elrond-g;jia_ya_nan;zhenyu_zhang;ChenHuiwen;wangshen001;Hudingyi;wendellX;Secluded_Ocean;jhon-117;yaohan404;jiangruitao;zhenghaojie;stormchasingg;panyj1993;cmh1056291129;yuyinkai1;sunguozhong;genius52;liu_jiaxu;HongMaoShuiGuai;zhengxinqian;weixin_43368449;jsez-li-bin;jgong5;wqh17101;w00609794;yydyzr;JieZhang679;sppedforcy;gcw_61YBRfIt;Jiong Gong;hw_whx;gongjiong;孔炳翔 Merged-by: ascend-robot Description: 代码从 develop 同步到 master,后续基于 master 演进,并支持打包 See merge request: Ascend/msmodeling!330 | 15 天前 | |
【同步】【非开发代码】代码从 develop 同步到 master Co-authored-by: yydyzr<liuyuncong1@huawei.com> Co-authored-by: gcw_61YBRfIt<chuzhenxing@huawei.com> Co-authored-by: 孔炳翔<1120200577@qq.com> Co-authored-by: zhengxinqian<qianzhengxin@huawei.com> Co-authored-by: hw_whx<wanghexiang7@huawei.com> Co-authored-by: jgong5<steven.gong@gmail.com> Co-authored-by: hw_whx<2952154980@qq.com> # message auto-generated for no-merge-commit merge: !330 merge master into master 【同步】【非开发代码】代码从 develop 同步到 master Created-by: AvadaKedavrua Commit-by: liujiawang;ascend-robot;AvadaKedavrua;lutean;Horacehxw;eveyin1;minghang_c;zwt__;tt0cool;elrond-g;jia_ya_nan;zhenyu_zhang;ChenHuiwen;wangshen001;Hudingyi;wendellX;Secluded_Ocean;jhon-117;yaohan404;jiangruitao;zhenghaojie;stormchasingg;panyj1993;cmh1056291129;yuyinkai1;sunguozhong;genius52;liu_jiaxu;HongMaoShuiGuai;zhengxinqian;weixin_43368449;jsez-li-bin;jgong5;wqh17101;w00609794;yydyzr;JieZhang679;sppedforcy;gcw_61YBRfIt;Jiong Gong;hw_whx;gongjiong;孔炳翔 Merged-by: ascend-robot Description: 代码从 develop 同步到 master,后续基于 master 演进,并支持打包 See merge request: Ascend/msmodeling!330 | 15 天前 | |
【同步】【非开发代码】代码从 develop 同步到 master Co-authored-by: yydyzr<liuyuncong1@huawei.com> Co-authored-by: gcw_61YBRfIt<chuzhenxing@huawei.com> Co-authored-by: 孔炳翔<1120200577@qq.com> Co-authored-by: zhengxinqian<qianzhengxin@huawei.com> Co-authored-by: hw_whx<wanghexiang7@huawei.com> Co-authored-by: jgong5<steven.gong@gmail.com> Co-authored-by: hw_whx<2952154980@qq.com> # message auto-generated for no-merge-commit merge: !330 merge master into master 【同步】【非开发代码】代码从 develop 同步到 master Created-by: AvadaKedavrua Commit-by: liujiawang;ascend-robot;AvadaKedavrua;lutean;Horacehxw;eveyin1;minghang_c;zwt__;tt0cool;elrond-g;jia_ya_nan;zhenyu_zhang;ChenHuiwen;wangshen001;Hudingyi;wendellX;Secluded_Ocean;jhon-117;yaohan404;jiangruitao;zhenghaojie;stormchasingg;panyj1993;cmh1056291129;yuyinkai1;sunguozhong;genius52;liu_jiaxu;HongMaoShuiGuai;zhengxinqian;weixin_43368449;jsez-li-bin;jgong5;wqh17101;w00609794;yydyzr;JieZhang679;sppedforcy;gcw_61YBRfIt;Jiong Gong;hw_whx;gongjiong;孔炳翔 Merged-by: ascend-robot Description: 代码从 develop 同步到 master,后续基于 master 演进,并支持打包 See merge request: Ascend/msmodeling!330 | 15 天前 | |
【同步】【非开发代码】代码从 develop 同步到 master Co-authored-by: yydyzr<liuyuncong1@huawei.com> Co-authored-by: gcw_61YBRfIt<chuzhenxing@huawei.com> Co-authored-by: 孔炳翔<1120200577@qq.com> Co-authored-by: zhengxinqian<qianzhengxin@huawei.com> Co-authored-by: hw_whx<wanghexiang7@huawei.com> Co-authored-by: jgong5<steven.gong@gmail.com> Co-authored-by: hw_whx<2952154980@qq.com> # message auto-generated for no-merge-commit merge: !330 merge master into master 【同步】【非开发代码】代码从 develop 同步到 master Created-by: AvadaKedavrua Commit-by: liujiawang;ascend-robot;AvadaKedavrua;lutean;Horacehxw;eveyin1;minghang_c;zwt__;tt0cool;elrond-g;jia_ya_nan;zhenyu_zhang;ChenHuiwen;wangshen001;Hudingyi;wendellX;Secluded_Ocean;jhon-117;yaohan404;jiangruitao;zhenghaojie;stormchasingg;panyj1993;cmh1056291129;yuyinkai1;sunguozhong;genius52;liu_jiaxu;HongMaoShuiGuai;zhengxinqian;weixin_43368449;jsez-li-bin;jgong5;wqh17101;w00609794;yydyzr;JieZhang679;sppedforcy;gcw_61YBRfIt;Jiong Gong;hw_whx;gongjiong;孔炳翔 Merged-by: ascend-robot Description: 代码从 develop 同步到 master,后续基于 master 演进,并支持打包 See merge request: Ascend/msmodeling!330 | 15 天前 | |
【同步】【非开发代码】代码从 develop 同步到 master Co-authored-by: yydyzr<liuyuncong1@huawei.com> Co-authored-by: gcw_61YBRfIt<chuzhenxing@huawei.com> Co-authored-by: 孔炳翔<1120200577@qq.com> Co-authored-by: zhengxinqian<qianzhengxin@huawei.com> Co-authored-by: hw_whx<wanghexiang7@huawei.com> Co-authored-by: jgong5<steven.gong@gmail.com> Co-authored-by: hw_whx<2952154980@qq.com> # message auto-generated for no-merge-commit merge: !330 merge master into master 【同步】【非开发代码】代码从 develop 同步到 master Created-by: AvadaKedavrua Commit-by: liujiawang;ascend-robot;AvadaKedavrua;lutean;Horacehxw;eveyin1;minghang_c;zwt__;tt0cool;elrond-g;jia_ya_nan;zhenyu_zhang;ChenHuiwen;wangshen001;Hudingyi;wendellX;Secluded_Ocean;jhon-117;yaohan404;jiangruitao;zhenghaojie;stormchasingg;panyj1993;cmh1056291129;yuyinkai1;sunguozhong;genius52;liu_jiaxu;HongMaoShuiGuai;zhengxinqian;weixin_43368449;jsez-li-bin;jgong5;wqh17101;w00609794;yydyzr;JieZhang679;sppedforcy;gcw_61YBRfIt;Jiong Gong;hw_whx;gongjiong;孔炳翔 Merged-by: ascend-robot Description: 代码从 develop 同步到 master,后续基于 master 演进,并支持打包 See merge request: Ascend/msmodeling!330 | 15 天前 | |
[feat]qwen3.5精度增强 Co-authored-by: yuyinkai1<769293914@qq.com> # message auto-generated for no-merge-commit merge: !349 merge master into master [feat]qwen3.5精度增强 Created-by: yuyinkai1 Commit-by: yuyinkai1 Merged-by: ascend-robot Description: # PR Template Thanks for your contribution; we appreciate it a lot. The following instructions will make your pull request healthier and help you get feedback more easily. If you do not understand some items, don't worry, just make the pull request and seek help from maintainers. 感谢您的贡献,我们非常重视。以下说明将使您的拉取请求更健康,更易于获得反馈。如果您不理解某些项目,请不要担心,只需提交拉取请求并从维护人员那里寻求帮助即可。 **PR Type / PR类型** - [✅️ ] Feature(功能新增) - [ ] Bugfix(Bug 修复) - [ ] Docs(文档更新) - [ ] CI/CD(持续集成/持续部署) - [ ] Refactor(代码重构) - [ ] Perf(性能优化) - [ ] Test-Cases(测试用例更新) - [ ] Other(其他) ## 🔍 Motivation / 变更动机 **Please describe the motivation of this PR and the goal you want to achieve through this PR.** **QWEN3.5仿真精度和实测prefill<30% decode<20%** ------ ## 📝 Modification / 修改内容 **Please briefly describe what modification is made in this PR.** **QWEN3.5lineattion算子的重构,MTP修复,量化算子没实现修复** ------ ## 📐 Associated Test Results / 关联测试结果 **Please provide the related test results, such as test reports, etc.**  ------ ## 🌟 Use cases (Optional) / 使用案例(可选) **If this PR introduces a new feature, it is better to list some use cases here and update the documentation.** **如果此拉取请求引入了新功能,最好在此处列出一些用例并更新文档。** ------ ## ✅ Checklist / 检查列表 **Before PR**: - [ ] Bug fixes are fully covered by unit tests, the case that causes the bug should be added in the unit tests. / 修复的 Bug 已完全由单元测试覆盖,导致 Bug 的情况应在单元测试中添加。 - [ ] The modification is covered by complete unit tests. If not, please add more unit tests to ensure the correctness. / 此拉取请求中的修改已完全由单元测试覆盖。如果不是,请添加更多单元测试以确保正确性。 - [ ] All relevant documentation (API docs, docstrings, example tutorials) has been updated to reflect these changes. / 所有相关文档(API 文档、文档字符串、示例教程)已更新以反映这些更改。 - [ ] Please ensure code files contain no Chinese comments. / 请保证代码文件中不含中文注释。 ------ See merge request: Ascend/msmodeling!349 | 12 天前 | |
【同步】【非开发代码】代码从 develop 同步到 master Co-authored-by: yydyzr<liuyuncong1@huawei.com> Co-authored-by: gcw_61YBRfIt<chuzhenxing@huawei.com> Co-authored-by: 孔炳翔<1120200577@qq.com> Co-authored-by: zhengxinqian<qianzhengxin@huawei.com> Co-authored-by: hw_whx<wanghexiang7@huawei.com> Co-authored-by: jgong5<steven.gong@gmail.com> Co-authored-by: hw_whx<2952154980@qq.com> # message auto-generated for no-merge-commit merge: !330 merge master into master 【同步】【非开发代码】代码从 develop 同步到 master Created-by: AvadaKedavrua Commit-by: liujiawang;ascend-robot;AvadaKedavrua;lutean;Horacehxw;eveyin1;minghang_c;zwt__;tt0cool;elrond-g;jia_ya_nan;zhenyu_zhang;ChenHuiwen;wangshen001;Hudingyi;wendellX;Secluded_Ocean;jhon-117;yaohan404;jiangruitao;zhenghaojie;stormchasingg;panyj1993;cmh1056291129;yuyinkai1;sunguozhong;genius52;liu_jiaxu;HongMaoShuiGuai;zhengxinqian;weixin_43368449;jsez-li-bin;jgong5;wqh17101;w00609794;yydyzr;JieZhang679;sppedforcy;gcw_61YBRfIt;Jiong Gong;hw_whx;gongjiong;孔炳翔 Merged-by: ascend-robot Description: 代码从 develop 同步到 master,后续基于 master 演进,并支持打包 See merge request: Ascend/msmodeling!330 | 15 天前 |
| 文件 | 最后提交记录 | 最后更新时间 |
|---|---|---|
| 12 天前 | ||
| 15 天前 | ||
| 3 天前 | ||
| 13 小时前 | ||
| 15 天前 | ||
| 15 天前 | ||
| 15 天前 | ||
| 15 天前 | ||
| 15 天前 | ||
| 15 天前 | ||
| 15 天前 | ||
| 15 天前 | ||
| 15 天前 | ||
| 12 天前 | ||
| 15 天前 |