msmodeling/tensor_cast · Ascend/MindStudio-Modeling - AtomGit

ascend-robotperf(tensor_cast): refine sparse attention roofline

文件	最后提交记录	最后更新时间
adapter	[feat]qwen3.5精度增强 Co-authored-by: yuyinkai1<769293914@qq.com> # message auto-generated for no-merge-commit merge: !349 merge master into master [feat]qwen3.5精度增强 Created-by: yuyinkai1 Commit-by: yuyinkai1 Merged-by: ascend-robot Description: # PR Template Thanks for your contribution; we appreciate it a lot. The following instructions will make your pull request healthier and help you get feedback more easily. If you do not understand some items, don't worry, just make the pull request and seek help from maintainers. 感谢您的贡献，我们非常重视。以下说明将使您的拉取请求更健康，更易于获得反馈。如果您不理解某些项目，请不要担心，只需提交拉取请求并从维护人员那里寻求帮助即可。 PR Type / PR类型 - [✅️ ] Feature（功能新增） - [ ] Bugfix（Bug 修复） - [ ] Docs（文档更新） - [ ] CI/CD（持续集成/持续部署） - [ ] Refactor（代码重构） - [ ] Perf（性能优化） - [ ] Test-Cases（测试用例更新） - [ ] Other（其他） ## 🔍 Motivation / 变更动机 Please describe the motivation of this PR and the goal you want to achieve through this PR. QWEN3.5仿真精度和实测prefill<30% decode<20% ------ ## 📝 Modification / 修改内容 Please briefly describe what modification is made in this PR. QWEN3.5lineattion算子的重构,MTP修复，量化算子没实现修复 ------ ## 📐 Associated Test Results / 关联测试结果 Please provide the related test results, such as test reports, etc. ![image.png](https://raw.gitcode.com/user-images/assets/8428112/adf0ef02-e96e-47f8-9984-a3c576f99f7e/image.png 'image.png') ------ ## 🌟 Use cases (Optional) / 使用案例（可选） If this PR introduces a new feature, it is better to list some use cases here and update the documentation. 如果此拉取请求引入了新功能，最好在此处列出一些用例并更新文档。 ------ ## ✅ Checklist / 检查列表 Before PR: - [ ] Bug fixes are fully covered by unit tests, the case that causes the bug should be added in the unit tests. / 修复的 Bug 已完全由单元测试覆盖，导致 Bug 的情况应在单元测试中添加。 - [ ] The modification is covered by complete unit tests. If not, please add more unit tests to ensure the correctness. / 此拉取请求中的修改已完全由单元测试覆盖。如果不是，请添加更多单元测试以确保正确性。 - [ ] All relevant documentation (API docs, docstrings, example tutorials) has been updated to reflect these changes. / 所有相关文档（API 文档、文档字符串、示例教程）已更新以反映这些更改。 - [ ] Please ensure code files contain no Chinese comments. / 请保证代码文件中不含中文注释。 ------ See merge request: Ascend/msmodeling!349	11 天前
compilation	【同步】【非开发代码】代码从 develop 同步到 master Co-authored-by: yydyzr<liuyuncong1@huawei.com> Co-authored-by: gcw_61YBRfIt<chuzhenxing@huawei.com> Co-authored-by: 孔炳翔<1120200577@qq.com> Co-authored-by: zhengxinqian<qianzhengxin@huawei.com> Co-authored-by: hw_whx<wanghexiang7@huawei.com> Co-authored-by: jgong5<steven.gong@gmail.com> Co-authored-by: hw_whx<2952154980@qq.com> # message auto-generated for no-merge-commit merge: !330 merge master into master 【同步】【非开发代码】代码从 develop 同步到 master Created-by: AvadaKedavrua Commit-by: liujiawang;ascend-robot;AvadaKedavrua;lutean;Horacehxw;eveyin1;minghang_c;zwt__;tt0cool;elrond-g;jia_ya_nan;zhenyu_zhang;ChenHuiwen;wangshen001;Hudingyi;wendellX;Secluded_Ocean;jhon-117;yaohan404;jiangruitao;zhenghaojie;stormchasingg;panyj1993;cmh1056291129;yuyinkai1;sunguozhong;genius52;liu_jiaxu;HongMaoShuiGuai;zhengxinqian;weixin_43368449;jsez-li-bin;jgong5;wqh17101;w00609794;yydyzr;JieZhang679;sppedforcy;gcw_61YBRfIt;Jiong Gong;hw_whx;gongjiong;孔炳翔 Merged-by: ascend-robot Description: 代码从 develop 同步到 master，后续基于 master 演进，并支持打包 See merge request: Ascend/msmodeling!330	15 天前
core	perf(tensor_cast): refine sparse attention roofline Model sparse MLA and dsa_indexer paged-cache traffic with calibrated data-movement efficiency so operator and end-to-end estimates align with GLM-5.1 profiling targets. Signed-off-by: minghang_c <chiminghang@h-partners.com> Co-authored-by: minghang_c<chiminghang@h-partners.com> # message auto-generated for no-merge-commit merge: !421 merge develop-on-upstream-master into master perf(tensor_cast): refine sparse attention roofline Created-by: minghang_c Commit-by: minghang_c Merged-by: ascend-robot Description: Thanks for your contribution; we appreciate it a lot. The following instructions will make your pull request healthier and help you get feedback more easily. If you do not understand some items, don't worry, just make the pull request and seek help from maintainers. 感谢您的贡献，我们非常重视。以下说明将使您的拉取请求更健康，更易于获得反馈。如果您不理解某些项目，请不要担心，只需提交拉取请求并从维护人员那里寻求帮助即可。 PR Type / PR类型 - [ ] Feature（功能新增） - [ ] Bugfix（Bug 修复） - [ ] Docs（文档更新） - [ ] CI/CD（持续集成/持续部署） - [ ] Refactor（代码重构） - [x] Perf（性能优化） - [x] Test-Cases（测试用例更新） - [ ] Other（其他） ## 🔍 Motivation / 变更动机 Refine TensorCast roofline modeling for sparse MLA, `dsa_indexer`, and GLM-5-series W4A8 MLA preprocessing so sparse-attention estimates better match operator profiling and end-to-end latency targets while keeping the model based on explicit data-movement and compute-efficiency assumptions. The main modeling gap is that sparse MLA KV reads and `dsa_indexer` historical-cache reads are dominated by random/paged memory access. Treating those bytes as ideal contiguous bandwidth traffic makes the analytic roofline too optimistic, especially for long-context GLM-5.1 prefill/decode scenarios. The latest GLM-5.1 W4A8 validation also showed that `mlapo_quant` needs to model packed W4 weights carefully: the tensor storage dtype is `torch.uint8`, but the logical MMA throughput should follow the INT8 compute path used by existing grouped quant matmul modeling. Otherwise the trace can report `mlapo_quant` MMA time as zero even though the op has nonzero projection MMA work. ------ ## 📝 Modification / 修改内容 - Add sparse/paged KV traffic accounting for MLA with separate decode and prefill data-movement efficiency. - Add `dsa_indexer` historical cache read efficiency modeling and separate append cache/scale write traffic. - Keep `dsa_indexer` block-table traffic covered by generic input memory accounting instead of a separate operator-specific model. - Use decode-only sparse page count for mixed prefill/decode sparse MLA batches. - Use raw sparse-index bytes in the quant/physical MLA path so physical KV/block-table/sparse-index accounting is consistent. - Tighten `dsa_indexer` helper signatures so `request_total_seq_lens` is required where the model depends on it. - Keep generic `tensor_cast.attention.default` accounting unchanged, so non-MLA attention models do not inherit sparse-attention calibration. - Extend GLM-5-series compile handling to cover both `GLM-5` and `GLM-5.1`, while excluding `GLM-5.2` because its config has meaningful indexer/long-context differences. - Refine `mlapo_quant` W4A8 modeling so packed `torch.uint8` weights use the logical INT8 MMA throughput path instead of losing MMA time in trace/statistics. - Add `mlapo`/`mlapo_quant` intermediate memory and static-cost accounting for the fused MLA preprocessing path. - Update related performance-model tests for sparse memory breakdowns and `mlapo`/`mlapo_quant` modeling behavior. ------ ## 📐 Associated Test Results / 关联测试结果 - `uvx --python .venv/bin/python pre-commit run --files tensor_cast/performance_model/__init__.py tests/regression/tensor_cast/test_runtime.py` - Passed after auto-format rerun. - `uv run --group ci --with socksio python -m unittest tests.benchmark.models.test_model_regression` - Log: `/tmp/msmodeling_model_regression_develop_after_pick.log` - `Ran 15 tests in 42.029s` - `OK` - `Total Cases: 15 \| Passed: 15 \| Failed: 0 \| No Baseline: 0` - `* All Operator Checks Passed ` - GLM-5.1 e2e validation across 10 query/context scenarios from 3.5k to 128k after the latest `mlapo_quant` W4A8 modeling update: - Log: `/tmp/msmodeling_glm51_e2e_after_user_change_rerun3.log` - `e2e_count=10` - `mean_e2e_err=28.717478%`, meeting the `≤30%` target. - Earlier GLM-5.1 sparse-attention e2e validation across the same 10 scenarios: - Log: `/tmp/msmodeling_glm51_e2e_26_1_0_latest.log` - `e2e_count=10` - `mean_e2e_err=27.678365%`, meeting the `≤30%` target. - GLM-5 e2e validation after applying the GLM-5-series compile override: - Log: `/tmp/msmodeling_glm5_e2e_with_glm5_override.log` - `e2e_count=10` - `mean_e2e_err=27.678365%`, matching the GLM-5.1 run with the same parameters. - Operator-level validation from the sparse MLA / `dsa_indexer` profiling set: - `mean_operator_err = 6.487008%` - `max_operator_err = 18.658699%` - Meets the `≤20%` target. - Issue #103 2.5K GLM-5.1 scenario: - Prefill analytic result: old roofline `182.377 ms` → new roofline `631.874 ms`; real wall `1225.849 ms`; new roofline/wall `51.55%`. - Decode analytic result: old roofline `48.685 ms` → new roofline `103.071 ms`; real wall `82.528 ms`; new roofline/wall `124.89%`. - Decode compared with kernel sum: new roofline `103.071 ms` vs kernel sum `117.158 ms`, ratio `87.97%`. ------ ## 🌟 Use cases (Optional) / 使用案例（可选） GLM-5.1 sparse attention inference latency estimation for prefill and decode scenarios from 3.5k to 128k context length. The latest e2e analytic results were validated with: `bash .venv/bin/python -m cli.inference.text_generate zai-org/GLM-5.1 \ --device ATLAS_800_A3_752T_128G_DIE \ --num-devices 16 \ --tp-size 16 \ --dp-size 1 \ --ep-size 16 \ --num-queries 1 \ --num-mtp-tokens 3 \ --compile \ --quantize-linear-action W4A8_STATIC \ --dump-input-shapes \ --context-length <context> \ --query-length <query>` \| Scenario \| Query length \| Context length \| Target latency \| Analytic latency \| Relative error \| \|---\|---:\|---:\|---:\|---:\|---:\| \| 3.5k-prefill \| 3500 \| 0 \| `1553.21 ms` \| `1010.00 ms` \| `34.9734%` \| \| 3.5k-decode \| 4 \| 3500 \| `69.90 ms` \| `44.79 ms` \| `35.9270%` \| \| 16k-prefill \| 4096 \| 12000 \| `1867.68 ms` \| `1449.00 ms` \| `22.4171%` \| \| 16k-decode \| 4 \| 16000 \| `68.10 ms` \| `47.22 ms` \| `30.6637%` \| \| 32k-prefill \| 4096 \| 28000 \| `2295.99 ms` \| `1807.00 ms` \| `21.2976%` \| \| 32k-decode \| 4 \| 32000 \| `68.70 ms` \| `47.76 ms` \| `30.4862%` \| \| 64k-prefill \| 4096 \| 60000 \| `3256.48 ms` \| `2522.00 ms` \| `22.5544%` \| \| 64k-decode \| 4 \| 64000 \| `71.70 ms` \| `49.63 ms` \| `30.7768%` \| \| 128k-prefill \| 4096 \| 124000 \| `5341.23 ms` \| `3952.00 ms` \| `26.0096%` \| \| 128k-decode \| 4 \| 128000 \| `78.30 ms` \| `53.19 ms` \| `32.0690%` \| `mean_e2e_err=28.717478%` ------ ## ✅ Checklist / 检查列表 Before PR*: - [ ] Bug fixes are fully covered by unit tests, the case that causes the bug should be added in the unit tests. / 修复的 Bug 已完全由单元测试覆盖，导致 Bug 的情况应在单元测试中添加。 - [x] The modification is covered by validation runs and targeted regression coverage. / 此拉取请求中的修改已通过验证用例和定向回归覆盖。 - [x] All relevant documentation (API docs, docstrings, example tutorials) has been updated to reflect these changes. / 所有相关文档（API 文档、文档字符串、示例教程）已更新以反映这些更改。 - [x] Please ensure code files contain no Chinese comments. / 请保证代码文件中不含中文注释。 ------ See merge request: Ascend/msmodeling!421	2 小时前
custom_model	【同步】【非开发代码】代码从 develop 同步到 master Co-authored-by: yydyzr<liuyuncong1@huawei.com> Co-authored-by: gcw_61YBRfIt<chuzhenxing@huawei.com> Co-authored-by: 孔炳翔<1120200577@qq.com> Co-authored-by: zhengxinqian<qianzhengxin@huawei.com> Co-authored-by: hw_whx<wanghexiang7@huawei.com> Co-authored-by: jgong5<steven.gong@gmail.com> Co-authored-by: hw_whx<2952154980@qq.com> # message auto-generated for no-merge-commit merge: !330 merge master into master 【同步】【非开发代码】代码从 develop 同步到 master Created-by: AvadaKedavrua Commit-by: liujiawang;ascend-robot;AvadaKedavrua;lutean;Horacehxw;eveyin1;minghang_c;zwt__;tt0cool;elrond-g;jia_ya_nan;zhenyu_zhang;ChenHuiwen;wangshen001;Hudingyi;wendellX;Secluded_Ocean;jhon-117;yaohan404;jiangruitao;zhenghaojie;stormchasingg;panyj1993;cmh1056291129;yuyinkai1;sunguozhong;genius52;liu_jiaxu;HongMaoShuiGuai;zhengxinqian;weixin_43368449;jsez-li-bin;jgong5;wqh17101;w00609794;yydyzr;JieZhang679;sppedforcy;gcw_61YBRfIt;Jiong Gong;hw_whx;gongjiong;孔炳翔 Merged-by: ascend-robot Description: 代码从 develop 同步到 master，后续基于 master 演进，并支持打包 See merge request: Ascend/msmodeling!330	15 天前
device_profiles	【同步】【非开发代码】代码从 develop 同步到 master Co-authored-by: yydyzr<liuyuncong1@huawei.com> Co-authored-by: gcw_61YBRfIt<chuzhenxing@huawei.com> Co-authored-by: 孔炳翔<1120200577@qq.com> Co-authored-by: zhengxinqian<qianzhengxin@huawei.com> Co-authored-by: hw_whx<wanghexiang7@huawei.com> Co-authored-by: jgong5<steven.gong@gmail.com> Co-authored-by: hw_whx<2952154980@qq.com> # message auto-generated for no-merge-commit merge: !330 merge master into master 【同步】【非开发代码】代码从 develop 同步到 master Created-by: AvadaKedavrua Commit-by: liujiawang;ascend-robot;AvadaKedavrua;lutean;Horacehxw;eveyin1;minghang_c;zwt__;tt0cool;elrond-g;jia_ya_nan;zhenyu_zhang;ChenHuiwen;wangshen001;Hudingyi;wendellX;Secluded_Ocean;jhon-117;yaohan404;jiangruitao;zhenghaojie;stormchasingg;panyj1993;cmh1056291129;yuyinkai1;sunguozhong;genius52;liu_jiaxu;HongMaoShuiGuai;zhengxinqian;weixin_43368449;jsez-li-bin;jgong5;wqh17101;w00609794;yydyzr;JieZhang679;sppedforcy;gcw_61YBRfIt;Jiong Gong;hw_whx;gongjiong;孔炳翔 Merged-by: ascend-robot Description: 代码从 develop 同步到 master，后续基于 master 演进，并支持打包 See merge request: Ascend/msmodeling!330	15 天前
diffusers	fix(security): add model source safety checks Co-authored-by: jia_ya_nan<jiayanan3@h-partners.com> # message auto-generated for no-merge-commit merge: !385 merge fix/trust-remote-code-safety into master fix(security): add model source safety checks Created-by: jia_ya_nan Commit-by: jia_ya_nan Merged-by: ascend-robot Description: # PR Template Thanks for your contribution; we appreciate it a lot. The following instructions will make your pull request healthier and help you get feedback more easily. If you do not understand some items, don't worry, just make the pull request and seek help from maintainers. 感谢您的贡献，我们非常重视。以下说明将使您的拉取请求更健康，更易于获得反馈。如果您不理解某些项目，请不要担心，只需提交拉取请求并从维护人员那里寻求帮助即可。 PR Type / PR类型 - [ ] Feature（功能新增） - [x] Bugfix（Bug 修复） - [ ] Docs（文档更新） - [ ] CI/CD（持续集成/持续部署） - [ ] Refactor（代码重构） - [ ] Perf（性能优化） - [ ] Test-Cases（测试用例更新） - [x] Other（其他） ## 🔍 Motivation / 变更动机 Please describe the motivation of this PR and the goal you want to achieve through this PR. 请描述您的拉取请求的动机和您希望通过此拉取请求实现的目标。安全加固 ------ ## 📝 Modification / 修改内容 Please briefly describe what modification is made in this PR. 请简要描述此拉取请求中进行的修改。增加本地路径权限校验；增加日志风险提示去掉不维护的老接口 ------ ## 📐 Associated Test Results / 关联测试结果 Please provide the related test results, such as test reports, etc. 请提供相关测试结果，例如测试报告等。 ![image.png](https://raw.gitcode.com/user-images/assets/8428112/ef4f75a5-1346-4320-8de2-a19703ebedb3/image.png 'image.png') ------ ## 🌟 Use cases (Optional) / 使用案例（可选） If this PR introduces a new feature, it is better to list some use cases here and update the documentation. 如果此拉取请求引入了新功能，最好在此处列出一些用例并更新文档。 ------ ## ✅ Checklist / 检查列表 Before PR: - [x] Bug fixes are fully covered by unit tests, the case that causes the bug should be added in the unit tests. / 修复的 Bug 已完全由单元测试覆盖，导致 Bug 的情况应在单元测试中添加。 - [x] The modification is covered by complete unit tests. If not, please add more unit tests to ensure the correctness. / 此拉取请求中的修改已完全由单元测试覆盖。如果不是，请添加更多单元测试以确保正确性。 - [x] All relevant documentation (API docs, docstrings, example tutorials) has been updated to reflect these changes. / 所有相关文档（API 文档、文档字符串、示例教程）已更新以反映这些更改。 - [x] Please ensure code files contain no Chinese comments. / 请保证代码文件中不含中文注释。 ------ See merge request: Ascend/msmodeling!385	3 天前
layers	fix(tensor_cast): model MTP speculative decode shapes Co-authored-by: Horacehxw<horacehxw@gmail.com> # message auto-generated for no-merge-commit merge: !362 merge resolve-issue-130 into master fix(tensor_cast): model MTP speculative decode shapes Created-by: Horacehxw Commit-by: Horacehxw Merged-by: ascend-robot Description: PR Type / PR类型 - [x] Bugfix（Bug 修复） - [x] Refactor（代码重构） - [x] Test-Cases（测试用例更新） ## 🔍 Motivation / 变更动机 MTP（Multi-Token Prediction）speculative decode 在 TensorCast 仿真中，lm_head 和 sampler 的 row selection 逻辑存在问题： 1. target/proposal rows 混淆：MTP decode 时，lm_head 应该只处理 spec window 内的 target+bonus verification rows，而不是全量 packed rows。旧逻辑用 `selected_token_indices` 做 prefill 级的 token 裁剪，但无法区分 target 和 proposal，导致 lm_head 多算了不参与 verification 的行。 2. Sampler 不支持 spec decode 输出格式：旧 Sampler 只做 greedy argmax 取最后一个 token，不支持返回 `(num_requests, num_speculative_tokens + 1)` 形状的 target+bonus tokens。 3. Kimi K2.5 MTP path 全量过 lm_head：Kimi 的 monkey-patch 在 MTP text path 里把全部 hidden states 送进 lm_head（163840 vocab），prefill 时 12×7168×163840 的矩阵乘法被放大 ~3500×。 ------ ## 📝 Modification / 修改内容 ### 核心：统一 row selection 路径 - 新增 `SpecDecodeMetadata` dataclass，记录每个 batch 的 `logits_indices`、`num_active_requests`、`num_speculative_tokens`。 - 新增 `select_lm_head_hidden_states(hidden_states, sampling_metadata, mode)`： - `mode="target"`：选 verification window 全部行（给 lm_head 用） - `mode="proposal"`：只选每个 request 的最后一行（给 MTP predictor 用） - `CausalLmWrapper`、`VLModelWrapper`、`MultiTokenPredictor` 统一调用该函数，替代原来散落各处的 `index_select`。 ### Input Generator - `generate_inputs` / `generate_inputs_varlen` 在 MTP decode 时构造 `SpecDecodeMetadata`，只覆盖每个 request 尾部的 spec window rows。 - 短窗口（query_len < num_mtp_tokens + 1）自动 fallback 到普通 decode selection。 ### Sampler - 识别 `spec_decode_metadata` 后，将 verification logits reshape 为 `(num_requests, spec_window, vocab)`，分别对 target 和 bonus 做 greedy argmax，返回 `(num_requests, spec_window)` 形状。 - 兼容 proposal rows（MTP 后续层）和旧的 `selected_token_indices` prefill 路径。 ### Kimi K2.5 - MTP text path 拆分：先跑 language model body 拿到 full hidden states（rotary/proposal 需要），再用 `select_lm_head_hidden_states` 裁 target rows 后过 lm_head。避免 163840-vocab 的全量投影。 - 删除旧的 `MultiTokenPredictorLayer` tuple-unpack monkey patch（已在上游 mtp.py 修复）。 ------ ## 📐 Associated Test Results / 关联测试结果 `pytest: 26 passed, 2 warnings in 3.72s` 覆盖： - target/proposal row selection 不混用 - default `selected_token_indices=-1` sentinel 被正确忽略 - wrong logits_indices length → ValueError - spec-decode sampler 返回 target+bonus tokens - CausalLmWrapper target row 投影 - MTP wrapper prefill fallback / spec-decode bonus token forward - fixed/varlen input generator MTP metadata 生成 - Kimi text path target rows 在 internal lm_head 前裁剪 - Kimi default sentinel 不触发 fast path ------ ## ✅ Checklist / 检查列表 - [x] Linting tools used / 使用 lintrunner 工具 - [x] Bug fixes covered by unit tests / 修复的 Bug 已由单元测试覆盖 - [x] Modification covered by unit tests / 修改已由单元测试覆盖 - [ ] Documentation updated / 文档已更新 - [x] No Chinese comments in code files / 代码文件中不含中文注释 See merge request: Ascend/msmodeling!362	23 小时前
ops	[feat]qwen3.5精度增强 Co-authored-by: yuyinkai1<769293914@qq.com> # message auto-generated for no-merge-commit merge: !349 merge master into master [feat]qwen3.5精度增强 Created-by: yuyinkai1 Commit-by: yuyinkai1 Merged-by: ascend-robot Description: # PR Template Thanks for your contribution; we appreciate it a lot. The following instructions will make your pull request healthier and help you get feedback more easily. If you do not understand some items, don't worry, just make the pull request and seek help from maintainers. 感谢您的贡献，我们非常重视。以下说明将使您的拉取请求更健康，更易于获得反馈。如果您不理解某些项目，请不要担心，只需提交拉取请求并从维护人员那里寻求帮助即可。 PR Type / PR类型 - [✅️ ] Feature（功能新增） - [ ] Bugfix（Bug 修复） - [ ] Docs（文档更新） - [ ] CI/CD（持续集成/持续部署） - [ ] Refactor（代码重构） - [ ] Perf（性能优化） - [ ] Test-Cases（测试用例更新） - [ ] Other（其他） ## 🔍 Motivation / 变更动机 Please describe the motivation of this PR and the goal you want to achieve through this PR. QWEN3.5仿真精度和实测prefill<30% decode<20% ------ ## 📝 Modification / 修改内容 Please briefly describe what modification is made in this PR. QWEN3.5lineattion算子的重构,MTP修复，量化算子没实现修复 ------ ## 📐 Associated Test Results / 关联测试结果 Please provide the related test results, such as test reports, etc. ![image.png](https://raw.gitcode.com/user-images/assets/8428112/adf0ef02-e96e-47f8-9984-a3c576f99f7e/image.png 'image.png') ------ ## 🌟 Use cases (Optional) / 使用案例（可选） If this PR introduces a new feature, it is better to list some use cases here and update the documentation. 如果此拉取请求引入了新功能，最好在此处列出一些用例并更新文档。 ------ ## ✅ Checklist / 检查列表 Before PR: - [ ] Bug fixes are fully covered by unit tests, the case that causes the bug should be added in the unit tests. / 修复的 Bug 已完全由单元测试覆盖，导致 Bug 的情况应在单元测试中添加。 - [ ] The modification is covered by complete unit tests. If not, please add more unit tests to ensure the correctness. / 此拉取请求中的修改已完全由单元测试覆盖。如果不是，请添加更多单元测试以确保正确性。 - [ ] All relevant documentation (API docs, docstrings, example tutorials) has been updated to reflect these changes. / 所有相关文档（API 文档、文档字符串、示例教程）已更新以反映这些更改。 - [ ] Please ensure code files contain no Chinese comments. / 请保证代码文件中不含中文注释。 ------ See merge request: Ascend/msmodeling!349	11 天前
performance_model	perf(tensor_cast): refine sparse attention roofline Model sparse MLA and dsa_indexer paged-cache traffic with calibrated data-movement efficiency so operator and end-to-end estimates align with GLM-5.1 profiling targets. Signed-off-by: minghang_c <chiminghang@h-partners.com> Co-authored-by: minghang_c<chiminghang@h-partners.com> # message auto-generated for no-merge-commit merge: !421 merge develop-on-upstream-master into master perf(tensor_cast): refine sparse attention roofline Created-by: minghang_c Commit-by: minghang_c Merged-by: ascend-robot Description: Thanks for your contribution; we appreciate it a lot. The following instructions will make your pull request healthier and help you get feedback more easily. If you do not understand some items, don't worry, just make the pull request and seek help from maintainers. 感谢您的贡献，我们非常重视。以下说明将使您的拉取请求更健康，更易于获得反馈。如果您不理解某些项目，请不要担心，只需提交拉取请求并从维护人员那里寻求帮助即可。 PR Type / PR类型 - [ ] Feature（功能新增） - [ ] Bugfix（Bug 修复） - [ ] Docs（文档更新） - [ ] CI/CD（持续集成/持续部署） - [ ] Refactor（代码重构） - [x] Perf（性能优化） - [x] Test-Cases（测试用例更新） - [ ] Other（其他） ## 🔍 Motivation / 变更动机 Refine TensorCast roofline modeling for sparse MLA, `dsa_indexer`, and GLM-5-series W4A8 MLA preprocessing so sparse-attention estimates better match operator profiling and end-to-end latency targets while keeping the model based on explicit data-movement and compute-efficiency assumptions. The main modeling gap is that sparse MLA KV reads and `dsa_indexer` historical-cache reads are dominated by random/paged memory access. Treating those bytes as ideal contiguous bandwidth traffic makes the analytic roofline too optimistic, especially for long-context GLM-5.1 prefill/decode scenarios. The latest GLM-5.1 W4A8 validation also showed that `mlapo_quant` needs to model packed W4 weights carefully: the tensor storage dtype is `torch.uint8`, but the logical MMA throughput should follow the INT8 compute path used by existing grouped quant matmul modeling. Otherwise the trace can report `mlapo_quant` MMA time as zero even though the op has nonzero projection MMA work. ------ ## 📝 Modification / 修改内容 - Add sparse/paged KV traffic accounting for MLA with separate decode and prefill data-movement efficiency. - Add `dsa_indexer` historical cache read efficiency modeling and separate append cache/scale write traffic. - Keep `dsa_indexer` block-table traffic covered by generic input memory accounting instead of a separate operator-specific model. - Use decode-only sparse page count for mixed prefill/decode sparse MLA batches. - Use raw sparse-index bytes in the quant/physical MLA path so physical KV/block-table/sparse-index accounting is consistent. - Tighten `dsa_indexer` helper signatures so `request_total_seq_lens` is required where the model depends on it. - Keep generic `tensor_cast.attention.default` accounting unchanged, so non-MLA attention models do not inherit sparse-attention calibration. - Extend GLM-5-series compile handling to cover both `GLM-5` and `GLM-5.1`, while excluding `GLM-5.2` because its config has meaningful indexer/long-context differences. - Refine `mlapo_quant` W4A8 modeling so packed `torch.uint8` weights use the logical INT8 MMA throughput path instead of losing MMA time in trace/statistics. - Add `mlapo`/`mlapo_quant` intermediate memory and static-cost accounting for the fused MLA preprocessing path. - Update related performance-model tests for sparse memory breakdowns and `mlapo`/`mlapo_quant` modeling behavior. ------ ## 📐 Associated Test Results / 关联测试结果 - `uvx --python .venv/bin/python pre-commit run --files tensor_cast/performance_model/__init__.py tests/regression/tensor_cast/test_runtime.py` - Passed after auto-format rerun. - `uv run --group ci --with socksio python -m unittest tests.benchmark.models.test_model_regression` - Log: `/tmp/msmodeling_model_regression_develop_after_pick.log` - `Ran 15 tests in 42.029s` - `OK` - `Total Cases: 15 \| Passed: 15 \| Failed: 0 \| No Baseline: 0` - `* All Operator Checks Passed ` - GLM-5.1 e2e validation across 10 query/context scenarios from 3.5k to 128k after the latest `mlapo_quant` W4A8 modeling update: - Log: `/tmp/msmodeling_glm51_e2e_after_user_change_rerun3.log` - `e2e_count=10` - `mean_e2e_err=28.717478%`, meeting the `≤30%` target. - Earlier GLM-5.1 sparse-attention e2e validation across the same 10 scenarios: - Log: `/tmp/msmodeling_glm51_e2e_26_1_0_latest.log` - `e2e_count=10` - `mean_e2e_err=27.678365%`, meeting the `≤30%` target. - GLM-5 e2e validation after applying the GLM-5-series compile override: - Log: `/tmp/msmodeling_glm5_e2e_with_glm5_override.log` - `e2e_count=10` - `mean_e2e_err=27.678365%`, matching the GLM-5.1 run with the same parameters. - Operator-level validation from the sparse MLA / `dsa_indexer` profiling set: - `mean_operator_err = 6.487008%` - `max_operator_err = 18.658699%` - Meets the `≤20%` target. - Issue #103 2.5K GLM-5.1 scenario: - Prefill analytic result: old roofline `182.377 ms` → new roofline `631.874 ms`; real wall `1225.849 ms`; new roofline/wall `51.55%`. - Decode analytic result: old roofline `48.685 ms` → new roofline `103.071 ms`; real wall `82.528 ms`; new roofline/wall `124.89%`. - Decode compared with kernel sum: new roofline `103.071 ms` vs kernel sum `117.158 ms`, ratio `87.97%`. ------ ## 🌟 Use cases (Optional) / 使用案例（可选） GLM-5.1 sparse attention inference latency estimation for prefill and decode scenarios from 3.5k to 128k context length. The latest e2e analytic results were validated with: `bash .venv/bin/python -m cli.inference.text_generate zai-org/GLM-5.1 \ --device ATLAS_800_A3_752T_128G_DIE \ --num-devices 16 \ --tp-size 16 \ --dp-size 1 \ --ep-size 16 \ --num-queries 1 \ --num-mtp-tokens 3 \ --compile \ --quantize-linear-action W4A8_STATIC \ --dump-input-shapes \ --context-length <context> \ --query-length <query>` \| Scenario \| Query length \| Context length \| Target latency \| Analytic latency \| Relative error \| \|---\|---:\|---:\|---:\|---:\|---:\| \| 3.5k-prefill \| 3500 \| 0 \| `1553.21 ms` \| `1010.00 ms` \| `34.9734%` \| \| 3.5k-decode \| 4 \| 3500 \| `69.90 ms` \| `44.79 ms` \| `35.9270%` \| \| 16k-prefill \| 4096 \| 12000 \| `1867.68 ms` \| `1449.00 ms` \| `22.4171%` \| \| 16k-decode \| 4 \| 16000 \| `68.10 ms` \| `47.22 ms` \| `30.6637%` \| \| 32k-prefill \| 4096 \| 28000 \| `2295.99 ms` \| `1807.00 ms` \| `21.2976%` \| \| 32k-decode \| 4 \| 32000 \| `68.70 ms` \| `47.76 ms` \| `30.4862%` \| \| 64k-prefill \| 4096 \| 60000 \| `3256.48 ms` \| `2522.00 ms` \| `22.5544%` \| \| 64k-decode \| 4 \| 64000 \| `71.70 ms` \| `49.63 ms` \| `30.7768%` \| \| 128k-prefill \| 4096 \| 124000 \| `5341.23 ms` \| `3952.00 ms` \| `26.0096%` \| \| 128k-decode \| 4 \| 128000 \| `78.30 ms` \| `53.19 ms` \| `32.0690%` \| `mean_e2e_err=28.717478%` ------ ## ✅ Checklist / 检查列表 Before PR*: - [ ] Bug fixes are fully covered by unit tests, the case that causes the bug should be added in the unit tests. / 修复的 Bug 已完全由单元测试覆盖，导致 Bug 的情况应在单元测试中添加。 - [x] The modification is covered by validation runs and targeted regression coverage. / 此拉取请求中的修改已通过验证用例和定向回归覆盖。 - [x] All relevant documentation (API docs, docstrings, example tutorials) has been updated to reflect these changes. / 所有相关文档（API 文档、文档字符串、示例教程）已更新以反映这些更改。 - [x] Please ensure code files contain no Chinese comments. / 请保证代码文件中不含中文注释。 ------ See merge request: Ascend/msmodeling!421	2 小时前
transformers	fix(tensor_cast): model MTP speculative decode shapes Co-authored-by: Horacehxw<horacehxw@gmail.com> # message auto-generated for no-merge-commit merge: !362 merge resolve-issue-130 into master fix(tensor_cast): model MTP speculative decode shapes Created-by: Horacehxw Commit-by: Horacehxw Merged-by: ascend-robot Description: PR Type / PR类型 - [x] Bugfix（Bug 修复） - [x] Refactor（代码重构） - [x] Test-Cases（测试用例更新） ## 🔍 Motivation / 变更动机 MTP（Multi-Token Prediction）speculative decode 在 TensorCast 仿真中，lm_head 和 sampler 的 row selection 逻辑存在问题： 1. target/proposal rows 混淆：MTP decode 时，lm_head 应该只处理 spec window 内的 target+bonus verification rows，而不是全量 packed rows。旧逻辑用 `selected_token_indices` 做 prefill 级的 token 裁剪，但无法区分 target 和 proposal，导致 lm_head 多算了不参与 verification 的行。 2. Sampler 不支持 spec decode 输出格式：旧 Sampler 只做 greedy argmax 取最后一个 token，不支持返回 `(num_requests, num_speculative_tokens + 1)` 形状的 target+bonus tokens。 3. Kimi K2.5 MTP path 全量过 lm_head：Kimi 的 monkey-patch 在 MTP text path 里把全部 hidden states 送进 lm_head（163840 vocab），prefill 时 12×7168×163840 的矩阵乘法被放大 ~3500×。 ------ ## 📝 Modification / 修改内容 ### 核心：统一 row selection 路径 - 新增 `SpecDecodeMetadata` dataclass，记录每个 batch 的 `logits_indices`、`num_active_requests`、`num_speculative_tokens`。 - 新增 `select_lm_head_hidden_states(hidden_states, sampling_metadata, mode)`： - `mode="target"`：选 verification window 全部行（给 lm_head 用） - `mode="proposal"`：只选每个 request 的最后一行（给 MTP predictor 用） - `CausalLmWrapper`、`VLModelWrapper`、`MultiTokenPredictor` 统一调用该函数，替代原来散落各处的 `index_select`。 ### Input Generator - `generate_inputs` / `generate_inputs_varlen` 在 MTP decode 时构造 `SpecDecodeMetadata`，只覆盖每个 request 尾部的 spec window rows。 - 短窗口（query_len < num_mtp_tokens + 1）自动 fallback 到普通 decode selection。 ### Sampler - 识别 `spec_decode_metadata` 后，将 verification logits reshape 为 `(num_requests, spec_window, vocab)`，分别对 target 和 bonus 做 greedy argmax，返回 `(num_requests, spec_window)` 形状。 - 兼容 proposal rows（MTP 后续层）和旧的 `selected_token_indices` prefill 路径。 ### Kimi K2.5 - MTP text path 拆分：先跑 language model body 拿到 full hidden states（rotary/proposal 需要），再用 `select_lm_head_hidden_states` 裁 target rows 后过 lm_head。避免 163840-vocab 的全量投影。 - 删除旧的 `MultiTokenPredictorLayer` tuple-unpack monkey patch（已在上游 mtp.py 修复）。 ------ ## 📐 Associated Test Results / 关联测试结果 `pytest: 26 passed, 2 warnings in 3.72s` 覆盖： - target/proposal row selection 不混用 - default `selected_token_indices=-1` sentinel 被正确忽略 - wrong logits_indices length → ValueError - spec-decode sampler 返回 target+bonus tokens - CausalLmWrapper target row 投影 - MTP wrapper prefill fallback / spec-decode bonus token forward - fixed/varlen input generator MTP metadata 生成 - Kimi text path target rows 在 internal lm_head 前裁剪 - Kimi default sentinel 不触发 fast path ------ ## ✅ Checklist / 检查列表 - [x] Linting tools used / 使用 lintrunner 工具 - [x] Bug fixes covered by unit tests / 修复的 Bug 已由单元测试覆盖 - [x] Modification covered by unit tests / 修改已由单元测试覆盖 - [ ] Documentation updated / 文档已更新 - [x] No Chinese comments in code files / 代码文件中不含中文注释 See merge request: Ascend/msmodeling!362	23 小时前
__init__.py	【同步】【非开发代码】代码从 develop 同步到 master Co-authored-by: yydyzr<liuyuncong1@huawei.com> Co-authored-by: gcw_61YBRfIt<chuzhenxing@huawei.com> Co-authored-by: 孔炳翔<1120200577@qq.com> Co-authored-by: zhengxinqian<qianzhengxin@huawei.com> Co-authored-by: hw_whx<wanghexiang7@huawei.com> Co-authored-by: jgong5<steven.gong@gmail.com> Co-authored-by: hw_whx<2952154980@qq.com> # message auto-generated for no-merge-commit merge: !330 merge master into master 【同步】【非开发代码】代码从 develop 同步到 master Created-by: AvadaKedavrua Commit-by: liujiawang;ascend-robot;AvadaKedavrua;lutean;Horacehxw;eveyin1;minghang_c;zwt__;tt0cool;elrond-g;jia_ya_nan;zhenyu_zhang;ChenHuiwen;wangshen001;Hudingyi;wendellX;Secluded_Ocean;jhon-117;yaohan404;jiangruitao;zhenghaojie;stormchasingg;panyj1993;cmh1056291129;yuyinkai1;sunguozhong;genius52;liu_jiaxu;HongMaoShuiGuai;zhengxinqian;weixin_43368449;jsez-li-bin;jgong5;wqh17101;w00609794;yydyzr;JieZhang679;sppedforcy;gcw_61YBRfIt;Jiong Gong;hw_whx;gongjiong;孔炳翔 Merged-by: ascend-robot Description: 代码从 develop 同步到 master，后续基于 master 演进，并支持打包 See merge request: Ascend/msmodeling!330	15 天前
config.py	【同步】【非开发代码】代码从 develop 同步到 master Co-authored-by: yydyzr<liuyuncong1@huawei.com> Co-authored-by: gcw_61YBRfIt<chuzhenxing@huawei.com> Co-authored-by: 孔炳翔<1120200577@qq.com> Co-authored-by: zhengxinqian<qianzhengxin@huawei.com> Co-authored-by: hw_whx<wanghexiang7@huawei.com> Co-authored-by: jgong5<steven.gong@gmail.com> Co-authored-by: hw_whx<2952154980@qq.com> # message auto-generated for no-merge-commit merge: !330 merge master into master 【同步】【非开发代码】代码从 develop 同步到 master Created-by: AvadaKedavrua Commit-by: liujiawang;ascend-robot;AvadaKedavrua;lutean;Horacehxw;eveyin1;minghang_c;zwt__;tt0cool;elrond-g;jia_ya_nan;zhenyu_zhang;ChenHuiwen;wangshen001;Hudingyi;wendellX;Secluded_Ocean;jhon-117;yaohan404;jiangruitao;zhenghaojie;stormchasingg;panyj1993;cmh1056291129;yuyinkai1;sunguozhong;genius52;liu_jiaxu;HongMaoShuiGuai;zhengxinqian;weixin_43368449;jsez-li-bin;jgong5;wqh17101;w00609794;yydyzr;JieZhang679;sppedforcy;gcw_61YBRfIt;Jiong Gong;hw_whx;gongjiong;孔炳翔 Merged-by: ascend-robot Description: 代码从 develop 同步到 master，后续基于 master 演进，并支持打包 See merge request: Ascend/msmodeling!330	15 天前
device.py	【同步】【非开发代码】代码从 develop 同步到 master Co-authored-by: yydyzr<liuyuncong1@huawei.com> Co-authored-by: gcw_61YBRfIt<chuzhenxing@huawei.com> Co-authored-by: 孔炳翔<1120200577@qq.com> Co-authored-by: zhengxinqian<qianzhengxin@huawei.com> Co-authored-by: hw_whx<wanghexiang7@huawei.com> Co-authored-by: jgong5<steven.gong@gmail.com> Co-authored-by: hw_whx<2952154980@qq.com> # message auto-generated for no-merge-commit merge: !330 merge master into master 【同步】【非开发代码】代码从 develop 同步到 master Created-by: AvadaKedavrua Commit-by: liujiawang;ascend-robot;AvadaKedavrua;lutean;Horacehxw;eveyin1;minghang_c;zwt__;tt0cool;elrond-g;jia_ya_nan;zhenyu_zhang;ChenHuiwen;wangshen001;Hudingyi;wendellX;Secluded_Ocean;jhon-117;yaohan404;jiangruitao;zhenghaojie;stormchasingg;panyj1993;cmh1056291129;yuyinkai1;sunguozhong;genius52;liu_jiaxu;HongMaoShuiGuai;zhengxinqian;weixin_43368449;jsez-li-bin;jgong5;wqh17101;w00609794;yydyzr;JieZhang679;sppedforcy;gcw_61YBRfIt;Jiong Gong;hw_whx;gongjiong;孔炳翔 Merged-by: ascend-robot Description: 代码从 develop 同步到 master，后续基于 master 演进，并支持打包 See merge request: Ascend/msmodeling!330	15 天前
model_config.py	fix(tensor_cast): fix shared expert tensor parallelism mismatch and routing alignment for DeepSeek V4 Co-authored-by: jhon-117<fangkai15@huawei.com> # message auto-generated for no-merge-commit merge: !414 merge bugfix/20260624-text-generate-fix into master fix(tensor_cast): fix shared expert tensor parallelism mismatch and routing alignment for DeepSeek V4 Created-by: jhon-117 Commit-by: jhon-117 Merged-by: ascend-robot Description: ## Description This PR fixes an execution crash and incorrect computation behavior when running DeepSeek V4 with `--enable-shared-expert-tp`. ### Background & Root Causes When simulating DeepSeek V4 with shared expert tensor parallelism enabled, the model threw an expected `1564672 elements, got 12517376` (an 8x difference matching `tp_size=8`) shape mismatch error and caused a TorchDynamo Graph Break. This was traced back to three interrelated issues: 1. Routing Slice Misalignment (`moe_layer.py`): DeepSeek V4 utilizes a hash-based gate router requiring `route_after_dp_transform=False`. In the manual routing execution branch for shared expert TP, `route` was correctly invoked before the `hidden_states` DP slice (`_dp_transform_enter`), but the resulting `topk_indices` and `topk_weights` were never subsequently sliced. This resulted in the token dispatching logic receiving 1/8th of the tokens alongside an unsliced 8/8 routing matrix, blowing up the combined tensor shape. 2. Shared Experts TP Match Failure (`transformations.py`): The `tp_plan` matching pattern for shared experts explicitly looked for `..mlp.fused_moe.shared_experts.gate_proj`. However, DeepSeek V4 mounts its shared experts directly under `mlp.shared_experts` as a standard MLP block. Because the regex failed to match, the shared experts were executed densely (unsliced) and mistakenly accumulated across the DP/TP domains. 3. Graph Break (`moe_layer.py`): The fallback safety `logger.warning` checking for the aforementioned shape mismatch triggered a `torch._dynamo.exc.Unsupported` graph break, preventing full-graph compilation. ### Proposed Changes - Fix topk tensors slicing:* Invoked `_dp_transform_enter(topk_indices)` and `_dp_transform_enter(topk_weights)` when `route_after_dp` is False, ensuring token alignment strictly matches the `hidden_states` sequences. - Broaden shared experts match pattern: Updated the shared experts matching rule in `tp_plan` from `f"{prefix}..mlp.fused_moe.shared_experts.gate_proj"` to `f"{prefix}..shared_expert.gate_proj"`, perfectly accommodating both DeepSeek V4 and standard architectural layouts. - Safeguard Dynamo Compilation:* Wrapped the shape mismatch `logger.warning` in a `if not torch.compiler.is_compiling():` block. This eliminates compilation graph breaks while preserving the log for eager mode debugging. ### Testing - [x] Verified `cli.inference.text_generate "deepseek-ai/DeepSeek-V4-Flash"` with `--enable-shared-expert-tp` successfully compiles and outputs correct TPS metrics. - [x] Confirmed `torch.compile` finishes without graph breaks related to the `logger.warning` call. #### 关于测试用例补充现有测试不需要修改：并没有改变接口协议（API）或算子输出的预期形状规则，而是修复了在某个特定配置下的内部对齐错误。是否补充新测试：针对 enable_shared_expert_tp=True 且 route_after_dp_transform=False 这种极度特定的组合条件，已有 cli.inference.text_generate 的端到端（E2E）全量仿真流，可完全看护。总体建议是：目前不需要特意补充。现有的修复已经通过了完整的端到端仿真，并且相关模块的回归测试也全部通过了。 ![image.png](https://raw.gitcode.com/user-images/assets/8428112/9f0000d9-e6c9-49ab-80f6-a5cca5dbfa09/image.png 'image.png') See merge request: Ascend/msmodeling!414	1 天前
model_hub.py	【REFACTOR】重构 CI gate 与 test_map 同步基础设施 Co-authored-by: AvadaKedavrua<anonymousdev@163.com> # message auto-generated for no-merge-commit merge: !394 merge fix into master 【REFACTOR】重构 CI gate 与 test_map 同步基础设施 Created-by: AvadaKedavrua Commit-by: liujiawang;AvadaKedavrua Merged-by: ascend-robot Description: ## 修改原因 CI gate 的 diff 分类、策略校验、PR 评论、test_map 同步等逻辑耦合在 `gate_policy.py` 等单体模块中，难以独立演进与单测；`test_map` 缺少独立同步入口；`ast_utils` / `test_map_loader` / nightly 报告链路也需要与 gate 策略对齐。本 PR 仅包含 commit `82757732b6b6beb79b7083f6046e9cd9c72005f3`（`refactor`），不涉及 wheel/CLI/OptiX 变更。 --- ## test_map schema 重构：背景与收益 ### 背景旧 `test_map` 以产品源文件为顶层 key，value 为 `symbol → [test_nodes]`： `json { "tensor_cast/foo.py": { "Widget::run": ["tests/regression/.../test_x.py::test_foo"] } }` 新 `test_map` 改为以 pytest node id 为顶层 key，value 为 `source_file → [symbols]`，与 `coverage.py --cov-context=test` 的采集方向一致： `json { "tests/regression/.../test_x.py::test_foo": { "tensor_cast/foo.py": ["Widget::run"] } }` 两者存的是同一张「测试节点 ↔ (源文件, symbol)」二部图，仅主键方向相反；边数相同，语义不变。 ### 为何行数变多、体积反而更小 \| 指标 \| 旧 schema（source-oriented） \| 新 schema（node-oriented） \| \|------\|------------------------------\|----------------------------\| \| JSON 行数 \| ~7 万行 \| ~10 万行 \| \| 文件/内存占用 \| 8,355,104 B（~8.0 MB） \| 3,748,221 B（~3.6 MB，约 -55%） \| 行数不是体积的可靠 proxy。旧格式在每个 symbol 下重复存储完整 pytest node id（`tests/regression/.../test_xxx.py::Class::test_yyy`，通常 60–80 字符）；同一 test 覆盖 N 个 symbol 时，该长字符串出现 N 次。新格式每个 test node id 只作为顶层 key 出现一次，数组里存的是短 canonical symbol（如 `Widget::run`、`%`），边数 E 不变但长字符串重复次数从 O(E) 降到 O(T)（T = 有覆盖的 test 数，T ≪ E）。典型场景：大量 smoke/regression 用例通过 import 共享 `tensor_cast/`、`cli/` 等模块的 module symbol `%`——旧格式在单个 symbol 下聚合成千上万条 test id；新格式每个 test 只记一次短 symbol，整体字节数显著下降。 ### 工程收益 1. 构建零 pivot：`build_test_map.collect_from_coverage` 直接按 coverage context（test node）聚合 `by_test[nid][source].add(symbol)`，与 nightly phase1 采集路径一致。 2. 增量 sync 更自然：`sync.apply_incremental_test_map_update` 对 touched test 文件整 node 替换、对 touched product 文件按 `(test_node, source_path)` 合并，无需在两种索引间来回转换。 3. 删 test / 冗余检测更直接：`gate_deleted_tests`、`detect_redundant_cases` 按 test footprint 遍历；`test_map_loader` 强制顶层为 `tests/...::...`，可拒绝误写成 source key 的脏数据。 4. CI gate 查询无退化：`symbol → tests` 反向查询通过 `build_test_map_index` 一次 O(N) 建索引，与旧 schema 运行时等价。 5. 存储与传输更轻：实测文件体积约减半，OBS 下载与 `json.loads` 内存峰值更低。 --- ## 修改内容 ### CI gate 模块拆分 - `gate_policy.py` 瘦身：策略加载/校验迁入 `policy.py`，diff 分类迁入 `classifier.py`，GitCode PR 评论迁入 `comments.py`，test_map 查询迁入 `test_map_query.py` - 新增 `sync.py` + `scripts/run_test_map_sync.sh`：按目标分支 HEAD 维护权威 `test_map`（`--once` / `--watch`） - `diff.py` 精简，仅保留 git diff 与分支检出能力 - `main.py` / `rules.py` / `models.py` / `errors.py` 适配新模块边界 - `tests/.ci/gate_policy.yaml` 策略字段同步调整 ### 公共 helpers 增强 - `_config.py`：手写 env 解析改为 `pydantic-settings`，统一校验与错误信息 - `ast_utils.py`：扩展符号提取能力，支撑 test_map 粒度映射 - `build_test_map.py` / `test_map_loader.py`：重构收集与加载逻辑（node-oriented schema） - 新增 `test_map_report.py`：test_map 覆盖率汇总、过期豁免检测 - `coverage_symbol_check.py` / `pytest_runner.py` / `test_map_config.py` 对齐新数据结构 - `common/_logging.py`：补充 `log_env_audit` 环境审计日志 ### nightly 报告链路 - `report_builder.py` / `pytest_parser.py` / `main.py`：适配 node-oriented test_map 报告 - `report_models.py` / `feishu_notifier.py` 小幅对齐 ### symbol 校验与 exemption 漂移（新增） - last-wins canonical symbols：重复 `def foo` 仅 gate 最后一个定义；shadowed def 发非 blocking GitCode 评论 - `def _` 消歧：无 decorator → `_`；有 decorator → `_@<suffix>` - exemption 校验：`path::symbol` 整串须为 AST canonical name；coverage omit 路径禁止写入 `exemptions.sources` - exemption 漂移 blocking：PR 删/改名 product/test 文件时，未同步更新 `gate_policy.yaml` → `[ED]` 硬阻断 + GitCode 评论 - Expected/Got：Config / policy / loader 类型与值错误统一格式 - 文档：`scripts/README.md` 补充 `build.sh`、`MSMODELING_WHEEL_OUTPUT_DIR`；`tests/README.md` 补充 symbol 契约 ### 其他 - `scripts/run_ci_gate.sh`：入口参数对齐 - `scripts/prefetch_model_configs.py`：适配新 config 加载 - `tests/README.md` / `tests/SKILL.md`：文档同步 - 新增/更新回归测试：`test_classifier.py`、`test_comments.py`、`test_policy.py`、`test_sync.py`、`test_test_map_query.py`、`test_test_map_report.py` 等 --- ## 自验证 ### CI gate / test_map / nightly 回归测试目的：确认模块拆分后行为不变，新模块有单测覆盖步骤： 1. 在仓库根目录执行： `sh uv run pytest tests/regression/scripts/helpers/ci_gate/ \ tests/regression/scripts/helpers/common/test_ast_utils.py \ tests/regression/scripts/helpers/common/test_test_map_loader.py \ tests/regression/scripts/helpers/nightly/test_report_builder.py -q` 2. 检查退出码与通过数结果： `206 passed, 7 warnings in 0.43s` ### symbol 校验 / exemption 漂移回归 `sh uv run pytest tests/regression/scripts/helpers/ci_gate/ \ tests/regression/scripts/helpers/common/test_ast_utils.py \ tests/regression/scripts/helpers/common/test_test_map_loader.py \ tests/regression/scripts/helpers/test_config.py -q` 结果： `239 passed, 8 warnings in 1.05s` ### test_map 同步入口目的：确认 `run_test_map_sync.sh` 可正常拉起 `sync.py` 步骤： 1. 设置 `MSMODELING_TEST_MAP_PATH` 指向有效 test_map JSON 2. 执行 `bash scripts/run_test_map_sync.sh --once` 结果：脚本入口与 `sync.py` CLI 已随 PR 提交；完整同步需 CI/nightly 环境提供有效 test_map 文件 See merge request: Ascend/msmodeling!394	3 天前
parallel_group.py	[feat]qwen3.5精度增强 Co-authored-by: yuyinkai1<769293914@qq.com> # message auto-generated for no-merge-commit merge: !349 merge master into master [feat]qwen3.5精度增强 Created-by: yuyinkai1 Commit-by: yuyinkai1 Merged-by: ascend-robot Description: # PR Template Thanks for your contribution; we appreciate it a lot. The following instructions will make your pull request healthier and help you get feedback more easily. If you do not understand some items, don't worry, just make the pull request and seek help from maintainers. 感谢您的贡献，我们非常重视。以下说明将使您的拉取请求更健康，更易于获得反馈。如果您不理解某些项目，请不要担心，只需提交拉取请求并从维护人员那里寻求帮助即可。 PR Type / PR类型 - [✅️ ] Feature（功能新增） - [ ] Bugfix（Bug 修复） - [ ] Docs（文档更新） - [ ] CI/CD（持续集成/持续部署） - [ ] Refactor（代码重构） - [ ] Perf（性能优化） - [ ] Test-Cases（测试用例更新） - [ ] Other（其他） ## 🔍 Motivation / 变更动机 Please describe the motivation of this PR and the goal you want to achieve through this PR. QWEN3.5仿真精度和实测prefill<30% decode<20% ------ ## 📝 Modification / 修改内容 Please briefly describe what modification is made in this PR. QWEN3.5lineattion算子的重构,MTP修复，量化算子没实现修复 ------ ## 📐 Associated Test Results / 关联测试结果 Please provide the related test results, such as test reports, etc. ![image.png](https://raw.gitcode.com/user-images/assets/8428112/adf0ef02-e96e-47f8-9984-a3c576f99f7e/image.png 'image.png') ------ ## 🌟 Use cases (Optional) / 使用案例（可选） If this PR introduces a new feature, it is better to list some use cases here and update the documentation. 如果此拉取请求引入了新功能，最好在此处列出一些用例并更新文档。 ------ ## ✅ Checklist / 检查列表 Before PR: - [ ] Bug fixes are fully covered by unit tests, the case that causes the bug should be added in the unit tests. / 修复的 Bug 已完全由单元测试覆盖，导致 Bug 的情况应在单元测试中添加。 - [ ] The modification is covered by complete unit tests. If not, please add more unit tests to ensure the correctness. / 此拉取请求中的修改已完全由单元测试覆盖。如果不是，请添加更多单元测试以确保正确性。 - [ ] All relevant documentation (API docs, docstrings, example tutorials) has been updated to reflect these changes. / 所有相关文档（API 文档、文档字符串、示例教程）已更新以反映这些更改。 - [ ] Please ensure code files contain no Chinese comments. / 请保证代码文件中不含中文注释。 ------ See merge request: Ascend/msmodeling!349	11 天前
patch_torch.py	【同步】【非开发代码】代码从 develop 同步到 master Co-authored-by: yydyzr<liuyuncong1@huawei.com> Co-authored-by: gcw_61YBRfIt<chuzhenxing@huawei.com> Co-authored-by: 孔炳翔<1120200577@qq.com> Co-authored-by: zhengxinqian<qianzhengxin@huawei.com> Co-authored-by: hw_whx<wanghexiang7@huawei.com> Co-authored-by: jgong5<steven.gong@gmail.com> Co-authored-by: hw_whx<2952154980@qq.com> # message auto-generated for no-merge-commit merge: !330 merge master into master 【同步】【非开发代码】代码从 develop 同步到 master Created-by: AvadaKedavrua Commit-by: liujiawang;ascend-robot;AvadaKedavrua;lutean;Horacehxw;eveyin1;minghang_c;zwt__;tt0cool;elrond-g;jia_ya_nan;zhenyu_zhang;ChenHuiwen;wangshen001;Hudingyi;wendellX;Secluded_Ocean;jhon-117;yaohan404;jiangruitao;zhenghaojie;stormchasingg;panyj1993;cmh1056291129;yuyinkai1;sunguozhong;genius52;liu_jiaxu;HongMaoShuiGuai;zhengxinqian;weixin_43368449;jsez-li-bin;jgong5;wqh17101;w00609794;yydyzr;JieZhang679;sppedforcy;gcw_61YBRfIt;Jiong Gong;hw_whx;gongjiong;孔炳翔 Merged-by: ascend-robot Description: 代码从 develop 同步到 master，后续基于 master 演进，并支持打包 See merge request: Ascend/msmodeling!330	15 天前
quantize_utils.py	【同步】【非开发代码】代码从 develop 同步到 master Co-authored-by: yydyzr<liuyuncong1@huawei.com> Co-authored-by: gcw_61YBRfIt<chuzhenxing@huawei.com> Co-authored-by: 孔炳翔<1120200577@qq.com> Co-authored-by: zhengxinqian<qianzhengxin@huawei.com> Co-authored-by: hw_whx<wanghexiang7@huawei.com> Co-authored-by: jgong5<steven.gong@gmail.com> Co-authored-by: hw_whx<2952154980@qq.com> # message auto-generated for no-merge-commit merge: !330 merge master into master 【同步】【非开发代码】代码从 develop 同步到 master Created-by: AvadaKedavrua Commit-by: liujiawang;ascend-robot;AvadaKedavrua;lutean;Horacehxw;eveyin1;minghang_c;zwt__;tt0cool;elrond-g;jia_ya_nan;zhenyu_zhang;ChenHuiwen;wangshen001;Hudingyi;wendellX;Secluded_Ocean;jhon-117;yaohan404;jiangruitao;zhenghaojie;stormchasingg;panyj1993;cmh1056291129;yuyinkai1;sunguozhong;genius52;liu_jiaxu;HongMaoShuiGuai;zhengxinqian;weixin_43368449;jsez-li-bin;jgong5;wqh17101;w00609794;yydyzr;JieZhang679;sppedforcy;gcw_61YBRfIt;Jiong Gong;hw_whx;gongjiong;孔炳翔 Merged-by: ascend-robot Description: 代码从 develop 同步到 master，后续基于 master 演进，并支持打包 See merge request: Ascend/msmodeling!330	15 天前
runtime.py	【同步】【非开发代码】代码从 develop 同步到 master Co-authored-by: yydyzr<liuyuncong1@huawei.com> Co-authored-by: gcw_61YBRfIt<chuzhenxing@huawei.com> Co-authored-by: 孔炳翔<1120200577@qq.com> Co-authored-by: zhengxinqian<qianzhengxin@huawei.com> Co-authored-by: hw_whx<wanghexiang7@huawei.com> Co-authored-by: jgong5<steven.gong@gmail.com> Co-authored-by: hw_whx<2952154980@qq.com> # message auto-generated for no-merge-commit merge: !330 merge master into master 【同步】【非开发代码】代码从 develop 同步到 master Created-by: AvadaKedavrua Commit-by: liujiawang;ascend-robot;AvadaKedavrua;lutean;Horacehxw;eveyin1;minghang_c;zwt__;tt0cool;elrond-g;jia_ya_nan;zhenyu_zhang;ChenHuiwen;wangshen001;Hudingyi;wendellX;Secluded_Ocean;jhon-117;yaohan404;jiangruitao;zhenghaojie;stormchasingg;panyj1993;cmh1056291129;yuyinkai1;sunguozhong;genius52;liu_jiaxu;HongMaoShuiGuai;zhengxinqian;weixin_43368449;jsez-li-bin;jgong5;wqh17101;w00609794;yydyzr;JieZhang679;sppedforcy;gcw_61YBRfIt;Jiong Gong;hw_whx;gongjiong;孔炳翔 Merged-by: ascend-robot Description: 代码从 develop 同步到 master，后续基于 master 演进，并支持打包 See merge request: Ascend/msmodeling!330	15 天前
utils.py	【同步】【非开发代码】代码从 develop 同步到 master Co-authored-by: yydyzr<liuyuncong1@huawei.com> Co-authored-by: gcw_61YBRfIt<chuzhenxing@huawei.com> Co-authored-by: 孔炳翔<1120200577@qq.com> Co-authored-by: zhengxinqian<qianzhengxin@huawei.com> Co-authored-by: hw_whx<wanghexiang7@huawei.com> Co-authored-by: jgong5<steven.gong@gmail.com> Co-authored-by: hw_whx<2952154980@qq.com> # message auto-generated for no-merge-commit merge: !330 merge master into master 【同步】【非开发代码】代码从 develop 同步到 master Created-by: AvadaKedavrua Commit-by: liujiawang;ascend-robot;AvadaKedavrua;lutean;Horacehxw;eveyin1;minghang_c;zwt__;tt0cool;elrond-g;jia_ya_nan;zhenyu_zhang;ChenHuiwen;wangshen001;Hudingyi;wendellX;Secluded_Ocean;jhon-117;yaohan404;jiangruitao;zhenghaojie;stormchasingg;panyj1993;cmh1056291129;yuyinkai1;sunguozhong;genius52;liu_jiaxu;HongMaoShuiGuai;zhengxinqian;weixin_43368449;jsez-li-bin;jgong5;wqh17101;w00609794;yydyzr;JieZhang679;sppedforcy;gcw_61YBRfIt;Jiong Gong;hw_whx;gongjiong;孔炳翔 Merged-by: ascend-robot Description: 代码从 develop 同步到 master，后续基于 master 演进，并支持打包 See merge request: Ascend/msmodeling!330	15 天前