msmodeling/tensor_cast/transformers · Ascend/MindStudio-Modeling - AtomGit

ascend-robotfix(tensor_cast): model MTP speculative decode shapes

文件	最后提交记录	最后更新时间
builtin_model	fix(tensor_cast): model MTP speculative decode shapes Co-authored-by: Horacehxw<horacehxw@gmail.com> # message auto-generated for no-merge-commit merge: !362 merge resolve-issue-130 into master fix(tensor_cast): model MTP speculative decode shapes Created-by: Horacehxw Commit-by: Horacehxw Merged-by: ascend-robot Description: PR Type / PR类型 - [x] Bugfix（Bug 修复） - [x] Refactor（代码重构） - [x] Test-Cases（测试用例更新） ## 🔍 Motivation / 变更动机 MTP（Multi-Token Prediction）speculative decode 在 TensorCast 仿真中，lm_head 和 sampler 的 row selection 逻辑存在问题： 1. target/proposal rows 混淆：MTP decode 时，lm_head 应该只处理 spec window 内的 target+bonus verification rows，而不是全量 packed rows。旧逻辑用 `selected_token_indices` 做 prefill 级的 token 裁剪，但无法区分 target 和 proposal，导致 lm_head 多算了不参与 verification 的行。 2. Sampler 不支持 spec decode 输出格式：旧 Sampler 只做 greedy argmax 取最后一个 token，不支持返回 `(num_requests, num_speculative_tokens + 1)` 形状的 target+bonus tokens。 3. Kimi K2.5 MTP path 全量过 lm_head：Kimi 的 monkey-patch 在 MTP text path 里把全部 hidden states 送进 lm_head（163840 vocab），prefill 时 12×7168×163840 的矩阵乘法被放大 ~3500×。 ------ ## 📝 Modification / 修改内容 ### 核心：统一 row selection 路径 - 新增 `SpecDecodeMetadata` dataclass，记录每个 batch 的 `logits_indices`、`num_active_requests`、`num_speculative_tokens`。 - 新增 `select_lm_head_hidden_states(hidden_states, sampling_metadata, mode)`： - `mode="target"`：选 verification window 全部行（给 lm_head 用） - `mode="proposal"`：只选每个 request 的最后一行（给 MTP predictor 用） - `CausalLmWrapper`、`VLModelWrapper`、`MultiTokenPredictor` 统一调用该函数，替代原来散落各处的 `index_select`。 ### Input Generator - `generate_inputs` / `generate_inputs_varlen` 在 MTP decode 时构造 `SpecDecodeMetadata`，只覆盖每个 request 尾部的 spec window rows。 - 短窗口（query_len < num_mtp_tokens + 1）自动 fallback 到普通 decode selection。 ### Sampler - 识别 `spec_decode_metadata` 后，将 verification logits reshape 为 `(num_requests, spec_window, vocab)`，分别对 target 和 bonus 做 greedy argmax，返回 `(num_requests, spec_window)` 形状。 - 兼容 proposal rows（MTP 后续层）和旧的 `selected_token_indices` prefill 路径。 ### Kimi K2.5 - MTP text path 拆分：先跑 language model body 拿到 full hidden states（rotary/proposal 需要），再用 `select_lm_head_hidden_states` 裁 target rows 后过 lm_head。避免 163840-vocab 的全量投影。 - 删除旧的 `MultiTokenPredictorLayer` tuple-unpack monkey patch（已在上游 mtp.py 修复）。 ------ ## 📐 Associated Test Results / 关联测试结果 `pytest: 26 passed, 2 warnings in 3.72s` 覆盖： - target/proposal row selection 不混用 - default `selected_token_indices=-1` sentinel 被正确忽略 - wrong logits_indices length → ValueError - spec-decode sampler 返回 target+bonus tokens - CausalLmWrapper target row 投影 - MTP wrapper prefill fallback / spec-decode bonus token forward - fixed/varlen input generator MTP metadata 生成 - Kimi text path target rows 在 internal lm_head 前裁剪 - Kimi default sentinel 不触发 fast path ------ ## ✅ Checklist / 检查列表 - [x] Linting tools used / 使用 lintrunner 工具 - [x] Bug fixes covered by unit tests / 修复的 Bug 已由单元测试覆盖 - [x] Modification covered by unit tests / 修改已由单元测试覆盖 - [ ] Documentation updated / 文档已更新 - [x] No Chinese comments in code files / 代码文件中不含中文注释 See merge request: Ascend/msmodeling!362	1 天前
__init__.py	【同步】【非开发代码】代码从 develop 同步到 master Co-authored-by: yydyzr<liuyuncong1@huawei.com> Co-authored-by: gcw_61YBRfIt<chuzhenxing@huawei.com> Co-authored-by: 孔炳翔<1120200577@qq.com> Co-authored-by: zhengxinqian<qianzhengxin@huawei.com> Co-authored-by: hw_whx<wanghexiang7@huawei.com> Co-authored-by: jgong5<steven.gong@gmail.com> Co-authored-by: hw_whx<2952154980@qq.com> # message auto-generated for no-merge-commit merge: !330 merge master into master 【同步】【非开发代码】代码从 develop 同步到 master Created-by: AvadaKedavrua Commit-by: liujiawang;ascend-robot;AvadaKedavrua;lutean;Horacehxw;eveyin1;minghang_c;zwt__;tt0cool;elrond-g;jia_ya_nan;zhenyu_zhang;ChenHuiwen;wangshen001;Hudingyi;wendellX;Secluded_Ocean;jhon-117;yaohan404;jiangruitao;zhenghaojie;stormchasingg;panyj1993;cmh1056291129;yuyinkai1;sunguozhong;genius52;liu_jiaxu;HongMaoShuiGuai;zhengxinqian;weixin_43368449;jsez-li-bin;jgong5;wqh17101;w00609794;yydyzr;JieZhang679;sppedforcy;gcw_61YBRfIt;Jiong Gong;hw_whx;gongjiong;孔炳翔 Merged-by: ascend-robot Description: 代码从 develop 同步到 master，后续基于 master 演进，并支持打包 See merge request: Ascend/msmodeling!330	15 天前
custom_model_registry.py	【同步】【非开发代码】代码从 develop 同步到 master Co-authored-by: yydyzr<liuyuncong1@huawei.com> Co-authored-by: gcw_61YBRfIt<chuzhenxing@huawei.com> Co-authored-by: 孔炳翔<1120200577@qq.com> Co-authored-by: zhengxinqian<qianzhengxin@huawei.com> Co-authored-by: hw_whx<wanghexiang7@huawei.com> Co-authored-by: jgong5<steven.gong@gmail.com> Co-authored-by: hw_whx<2952154980@qq.com> # message auto-generated for no-merge-commit merge: !330 merge master into master 【同步】【非开发代码】代码从 develop 同步到 master Created-by: AvadaKedavrua Commit-by: liujiawang;ascend-robot;AvadaKedavrua;lutean;Horacehxw;eveyin1;minghang_c;zwt__;tt0cool;elrond-g;jia_ya_nan;zhenyu_zhang;ChenHuiwen;wangshen001;Hudingyi;wendellX;Secluded_Ocean;jhon-117;yaohan404;jiangruitao;zhenghaojie;stormchasingg;panyj1993;cmh1056291129;yuyinkai1;sunguozhong;genius52;liu_jiaxu;HongMaoShuiGuai;zhengxinqian;weixin_43368449;jsez-li-bin;jgong5;wqh17101;w00609794;yydyzr;JieZhang679;sppedforcy;gcw_61YBRfIt;Jiong Gong;hw_whx;gongjiong;孔炳翔 Merged-by: ascend-robot Description: 代码从 develop 同步到 master，后续基于 master 演进，并支持打包 See merge request: Ascend/msmodeling!330	15 天前
model.py	fix(tensor_cast): model MTP speculative decode shapes Co-authored-by: Horacehxw<horacehxw@gmail.com> # message auto-generated for no-merge-commit merge: !362 merge resolve-issue-130 into master fix(tensor_cast): model MTP speculative decode shapes Created-by: Horacehxw Commit-by: Horacehxw Merged-by: ascend-robot Description: PR Type / PR类型 - [x] Bugfix（Bug 修复） - [x] Refactor（代码重构） - [x] Test-Cases（测试用例更新） ## 🔍 Motivation / 变更动机 MTP（Multi-Token Prediction）speculative decode 在 TensorCast 仿真中，lm_head 和 sampler 的 row selection 逻辑存在问题： 1. target/proposal rows 混淆：MTP decode 时，lm_head 应该只处理 spec window 内的 target+bonus verification rows，而不是全量 packed rows。旧逻辑用 `selected_token_indices` 做 prefill 级的 token 裁剪，但无法区分 target 和 proposal，导致 lm_head 多算了不参与 verification 的行。 2. Sampler 不支持 spec decode 输出格式：旧 Sampler 只做 greedy argmax 取最后一个 token，不支持返回 `(num_requests, num_speculative_tokens + 1)` 形状的 target+bonus tokens。 3. Kimi K2.5 MTP path 全量过 lm_head：Kimi 的 monkey-patch 在 MTP text path 里把全部 hidden states 送进 lm_head（163840 vocab），prefill 时 12×7168×163840 的矩阵乘法被放大 ~3500×。 ------ ## 📝 Modification / 修改内容 ### 核心：统一 row selection 路径 - 新增 `SpecDecodeMetadata` dataclass，记录每个 batch 的 `logits_indices`、`num_active_requests`、`num_speculative_tokens`。 - 新增 `select_lm_head_hidden_states(hidden_states, sampling_metadata, mode)`： - `mode="target"`：选 verification window 全部行（给 lm_head 用） - `mode="proposal"`：只选每个 request 的最后一行（给 MTP predictor 用） - `CausalLmWrapper`、`VLModelWrapper`、`MultiTokenPredictor` 统一调用该函数，替代原来散落各处的 `index_select`。 ### Input Generator - `generate_inputs` / `generate_inputs_varlen` 在 MTP decode 时构造 `SpecDecodeMetadata`，只覆盖每个 request 尾部的 spec window rows。 - 短窗口（query_len < num_mtp_tokens + 1）自动 fallback 到普通 decode selection。 ### Sampler - 识别 `spec_decode_metadata` 后，将 verification logits reshape 为 `(num_requests, spec_window, vocab)`，分别对 target 和 bonus 做 greedy argmax，返回 `(num_requests, spec_window)` 形状。 - 兼容 proposal rows（MTP 后续层）和旧的 `selected_token_indices` prefill 路径。 ### Kimi K2.5 - MTP text path 拆分：先跑 language model body 拿到 full hidden states（rotary/proposal 需要），再用 `select_lm_head_hidden_states` 裁 target rows 后过 lm_head。避免 163840-vocab 的全量投影。 - 删除旧的 `MultiTokenPredictorLayer` tuple-unpack monkey patch（已在上游 mtp.py 修复）。 ------ ## 📐 Associated Test Results / 关联测试结果 `pytest: 26 passed, 2 warnings in 3.72s` 覆盖： - target/proposal row selection 不混用 - default `selected_token_indices=-1` sentinel 被正确忽略 - wrong logits_indices length → ValueError - spec-decode sampler 返回 target+bonus tokens - CausalLmWrapper target row 投影 - MTP wrapper prefill fallback / spec-decode bonus token forward - fixed/varlen input generator MTP metadata 生成 - Kimi text path target rows 在 internal lm_head 前裁剪 - Kimi default sentinel 不触发 fast path ------ ## ✅ Checklist / 检查列表 - [x] Linting tools used / 使用 lintrunner 工具 - [x] Bug fixes covered by unit tests / 修复的 Bug 已由单元测试覆盖 - [x] Modification covered by unit tests / 修改已由单元测试覆盖 - [ ] Documentation updated / 文档已更新 - [x] No Chinese comments in code files / 代码文件中不含中文注释 See merge request: Ascend/msmodeling!362	1 天前
transformations.py	fix(tensor_cast): fix shared expert tensor parallelism mismatch and routing alignment for DeepSeek V4 Co-authored-by: jhon-117<fangkai15@huawei.com> # message auto-generated for no-merge-commit merge: !414 merge bugfix/20260624-text-generate-fix into master fix(tensor_cast): fix shared expert tensor parallelism mismatch and routing alignment for DeepSeek V4 Created-by: jhon-117 Commit-by: jhon-117 Merged-by: ascend-robot Description: ## Description This PR fixes an execution crash and incorrect computation behavior when running DeepSeek V4 with `--enable-shared-expert-tp`. ### Background & Root Causes When simulating DeepSeek V4 with shared expert tensor parallelism enabled, the model threw an expected `1564672 elements, got 12517376` (an 8x difference matching `tp_size=8`) shape mismatch error and caused a TorchDynamo Graph Break. This was traced back to three interrelated issues: 1. Routing Slice Misalignment (`moe_layer.py`): DeepSeek V4 utilizes a hash-based gate router requiring `route_after_dp_transform=False`. In the manual routing execution branch for shared expert TP, `route` was correctly invoked before the `hidden_states` DP slice (`_dp_transform_enter`), but the resulting `topk_indices` and `topk_weights` were never subsequently sliced. This resulted in the token dispatching logic receiving 1/8th of the tokens alongside an unsliced 8/8 routing matrix, blowing up the combined tensor shape. 2. Shared Experts TP Match Failure (`transformations.py`): The `tp_plan` matching pattern for shared experts explicitly looked for `..mlp.fused_moe.shared_experts.gate_proj`. However, DeepSeek V4 mounts its shared experts directly under `mlp.shared_experts` as a standard MLP block. Because the regex failed to match, the shared experts were executed densely (unsliced) and mistakenly accumulated across the DP/TP domains. 3. Graph Break (`moe_layer.py`): The fallback safety `logger.warning` checking for the aforementioned shape mismatch triggered a `torch._dynamo.exc.Unsupported` graph break, preventing full-graph compilation. ### Proposed Changes - Fix topk tensors slicing:* Invoked `_dp_transform_enter(topk_indices)` and `_dp_transform_enter(topk_weights)` when `route_after_dp` is False, ensuring token alignment strictly matches the `hidden_states` sequences. - Broaden shared experts match pattern: Updated the shared experts matching rule in `tp_plan` from `f"{prefix}..mlp.fused_moe.shared_experts.gate_proj"` to `f"{prefix}..shared_expert.gate_proj"`, perfectly accommodating both DeepSeek V4 and standard architectural layouts. - Safeguard Dynamo Compilation:* Wrapped the shape mismatch `logger.warning` in a `if not torch.compiler.is_compiling():` block. This eliminates compilation graph breaks while preserving the log for eager mode debugging. ### Testing - [x] Verified `cli.inference.text_generate "deepseek-ai/DeepSeek-V4-Flash"` with `--enable-shared-expert-tp` successfully compiles and outputs correct TPS metrics. - [x] Confirmed `torch.compile` finishes without graph breaks related to the `logger.warning` call. #### 关于测试用例补充现有测试不需要修改：并没有改变接口协议（API）或算子输出的预期形状规则，而是修复了在某个特定配置下的内部对齐错误。是否补充新测试：针对 enable_shared_expert_tp=True 且 route_after_dp_transform=False 这种极度特定的组合条件，已有 cli.inference.text_generate 的端到端（E2E）全量仿真流，可完全看护。总体建议是：目前不需要特意补充。现有的修复已经通过了完整的端到端仿真，并且相关模块的回归测试也全部通过了。 ![image.png](https://raw.gitcode.com/user-images/assets/8428112/9f0000d9-e6c9-49ab-80f6-a5cca5dbfa09/image.png 'image.png') See merge request: Ascend/msmodeling!414	2 天前
utils.py	fix(security): add model source safety checks Co-authored-by: jia_ya_nan<jiayanan3@h-partners.com> # message auto-generated for no-merge-commit merge: !385 merge fix/trust-remote-code-safety into master fix(security): add model source safety checks Created-by: jia_ya_nan Commit-by: jia_ya_nan Merged-by: ascend-robot Description: # PR Template Thanks for your contribution; we appreciate it a lot. The following instructions will make your pull request healthier and help you get feedback more easily. If you do not understand some items, don't worry, just make the pull request and seek help from maintainers. 感谢您的贡献，我们非常重视。以下说明将使您的拉取请求更健康，更易于获得反馈。如果您不理解某些项目，请不要担心，只需提交拉取请求并从维护人员那里寻求帮助即可。 PR Type / PR类型 - [ ] Feature（功能新增） - [x] Bugfix（Bug 修复） - [ ] Docs（文档更新） - [ ] CI/CD（持续集成/持续部署） - [ ] Refactor（代码重构） - [ ] Perf（性能优化） - [ ] Test-Cases（测试用例更新） - [x] Other（其他） ## 🔍 Motivation / 变更动机 Please describe the motivation of this PR and the goal you want to achieve through this PR. 请描述您的拉取请求的动机和您希望通过此拉取请求实现的目标。安全加固 ------ ## 📝 Modification / 修改内容 Please briefly describe what modification is made in this PR. 请简要描述此拉取请求中进行的修改。增加本地路径权限校验；增加日志风险提示去掉不维护的老接口 ------ ## 📐 Associated Test Results / 关联测试结果 Please provide the related test results, such as test reports, etc. 请提供相关测试结果，例如测试报告等。 ![image.png](https://raw.gitcode.com/user-images/assets/8428112/ef4f75a5-1346-4320-8de2-a19703ebedb3/image.png 'image.png') ------ ## 🌟 Use cases (Optional) / 使用案例（可选） If this PR introduces a new feature, it is better to list some use cases here and update the documentation. 如果此拉取请求引入了新功能，最好在此处列出一些用例并更新文档。 ------ ## ✅ Checklist / 检查列表 Before PR: - [x] Bug fixes are fully covered by unit tests, the case that causes the bug should be added in the unit tests. / 修复的 Bug 已完全由单元测试覆盖，导致 Bug 的情况应在单元测试中添加。 - [x] The modification is covered by complete unit tests. If not, please add more unit tests to ensure the correctness. / 此拉取请求中的修改已完全由单元测试覆盖。如果不是，请添加更多单元测试以确保正确性。 - [x] All relevant documentation (API docs, docstrings, example tutorials) has been updated to reflect these changes. / 所有相关文档（API 文档、文档字符串、示例教程）已更新以反映这些更改。 - [x] Please ensure code files contain no Chinese comments. / 请保证代码文件中不含中文注释。 ------ See merge request: Ascend/msmodeling!385	3 天前