| 文件 | 最后提交记录 | 最后更新时间 |
|---|---|---|
【同步】【非开发代码】代码从 develop 同步到 master Co-authored-by: yydyzr<liuyuncong1@huawei.com> Co-authored-by: gcw_61YBRfIt<chuzhenxing@huawei.com> Co-authored-by: 孔炳翔<1120200577@qq.com> Co-authored-by: zhengxinqian<qianzhengxin@huawei.com> Co-authored-by: hw_whx<wanghexiang7@huawei.com> Co-authored-by: jgong5<steven.gong@gmail.com> Co-authored-by: hw_whx<2952154980@qq.com> # message auto-generated for no-merge-commit merge: !330 merge master into master 【同步】【非开发代码】代码从 develop 同步到 master Created-by: AvadaKedavrua Commit-by: liujiawang;ascend-robot;AvadaKedavrua;lutean;Horacehxw;eveyin1;minghang_c;zwt__;tt0cool;elrond-g;jia_ya_nan;zhenyu_zhang;ChenHuiwen;wangshen001;Hudingyi;wendellX;Secluded_Ocean;jhon-117;yaohan404;jiangruitao;zhenghaojie;stormchasingg;panyj1993;cmh1056291129;yuyinkai1;sunguozhong;genius52;liu_jiaxu;HongMaoShuiGuai;zhengxinqian;weixin_43368449;jsez-li-bin;jgong5;wqh17101;w00609794;yydyzr;JieZhang679;sppedforcy;gcw_61YBRfIt;Jiong Gong;hw_whx;gongjiong;孔炳翔 Merged-by: ascend-robot Description: 代码从 develop 同步到 master,后续基于 master 演进,并支持打包 See merge request: Ascend/msmodeling!330 | 15 天前 | |
【同步】【非开发代码】代码从 develop 同步到 master Co-authored-by: yydyzr<liuyuncong1@huawei.com> Co-authored-by: gcw_61YBRfIt<chuzhenxing@huawei.com> Co-authored-by: 孔炳翔<1120200577@qq.com> Co-authored-by: zhengxinqian<qianzhengxin@huawei.com> Co-authored-by: hw_whx<wanghexiang7@huawei.com> Co-authored-by: jgong5<steven.gong@gmail.com> Co-authored-by: hw_whx<2952154980@qq.com> # message auto-generated for no-merge-commit merge: !330 merge master into master 【同步】【非开发代码】代码从 develop 同步到 master Created-by: AvadaKedavrua Commit-by: liujiawang;ascend-robot;AvadaKedavrua;lutean;Horacehxw;eveyin1;minghang_c;zwt__;tt0cool;elrond-g;jia_ya_nan;zhenyu_zhang;ChenHuiwen;wangshen001;Hudingyi;wendellX;Secluded_Ocean;jhon-117;yaohan404;jiangruitao;zhenghaojie;stormchasingg;panyj1993;cmh1056291129;yuyinkai1;sunguozhong;genius52;liu_jiaxu;HongMaoShuiGuai;zhengxinqian;weixin_43368449;jsez-li-bin;jgong5;wqh17101;w00609794;yydyzr;JieZhang679;sppedforcy;gcw_61YBRfIt;Jiong Gong;hw_whx;gongjiong;孔炳翔 Merged-by: ascend-robot Description: 代码从 develop 同步到 master,后续基于 master 演进,并支持打包 See merge request: Ascend/msmodeling!330 | 15 天前 | |
fix(tensor_cast): fix shared expert tensor parallelism mismatch and routing alignment for DeepSeek V4 Co-authored-by: jhon-117<fangkai15@huawei.com> # message auto-generated for no-merge-commit merge: !414 merge bugfix/20260624-text-generate-fix into master fix(tensor_cast): fix shared expert tensor parallelism mismatch and routing alignment for DeepSeek V4 Created-by: jhon-117 Commit-by: jhon-117 Merged-by: ascend-robot Description: ## Description This PR fixes an execution crash and incorrect computation behavior when running DeepSeek V4 with --enable-shared-expert-tp. ### Background & Root Causes When simulating DeepSeek V4 with shared expert tensor parallelism enabled, the model threw an expected 1564672 elements, got 12517376 (an 8x difference matching tp_size=8) shape mismatch error and caused a TorchDynamo Graph Break. This was traced back to three interrelated issues: 1. **Routing Slice Misalignment (moe_layer.py)**: DeepSeek V4 utilizes a hash-based gate router requiring route_after_dp_transform=False. In the manual routing execution branch for shared expert TP, route was correctly invoked before the hidden_states DP slice (_dp_transform_enter), but the resulting topk_indices and topk_weights were never subsequently sliced. This resulted in the token dispatching logic receiving 1/8th of the tokens alongside an unsliced 8/8 routing matrix, blowing up the combined tensor shape. 2. **Shared Experts TP Match Failure (transformations.py)**: The tp_plan matching pattern for shared experts explicitly looked for .*.mlp.fused_moe.shared_experts.gate_proj. However, DeepSeek V4 mounts its shared experts directly under mlp.shared_experts as a standard MLP block. Because the regex failed to match, the shared experts were executed densely (unsliced) and mistakenly accumulated across the DP/TP domains. 3. **Graph Break (moe_layer.py)**: The fallback safety logger.warning checking for the aforementioned shape mismatch triggered a torch._dynamo.exc.Unsupported graph break, preventing full-graph compilation. ### Proposed Changes - **Fix topk tensors slicing:** Invoked _dp_transform_enter(topk_indices) and _dp_transform_enter(topk_weights) when route_after_dp is False, ensuring token alignment strictly matches the hidden_states sequences. - **Broaden shared experts match pattern:** Updated the shared experts matching rule in tp_plan from f"{prefix}.*.mlp.fused_moe.shared_experts.gate_proj" to f"{prefix}.*.shared_expert*.gate_proj", perfectly accommodating both DeepSeek V4 and standard architectural layouts. - **Safeguard Dynamo Compilation:** Wrapped the shape mismatch logger.warning in a if not torch.compiler.is_compiling(): block. This eliminates compilation graph breaks while preserving the log for eager mode debugging. ### Testing - [x] Verified cli.inference.text_generate "deepseek-ai/DeepSeek-V4-Flash" with --enable-shared-expert-tp successfully compiles and outputs correct TPS metrics. - [x] Confirmed torch.compile finishes without graph breaks related to the logger.warning call. #### 关于测试用例补充 现有测试不需要修改:并没有改变接口协议(API)或算子输出的预期形状规则,而是修复了在某个特定配置下的内部对齐错误。 是否补充新测试:针对 enable_shared_expert_tp=True 且 route_after_dp_transform=False 这种极度特定的组合条件,已有 cli.inference.text_generate 的端到端(E2E)全量仿真流,可完全看护。 总体建议是:目前不需要特意补充。 现有的修复已经通过了完整的端到端仿真,并且相关模块的回归测试也全部通过了。  See merge request: Ascend/msmodeling!414 | 2 天前 | |
【同步】【非开发代码】代码从 develop 同步到 master Co-authored-by: yydyzr<liuyuncong1@huawei.com> Co-authored-by: gcw_61YBRfIt<chuzhenxing@huawei.com> Co-authored-by: 孔炳翔<1120200577@qq.com> Co-authored-by: zhengxinqian<qianzhengxin@huawei.com> Co-authored-by: hw_whx<wanghexiang7@huawei.com> Co-authored-by: jgong5<steven.gong@gmail.com> Co-authored-by: hw_whx<2952154980@qq.com> # message auto-generated for no-merge-commit merge: !330 merge master into master 【同步】【非开发代码】代码从 develop 同步到 master Created-by: AvadaKedavrua Commit-by: liujiawang;ascend-robot;AvadaKedavrua;lutean;Horacehxw;eveyin1;minghang_c;zwt__;tt0cool;elrond-g;jia_ya_nan;zhenyu_zhang;ChenHuiwen;wangshen001;Hudingyi;wendellX;Secluded_Ocean;jhon-117;yaohan404;jiangruitao;zhenghaojie;stormchasingg;panyj1993;cmh1056291129;yuyinkai1;sunguozhong;genius52;liu_jiaxu;HongMaoShuiGuai;zhengxinqian;weixin_43368449;jsez-li-bin;jgong5;wqh17101;w00609794;yydyzr;JieZhang679;sppedforcy;gcw_61YBRfIt;Jiong Gong;hw_whx;gongjiong;孔炳翔 Merged-by: ascend-robot Description: 代码从 develop 同步到 master,后续基于 master 演进,并支持打包 See merge request: Ascend/msmodeling!330 | 15 天前 | |
【同步】【非开发代码】代码从 develop 同步到 master Co-authored-by: yydyzr<liuyuncong1@huawei.com> Co-authored-by: gcw_61YBRfIt<chuzhenxing@huawei.com> Co-authored-by: 孔炳翔<1120200577@qq.com> Co-authored-by: zhengxinqian<qianzhengxin@huawei.com> Co-authored-by: hw_whx<wanghexiang7@huawei.com> Co-authored-by: jgong5<steven.gong@gmail.com> Co-authored-by: hw_whx<2952154980@qq.com> # message auto-generated for no-merge-commit merge: !330 merge master into master 【同步】【非开发代码】代码从 develop 同步到 master Created-by: AvadaKedavrua Commit-by: liujiawang;ascend-robot;AvadaKedavrua;lutean;Horacehxw;eveyin1;minghang_c;zwt__;tt0cool;elrond-g;jia_ya_nan;zhenyu_zhang;ChenHuiwen;wangshen001;Hudingyi;wendellX;Secluded_Ocean;jhon-117;yaohan404;jiangruitao;zhenghaojie;stormchasingg;panyj1993;cmh1056291129;yuyinkai1;sunguozhong;genius52;liu_jiaxu;HongMaoShuiGuai;zhengxinqian;weixin_43368449;jsez-li-bin;jgong5;wqh17101;w00609794;yydyzr;JieZhang679;sppedforcy;gcw_61YBRfIt;Jiong Gong;hw_whx;gongjiong;孔炳翔 Merged-by: ascend-robot Description: 代码从 develop 同步到 master,后续基于 master 演进,并支持打包 See merge request: Ascend/msmodeling!330 | 15 天前 | |
【同步】【非开发代码】代码从 develop 同步到 master Co-authored-by: yydyzr<liuyuncong1@huawei.com> Co-authored-by: gcw_61YBRfIt<chuzhenxing@huawei.com> Co-authored-by: 孔炳翔<1120200577@qq.com> Co-authored-by: zhengxinqian<qianzhengxin@huawei.com> Co-authored-by: hw_whx<wanghexiang7@huawei.com> Co-authored-by: jgong5<steven.gong@gmail.com> Co-authored-by: hw_whx<2952154980@qq.com> # message auto-generated for no-merge-commit merge: !330 merge master into master 【同步】【非开发代码】代码从 develop 同步到 master Created-by: AvadaKedavrua Commit-by: liujiawang;ascend-robot;AvadaKedavrua;lutean;Horacehxw;eveyin1;minghang_c;zwt__;tt0cool;elrond-g;jia_ya_nan;zhenyu_zhang;ChenHuiwen;wangshen001;Hudingyi;wendellX;Secluded_Ocean;jhon-117;yaohan404;jiangruitao;zhenghaojie;stormchasingg;panyj1993;cmh1056291129;yuyinkai1;sunguozhong;genius52;liu_jiaxu;HongMaoShuiGuai;zhengxinqian;weixin_43368449;jsez-li-bin;jgong5;wqh17101;w00609794;yydyzr;JieZhang679;sppedforcy;gcw_61YBRfIt;Jiong Gong;hw_whx;gongjiong;孔炳翔 Merged-by: ascend-robot Description: 代码从 develop 同步到 master,后续基于 master 演进,并支持打包 See merge request: Ascend/msmodeling!330 | 15 天前 | |
fix(tensor_cast): fix shared expert tensor parallelism mismatch and routing alignment for DeepSeek V4 Co-authored-by: jhon-117<fangkai15@huawei.com> # message auto-generated for no-merge-commit merge: !414 merge bugfix/20260624-text-generate-fix into master fix(tensor_cast): fix shared expert tensor parallelism mismatch and routing alignment for DeepSeek V4 Created-by: jhon-117 Commit-by: jhon-117 Merged-by: ascend-robot Description: ## Description This PR fixes an execution crash and incorrect computation behavior when running DeepSeek V4 with --enable-shared-expert-tp. ### Background & Root Causes When simulating DeepSeek V4 with shared expert tensor parallelism enabled, the model threw an expected 1564672 elements, got 12517376 (an 8x difference matching tp_size=8) shape mismatch error and caused a TorchDynamo Graph Break. This was traced back to three interrelated issues: 1. **Routing Slice Misalignment (moe_layer.py)**: DeepSeek V4 utilizes a hash-based gate router requiring route_after_dp_transform=False. In the manual routing execution branch for shared expert TP, route was correctly invoked before the hidden_states DP slice (_dp_transform_enter), but the resulting topk_indices and topk_weights were never subsequently sliced. This resulted in the token dispatching logic receiving 1/8th of the tokens alongside an unsliced 8/8 routing matrix, blowing up the combined tensor shape. 2. **Shared Experts TP Match Failure (transformations.py)**: The tp_plan matching pattern for shared experts explicitly looked for .*.mlp.fused_moe.shared_experts.gate_proj. However, DeepSeek V4 mounts its shared experts directly under mlp.shared_experts as a standard MLP block. Because the regex failed to match, the shared experts were executed densely (unsliced) and mistakenly accumulated across the DP/TP domains. 3. **Graph Break (moe_layer.py)**: The fallback safety logger.warning checking for the aforementioned shape mismatch triggered a torch._dynamo.exc.Unsupported graph break, preventing full-graph compilation. ### Proposed Changes - **Fix topk tensors slicing:** Invoked _dp_transform_enter(topk_indices) and _dp_transform_enter(topk_weights) when route_after_dp is False, ensuring token alignment strictly matches the hidden_states sequences. - **Broaden shared experts match pattern:** Updated the shared experts matching rule in tp_plan from f"{prefix}.*.mlp.fused_moe.shared_experts.gate_proj" to f"{prefix}.*.shared_expert*.gate_proj", perfectly accommodating both DeepSeek V4 and standard architectural layouts. - **Safeguard Dynamo Compilation:** Wrapped the shape mismatch logger.warning in a if not torch.compiler.is_compiling(): block. This eliminates compilation graph breaks while preserving the log for eager mode debugging. ### Testing - [x] Verified cli.inference.text_generate "deepseek-ai/DeepSeek-V4-Flash" with --enable-shared-expert-tp successfully compiles and outputs correct TPS metrics. - [x] Confirmed torch.compile finishes without graph breaks related to the logger.warning call. #### 关于测试用例补充 现有测试不需要修改:并没有改变接口协议(API)或算子输出的预期形状规则,而是修复了在某个特定配置下的内部对齐错误。 是否补充新测试:针对 enable_shared_expert_tp=True 且 route_after_dp_transform=False 这种极度特定的组合条件,已有 cli.inference.text_generate 的端到端(E2E)全量仿真流,可完全看护。 总体建议是:目前不需要特意补充。 现有的修复已经通过了完整的端到端仿真,并且相关模块的回归测试也全部通过了。  See merge request: Ascend/msmodeling!414 | 2 天前 | |
fix(tensor_cast): model MTP speculative decode shapes Co-authored-by: Horacehxw<horacehxw@gmail.com> # message auto-generated for no-merge-commit merge: !362 merge resolve-issue-130 into master fix(tensor_cast): model MTP speculative decode shapes Created-by: Horacehxw Commit-by: Horacehxw Merged-by: ascend-robot Description: **PR Type / PR类型** - [x] Bugfix(Bug 修复) - [x] Refactor(代码重构) - [x] Test-Cases(测试用例更新) ## 🔍 Motivation / 变更动机 MTP(Multi-Token Prediction)speculative decode 在 TensorCast 仿真中,lm_head 和 sampler 的 row selection 逻辑存在问题: 1. **target/proposal rows 混淆**:MTP decode 时,lm_head 应该只处理 spec window 内的 target+bonus verification rows,而不是全量 packed rows。旧逻辑用 selected_token_indices 做 prefill 级的 token 裁剪,但无法区分 target 和 proposal,导致 lm_head 多算了不参与 verification 的行。 2. **Sampler 不支持 spec decode 输出格式**:旧 Sampler 只做 greedy argmax 取最后一个 token,不支持返回 (num_requests, num_speculative_tokens + 1) 形状的 target+bonus tokens。 3. **Kimi K2.5 MTP path 全量过 lm_head**:Kimi 的 monkey-patch 在 MTP text path 里把全部 hidden states 送进 lm_head(163840 vocab),prefill 时 12×7168×163840 的矩阵乘法被放大 ~3500×。 ------ ## 📝 Modification / 修改内容 ### 核心:统一 row selection 路径 - 新增 SpecDecodeMetadata dataclass,记录每个 batch 的 logits_indices、num_active_requests、num_speculative_tokens。 - 新增 select_lm_head_hidden_states(hidden_states, sampling_metadata, mode): - mode="target":选 verification window 全部行(给 lm_head 用) - mode="proposal":只选每个 request 的最后一行(给 MTP predictor 用) - CausalLmWrapper、VLModelWrapper、MultiTokenPredictor 统一调用该函数,替代原来散落各处的 index_select。 ### Input Generator - generate_inputs / generate_inputs_varlen 在 MTP decode 时构造 SpecDecodeMetadata,只覆盖每个 request 尾部的 spec window rows。 - 短窗口(query_len < num_mtp_tokens + 1)自动 fallback 到普通 decode selection。 ### Sampler - 识别 spec_decode_metadata 后,将 verification logits reshape 为 (num_requests, spec_window, vocab),分别对 target 和 bonus 做 greedy argmax,返回 (num_requests, spec_window) 形状。 - 兼容 proposal rows(MTP 后续层)和旧的 selected_token_indices prefill 路径。 ### Kimi K2.5 - MTP text path 拆分:先跑 language model body 拿到 full hidden states(rotary/proposal 需要),再用 select_lm_head_hidden_states 裁 target rows 后过 lm_head。避免 163840-vocab 的全量投影。 - 删除旧的 MultiTokenPredictorLayer tuple-unpack monkey patch(已在上游 mtp.py 修复)。 ------ ## 📐 Associated Test Results / 关联测试结果 pytest: 26 passed, 2 warnings in 3.72s 覆盖: - target/proposal row selection 不混用 - default selected_token_indices=-1 sentinel 被正确忽略 - wrong logits_indices length → ValueError - spec-decode sampler 返回 target+bonus tokens - CausalLmWrapper target row 投影 - MTP wrapper prefill fallback / spec-decode bonus token forward - fixed/varlen input generator MTP metadata 生成 - Kimi text path target rows 在 internal lm_head 前裁剪 - Kimi default sentinel 不触发 fast path ------ ## ✅ Checklist / 检查列表 - [x] Linting tools used / 使用 lintrunner 工具 - [x] Bug fixes covered by unit tests / 修复的 Bug 已由单元测试覆盖 - [x] Modification covered by unit tests / 修改已由单元测试覆盖 - [ ] Documentation updated / 文档已更新 - [x] No Chinese comments in code files / 代码文件中不含中文注释 See merge request: Ascend/msmodeling!362 | 1 天前 | |
【同步】【非开发代码】代码从 develop 同步到 master Co-authored-by: yydyzr<liuyuncong1@huawei.com> Co-authored-by: gcw_61YBRfIt<chuzhenxing@huawei.com> Co-authored-by: 孔炳翔<1120200577@qq.com> Co-authored-by: zhengxinqian<qianzhengxin@huawei.com> Co-authored-by: hw_whx<wanghexiang7@huawei.com> Co-authored-by: jgong5<steven.gong@gmail.com> Co-authored-by: hw_whx<2952154980@qq.com> # message auto-generated for no-merge-commit merge: !330 merge master into master 【同步】【非开发代码】代码从 develop 同步到 master Created-by: AvadaKedavrua Commit-by: liujiawang;ascend-robot;AvadaKedavrua;lutean;Horacehxw;eveyin1;minghang_c;zwt__;tt0cool;elrond-g;jia_ya_nan;zhenyu_zhang;ChenHuiwen;wangshen001;Hudingyi;wendellX;Secluded_Ocean;jhon-117;yaohan404;jiangruitao;zhenghaojie;stormchasingg;panyj1993;cmh1056291129;yuyinkai1;sunguozhong;genius52;liu_jiaxu;HongMaoShuiGuai;zhengxinqian;weixin_43368449;jsez-li-bin;jgong5;wqh17101;w00609794;yydyzr;JieZhang679;sppedforcy;gcw_61YBRfIt;Jiong Gong;hw_whx;gongjiong;孔炳翔 Merged-by: ascend-robot Description: 代码从 develop 同步到 master,后续基于 master 演进,并支持打包 See merge request: Ascend/msmodeling!330 | 15 天前 | |
【同步】【非开发代码】代码从 develop 同步到 master Co-authored-by: yydyzr<liuyuncong1@huawei.com> Co-authored-by: gcw_61YBRfIt<chuzhenxing@huawei.com> Co-authored-by: 孔炳翔<1120200577@qq.com> Co-authored-by: zhengxinqian<qianzhengxin@huawei.com> Co-authored-by: hw_whx<wanghexiang7@huawei.com> Co-authored-by: jgong5<steven.gong@gmail.com> Co-authored-by: hw_whx<2952154980@qq.com> # message auto-generated for no-merge-commit merge: !330 merge master into master 【同步】【非开发代码】代码从 develop 同步到 master Created-by: AvadaKedavrua Commit-by: liujiawang;ascend-robot;AvadaKedavrua;lutean;Horacehxw;eveyin1;minghang_c;zwt__;tt0cool;elrond-g;jia_ya_nan;zhenyu_zhang;ChenHuiwen;wangshen001;Hudingyi;wendellX;Secluded_Ocean;jhon-117;yaohan404;jiangruitao;zhenghaojie;stormchasingg;panyj1993;cmh1056291129;yuyinkai1;sunguozhong;genius52;liu_jiaxu;HongMaoShuiGuai;zhengxinqian;weixin_43368449;jsez-li-bin;jgong5;wqh17101;w00609794;yydyzr;JieZhang679;sppedforcy;gcw_61YBRfIt;Jiong Gong;hw_whx;gongjiong;孔炳翔 Merged-by: ascend-robot Description: 代码从 develop 同步到 master,后续基于 master 演进,并支持打包 See merge request: Ascend/msmodeling!330 | 15 天前 | |
【同步】【非开发代码】代码从 develop 同步到 master Co-authored-by: yydyzr<liuyuncong1@huawei.com> Co-authored-by: gcw_61YBRfIt<chuzhenxing@huawei.com> Co-authored-by: 孔炳翔<1120200577@qq.com> Co-authored-by: zhengxinqian<qianzhengxin@huawei.com> Co-authored-by: hw_whx<wanghexiang7@huawei.com> Co-authored-by: jgong5<steven.gong@gmail.com> Co-authored-by: hw_whx<2952154980@qq.com> # message auto-generated for no-merge-commit merge: !330 merge master into master 【同步】【非开发代码】代码从 develop 同步到 master Created-by: AvadaKedavrua Commit-by: liujiawang;ascend-robot;AvadaKedavrua;lutean;Horacehxw;eveyin1;minghang_c;zwt__;tt0cool;elrond-g;jia_ya_nan;zhenyu_zhang;ChenHuiwen;wangshen001;Hudingyi;wendellX;Secluded_Ocean;jhon-117;yaohan404;jiangruitao;zhenghaojie;stormchasingg;panyj1993;cmh1056291129;yuyinkai1;sunguozhong;genius52;liu_jiaxu;HongMaoShuiGuai;zhengxinqian;weixin_43368449;jsez-li-bin;jgong5;wqh17101;w00609794;yydyzr;JieZhang679;sppedforcy;gcw_61YBRfIt;Jiong Gong;hw_whx;gongjiong;孔炳翔 Merged-by: ascend-robot Description: 代码从 develop 同步到 master,后续基于 master 演进,并支持打包 See merge request: Ascend/msmodeling!330 | 15 天前 | |
【同步】【非开发代码】代码从 develop 同步到 master Co-authored-by: yydyzr<liuyuncong1@huawei.com> Co-authored-by: gcw_61YBRfIt<chuzhenxing@huawei.com> Co-authored-by: 孔炳翔<1120200577@qq.com> Co-authored-by: zhengxinqian<qianzhengxin@huawei.com> Co-authored-by: hw_whx<wanghexiang7@huawei.com> Co-authored-by: jgong5<steven.gong@gmail.com> Co-authored-by: hw_whx<2952154980@qq.com> # message auto-generated for no-merge-commit merge: !330 merge master into master 【同步】【非开发代码】代码从 develop 同步到 master Created-by: AvadaKedavrua Commit-by: liujiawang;ascend-robot;AvadaKedavrua;lutean;Horacehxw;eveyin1;minghang_c;zwt__;tt0cool;elrond-g;jia_ya_nan;zhenyu_zhang;ChenHuiwen;wangshen001;Hudingyi;wendellX;Secluded_Ocean;jhon-117;yaohan404;jiangruitao;zhenghaojie;stormchasingg;panyj1993;cmh1056291129;yuyinkai1;sunguozhong;genius52;liu_jiaxu;HongMaoShuiGuai;zhengxinqian;weixin_43368449;jsez-li-bin;jgong5;wqh17101;w00609794;yydyzr;JieZhang679;sppedforcy;gcw_61YBRfIt;Jiong Gong;hw_whx;gongjiong;孔炳翔 Merged-by: ascend-robot Description: 代码从 develop 同步到 master,后续基于 master 演进,并支持打包 See merge request: Ascend/msmodeling!330 | 15 天前 | |
fix(tensor_cast): model MTP speculative decode shapes Co-authored-by: Horacehxw<horacehxw@gmail.com> # message auto-generated for no-merge-commit merge: !362 merge resolve-issue-130 into master fix(tensor_cast): model MTP speculative decode shapes Created-by: Horacehxw Commit-by: Horacehxw Merged-by: ascend-robot Description: **PR Type / PR类型** - [x] Bugfix(Bug 修复) - [x] Refactor(代码重构) - [x] Test-Cases(测试用例更新) ## 🔍 Motivation / 变更动机 MTP(Multi-Token Prediction)speculative decode 在 TensorCast 仿真中,lm_head 和 sampler 的 row selection 逻辑存在问题: 1. **target/proposal rows 混淆**:MTP decode 时,lm_head 应该只处理 spec window 内的 target+bonus verification rows,而不是全量 packed rows。旧逻辑用 selected_token_indices 做 prefill 级的 token 裁剪,但无法区分 target 和 proposal,导致 lm_head 多算了不参与 verification 的行。 2. **Sampler 不支持 spec decode 输出格式**:旧 Sampler 只做 greedy argmax 取最后一个 token,不支持返回 (num_requests, num_speculative_tokens + 1) 形状的 target+bonus tokens。 3. **Kimi K2.5 MTP path 全量过 lm_head**:Kimi 的 monkey-patch 在 MTP text path 里把全部 hidden states 送进 lm_head(163840 vocab),prefill 时 12×7168×163840 的矩阵乘法被放大 ~3500×。 ------ ## 📝 Modification / 修改内容 ### 核心:统一 row selection 路径 - 新增 SpecDecodeMetadata dataclass,记录每个 batch 的 logits_indices、num_active_requests、num_speculative_tokens。 - 新增 select_lm_head_hidden_states(hidden_states, sampling_metadata, mode): - mode="target":选 verification window 全部行(给 lm_head 用) - mode="proposal":只选每个 request 的最后一行(给 MTP predictor 用) - CausalLmWrapper、VLModelWrapper、MultiTokenPredictor 统一调用该函数,替代原来散落各处的 index_select。 ### Input Generator - generate_inputs / generate_inputs_varlen 在 MTP decode 时构造 SpecDecodeMetadata,只覆盖每个 request 尾部的 spec window rows。 - 短窗口(query_len < num_mtp_tokens + 1)自动 fallback 到普通 decode selection。 ### Sampler - 识别 spec_decode_metadata 后,将 verification logits reshape 为 (num_requests, spec_window, vocab),分别对 target 和 bonus 做 greedy argmax,返回 (num_requests, spec_window) 形状。 - 兼容 proposal rows(MTP 后续层)和旧的 selected_token_indices prefill 路径。 ### Kimi K2.5 - MTP text path 拆分:先跑 language model body 拿到 full hidden states(rotary/proposal 需要),再用 select_lm_head_hidden_states 裁 target rows 后过 lm_head。避免 163840-vocab 的全量投影。 - 删除旧的 MultiTokenPredictorLayer tuple-unpack monkey patch(已在上游 mtp.py 修复)。 ------ ## 📐 Associated Test Results / 关联测试结果 pytest: 26 passed, 2 warnings in 3.72s 覆盖: - target/proposal row selection 不混用 - default selected_token_indices=-1 sentinel 被正确忽略 - wrong logits_indices length → ValueError - spec-decode sampler 返回 target+bonus tokens - CausalLmWrapper target row 投影 - MTP wrapper prefill fallback / spec-decode bonus token forward - fixed/varlen input generator MTP metadata 生成 - Kimi text path target rows 在 internal lm_head 前裁剪 - Kimi default sentinel 不触发 fast path ------ ## ✅ Checklist / 检查列表 - [x] Linting tools used / 使用 lintrunner 工具 - [x] Bug fixes covered by unit tests / 修复的 Bug 已由单元测试覆盖 - [x] Modification covered by unit tests / 修改已由单元测试覆盖 - [ ] Documentation updated / 文档已更新 - [x] No Chinese comments in code files / 代码文件中不含中文注释 See merge request: Ascend/msmodeling!362 | 1 天前 | |
【同步】【非开发代码】代码从 develop 同步到 master Co-authored-by: yydyzr<liuyuncong1@huawei.com> Co-authored-by: gcw_61YBRfIt<chuzhenxing@huawei.com> Co-authored-by: 孔炳翔<1120200577@qq.com> Co-authored-by: zhengxinqian<qianzhengxin@huawei.com> Co-authored-by: hw_whx<wanghexiang7@huawei.com> Co-authored-by: jgong5<steven.gong@gmail.com> Co-authored-by: hw_whx<2952154980@qq.com> # message auto-generated for no-merge-commit merge: !330 merge master into master 【同步】【非开发代码】代码从 develop 同步到 master Created-by: AvadaKedavrua Commit-by: liujiawang;ascend-robot;AvadaKedavrua;lutean;Horacehxw;eveyin1;minghang_c;zwt__;tt0cool;elrond-g;jia_ya_nan;zhenyu_zhang;ChenHuiwen;wangshen001;Hudingyi;wendellX;Secluded_Ocean;jhon-117;yaohan404;jiangruitao;zhenghaojie;stormchasingg;panyj1993;cmh1056291129;yuyinkai1;sunguozhong;genius52;liu_jiaxu;HongMaoShuiGuai;zhengxinqian;weixin_43368449;jsez-li-bin;jgong5;wqh17101;w00609794;yydyzr;JieZhang679;sppedforcy;gcw_61YBRfIt;Jiong Gong;hw_whx;gongjiong;孔炳翔 Merged-by: ascend-robot Description: 代码从 develop 同步到 master,后续基于 master 演进,并支持打包 See merge request: Ascend/msmodeling!330 | 15 天前 |
| 文件 | 最后提交记录 | 最后更新时间 |
|---|---|---|
| 15 天前 | ||
| 15 天前 | ||
| 2 天前 | ||
| 15 天前 | ||
| 15 天前 | ||
| 15 天前 | ||
| 2 天前 | ||
| 1 天前 | ||
| 15 天前 | ||
| 15 天前 | ||
| 15 天前 | ||
| 15 天前 | ||
| 1 天前 | ||
| 15 天前 |