msmodeling/tensor_cast/transformers · Ascend/MindStudio-Modeling - AtomGit

ascend-robotrefactor(tensor_cast): unify word embedding tp config

文件	最后提交记录	最后更新时间
builtin_model	fix(tensor_cast): support GLM5 DSA tuple returns Co-authored-by: minghang_c<chiminghang@h-partners.com> # message auto-generated for no-merge-commit merge: !332 merge glm5-transformers-fix into develop fix(tensor_cast): support GLM5 DSA tuple returns Created-by: minghang_c Commit-by: minghang_c Merged-by: ascend-robot Description: ## 背景在 GLM-5 (`glm_moe_dsa`) / GLM-5.1 模型上执行 TensorCast 推理建模时，原始问题会在 decoder layer 返回值解包处失败： `bash python -m cli.inference.text_generate zai-org/GLM-5 \ --device ATLAS_800_A3_752T_128G_DIE \ --num-devices 16 \ --tp-size 16 \ --dp-size 1 \ --ep-size 16 \ --context-length 0 \ --query-length 3500 \ --num-queries 1 \ --compile \ --quantize-linear-action W4A8_STATIC \ --dump-input-shapes` 错误表现为 tuple 返回值数量不匹配： `text ValueError: not enough values to unpack (expected 3, got 2)` 修复 attention 返回协议后，repetition copy layer 路径还会暴露 decoder layer 返回值数量不匹配： `text ValueError: not enough values to unpack (expected 2, got 1)` 在 GLM-5.1 开启 MTP 时还会暴露两个 MTP 适配问题： `bash python -m cli.inference.text_generate zai-org/GLM-5.1 \ --device ATLAS_800_A3_752T_128G_DIE \ --num-devices 16 \ --tp-size 16 \ --dp-size 1 \ --ep-size 16 \ --context-length 0 \ --query-length 3500 \ --num-queries 1 \ --num-mtp-tokens 3 \ --compile \ --quantize-linear-action W4A8_STATIC \ --dump-input-shapes` 第一处是 synthetic MTP layer 使用 `layer_idx >= num_hidden_layers` 时访问 GLM DSA per-layer config 越界： `text IndexError: list index out of range # config.indexer_types[layer_idx]` 第二处是 GLM DSA decoder block 返回 tuple，而 MTP 通用流程期望继续处理 tensor： ```text torch._dynamo.exc.Unsupported: Dynamo does not know how to trace method` index_select `of class` tuple ` `## 根因 GLM-5 / GLM-5.1 的 HuggingFace decoder layer 有 DSA sparse attention 的跨层 top-k 传递协议： - attention 返回值协议是三元组：`(attn_output, attn_weights, topk_indices) `- decoder layer 返回值协议是二元组：`(hidden_states, topk_indices) `TensorCast 在模型转换过程中会： 1. 使用` mla_module_class_type `将 HF` GlmMoeDsaAttention `替换为 TensorCast sparse attention 实现； 2. 在 repetition 优化中，用` RegionMarkerWrapper `包裹代表层，并用` CopyLayerWrapper `替换后续重复层； 3. 开启 MTP 时，基于 decoder layer class 构造 synthetic MTP layers。原来的通用实现没有完整保留 GLM DSA 相关返回值和 per-layer config 协议： -` DeepseekSparseAttention `只返回` (attn_output, attn_weights)`，但 GLM decoder 期望 attention 返回 3 个值； -` CopyLayerWrapper `对 tuple 返回只构造` (hidden_states,)`，但 GLM decoder layer 期望 repeated layer 也返回 2 个值； -` maybe_enable_mtp() `只扩展了` layer_types `/` mlp_layer_types`，但没有扩展 GLM DSA 专用的` indexer_types`； -` MultiTokenPredictorLayer `没有处理 MTP block 返回 tuple 的模型族。因此问题本质是：TensorCast wrapper/replacement/MTP synthetic layer 没有完整保持被替换 HF 模块的 return contract 和 per-layer config contract。 ## 改动点 ### 1. 增加 GLM 专用 sparse attention wrapper 新增` tensor_cast/layers/glm5.py`：` `python class Glm5SparseAttention(DeepseekSparseAttention): def forward(self, args, kwargs): attn_output, attn_weights = super().forward(args, *kwargs) return attn_output, attn_weights, None` `并将` tensor_cast/transformers/builtin_model/glm5.py `中 GLM profile 的` mla_module_class_type `从通用` DeepseekSparseAttention `切换为` Glm5SparseAttention`。这样 GLM 的三元组 attention 返回协议只在 GLM adapter 层处理，不改变通用` DeepseekSparseAttention`，避免影响其他 built-in 模型。这里没有修改` tests/.ci/gate_policy.yaml`：`builtin_model `路径在 coverage 配置里被 omit，直接把新增实现放在` builtin_model/glm5.py `会导致新增测试无法生成 test_map；因此将可测的 wrapper 放到` tensor_cast/layers/glm5.py`，让 CI gate 可以通过正常 coverage/test_map 关联到` tests/regression/tensor_cast/test_glm5.py`。 ### 2. 让 repetition copy wrapper 保持代表层 tuple 长度在` tensor_cast/layers/internal.py `中： -` RegionMarkerWrapper `记录代表层真实返回 tuple 长度； -` CopyLayerWrapper `根据代表层返回长度补齐` None`，使 copy layer 的 tuple arity 与代表层一致。这个改动不包含 GLM 专属字段判断，例如不读取` prev_topk_indices`。它只保证通用 wrapper 的返回结构长度与代表层一致。对于 GLM，被 copy 的 decoder layer 会返回` (hidden_states, None)`，下一层如果收到` prev_topk_indices=None`，会按 HF 原逻辑重新计算 top-k，因此语义安全。 ### 3. 补齐 GLM DSA MTP per-layer config 在` tensor_cast/transformers/transformations.py `中，开启 MTP 时像` layer_types `/` mlp_layer_types `一样扩展` indexer_types`：` `python if hasattr(hf_config, "indexer_types") and isinstance(hf_config.indexer_types, list) and hf_config.indexer_types: hf_config.indexer_types.extend([hf_config.indexer_types[-1]] mtp_config.num_mtp_layers)` `这样 synthetic MTP layer 的` layer_idx=78,79,80 `可以访问合法的 GLM DSA indexer type，避免` IndexError`。 ### 4. 让 MTP layer 兼容 tuple block 输出在` tensor_cast/layers/mtp.py `中，如果` mtp_block `返回 tuple，则取第一个元素作为后续 hidden states：` `python if isinstance(hidden_states, tuple): hidden_states = hidden_states[0]` `这与 decoder layer tuple 协议一致：第一个元素是` hidden_states`，后续元素是模型族特定的辅助返回值。 ### 5. 增加轻量回归测试新增/扩展` tests/regression/tensor_cast/test_glm5.py`，覆盖： -` Glm5SparseAttention.forward `将二元组 attention 输出补齐为 GLM decoder 需要的三元组； -` maybe_enable_mtp() `会扩展 GLM DSA` indexer_types`； -` MultiTokenPredictorLayer `会从 tuple MTP block 输出中取` hidden_states`。 ## 验证已验证 GLM adapter / MTP 回归测试和现有 repetition wrapper 测试通过：` `bash /home/minghang/workspace/msmodeling-upstream/.venv/bin/python -m pytest \ tests/regression/tensor_cast/test_glm5.py \ tests/regression/tensor_cast/test_repetition_wrappers.py -q` `结果：` `text 4 passed in 0.02s` `已验证 GLM-5.1 + MTP 原始失败命令可运行并完成性能统计输出：` `bash /home/minghang/workspace/msmodeling-upstream/.venv/bin/python -m cli.inference.text_generate zai-org/GLM-5.1 \ --device ATLAS_800_A3_752T_128G_DIE \ --num-devices 16 \ --tp-size 16 \ --dp-size 1 \ --ep-size 16 \ --context-length 0 \ --query-length 3500 \ --num-queries 1 \ --num-mtp-tokens 3 \ --compile \ --quantize-linear-action W4A8_STATIC \ --dump-input-shapes` `结果摘要：` `text Model compilation and execution time: 8.125 s Total time for analytic: 283.311ms [analytic] TPS/Device: 772.1 token/s` `已验证新增 layer 文件的符号可被 CI gate AST 逻辑识别：` `text top-level: ['Glm5SparseAttention'] spans: [('Glm5SparseAttention.forward', 5, 7)]` `## 影响范围 - GLM attention 返回协议的三元组适配限定在` tensor_cast/layers/glm5.py `的` Glm5SparseAttention `中； - 通用` DeepseekSparseAttention `未修改，避免影响其他 MLA/DSA 模型； -` CopyLayerWrapper `的改动是通用 tuple arity 保持逻辑，不引入 GLM 专属字段判断； -` maybe_enable_mtp() `只对存在` indexer_types `的 HF config 做 list 扩展，和已有` layer_types `/` mlp_layer_types `扩展逻辑一致； -` MultiTokenPredictorLayer `对 tuple block 输出取第一个元素，兼容 decoder layer 标准 tuple 返回协议； - 不修改` tests/.ci/gate_policy.yaml`，避免触发配置变更导致 CI gate 运行 full suite。 See merge request: Ascend/msmodeling!332	16 天前
__init__.py	Supports a plugin-based mechanism for custom model Co-authored-by: HongMaoShuiGuai<1120200577@qq.com> Co-authored-by: genius52<taochengcheng@h-partners.com> # message auto-generated for no-merge-commit merge: !61 merge custom_model into develop Supports a plugin-based mechanism for custom model Created-by: genius52 Commit-by: genius52;HongMaoShuiGuai Merged-by: ascend-robot Description: Thanks for your contribution; we appreciate it a lot. The following instructions will make your pull request healthier and help you get feedback more easily. If you do not understand some items, don't worry, just make the pull request and seek help from maintainers. 感谢您的贡献，我们非常重视。以下说明将使您的拉取请求更健康，更易于获得反馈。如果您不理解某些项目，请不要担心，只需提交拉取请求并从维护人员那里寻求帮助即可。 PR Type / PR类型 - [x] Feature（功能新增） - [ ] Bugfix（Bug 修复） - [ ] Docs（文档更新） - [ ] CI/CD（持续集成/持续部署） - [x] Refactor（代码重构） - [ ] Perf（性能优化） - [ ] Test-Cases（测试用例更新） - [ ] Other（其他） ## 🔍 Motivation / 变更动机 Please describe the motivation of this PR and the goal you want to achieve through this PR. 请描述您的拉取请求的动机和您希望通过此拉取请求实现的目标。为提升框架的扩展性与易用性，本次提交引入模型插件化机制，支持用户在不修改框架核心代码的前提下，通过独立文件注册自定义模型、转换逻辑与执行流水线，实现新模型的灵活接入与扩展，大幅降低适配成本，提升架构可维护性。 ------ ## 📝 Modification / 修改内容 Please briefly describe what modification is made in this PR. 请简要描述此拉取请求中进行的修改。引入模型插件化机制，支持通过注册器在不修改核心代码的情况下扩展新模型；新增完整的转换流水线与阶段执行体系，实现模型包装、补丁、量化、分片等流程的灵活自定义 ------ ## 📐 Associated Test Results / 关联测试结果 Please provide the related test results, such as test reports, etc. 请提供相关测试结果，例如测试报告等。 ![图像2026-3-4 15.40.png](https://raw.gitcode.com/user-images/assets/8428112/bf296ec7-3f30-4949-bb5d-86f432749ff8/图像2026-3-4_15.40.png '图像2026-3-4 15.40.png') ------ ## 🌟 Use cases (Optional) / 使用案例（可选） If this PR introduces a new feature, it is better to list some use cases here and update the documentation. 如果此拉取请求引入了新功能，最好在此处列出一些用例并更新文档。 ------ ## ✅ Checklist / 检查列表 Before PR: - [ ] [Linting tools](https://gitcode.com/Ascend/msmodeling/blob/develop/tensor_cast/README.md#coding-style) are used to fix the potential lint issues. / 使用 [lintrunner 工具](https://gitcode.com/Ascend/msmodeling/blob/develop/tensor_cast/README.md#coding-style) 来修复潜在的 lint 问题。 - [ ] Bug fixes are fully covered by unit tests, the case that causes the bug should be added in the unit tests. / 修复的 Bug 已完全由单元测试覆盖，导致 Bug 的情况应在单元测试中添加。 - [ ] The modification is covered by complete unit tests. If not, please add more unit tests to ensure the correctness. / 此拉取请求中的修改已完全由单元测试覆盖。如果不是，请添加更多单元测试以确保正确性。 - [ ] All relevant documentation (API docs, docstrings, example tutorials) has been updated to reflect these changes. / 所有相关文档（API 文档、文档字符串、示例教程）已更新以反映这些更改。 - [ ] Please ensure code files contain no Chinese comments. / 请保证代码文件中不含中文注释。 ------ See merge request: Ascend/msmodeling!61	3 个月前
custom_model_registry.py	feat：仿真建模支持deepseek-V4模型适配 Co-authored-by: ChenHuiwen<chenhuiwen7@huawei.com> # message auto-generated for no-merge-commit merge: !166 merge deepseek-v4 into develop feat：仿真建模支持deepseek-V4模型适配 Created-by: ChenHuiwen Commit-by: ChenHuiwen Merged-by: ascend-robot Description: Thanks for your contribution; we appreciate it a lot. The following instructions will make your pull request healthier and help you get feedback more easily. If you do not understand some items, don't worry, just make the pull request and seek help from maintainers. 感谢您的贡献，我们非常重视。以下说明将使您的拉取请求更健康，更易于获得反馈。如果您不理解某些项目，请不要担心，只需提交拉取请求并从维护人员那里寻求帮助即可。 PR Type / PR类型 - [x] Feature（功能新增） - [ ] Bugfix（Bug 修复） - [ ] Docs（文档更新） - [ ] CI/CD（持续集成/持续部署） - [ ] Refactor（代码重构） - [ ] Perf（性能优化） - [ ] Test-Cases（测试用例更新） - [ ] Other（其他） ## 🔍 Motivation / 变更动机为 msmodeling/tensor_cast 增加对 DeepSeek V4 (Flash/Pro) 模型的端到端支持，使其性能建模流水线能够覆盖 V4 引入的稀疏注意力（NSA / Window / Compressed / Heavily-Compressed 多 layer-type 路由）、HC（Head Compression）混合、Sinkhorn 拆分以及 Hash Routing MoE 等新结构，并补齐对应的 fake-tensor 语义算子与代价模型，让 V4 模型可以直接走通现有 analytic / multistream tracing 流程。 ------ ## 📝 Modification / 修改内容新增文件 / New files - tensor_cast/transformers/builtin_model/deepseek_v4.py：DeepSeek V4 builtin model profile，包含 DeepseekV4Config / DeepseekV4Model 注册、layer-type 校验（{0, 4, 128} 对应 sliding_attention / compressed_sparse_attention / heavily_compressed_attention）、以及与 transformers AutoConfig / AutoModel 的安全注册逻辑。 - tests/test_tensor_cast/test_deepseek_v4.py 与 tests/test_tensor_cast/data/deepseek_v4/.json：V4 模型对应的测试数据集与用例（含合法/非法/缺失/截短的 ratios 配置）。注意力 / Attention（tensor_cast/layers/mla.py，tensor_cast/ops/mla.py，tensor_cast/ops/rotary_embedding.py） - 新增 DeepseekV4SparseAttention 与 MultiheadLatentAttentionTensorCast 适配（含 requires_legacy_kv_b_decomposition、KV-cache window 写入路径等）。 - 新增 get_window_topk_idxs / get_compress_topk_idxs 索引生成工具。 - 新增 HC 路径语义算子：hc_pre_inv_rms、hc_pre_sinkhorn，分别对应参考实现中的 inverse-RMS 缩放与 Sinkhorn 加权 reduction。 - 新增 scatter_nd_update_mla 等 KV 写入算子的代价模型，按参考实现仅计 source 行读 + 更新行写，不计 slot_mapping / 整 cache 张量。 MoE / Gate（tensor_cast/layers/moe_layer.py，tensor_cast/ops/fused_moe.py） - MoELayer 增加 V4 统一 gating 路径：识别 gate 上的 is_v4 / hash 标志位，按参考 Gate.forward 顺序发出 matmul + score func + indices + gather/normalize/route_scale 各算子，使每一步按其真实 dtype（gate matmul 走 fp32）单独计费。 - 新增 moe_gating_top_k（带可选 bias 的 V4 非 hash 层）与 moe_gating_top_k_hash（基于 tid2eid 表的 hash 路由层）两个语义算子。性能模型 / Performance Model（tensor_cast/performance_model/__init__.py） - 引入 _safe_max_int 工具：在 fake / meta / functional tensor 上 tensor.max().item() 不可用时回退为 None，让 caller 走 shape-based 估算。 - 注册 V4 新算子（scatter_nd_update_mla、HC 系列、MoE 新 gating tail 等）的 PerformanceProperties，与参考实现的内存访问语义对齐。其他 / Misc - tensor_cast/core/config_resolver.py、input_generator.py、model_runner.py、device.py、transformers/transformations.py、 transformers/custom_model_registry.py、layers/utils.py、model_config.py、compilation/passes/multistream_pass.py：补齐 V4 在 config 解析、输入构造、runner 调度、device profile、模型变换与算子注册各环节的接入。 ------ ## 📐 Associated Test Results / 关联测试结果 Please provide the related test results, such as test reports, etc.* 请提供相关测试结果，例如测试报告等。 ![image.png](https://raw.gitcode.com/user-images/assets/8428112/4dbd32d5-6f6d-4b84-a840-a06eec62fc40/image.png 'image.png') ![image.png](https://raw.gitcode.com/user-images/assets/8428112/fda50383-9b30-4453-bfd1-391889bebb47/image.png 'image.png') ------ ## 🌟 Use cases (Optional) / 使用案例（可选） If this PR introduces a new feature, it is better to list some use cases here and update the documentation. 如果此拉取请求引入了新功能，最好在此处列出一些用例并更新文档。 ------ ## ✅ Checklist / 检查列表 Before PR: - [ ] [Linting tools](https://gitcode.com/Ascend/msmodeling/blob/develop/tensor_cast/README.md#coding-style) are used to fix the potential lint issues. / 使用 [lintrunner 工具](https://gitcode.com/Ascend/msmodeling/blob/develop/tensor_cast/README.md#coding-style) 来修复潜在的 lint 问题。 - [ ] Bug fixes are fully covered by unit tests, the case that causes the bug should be added in the unit tests. / 修复的 Bug 已完全由单元测试覆盖，导致 Bug 的情况应在单元测试中添加。 - [ ] The modification is covered by complete unit tests. If not, please add more unit tests to ensure the correctness. / 此拉取请求中的修改已完全由单元测试覆盖。如果不是，请添加更多单元测试以确保正确性。 - [ ] All relevant documentation (API docs, docstrings, example tutorials) has been updated to reflect these changes. / 所有相关文档（API 文档、文档字符串、示例教程）已更新以反映这些更改。 - [ ] Please ensure code files contain no Chinese comments. / 请保证代码文件中不含中文注释。 ------ See merge request: Ascend/msmodeling!166	23 天前
model.py	perf: optimize repetition representative layers Co-authored-by: yaohan404<yaohan8@huawei.com> # message auto-generated for no-merge-commit merge: !222 merge develop into develop perf: optimize repetition representative layers Created-by: yaohan404 Commit-by: yaohan404 Merged-by: ascend-robot Description: # perf: optimize repetition representative layers Thanks for your contribution; we appreciate it a lot. The following instructions will make your pull request healthier and help you get feedback more easily. If you do not understand some items, don't worry, just make the pull request and seek help from maintainers. 感谢您的贡献，我们非常重视。以下说明将使您的拉取请求更健康，更易于获得反馈。如果您不理解某些项目，请不要担心，只需提交拉取请求并从维护人员那里寻求帮助即可。 PR Type / PR类型 - [ ] Feature（功能新增） - [ ] Bugfix（Bug 修复） - [ ] Docs（文档更新） - [ ] CI/CD（持续集成/持续部署） - [ ] Refactor（代码重构） - [x] Perf（性能优化） - [ ] Test-Cases（测试用例更新） - [ ] Other（其他） ## 🔍 Motivation / 变更动机 Please describe the motivation of this PR and the goal you want to achieve through this PR. 请描述您的拉取请求的动机和您希望通过此拉取请求实现的目标。 Improve model construction performance for large Transformer models with many structurally repeated layers. Existing copy-region based layer reuse already reduces runtime and compile cost, but host-side transformations such as MoE patching, quantization, and sharding still traversed repeated layer internals, causing significant startup overhead for large MoE models like DeepSeek-V3.2. ------ ## 📝 Modification / 修改内容 Please briefly describe what modification is made in this PR. 请简要描述此拉取请求中进行的修改。 Introduce representative-layer processing for structurally repeated Transformer layers. Repeated copy layers now behave as leaf modules during module, parameter, and buffer traversal, so subsequent model transformations only process representative layers. Add repeat-count based weight accounting to preserve full-model memory estimation, and extend repetition tests to verify representative layer counts, full layer count preservation, and modeling-result consistency. ------ ## 📐 Associated Test Results / 关联测试结果 Please provide the related test results, such as test reports, etc. 请提供相关测试结果，例如测试报告等。 ### throughput_optimizer disagg case： cli.inference.throughput_optimizer deepseek-ai/DeepSeek-V3.2 --device ATLAS_800_A3_752T_128G_DIE --num-devices 16 --input-length 3500 --output-length 1500 --disagg --tpot-limits 50 --compile before： ![image.png](https://raw.gitcode.com/user-images/assets/8428112/3ceef0f4-ddf5-44a6-9ba7-af88e4d1fb36/image.png 'image.png') after: ![image.png](https://raw.gitcode.com/user-images/assets/8428112/e0a1d4c1-ea15-4a79-897c-9eacb17cd145/image.png 'image.png') ### throughput_optimizer agg case: cli.inference.throughput_optimizer deepseek-ai/DeepSeek-V3.2 --device ATLAS_800_A3_752T_128G_DIE --num-devices 16 --input-length 3500 --output-length 1500 --ttft-limits 2000 --tpot-limits 100 --compile before: ![image.png](https://raw.gitcode.com/user-images/assets/8428112/974c8d65-2130-496a-b612-34e42764f96a/image.png 'image.png') aftre: ![image.png](https://raw.gitcode.com/user-images/assets/8428112/1a019cba-1d3e-43a4-b218-9b02acce5acd/image.png 'image.png') ### text_generate case: cli.inference.text_generate deepseek-ai/DeepSeek-V3.2 --num-queries 16 --query-length 1 --context-length 4096 --device ATLAS_800_A3_752T_128G_DIE --quantize-linear-action FP8 --num-devices 16 --tp-size 1 --ep-size 16 --compile before: ![image.png](https://raw.gitcode.com/user-images/assets/8428112/de70b160-975a-4b8b-a060-6f0a65c6869a/image.png 'image.png') after: ![image.png](https://raw.gitcode.com/user-images/assets/8428112/c0a505f2-ccd8-4e57-9aca-82e2cdc43148/image.png 'image.png') ------ ## 🌟 Use cases (Optional) / 使用案例（可选） If this PR introduces a new feature, it is better to list some use cases here and update the documentation. 如果此拉取请求引入了新功能，最好在此处列出一些用例并更新文档。 ------ ## ✅ Checklist / 检查列表 Before PR: - [ ] Bug fixes are fully covered by unit tests, the case that causes the bug should be added in the unit tests. / 修复的 Bug 已完全由单元测试覆盖，导致 Bug 的情况应在单元测试中添加。 - [x] The modification is covered by complete unit tests. If not, please add more unit tests to ensure the correctness. / 此拉取请求中的修改已完全由单元测试覆盖。如果不是，请添加更多单元测试以确保正确性。 - [ ] All relevant documentation (API docs, docstrings, example tutorials) has been updated to reflect these changes. / 所有相关文档（API 文档、文档字符串、示例教程）已更新以反映这些更改。 - [x] Please ensure code files contain no Chinese comments. / 请保证代码文件中不含中文注释。 ------ See merge request: Ascend/msmodeling!222	23 天前
transformations.py	refactor(tensor_cast): unify word embedding tp config Co-authored-by: Kudo__shinichi<liuning119@huawei.com> # message auto-generated for no-merge-commit merge: !344 merge codex/word-embedding-tp-normalize into develop refactor(tensor_cast): unify word embedding tp config Created-by: Kudo__shinichi Commit-by: Kudo__shinichi Merged-by: ascend-robot Description: # PR Template Thanks for your contribution; we appreciate it a lot. The following instructions will make your pull request healthier and help you get feedback more easily. If you do not understand some items, don't worry, just make the pull request and seek help from maintainers. 感谢您的贡献，我们非常重视。以下说明将使您的拉取请求更健康，更易于获得反馈。如果您不理解某些项目，请不要担心，只需提交拉取请求并从维护人员那里寻求帮助即可。 PR Type / PR类型 - [ ] Feature（功能新增） - [ ] Bugfix（Bug 修复） - [x] Docs（文档更新） - [ ] CI/CD（持续集成/持续部署） - [x] Refactor（代码重构） - [ ] Perf（性能优化） - [x] Test-Cases（测试用例更新） - [ ] Other（其他） ## 🔍 Motivation / 变更动机 `word_embedding_tp` and `word_embedding_tp_mode` represented the same configuration concept in two fields: one field toggled word embedding TP, and the other selected the TP mode. This PR reduces the public and internal configuration shape to a single parameter so users only need to configure `word_embedding_tp` as disabled, `col`, or `row`. ------ ## 📝 Modification / 修改内容 - Make `UserInputConfig.word_embedding_tp` the single nullable word embedding TP mode field. - Remove `word_embedding_tp_mode` and `embedding_parallel_mode` from the config model. - Pass the normalized `word_embedding_tp` mode directly into `ParallelConfig.embedding_parallel` and the embedding transformation. - Keep legacy bool input normalization for compatibility: `True -> col`, `False/None -> disabled`. - Remove redundant CLI-side bool/mode conversion and update related benchmark cases and user guide docs. - Add regression coverage for single-field config, legacy bool normalization, and invalid `word_embedding_tp` values. ------ ## 📐 Associated Test Results / 关联测试结果 - `python -m pytest tests/regression/tensor_cast/test_user_config.py -q`: 6 passed - `python -m pytest tests/regression/tensor_cast/test_user_config.py tests/regression/web_ui/test_command_builder.py tests/regression/tensor_cast/test_adapter_automation.py -q`: 98 passed - `python -m pytest tests/regression/tensor_cast/test_text_generate.py -k word_embedding_parallel -q`: 2 passed, 113 deselected - `python -m pytest tests/regression/tensor_cast/test_sequence_parallel_pass.py -o addopts= -m "nightly and not npu and not network" -q`: 2 passed - `python -m pytest tests/benchmark/models/test_model_regression.py --collect-only -q`: 15 tests collected - `python -m ruff check <changed python files>`: All checks passed - `python -m pre_commit run --from-ref origin/develop --to-ref HEAD`: passed - `git diff --check HEAD~1 HEAD`: passed ------ ## 🌟 Use cases (Optional) / 使用案例（可选） - Disable word embedding TP: `word_embedding_tp=None` - Enable column mode: `word_embedding_tp="col"` - Enable row mode: `word_embedding_tp="row"` - CLI usage: `--word-embedding-tp col` or `--word-embedding-tp row` ------ ## ✅ Checklist / 检查列表 Before PR: - [x] Bug fixes are fully covered by unit tests, the case that causes the bug should be added in the unit tests. / 修复的 Bug 已完全由单元测试覆盖，导致 Bug 的情况应在单元测试中添加。 - [x] The modification is covered by complete unit tests. If not, please add more unit tests to ensure the correctness. / 此拉取请求中的修改已完全由单元测试覆盖。如果不是，请添加更多单元测试以确保正确性。 - [x] All relevant documentation (API docs, docstrings, example tutorials) has been updated to reflect these changes. / 所有相关文档（API 文档、文档字符串、示例教程）已更新以反映这些更改。 - [x] Please ensure code files contain no Chinese comments. / 请保证代码文件中不含中文注释。 ------ See merge request: Ascend/msmodeling!344	16 天前
utils.py	bugfix: fix kimi-K2.6 remote code trust(like kimi-2.5) Co-authored-by: Elrond G<elrondgcn@gmail.com> # message auto-generated for no-merge-commit merge: !260 merge bugfix/develop/kimi_k26_trust_remote_code into develop bugfix: fix kimi-K2.6 remote code trust(like kimi-2.5) Created-by: elrond-g Commit-by: Elrond G Merged-by: ascend-robot Description: # PR Template Thanks for your contribution; we appreciate it a lot. The following instructions will make your pull request healthier and help you get feedback more easily. If you do not understand some items, don't worry, just make the pull request and seek help from maintainers. 感谢您的贡献，我们非常重视。以下说明将使您的拉取请求更健康，更易于获得反馈。如果您不理解某些项目，请不要担心，只需提交拉取请求并从维护人员那里寻求帮助即可。 PR Type / PR类型 - [ ] Feature（功能新增） - [x] Bugfix（Bug 修复） - [ ] Docs（文档更新） - [ ] CI/CD（持续集成/持续部署） - [ ] Refactor（代码重构） - [ ] Perf（性能优化） - [ ] Test-Cases（测试用例更新） - [ ] Other（其他） ## 🔍 Motivation / 变更动机 Please describe the motivation of this PR and the goal you want to achieve through this PR. 请描述您的拉取请求的动机和您希望通过此拉取请求实现的目标。能够使用Kimi-K2.6的方针 ------ ## 📝 Modification / 修改内容 Please briefly describe what modification is made in this PR. 请简要描述此拉取请求中进行的修改。 1. 将 “windows 补丁” 扩展成为“全平台补丁” 2. 修复探测分支被全局补丁污染的副作用 ------ ## 📐 Associated Test Results / 关联测试结果 Please provide the related test results, such as test reports, etc. 请提供相关测试结果，例如测试报告等。运行以下命令可以成功 bash python -m tensor_cast.scripts.text_generate moonshotai/Kimi-K2.6 \ --num-queries 8 --query-length 1 --context-length 4096 \ --tp-size 8 --dp-size 2 --ep-size 16 \ --quantize-linear-action W8A8_STATIC --word-embedding-tp row \ --device ATLAS_800_A3_752T_128G_DIE --world-size 16 \ --performance-model profiling --compile \ --profiling-database tensor_cast/performance_model/profiling_database/data/ATLAS_800_A3_752T_128G_DIE/vllm_ascend/vllm0.18.0_torch2.9.0_cann8.5 \ --enable-shared-expert-tp --enable-dispatch-ffn-combine ------ ## 🌟 Use cases (Optional) / 使用案例（可选） If this PR introduces a new feature, it is better to list some use cases here and update the documentation. 如果此拉取请求引入了新功能，最好在此处列出一些用例并更新文档。 ------ ## ✅ Checklist / 检查列表 Before PR: - [ ] Bug fixes are fully covered by unit tests, the case that causes the bug should be added in the unit tests. / 修复的 Bug 已完全由单元测试覆盖，导致 Bug 的情况应在单元测试中添加。 - [ ] The modification is covered by complete unit tests. If not, please add more unit tests to ensure the correctness. / 此拉取请求中的修改已完全由单元测试覆盖。如果不是，请添加更多单元测试以确保正确性。 - [ ] All relevant documentation (API docs, docstrings, example tutorials) has been updated to reflect these changes. / 所有相关文档（API 文档、文档字符串、示例教程）已更新以反映这些更改。 - [ ] Please ensure code files contain no Chinese comments. / 请保证代码文件中不含中文注释。 ------ See merge request: Ascend/msmodeling!260	17 天前