msmodeling/tensor_cast/ops · Ascend/MindStudio-Modeling - AtomGit

文件	最后提交记录	最后更新时间
__init__.py	feat：仿真建模支持deepseek-V4模型适配 Co-authored-by: ChenHuiwen<chenhuiwen7@huawei.com> # message auto-generated for no-merge-commit merge: !166 merge deepseek-v4 into develop feat：仿真建模支持deepseek-V4模型适配 Created-by: ChenHuiwen Commit-by: ChenHuiwen Merged-by: ascend-robot Description: Thanks for your contribution; we appreciate it a lot. The following instructions will make your pull request healthier and help you get feedback more easily. If you do not understand some items, don't worry, just make the pull request and seek help from maintainers. 感谢您的贡献，我们非常重视。以下说明将使您的拉取请求更健康，更易于获得反馈。如果您不理解某些项目，请不要担心，只需提交拉取请求并从维护人员那里寻求帮助即可。 PR Type / PR类型 - [x] Feature（功能新增） - [ ] Bugfix（Bug 修复） - [ ] Docs（文档更新） - [ ] CI/CD（持续集成/持续部署） - [ ] Refactor（代码重构） - [ ] Perf（性能优化） - [ ] Test-Cases（测试用例更新） - [ ] Other（其他） ## 🔍 Motivation / 变更动机为 msmodeling/tensor_cast 增加对 DeepSeek V4 (Flash/Pro) 模型的端到端支持，使其性能建模流水线能够覆盖 V4 引入的稀疏注意力（NSA / Window / Compressed / Heavily-Compressed 多 layer-type 路由）、HC（Head Compression）混合、Sinkhorn 拆分以及 Hash Routing MoE 等新结构，并补齐对应的 fake-tensor 语义算子与代价模型，让 V4 模型可以直接走通现有 analytic / multistream tracing 流程。 ------ ## 📝 Modification / 修改内容新增文件 / New files - tensor_cast/transformers/builtin_model/deepseek_v4.py：DeepSeek V4 builtin model profile，包含 DeepseekV4Config / DeepseekV4Model 注册、layer-type 校验（{0, 4, 128} 对应 sliding_attention / compressed_sparse_attention / heavily_compressed_attention）、以及与 transformers AutoConfig / AutoModel 的安全注册逻辑。 - tests/test_tensor_cast/test_deepseek_v4.py 与 tests/test_tensor_cast/data/deepseek_v4/.json：V4 模型对应的测试数据集与用例（含合法/非法/缺失/截短的 ratios 配置）。注意力 / Attention（tensor_cast/layers/mla.py，tensor_cast/ops/mla.py，tensor_cast/ops/rotary_embedding.py） - 新增 DeepseekV4SparseAttention 与 MultiheadLatentAttentionTensorCast 适配（含 requires_legacy_kv_b_decomposition、KV-cache window 写入路径等）。 - 新增 get_window_topk_idxs / get_compress_topk_idxs 索引生成工具。 - 新增 HC 路径语义算子：hc_pre_inv_rms、hc_pre_sinkhorn，分别对应参考实现中的 inverse-RMS 缩放与 Sinkhorn 加权 reduction。 - 新增 scatter_nd_update_mla 等 KV 写入算子的代价模型，按参考实现仅计 source 行读 + 更新行写，不计 slot_mapping / 整 cache 张量。 MoE / Gate（tensor_cast/layers/moe_layer.py，tensor_cast/ops/fused_moe.py） - MoELayer 增加 V4 统一 gating 路径：识别 gate 上的 is_v4 / hash 标志位，按参考 Gate.forward 顺序发出 matmul + score func + indices + gather/normalize/route_scale 各算子，使每一步按其真实 dtype（gate matmul 走 fp32）单独计费。 - 新增 moe_gating_top_k（带可选 bias 的 V4 非 hash 层）与 moe_gating_top_k_hash（基于 tid2eid 表的 hash 路由层）两个语义算子。性能模型 / Performance Model（tensor_cast/performance_model/__init__.py） - 引入 _safe_max_int 工具：在 fake / meta / functional tensor 上 tensor.max().item() 不可用时回退为 None，让 caller 走 shape-based 估算。 - 注册 V4 新算子（scatter_nd_update_mla、HC 系列、MoE 新 gating tail 等）的 PerformanceProperties，与参考实现的内存访问语义对齐。其他 / Misc - tensor_cast/core/config_resolver.py、input_generator.py、model_runner.py、device.py、transformers/transformations.py、 transformers/custom_model_registry.py、layers/utils.py、model_config.py、compilation/passes/multistream_pass.py：补齐 V4 在 config 解析、输入构造、runner 调度、device profile、模型变换与算子注册各环节的接入。 ------ ## 📐 Associated Test Results / 关联测试结果 Please provide the related test results, such as test reports, etc.* 请提供相关测试结果，例如测试报告等。 ![image.png](https://raw.gitcode.com/user-images/assets/8428112/4dbd32d5-6f6d-4b84-a840-a06eec62fc40/image.png 'image.png') ![image.png](https://raw.gitcode.com/user-images/assets/8428112/fda50383-9b30-4453-bfd1-391889bebb47/image.png 'image.png') ------ ## 🌟 Use cases (Optional) / 使用案例（可选） If this PR introduces a new feature, it is better to list some use cases here and update the documentation. 如果此拉取请求引入了新功能，最好在此处列出一些用例并更新文档。 ------ ## ✅ Checklist / 检查列表 Before PR: - [ ] [Linting tools](https://gitcode.com/Ascend/msmodeling/blob/develop/tensor_cast/README.md#coding-style) are used to fix the potential lint issues. / 使用 [lintrunner 工具](https://gitcode.com/Ascend/msmodeling/blob/develop/tensor_cast/README.md#coding-style) 来修复潜在的 lint 问题。 - [ ] Bug fixes are fully covered by unit tests, the case that causes the bug should be added in the unit tests. / 修复的 Bug 已完全由单元测试覆盖，导致 Bug 的情况应在单元测试中添加。 - [ ] The modification is covered by complete unit tests. If not, please add more unit tests to ensure the correctness. / 此拉取请求中的修改已完全由单元测试覆盖。如果不是，请添加更多单元测试以确保正确性。 - [ ] All relevant documentation (API docs, docstrings, example tutorials) has been updated to reflect these changes. / 所有相关文档（API 文档、文档字符串、示例教程）已更新以反映这些更改。 - [ ] Please ensure code files contain no Chinese comments. / 请保证代码文件中不含中文注释。 ------ See merge request: Ascend/msmodeling!166	20 天前
attention.py	[bugfix]QWEN3.5 单个TOKEN prefill or decode 判断修复 Co-authored-by: AvadaKedavrua<anonymousdev@163.com> Co-authored-by: yuyinkai1<769293914@qq.com> # message auto-generated for no-merge-commit merge: !154 merge develop into develop [bugfix]QWEN3.5 单个TOKEN prefill or decode 判断修复 Created-by: yuyinkai1 Commit-by: yuyinkai1;AvadaKedavrua Merged-by: ascend-robot Description: Thanks for your contribution; we appreciate it a lot. The following instructions will make your pull request healthier and help you get feedback more easily. If you do not understand some items, don't worry, just make the pull request and seek help from maintainers. 感谢您的贡献，我们非常重视。以下说明将使您的拉取请求更健康，更易于获得反馈。如果您不理解某些项目，请不要担心，只需提交拉取请求并从维护人员那里寻求帮助即可。 PR Type / PR类型 - [ ] Feature（功能新增） - [x] Bugfix（Bug 修复） - [ ] Docs（文档更新） - [ ] CI/CD（持续集成/持续部署） - [ ] Refactor（代码重构） - [ ] Perf（性能优化） - [ ] Test-Cases（测试用例更新） - [ ] Other（其他） ## 🔍 Motivation / 变更动机 Please describe the motivation of this PR and the goal you want to achieve through this PR. 请描述您的拉取请求的动机和您希望通过此拉取请求实现的目标。 ------ ## 📝 Modification / 修改内容 Please briefly describe what modification is made in this PR. 请简要描述此拉取请求中进行的修改。 ------ ## 📐 Associated Test Results / 关联测试结果 Please provide the related test results, such as test reports, etc. 请提供相关测试结果，例如测试报告等。 ------ ## 🌟 Use cases (Optional) / 使用案例（可选） If this PR introduces a new feature, it is better to list some use cases here and update the documentation. 如果此拉取请求引入了新功能，最好在此处列出一些用例并更新文档。 ------ ## ✅ Checklist / 检查列表 Before PR: - [ ] [Linting tools](https://gitcode.com/Ascend/msmodeling/blob/develop/tensor_cast/README.md#coding-style) are used to fix the potential lint issues. / 使用 [lintrunner 工具](https://gitcode.com/Ascend/msmodeling/blob/develop/tensor_cast/README.md#coding-style) 来修复潜在的 lint 问题。 - [ ] Bug fixes are fully covered by unit tests, the case that causes the bug should be added in the unit tests. / 修复的 Bug 已完全由单元测试覆盖，导致 Bug 的情况应在单元测试中添加。 - [ ] The modification is covered by complete unit tests. If not, please add more unit tests to ensure the correctness. / 此拉取请求中的修改已完全由单元测试覆盖。如果不是，请添加更多单元测试以确保正确性。 - [ ] All relevant documentation (API docs, docstrings, example tutorials) has been updated to reflect these changes. / 所有相关文档（API 文档、文档字符串、示例教程）已更新以反映这些更改。 - [ ] Please ensure code files contain no Chinese comments. / 请保证代码文件中不含中文注释。 ------ See merge request: Ascend/msmodeling!154	1 个月前
cat.py	chore(ci): adopt pre-commit and retire legacy lintrunner adapters Co-authored-by: liujiawang<anonymousdev@163.com> # message auto-generated for no-merge-commit merge: !176 merge pre-commit into develop chore(ci): adopt pre-commit and retire legacy lintrunner adapters Created-by: AvadaKedavrua Commit-by: liujiawang;AvadaKedavrua Merged-by: ascend-robot Description: Thanks for your contribution; we appreciate it a lot. The following instructions will make your pull request healthier and help you get feedback more easily. If you do not understand some items, don't worry, just make the pull request and seek help from maintainers. 感谢您的贡献，我们非常重视。以下说明将使您的拉取请求更健康，更易于获得反馈。如果您不理解某些项目，请不要担心，只需提交拉取请求并从维护人员那里寻求帮助即可。 PR Type / PR类型 - [ ] Feature（功能新增） - [ ] Bugfix（Bug 修复） - [x] Docs（文档更新） - [x] CI/CD（持续集成/持续部署） - [ ] Refactor（代码重构） - [ ] Perf（性能优化） - [ ] Test-Cases（测试用例更新） - [ ] Other（其他） ------ ## Motivation / 变更动机 Continue the pre-commit migration: tighten Pylint so only high-signal messages run (`disable=all` + explicit `enable` list), fix real issues that remained under that profile, and translate hook/config comments to English. ------ ## Configuration changes（仅工具与注释 / tooling & comments only） \| Path \| What changed \| \|------\|----------------\| \| `pre-commit/pyproject.toml` \| Pylint: `[tool.pylint."messages control"]` with `disable = ["all"]` and a short allowlist of message IDs (E0100, E0601–E0611, E0632, E1101, E1120, W0632, W1514). Ruff: unchanged behavior; comments translated to English. Bandit: comments translated; rule allowlist/skip lists unchanged. \| \| `.pre-commit-config.yaml` \| Comments translated to English; Bandit hook display name set to bandit (Python security checks). Hook versions and args unchanged except for comment text. \| ------ ## Source code changes（应用代码 / application code） \| Area \| Files \| Purpose \| \|------\|--------\|---------\| \| `serving_cast` \| `communication.py`, `engine.py`, `instance.py`, `kv_cache_manager.py`, `load_gen.py`, `main.py`, `model_runner.py`, `request.py`, `serving.py`, `utils.py` \| Replace `from . import stime` with `import serving_cast.stime as stime` so Pylint resolves imports (fixes E0611). \| \| `serving_cast` \| `stime.py` \| Singleton salabim `Environment` via `_get_sim_env()` so type checkers/Pylint see `sim.Environment` (fixes E1101 on `SimulationEnv`). \| \| `serving_cast/service` \| `base_throughput_optimizer.py` \| `__init__` defaults + `assert runner is not None` before `run_inference` (fixes E1101 on base class). \| \| `tensor_cast` \| `diffusers/diffusers_model.py`, `diffusers/diffusers_utils.py`, `runtime.py` \| Add `encoding="utf-8"` to `open()` / trace export (fixes W1514). \| \| `web_ui` \| `callbacks.py` \| `refresh_optimizer_detail`: call `_optimizer_detail_view(rows, None, device)` and unpack five return values (fixes E1120). \| ------ ## Recent commits on `pre-commit` branch - `ci(pre-commit): fix pylint message selection with disable=all` - `fix: resolve pylint findings in serving_cast, tensor_cast, and web_ui` - `docs(pre-commit): translate comments to English and add all-files run log` ------ ![](https://raw.atomgit.com/Ascend/msmodeling/attachment/uploads/b22b18aa-4c84-4dc0-85f5-1e7e0715350e/pre-commit-all-files-run.svg) ------ ## Checklist / 检查列表 - [x] Please ensure code files contain no Chinese comments. / 请保证代码文件中不含中文注释。 See merge request: Ascend/msmodeling!176	1 个月前
communication.py	support mm allreduce fusion Co-authored-by: lutean<lutean1@huawei.com> # message auto-generated for no-merge-commit merge: !89 merge develop into develop support mm allreduce fusion Created-by: lutean Commit-by: lutean Merged-by: ascend-robot Description: Thanks for your contribution; we appreciate it a lot. The following instructions will make your pull request healthier and help you get feedback more easily. If you do not understand some items, don't worry, just make the pull request and seek help from maintainers. 感谢您的贡献，我们非常重视。以下说明将使您的拉取请求更健康，更易于获得反馈。如果您不理解某些项目，请不要担心，只需提交拉取请求并从维护人员那里寻求帮助即可。 PR Type / PR类型 - [x] Feature（功能新增） - [ ] Bugfix（Bug 修复） - [ ] Docs（文档更新） - [ ] CI/CD（持续集成/持续部署） - [ ] Refactor（代码重构） - [ ] Perf（性能优化） - [x] Test-Cases（测试用例更新） - [ ] Other（其他） ## 🔍 Motivation / 变更动机 Please describe the motivation of this PR and the goal you want to achieve through this PR. 请描述您的拉取请求的动机和您希望通过此拉取请求实现的目标。为了平衡计算效率与通信开销，cube计算和集合通信算子可以通过协同切分与并行执行来达成性能提升的目的。此PR实现matmul和allreduce融合，及对应量化场景下的融合。 ------ ## 📝 Modification / 修改内容 Please briefly describe what modification is made in this PR. 请简要描述此拉取请求中进行的修改。 1.matmul和allreduce的pattern注册 2.各个融合算子的注册 3.各个融合算子的耗时评估（取计算算子和通信算子的最大值） 4.对应判断融合是否生效的UT ------ ## 📐 Associated Test Results / 关联测试结果 Please provide the related test results, such as test reports, etc. 请提供相关测试结果，例如测试报告等。 ![noquant.png](https://raw.gitcode.com/user-images/assets/8428112/35dce015-6966-4b95-8b8a-e7cddffe1e98/noquant.png 'noquant.png') ![w8a8.png](https://raw.gitcode.com/user-images/assets/8428112/c7ba1943-5978-4c07-ae46-c738c233a86a/w8a8.png 'w8a8.png') ![w4a8.png](https://raw.gitcode.com/user-images/assets/8428112/c950e764-d6ce-40ca-8e55-cde5936a2019/w4a8.png 'w4a8.png') ![FP8.png](https://raw.gitcode.com/user-images/assets/8428112/0029666d-4bfb-46f6-844d-5f384781b01d/FP8.png 'FP8.png') ![MXFP4.png](https://raw.gitcode.com/user-images/assets/8428112/11feb970-4e5d-457b-9891-5373a754a497/MXFP4.png 'MXFP4.png') ------ ## 🌟 Use cases (Optional) / 使用案例（可选） If this PR introduces a new feature, it is better to list some use cases here and update the documentation. 如果此拉取请求引入了新功能，最好在此处列出一些用例并更新文档。 ------ ## ✅ Checklist / 检查列表 Before PR: - [ ] [Linting tools](https://gitcode.com/Ascend/msmodeling/blob/develop/tensor_cast/README.md#coding-style) are used to fix the potential lint issues. / 使用 [lintrunner 工具](https://gitcode.com/Ascend/msmodeling/blob/develop/tensor_cast/README.md#coding-style) 来修复潜在的 lint 问题。 - [ ] Bug fixes are fully covered by unit tests, the case that causes the bug should be added in the unit tests. / 修复的 Bug 已完全由单元测试覆盖，导致 Bug 的情况应在单元测试中添加。 - [ ] The modification is covered by complete unit tests. If not, please add more unit tests to ensure the correctness. / 此拉取请求中的修改已完全由单元测试覆盖。如果不是，请添加更多单元测试以确保正确性。 - [ ] All relevant documentation (API docs, docstrings, example tutorials) has been updated to reflect these changes. / 所有相关文档（API 文档、文档字符串、示例教程）已更新以反映这些更改。 - [ ] Please ensure code files contain no Chinese comments. / 请保证代码文件中不含中文注释。 ------ See merge request: Ascend/msmodeling!89	3 个月前
deepseek_v4.py	feat：仿真建模支持deepseek-V4模型适配 Co-authored-by: ChenHuiwen<chenhuiwen7@huawei.com> # message auto-generated for no-merge-commit merge: !166 merge deepseek-v4 into develop feat：仿真建模支持deepseek-V4模型适配 Created-by: ChenHuiwen Commit-by: ChenHuiwen Merged-by: ascend-robot Description: Thanks for your contribution; we appreciate it a lot. The following instructions will make your pull request healthier and help you get feedback more easily. If you do not understand some items, don't worry, just make the pull request and seek help from maintainers. 感谢您的贡献，我们非常重视。以下说明将使您的拉取请求更健康，更易于获得反馈。如果您不理解某些项目，请不要担心，只需提交拉取请求并从维护人员那里寻求帮助即可。 PR Type / PR类型 - [x] Feature（功能新增） - [ ] Bugfix（Bug 修复） - [ ] Docs（文档更新） - [ ] CI/CD（持续集成/持续部署） - [ ] Refactor（代码重构） - [ ] Perf（性能优化） - [ ] Test-Cases（测试用例更新） - [ ] Other（其他） ## 🔍 Motivation / 变更动机为 msmodeling/tensor_cast 增加对 DeepSeek V4 (Flash/Pro) 模型的端到端支持，使其性能建模流水线能够覆盖 V4 引入的稀疏注意力（NSA / Window / Compressed / Heavily-Compressed 多 layer-type 路由）、HC（Head Compression）混合、Sinkhorn 拆分以及 Hash Routing MoE 等新结构，并补齐对应的 fake-tensor 语义算子与代价模型，让 V4 模型可以直接走通现有 analytic / multistream tracing 流程。 ------ ## 📝 Modification / 修改内容新增文件 / New files - tensor_cast/transformers/builtin_model/deepseek_v4.py：DeepSeek V4 builtin model profile，包含 DeepseekV4Config / DeepseekV4Model 注册、layer-type 校验（{0, 4, 128} 对应 sliding_attention / compressed_sparse_attention / heavily_compressed_attention）、以及与 transformers AutoConfig / AutoModel 的安全注册逻辑。 - tests/test_tensor_cast/test_deepseek_v4.py 与 tests/test_tensor_cast/data/deepseek_v4/.json：V4 模型对应的测试数据集与用例（含合法/非法/缺失/截短的 ratios 配置）。注意力 / Attention（tensor_cast/layers/mla.py，tensor_cast/ops/mla.py，tensor_cast/ops/rotary_embedding.py） - 新增 DeepseekV4SparseAttention 与 MultiheadLatentAttentionTensorCast 适配（含 requires_legacy_kv_b_decomposition、KV-cache window 写入路径等）。 - 新增 get_window_topk_idxs / get_compress_topk_idxs 索引生成工具。 - 新增 HC 路径语义算子：hc_pre_inv_rms、hc_pre_sinkhorn，分别对应参考实现中的 inverse-RMS 缩放与 Sinkhorn 加权 reduction。 - 新增 scatter_nd_update_mla 等 KV 写入算子的代价模型，按参考实现仅计 source 行读 + 更新行写，不计 slot_mapping / 整 cache 张量。 MoE / Gate（tensor_cast/layers/moe_layer.py，tensor_cast/ops/fused_moe.py） - MoELayer 增加 V4 统一 gating 路径：识别 gate 上的 is_v4 / hash 标志位，按参考 Gate.forward 顺序发出 matmul + score func + indices + gather/normalize/route_scale 各算子，使每一步按其真实 dtype（gate matmul 走 fp32）单独计费。 - 新增 moe_gating_top_k（带可选 bias 的 V4 非 hash 层）与 moe_gating_top_k_hash（基于 tid2eid 表的 hash 路由层）两个语义算子。性能模型 / Performance Model（tensor_cast/performance_model/__init__.py） - 引入 _safe_max_int 工具：在 fake / meta / functional tensor 上 tensor.max().item() 不可用时回退为 None，让 caller 走 shape-based 估算。 - 注册 V4 新算子（scatter_nd_update_mla、HC 系列、MoE 新 gating tail 等）的 PerformanceProperties，与参考实现的内存访问语义对齐。其他 / Misc - tensor_cast/core/config_resolver.py、input_generator.py、model_runner.py、device.py、transformers/transformations.py、 transformers/custom_model_registry.py、layers/utils.py、model_config.py、compilation/passes/multistream_pass.py：补齐 V4 在 config 解析、输入构造、runner 调度、device profile、模型变换与算子注册各环节的接入。 ------ ## 📐 Associated Test Results / 关联测试结果 Please provide the related test results, such as test reports, etc.* 请提供相关测试结果，例如测试报告等。 ![image.png](https://raw.gitcode.com/user-images/assets/8428112/4dbd32d5-6f6d-4b84-a840-a06eec62fc40/image.png 'image.png') ![image.png](https://raw.gitcode.com/user-images/assets/8428112/fda50383-9b30-4453-bfd1-391889bebb47/image.png 'image.png') ------ ## 🌟 Use cases (Optional) / 使用案例（可选） If this PR introduces a new feature, it is better to list some use cases here and update the documentation. 如果此拉取请求引入了新功能，最好在此处列出一些用例并更新文档。 ------ ## ✅ Checklist / 检查列表 Before PR: - [ ] [Linting tools](https://gitcode.com/Ascend/msmodeling/blob/develop/tensor_cast/README.md#coding-style) are used to fix the potential lint issues. / 使用 [lintrunner 工具](https://gitcode.com/Ascend/msmodeling/blob/develop/tensor_cast/README.md#coding-style) 来修复潜在的 lint 问题。 - [ ] Bug fixes are fully covered by unit tests, the case that causes the bug should be added in the unit tests. / 修复的 Bug 已完全由单元测试覆盖，导致 Bug 的情况应在单元测试中添加。 - [ ] The modification is covered by complete unit tests. If not, please add more unit tests to ensure the correctness. / 此拉取请求中的修改已完全由单元测试覆盖。如果不是，请添加更多单元测试以确保正确性。 - [ ] All relevant documentation (API docs, docstrings, example tutorials) has been updated to reflect these changes. / 所有相关文档（API 文档、文档字符串、示例教程）已更新以反映这些更改。 - [ ] Please ensure code files contain no Chinese comments. / 请保证代码文件中不含中文注释。 ------ See merge request: Ascend/msmodeling!166	20 天前
fused_moe.py	【bugfix】Fix DFC quant fusion residuals by internalizing activation quant args Co-authored-by: lutean<lutean1@huawei.com> # message auto-generated for no-merge-commit merge: !191 merge develop into develop 【bugfix】Fix DFC quant fusion residuals by internalizing activation quant args Created-by: lutean Commit-by: lutean Merged-by: ascend-robot Description: Thanks for your contribution; we appreciate it a lot. The following instructions will make your pull request healthier and help you get feedback more easily. If you do not understand some items, don't worry, just make the pull request and seek help from maintainers. 感谢您的贡献，我们非常重视。以下说明将使您的拉取请求更健康，更易于获得反馈。如果您不理解某些项目，请不要担心，只需提交拉取请求并从维护人员那里寻求帮助即可。 PR Type / PR类型 - [ ] Feature（功能新增） - [x] Bugfix（Bug 修复） - [ ] Docs（文档更新） - [ ] CI/CD（持续集成/持续部署） - [ ] Refactor（代码重构） - [ ] Perf（性能优化） - [ ] Test-Cases（测试用例更新） - [ ] Other（其他） ## 🔍 Motivation / 变更动机 Please describe the motivation of this PR and the goal you want to achieve through this PR. 请描述您的拉取请求的动机和您希望通过此拉取请求实现的目标。错误现象：在动态量化下，DFC融合pass生效后，会残留init_routing、all to all、gmm_quant_swiglu，导致性能精度下降错误原因：当前在 quant case 里做的是“结构替换”，但参数接口设计还保留了对原始中间激活量化节点的依赖，导致grouped_matmul_quant_swiglu_default 这条链因为还在给 fused op 产生活跃输入，无法被 eliminate_dead_code() 删除，所以图上留下了 grouped_matmul_quant_swiglu_default 活节点。 ------ ## 📝 Modification / 修改内容 Please briefly describe what modification is made in this PR. 请简要描述此拉取请求中进行的修改。 1、修改 dispatch_ffn_combine_quant / dispatch_ffn_combine_quant_int4 的算子语义，不再把 gmm2_x_scale/gmm2_x_offset 作为外部输入，fused op 内部自己完成。 2、修改 pass 的 grouped quant 取参逻辑：在 grouped case 下，不再直接用 gmm_plain_node.args[1:]，而是只提取 gmm2 的静态权重侧参数，不再把 gmm2_x_scale/gmm2_x_offset 从图里带进去。 3、同步更新 meta op / estimator 签名：tensor_cast/ops/fused_moe.py 和 tensor_cast/performance_model/__init__.py 里 dispatch_ffn_combine_quant / quant_int4 的参数列表 ------ ## 📐 Associated Test Results / 关联测试结果 Please provide the related test results, such as test reports, etc. 请提供相关测试结果，例如测试报告等。修复前： ![修复前.png](https://raw.gitcode.com/user-images/assets/8428112/7f38c958-08ba-4d04-9038-34f5f07dc63d/修复前.png '修复前.png') 修复后： ![修复后.png](https://raw.gitcode.com/user-images/assets/8428112/601b480a-6e5d-4196-84f8-343a06dd71bc/修复后.png '修复后.png') ------ ## 🌟 Use cases (Optional) / 使用案例（可选） If this PR introduces a new feature, it is better to list some use cases here and update the documentation. 如果此拉取请求引入了新功能，最好在此处列出一些用例并更新文档。 ------ ## ✅ Checklist / 检查列表 Before PR: - [ ] [Linting tools](https://gitcode.com/Ascend/msmodeling/blob/develop/tensor_cast/README.md#coding-style) are used to fix the potential lint issues. / 使用 [lintrunner 工具](https://gitcode.com/Ascend/msmodeling/blob/develop/tensor_cast/README.md#coding-style) 来修复潜在的 lint 问题。 - [ ] Bug fixes are fully covered by unit tests, the case that causes the bug should be added in the unit tests. / 修复的 Bug 已完全由单元测试覆盖，导致 Bug 的情况应在单元测试中添加。 - [ ] The modification is covered by complete unit tests. If not, please add more unit tests to ensure the correctness. / 此拉取请求中的修改已完全由单元测试覆盖。如果不是，请添加更多单元测试以确保正确性。 - [ ] All relevant documentation (API docs, docstrings, example tutorials) has been updated to reflect these changes. / 所有相关文档（API 文档、文档字符串、示例教程）已更新以反映这些更改。 - [ ] Please ensure code files contain no Chinese comments. / 请保证代码文件中不含中文注释。 ------ See merge request: Ascend/msmodeling!191	1 个月前
gmm.py	Implement fusion of gmm + swiglu Co-authored-by: HongMaoShuiGuai<1120200577@qq.com> Co-authored-by: genius52<taochengcheng@h-partners.com> # message auto-generated for no-merge-commit merge: !83 merge gmm_swiglu into develop Implement fusion of gmm + swiglu Created-by: genius52 Commit-by: genius52;HongMaoShuiGuai Merged-by: ascend-robot Description: Thanks for your contribution; we appreciate it a lot. The following instructions will make your pull request healthier and help you get feedback more easily. If you do not understand some items, don't worry, just make the pull request and seek help from maintainers. 感谢您的贡献，我们非常重视。以下说明将使您的拉取请求更健康，更易于获得反馈。如果您不理解某些项目，请不要担心，只需提交拉取请求并从维护人员那里寻求帮助即可。 PR Type / PR类型 - [x] Feature（功能新增） - [ ] Bugfix（Bug 修复） - [ ] Docs（文档更新） - [ ] CI/CD（持续集成/持续部署） - [ ] Refactor（代码重构） - [ ] Perf（性能优化） - [ ] Test-Cases（测试用例更新） - [ ] Other（其他） ## 🔍 Motivation / 变更动机 Please describe the motivation of this PR and the goal you want to achieve through this PR. 请描述您的拉取请求的动机和您希望通过此拉取请求实现的目标。 ------ ## 📝 Modification / 修改内容 Please briefly describe what modification is made in this PR. 请简要描述此拉取请求中进行的修改。 ------ ## 📐 Associated Test Results / 关联测试结果 Please provide the related test results, such as test reports, etc. 请提供相关测试结果，例如测试报告等。 python -m tensor_cast.scripts.text_generate Qwen/Qwen3-235B-A22B --num-queries 2 --query-length 3500 --context-length 4500 --device TEST_DEVICE --quantize-attention-action INT8 --compile --num-hidden-layers-override 3 ![image.png](https://raw.gitcode.com/user-images/assets/8428112/01a10652-1d65-4052-a954-b54a7a4e9c26/image.png 'image.png') ------ ## 🌟 Use cases (Optional) / 使用案例（可选） If this PR introduces a new feature, it is better to list some use cases here and update the documentation. 如果此拉取请求引入了新功能，最好在此处列出一些用例并更新文档。 ![image.png](https://raw.gitcode.com/user-images/assets/8428112/24cf3a41-38b4-4cb2-b2a8-1f4a6f0945b2/image.png 'image.png') ------ ## ✅ Checklist / 检查列表 Before PR: - [ ] [Linting tools](https://gitcode.com/Ascend/msmodeling/blob/develop/tensor_cast/README.md#coding-style) are used to fix the potential lint issues. / 使用 [lintrunner 工具](https://gitcode.com/Ascend/msmodeling/blob/develop/tensor_cast/README.md#coding-style) 来修复潜在的 lint 问题。 - [ ] Bug fixes are fully covered by unit tests, the case that causes the bug should be added in the unit tests. / 修复的 Bug 已完全由单元测试覆盖，导致 Bug 的情况应在单元测试中添加。 - [ ] The modification is covered by complete unit tests. If not, please add more unit tests to ensure the correctness. / 此拉取请求中的修改已完全由单元测试覆盖。如果不是，请添加更多单元测试以确保正确性。 - [ ] All relevant documentation (API docs, docstrings, example tutorials) has been updated to reflect these changes. / 所有相关文档（API 文档、文档字符串、示例教程）已更新以反映这些更改。 - [ ] Please ensure code files contain no Chinese comments. / 请保证代码文件中不含中文注释。 ------ See merge request: Ascend/msmodeling!83	3 个月前
internal.py	feat(multistream): add compile-time multistream scheduling (core only) Co-authored-by: Kudo__shinichi<liuning119@huawei.com> # message auto-generated for no-merge-commit merge: !117 merge feat/multistream-design into develop feat(multistream): add compile-time multistream scheduling (core only) Created-by: Kudo__shinichi Commit-by: Kudo__shinichi Merged-by: ascend-robot Description: PR Type / PR类型 - [x] Feature（功能新增） - [ ] Bugfix（Bug 修复） - [ ] Docs（文档更新） - [ ] CI/CD（持续集成/持续部署） - [ ] Refactor（代码重构） - [x] Perf（性能优化） - [x] Test-Cases（测试用例更新） - [ ] Other（其他） ## 🔍 Motivation / 变更动机当前 `torch.compile` 路径中缺少通用的多流调度能力。通信与计算的重叠主要依赖少量已有融合算子的局部建模，无法对 FX 图中的普通 compute / collective 节点做统一的 compile-time 调度。本 PR 的目标是： 1. 在 `torch.compile` 路径中引入可控的多流调度能力； 2. 在存在通信与计算重叠窗口的场景下缩短关键路径； 3. 在预测无收益时通过收益守卫自动回退，保持原有单流行为不变； 4. 保持实现简洁，尽量复用现有 compile / runtime / performance model 基础能力； 5. 修复多流控制锚点参与 memory tracking 时导致 activation memory 统计失真的问题。 ## 📝 Modification / 修改内容本 PR 主要包含以下改动： 1. `tensor_cast/config.py` - 增加 multistream 配置项； - 支持基于 role 的 stream 映射； - 保留旧字段兼容； - 去除 pass-local 硬编码带宽默认值，调度成本优先使用 analytic performance model 和 device profile 信息。 2. `tensor_cast/core/model_builder.py` - 在构建 compile backend 时传入当前 device 信息； - 使 multistream pass 能够基于当前设备 profile 做 cost estimation。 3. `tensor_cast/compilation/compile_backend.py` - 在 compile rewrite 流程中接入 multistream pass； - 按 reviewer 建议，将 multistream pass 放在 `decompose_auto_functionalized_pass` 之前执行； - 原因是 multistream pass 内部会调用 DCE，需要在 pure-functional graph 上运行，避免 defunctionalization 后的 mutation-style graph 影响语义正确性。 4. `tensor_cast/compilation/passes/multistream_pass.py` - 引入 compile-time multistream schedule pass； - 将节点按执行资源划分为 `COMM_ONLY`、`HYBRID`、`COMPUTE`； - `all_reduce / all_gather / reduce_scatter / all_to_all` 等 collective 节点建模为通信节点； - `matmul_all_reduce / static_quant_linear_all_reduce` 等融合节点建模为 hybrid 节点； - 通过 `_internal_wait_and_bind` / `_internal_record` 完成 lowering； - 增加收益守卫，仅当预测多流 makespan 优于单流 baseline 时才应用改写； - 非 `OpOverload` helper 节点不进入 analytic cost estimation，避免 `operator.getitem` 等 helper 被错误当作设备算子建模。 5. `tensor_cast/runtime.py` - 增加多流运行事件中的 stream / dependency token 记录； - memory tracker 按多流依赖感知顺序回放事件，更准确地反映多流下 activation lifetime 延长； - 多流内部 anchor op 不作为模型 activation 参与显存统计，避免控制锚点放大 memory 结果。 6. `tests` - 增加 multistream pass 基础覆盖； - 增加 runtime critical path 和 anchor memory 相关覆盖； - 覆盖收益守卫、anchor lowering、helper node 处理和多流 memory accounting 等关键行为。 ## 📐 Associated Test Results / 关联测试结果单流示例 `python -m tensor_cast.scripts.text_generate deepseek-ai/DeepSeek-V3.1 --device ATLAS_800_A3_560T_128G_DIE --num-queries 64 --query-length 1 --context-length 1024 --world-size 16 --tp-size 8 --dp-size 2 --moe-tp-size 4 --moe-dp-size 1 --ep-size 4 --decode --compile --compile-allow-graph-break --disable-repetition --num-hidden-layers-override 4 --quantize-attention-action INT8 --chrome-trace trace_ds_single_l4_q64_ctx1024.json --log-level info` ![image.png](https://raw.gitcode.com/user-images/assets/8428112/06a071a9-09e8-4df3-ad69-be50de74296d/image.png 'image.png') 多流示例 `python -m tensor_cast.scripts.text_generate deepseek-ai/DeepSeek-V3.1 --device ATLAS_800_A3_560T_128G_DIE --num-queries 64 --query-length 1 --context-length 1024 --world-size 16 --tp-size 8 --dp-size 2 --moe-tp-size 4 --moe-dp-size 1 --ep-size 4 --decode --compile --compile-allow-graph-break --disable-repetition --num-hidden-layers-override 4 --quantize-attention-action INT8 --chrome-trace trace_ds_multi_l4_q64_ctx1024_current.json --log-level info` ![c438e5d0a40bb5de8d33d565d2196c94.png](https://raw.gitcode.com/user-images/assets/8428112/0a697207-4b9a-4015-a7df-11135740b70f/c438e5d0a40bb5de8d33d565d2196c94.png 'c438e5d0a40bb5de8d33d565d2196c94.png') 关键结果： \| 场景 \| Total time for analytic \| Execution time \| TPS/Device \| 说明 \| \|---\|---:\|---:\|---:\|---\| \| 单流 \| 20.729ms \| 0.020729 s \| 193 token/s \| baseline \| \| 多流 \| 20.687ms \| 0.019750 s \| 202.5 token/s \| multistream enabled \| 性能对比： - 多流场景下，`Execution time` 从 `0.020729 s` 降低到 `0.019750 s`，时延下降约 `4.72%`。 - `TPS/Device` 从 `193 token/s` 提升到 `202.5 token/s`，提升约 `4.92%`。 ------ ## 🌟 Use cases (Optional) / 使用案例（可选）适合当前版本多流收益验证的场景： 1. 通信占比较高的 decode 场景； 2. TP/EP collective 较多、存在独立 compute/comm 重叠窗口的场景； 3. 希望在 compile 侧进行保守调度尝试，并要求无收益时自动回退的场景。当前版本的已知边界： 1. dense / memory-bound 场景下，多流可能因收益守卫直接跳过； 2. `HYBRID` 融合算子当前仍按主流黑盒节点建模，后续仍有进一步细化空间。 ------ ## ✅ Checklist / 检查列表 Before PR: - [x] Linting tools are used to fix the potential lint issues. - [x] Bug fixes are fully covered by unit tests, the case that causes the bug should be added in the unit tests. - [x] The modification is covered by complete unit tests. If not, please add more unit tests to ensure the correctness. - [ ] All relevant documentation (API docs, docstrings, example tutorials) has been updated to reflect these changes. - [x] Please ensure code files contain no Chinese comments. See merge request: Ascend/msmodeling!117	1 个月前
layernorm.py	chore(ci): adopt pre-commit and retire legacy lintrunner adapters Co-authored-by: liujiawang<anonymousdev@163.com> # message auto-generated for no-merge-commit merge: !176 merge pre-commit into develop chore(ci): adopt pre-commit and retire legacy lintrunner adapters Created-by: AvadaKedavrua Commit-by: liujiawang;AvadaKedavrua Merged-by: ascend-robot Description: Thanks for your contribution; we appreciate it a lot. The following instructions will make your pull request healthier and help you get feedback more easily. If you do not understand some items, don't worry, just make the pull request and seek help from maintainers. 感谢您的贡献，我们非常重视。以下说明将使您的拉取请求更健康，更易于获得反馈。如果您不理解某些项目，请不要担心，只需提交拉取请求并从维护人员那里寻求帮助即可。 PR Type / PR类型 - [ ] Feature（功能新增） - [ ] Bugfix（Bug 修复） - [x] Docs（文档更新） - [x] CI/CD（持续集成/持续部署） - [ ] Refactor（代码重构） - [ ] Perf（性能优化） - [ ] Test-Cases（测试用例更新） - [ ] Other（其他） ------ ## Motivation / 变更动机 Continue the pre-commit migration: tighten Pylint so only high-signal messages run (`disable=all` + explicit `enable` list), fix real issues that remained under that profile, and translate hook/config comments to English. ------ ## Configuration changes（仅工具与注释 / tooling & comments only） \| Path \| What changed \| \|------\|----------------\| \| `pre-commit/pyproject.toml` \| Pylint: `[tool.pylint."messages control"]` with `disable = ["all"]` and a short allowlist of message IDs (E0100, E0601–E0611, E0632, E1101, E1120, W0632, W1514). Ruff: unchanged behavior; comments translated to English. Bandit: comments translated; rule allowlist/skip lists unchanged. \| \| `.pre-commit-config.yaml` \| Comments translated to English; Bandit hook display name set to bandit (Python security checks). Hook versions and args unchanged except for comment text. \| ------ ## Source code changes（应用代码 / application code） \| Area \| Files \| Purpose \| \|------\|--------\|---------\| \| `serving_cast` \| `communication.py`, `engine.py`, `instance.py`, `kv_cache_manager.py`, `load_gen.py`, `main.py`, `model_runner.py`, `request.py`, `serving.py`, `utils.py` \| Replace `from . import stime` with `import serving_cast.stime as stime` so Pylint resolves imports (fixes E0611). \| \| `serving_cast` \| `stime.py` \| Singleton salabim `Environment` via `_get_sim_env()` so type checkers/Pylint see `sim.Environment` (fixes E1101 on `SimulationEnv`). \| \| `serving_cast/service` \| `base_throughput_optimizer.py` \| `__init__` defaults + `assert runner is not None` before `run_inference` (fixes E1101 on base class). \| \| `tensor_cast` \| `diffusers/diffusers_model.py`, `diffusers/diffusers_utils.py`, `runtime.py` \| Add `encoding="utf-8"` to `open()` / trace export (fixes W1514). \| \| `web_ui` \| `callbacks.py` \| `refresh_optimizer_detail`: call `_optimizer_detail_view(rows, None, device)` and unpack five return values (fixes E1120). \| ------ ## Recent commits on `pre-commit` branch - `ci(pre-commit): fix pylint message selection with disable=all` - `fix: resolve pylint findings in serving_cast, tensor_cast, and web_ui` - `docs(pre-commit): translate comments to English and add all-files run log` ------ ![](https://raw.atomgit.com/Ascend/msmodeling/attachment/uploads/b22b18aa-4c84-4dc0-85f5-1e7e0715350e/pre-commit-all-files-run.svg) ------ ## Checklist / 检查列表 - [x] Please ensure code files contain no Chinese comments. / 请保证代码文件中不含中文注释。 See merge request: Ascend/msmodeling!176	1 个月前
linear.py	Use _ for names of the ops and compute properties functions. Always return the graph module for all graph passes. Move stable_topo_sort to its own file. Move sink_split_pass to freezing passes since it depends on graph freezing. Co-authored-by: Jiong Gong<steven.gong@gmail.com>	6 个月前
mla.py	chore(ci): adopt pre-commit and retire legacy lintrunner adapters Co-authored-by: liujiawang<anonymousdev@163.com> # message auto-generated for no-merge-commit merge: !176 merge pre-commit into develop chore(ci): adopt pre-commit and retire legacy lintrunner adapters Created-by: AvadaKedavrua Commit-by: liujiawang;AvadaKedavrua Merged-by: ascend-robot Description: Thanks for your contribution; we appreciate it a lot. The following instructions will make your pull request healthier and help you get feedback more easily. If you do not understand some items, don't worry, just make the pull request and seek help from maintainers. 感谢您的贡献，我们非常重视。以下说明将使您的拉取请求更健康，更易于获得反馈。如果您不理解某些项目，请不要担心，只需提交拉取请求并从维护人员那里寻求帮助即可。 PR Type / PR类型 - [ ] Feature（功能新增） - [ ] Bugfix（Bug 修复） - [x] Docs（文档更新） - [x] CI/CD（持续集成/持续部署） - [ ] Refactor（代码重构） - [ ] Perf（性能优化） - [ ] Test-Cases（测试用例更新） - [ ] Other（其他） ------ ## Motivation / 变更动机 Continue the pre-commit migration: tighten Pylint so only high-signal messages run (`disable=all` + explicit `enable` list), fix real issues that remained under that profile, and translate hook/config comments to English. ------ ## Configuration changes（仅工具与注释 / tooling & comments only） \| Path \| What changed \| \|------\|----------------\| \| `pre-commit/pyproject.toml` \| Pylint: `[tool.pylint."messages control"]` with `disable = ["all"]` and a short allowlist of message IDs (E0100, E0601–E0611, E0632, E1101, E1120, W0632, W1514). Ruff: unchanged behavior; comments translated to English. Bandit: comments translated; rule allowlist/skip lists unchanged. \| \| `.pre-commit-config.yaml` \| Comments translated to English; Bandit hook display name set to bandit (Python security checks). Hook versions and args unchanged except for comment text. \| ------ ## Source code changes（应用代码 / application code） \| Area \| Files \| Purpose \| \|------\|--------\|---------\| \| `serving_cast` \| `communication.py`, `engine.py`, `instance.py`, `kv_cache_manager.py`, `load_gen.py`, `main.py`, `model_runner.py`, `request.py`, `serving.py`, `utils.py` \| Replace `from . import stime` with `import serving_cast.stime as stime` so Pylint resolves imports (fixes E0611). \| \| `serving_cast` \| `stime.py` \| Singleton salabim `Environment` via `_get_sim_env()` so type checkers/Pylint see `sim.Environment` (fixes E1101 on `SimulationEnv`). \| \| `serving_cast/service` \| `base_throughput_optimizer.py` \| `__init__` defaults + `assert runner is not None` before `run_inference` (fixes E1101 on base class). \| \| `tensor_cast` \| `diffusers/diffusers_model.py`, `diffusers/diffusers_utils.py`, `runtime.py` \| Add `encoding="utf-8"` to `open()` / trace export (fixes W1514). \| \| `web_ui` \| `callbacks.py` \| `refresh_optimizer_detail`: call `_optimizer_detail_view(rows, None, device)` and unpack five return values (fixes E1120). \| ------ ## Recent commits on `pre-commit` branch - `ci(pre-commit): fix pylint message selection with disable=all` - `fix: resolve pylint findings in serving_cast, tensor_cast, and web_ui` - `docs(pre-commit): translate comments to English and add all-files run log` ------ ![](https://raw.atomgit.com/Ascend/msmodeling/attachment/uploads/b22b18aa-4c84-4dc0-85f5-1e7e0715350e/pre-commit-all-files-run.svg) ------ ## Checklist / 检查列表 - [x] Please ensure code files contain no Chinese comments. / 请保证代码文件中不含中文注释。 See merge request: Ascend/msmodeling!176	1 个月前
mtp.py	Use _ for names of the ops and compute properties functions. Always return the graph module for all graph passes. Move stable_topo_sort to its own file. Move sink_split_pass to freezing passes since it depends on graph freezing. Co-authored-by: Jiong Gong<steven.gong@gmail.com>	6 个月前
quantization.py	Use _ for names of the ops and compute properties functions. Always return the graph module for all graph passes. Move stable_topo_sort to its own file. Move sink_split_pass to freezing passes since it depends on graph freezing. Co-authored-by: Jiong Gong<steven.gong@gmail.com>	6 个月前
rotary_embedding.py	feat：仿真建模支持deepseek-V4模型适配 Co-authored-by: ChenHuiwen<chenhuiwen7@huawei.com> # message auto-generated for no-merge-commit merge: !166 merge deepseek-v4 into develop feat：仿真建模支持deepseek-V4模型适配 Created-by: ChenHuiwen Commit-by: ChenHuiwen Merged-by: ascend-robot Description: Thanks for your contribution; we appreciate it a lot. The following instructions will make your pull request healthier and help you get feedback more easily. If you do not understand some items, don't worry, just make the pull request and seek help from maintainers. 感谢您的贡献，我们非常重视。以下说明将使您的拉取请求更健康，更易于获得反馈。如果您不理解某些项目，请不要担心，只需提交拉取请求并从维护人员那里寻求帮助即可。 PR Type / PR类型 - [x] Feature（功能新增） - [ ] Bugfix（Bug 修复） - [ ] Docs（文档更新） - [ ] CI/CD（持续集成/持续部署） - [ ] Refactor（代码重构） - [ ] Perf（性能优化） - [ ] Test-Cases（测试用例更新） - [ ] Other（其他） ## 🔍 Motivation / 变更动机为 msmodeling/tensor_cast 增加对 DeepSeek V4 (Flash/Pro) 模型的端到端支持，使其性能建模流水线能够覆盖 V4 引入的稀疏注意力（NSA / Window / Compressed / Heavily-Compressed 多 layer-type 路由）、HC（Head Compression）混合、Sinkhorn 拆分以及 Hash Routing MoE 等新结构，并补齐对应的 fake-tensor 语义算子与代价模型，让 V4 模型可以直接走通现有 analytic / multistream tracing 流程。 ------ ## 📝 Modification / 修改内容新增文件 / New files - tensor_cast/transformers/builtin_model/deepseek_v4.py：DeepSeek V4 builtin model profile，包含 DeepseekV4Config / DeepseekV4Model 注册、layer-type 校验（{0, 4, 128} 对应 sliding_attention / compressed_sparse_attention / heavily_compressed_attention）、以及与 transformers AutoConfig / AutoModel 的安全注册逻辑。 - tests/test_tensor_cast/test_deepseek_v4.py 与 tests/test_tensor_cast/data/deepseek_v4/.json：V4 模型对应的测试数据集与用例（含合法/非法/缺失/截短的 ratios 配置）。注意力 / Attention（tensor_cast/layers/mla.py，tensor_cast/ops/mla.py，tensor_cast/ops/rotary_embedding.py） - 新增 DeepseekV4SparseAttention 与 MultiheadLatentAttentionTensorCast 适配（含 requires_legacy_kv_b_decomposition、KV-cache window 写入路径等）。 - 新增 get_window_topk_idxs / get_compress_topk_idxs 索引生成工具。 - 新增 HC 路径语义算子：hc_pre_inv_rms、hc_pre_sinkhorn，分别对应参考实现中的 inverse-RMS 缩放与 Sinkhorn 加权 reduction。 - 新增 scatter_nd_update_mla 等 KV 写入算子的代价模型，按参考实现仅计 source 行读 + 更新行写，不计 slot_mapping / 整 cache 张量。 MoE / Gate（tensor_cast/layers/moe_layer.py，tensor_cast/ops/fused_moe.py） - MoELayer 增加 V4 统一 gating 路径：识别 gate 上的 is_v4 / hash 标志位，按参考 Gate.forward 顺序发出 matmul + score func + indices + gather/normalize/route_scale 各算子，使每一步按其真实 dtype（gate matmul 走 fp32）单独计费。 - 新增 moe_gating_top_k（带可选 bias 的 V4 非 hash 层）与 moe_gating_top_k_hash（基于 tid2eid 表的 hash 路由层）两个语义算子。性能模型 / Performance Model（tensor_cast/performance_model/__init__.py） - 引入 _safe_max_int 工具：在 fake / meta / functional tensor 上 tensor.max().item() 不可用时回退为 None，让 caller 走 shape-based 估算。 - 注册 V4 新算子（scatter_nd_update_mla、HC 系列、MoE 新 gating tail 等）的 PerformanceProperties，与参考实现的内存访问语义对齐。其他 / Misc - tensor_cast/core/config_resolver.py、input_generator.py、model_runner.py、device.py、transformers/transformations.py、 transformers/custom_model_registry.py、layers/utils.py、model_config.py、compilation/passes/multistream_pass.py：补齐 V4 在 config 解析、输入构造、runner 调度、device profile、模型变换与算子注册各环节的接入。 ------ ## 📐 Associated Test Results / 关联测试结果 Please provide the related test results, such as test reports, etc.* 请提供相关测试结果，例如测试报告等。 ![image.png](https://raw.gitcode.com/user-images/assets/8428112/4dbd32d5-6f6d-4b84-a840-a06eec62fc40/image.png 'image.png') ![image.png](https://raw.gitcode.com/user-images/assets/8428112/fda50383-9b30-4453-bfd1-391889bebb47/image.png 'image.png') ------ ## 🌟 Use cases (Optional) / 使用案例（可选） If this PR introduces a new feature, it is better to list some use cases here and update the documentation. 如果此拉取请求引入了新功能，最好在此处列出一些用例并更新文档。 ------ ## ✅ Checklist / 检查列表 Before PR: - [ ] [Linting tools](https://gitcode.com/Ascend/msmodeling/blob/develop/tensor_cast/README.md#coding-style) are used to fix the potential lint issues. / 使用 [lintrunner 工具](https://gitcode.com/Ascend/msmodeling/blob/develop/tensor_cast/README.md#coding-style) 来修复潜在的 lint 问题。 - [ ] Bug fixes are fully covered by unit tests, the case that causes the bug should be added in the unit tests. / 修复的 Bug 已完全由单元测试覆盖，导致 Bug 的情况应在单元测试中添加。 - [ ] The modification is covered by complete unit tests. If not, please add more unit tests to ensure the correctness. / 此拉取请求中的修改已完全由单元测试覆盖。如果不是，请添加更多单元测试以确保正确性。 - [ ] All relevant documentation (API docs, docstrings, example tutorials) has been updated to reflect these changes. / 所有相关文档（API 文档、文档字符串、示例教程）已更新以反映这些更改。 - [ ] Please ensure code files contain no Chinese comments. / 请保证代码文件中不含中文注释。 ------ See merge request: Ascend/msmodeling!166	20 天前
swiglu.py	chore(ci): adopt pre-commit and retire legacy lintrunner adapters Co-authored-by: liujiawang<anonymousdev@163.com> # message auto-generated for no-merge-commit merge: !176 merge pre-commit into develop chore(ci): adopt pre-commit and retire legacy lintrunner adapters Created-by: AvadaKedavrua Commit-by: liujiawang;AvadaKedavrua Merged-by: ascend-robot Description: Thanks for your contribution; we appreciate it a lot. The following instructions will make your pull request healthier and help you get feedback more easily. If you do not understand some items, don't worry, just make the pull request and seek help from maintainers. 感谢您的贡献，我们非常重视。以下说明将使您的拉取请求更健康，更易于获得反馈。如果您不理解某些项目，请不要担心，只需提交拉取请求并从维护人员那里寻求帮助即可。 PR Type / PR类型 - [ ] Feature（功能新增） - [ ] Bugfix（Bug 修复） - [x] Docs（文档更新） - [x] CI/CD（持续集成/持续部署） - [ ] Refactor（代码重构） - [ ] Perf（性能优化） - [ ] Test-Cases（测试用例更新） - [ ] Other（其他） ------ ## Motivation / 变更动机 Continue the pre-commit migration: tighten Pylint so only high-signal messages run (`disable=all` + explicit `enable` list), fix real issues that remained under that profile, and translate hook/config comments to English. ------ ## Configuration changes（仅工具与注释 / tooling & comments only） \| Path \| What changed \| \|------\|----------------\| \| `pre-commit/pyproject.toml` \| Pylint: `[tool.pylint."messages control"]` with `disable = ["all"]` and a short allowlist of message IDs (E0100, E0601–E0611, E0632, E1101, E1120, W0632, W1514). Ruff: unchanged behavior; comments translated to English. Bandit: comments translated; rule allowlist/skip lists unchanged. \| \| `.pre-commit-config.yaml` \| Comments translated to English; Bandit hook display name set to bandit (Python security checks). Hook versions and args unchanged except for comment text. \| ------ ## Source code changes（应用代码 / application code） \| Area \| Files \| Purpose \| \|------\|--------\|---------\| \| `serving_cast` \| `communication.py`, `engine.py`, `instance.py`, `kv_cache_manager.py`, `load_gen.py`, `main.py`, `model_runner.py`, `request.py`, `serving.py`, `utils.py` \| Replace `from . import stime` with `import serving_cast.stime as stime` so Pylint resolves imports (fixes E0611). \| \| `serving_cast` \| `stime.py` \| Singleton salabim `Environment` via `_get_sim_env()` so type checkers/Pylint see `sim.Environment` (fixes E1101 on `SimulationEnv`). \| \| `serving_cast/service` \| `base_throughput_optimizer.py` \| `__init__` defaults + `assert runner is not None` before `run_inference` (fixes E1101 on base class). \| \| `tensor_cast` \| `diffusers/diffusers_model.py`, `diffusers/diffusers_utils.py`, `runtime.py` \| Add `encoding="utf-8"` to `open()` / trace export (fixes W1514). \| \| `web_ui` \| `callbacks.py` \| `refresh_optimizer_detail`: call `_optimizer_detail_view(rows, None, device)` and unpack five return values (fixes E1120). \| ------ ## Recent commits on `pre-commit` branch - `ci(pre-commit): fix pylint message selection with disable=all` - `fix: resolve pylint findings in serving_cast, tensor_cast, and web_ui` - `docs(pre-commit): translate comments to English and add all-files run log` ------ ![](https://raw.atomgit.com/Ascend/msmodeling/attachment/uploads/b22b18aa-4c84-4dc0-85f5-1e7e0715350e/pre-commit-all-files-run.svg) ------ ## Checklist / 检查列表 - [x] Please ensure code files contain no Chinese comments. / 请保证代码文件中不含中文注释。 See merge request: Ascend/msmodeling!176	1 个月前