msmodeling/tensor_cast · Ascend/MindStudio-Modeling - AtomGit

ascend-robotrefactor(tensor_cast): unify word embedding tp config

文件	最后提交记录	最后更新时间
adapter	decouple ModelRunner metrics from runtime capture & refine documentation Co-authored-by: jhon-117<fangkai15@huawei.com> # message auto-generated for no-merge-commit merge: !291 merge codex/pr-282-review-fixes into develop decouple ModelRunner metrics from runtime capture & refine documentation Created-by: jhon-117 Commit-by: jhon-117 Merged-by: ascend-robot Description: # PR Template Thanks for your contribution; we appreciate it a lot. The following instructions will make your pull request healthier and help you get feedback more easily. If you do not understand some items, don't worry, just make the pull request and seek help from maintainers. 感谢您的贡献，我们非常重视。以下说明将使您的拉取请求更健康，更易于获得反馈。如果您不理解某些项目，请不要担心，只需提交拉取请求并从维护人员那里寻求帮助即可。 PR Type / PR类型 - [ ] Feature（功能新增） - [x] Bugfix（Bug 修复） - [] Docs（文档更新） - [ ] CI/CD（持续集成/持续部署） - [ ] Refactor（代码重构） - [ ] Perf（性能优化） - [ ] Test-Cases（测试用例更新） - [ ] Other（其他） ## 🔍 Motivation / 变更动机 - Fixed jgong5's concern about metrics depending on the entire `Runtime`. - Replied that the `--device` argument is required for adapter simulation/verification flows. - Replied that adapter Python files are product modules reused outside the skill, so they should remain under `tensor_cast/adapter`. ------ ## 📝 Modification / 修改内容 - Remove the full `Runtime` object from `ModelRunnerMetrics`. - Add an explicit `runtime_observer` hook to `ModelRunner.run_inference`. - Update adapter actual-summary collection to use the hook instead of reading `metrics.runtime`. - Keep `--device` in `model_adapter` because doctor/verify builds `UserInputConfig` and may run target-device simulation or verification. - Keep adapter automation modules under `tensor_cast/adapter` because they are reused by CLI and regression tests, not only by the skill. ------ ## 📐 Associated Test Results / 关联测试结果 - `pytest tests/regression/tensor_cast/test_adapter_automation.py -q` - `38 passed` - `python -m compileall -q tensor_cast/adapter tensor_cast/core/model_runner.py cli/inference/model_adapter.py` - passed ------ ## 🌟 Use cases (Optional) / 使用案例（可选） If this PR introduces a new feature, it is better to list some use cases here and update the documentation. 如果此拉取请求引入了新功能，最好在此处列出一些用例并更新文档。 ------ ## ✅ Checklist / 检查列表 Before PR: - [ ] Bug fixes are fully covered by unit tests, the case that causes the bug should be added in the unit tests. / 修复的 Bug 已完全由单元测试覆盖，导致 Bug 的情况应在单元测试中添加。 - [ ] The modification is covered by complete unit tests. If not, please add more unit tests to ensure the correctness. / 此拉取请求中的修改已完全由单元测试覆盖。如果不是，请添加更多单元测试以确保正确性。 - [ ] All relevant documentation (API docs, docstrings, example tutorials) has been updated to reflect these changes. / 所有相关文档（API 文档、文档字符串、示例教程）已更新以反映这些更改。 - [ ] Please ensure code files contain no Chinese comments. / 请保证代码文件中不含中文注释。 ------ See merge request: Ascend/msmodeling!291	20 天前
compilation	fix(tensor_cast): guard unsafe sequence parallel rewrites Co-authored-by: Horacehxw<horacehxw@gmail.com> # message auto-generated for no-merge-commit merge: !326 merge issue-90-v2 into develop fix(tensor_cast): guard unsafe sequence parallel rewrites Created-by: Horacehxw Commit-by: Horacehxw Merged-by: ascend-robot Description: PR Type / PR类型 - [ ] Feature（功能新增） - [x] Bugfix（Bug 修复） - [ ] Docs（文档更新） - [ ] Refactor（代码重构） - [x] Test-Cases（测试用例更新） - [ ] Other（其他） ## 🔍 Motivation / 变更动机 Fix GitCode issue #90 root cause in TensorCast Sequence Parallel. GLM5 TP>1 compile graphs can match `all_reduce -> add_rms_norm2` while the residual side is still full-shape. Rewriting only the all-reduce side to `reduce_scatter` creates mixed full/local inputs and may later double-expand the sequence dimension through `all_gather`, causing compile-time reshape failures such as `shape '[1, 128, 6144]' is invalid for input of size 1572864`. The pass also lacked a shardability precheck, so non-divisible sequence lengths such as `query_length=127, TP=2` reached the fake `reduce_scatter` exact-division assertion. This PR is intentionally scoped to the TensorCast SP pass root cause and focused regression tests. Throughput optimizer / ServingCast CLI exposure is not included in this PR. ------ ## 📝 Modification / 修改内容 Final diff only changes 2 files: - `tensor_cast/compilation/passes/sequence_parallel_pass.py` - `tests/regression/tensor_cast/test_sp_pass_unit.py` Main changes: - Add shape/provenance helpers for SP-local values and expected `reduce_scatter` output shape. - Guard P2 so `add_rms_norm2` rewrites only happen when the non-communication input is proven SP-local and shape-compatible with the `reduce_scatter` result. - Mark successfully localized `add_rms_norm2` nodes with `tensor_cast_sp_local`. - Allow P3 to consume only residuals from `add_rms_norm2` nodes already localized by P2. - Add a shardability check before SP rewrites to skip non-divisible shard dimensions instead of reaching `exact_division` assertions. - Share reduce-scatter insertion logic across P1/P2/P3, including view repair for 2-D/3-D metadata mismatches. - Add focused regression coverage for unsafe GLM5-style P2 residuals, all-gathered full residuals, P3 tail skips, non-divisible shard dimensions, and existing local Qwen3-style SP paths. ------ ## 📐 Associated Test Results / 关联测试结果 Environment timestamp: `2026-06-11 12:52 +08`, local worktree `issue-90-v2`, commit `e29c4b49fd697d0472974ddc91aa0b40f0737a97`. ### UT / Regression - [x] `python -m pytest tests/regression/tensor_cast/test_sp_pass_unit.py -q` - Result: `32 passed in 0.06s` - [x] `python -m pytest tests/regression/tensor_cast/test_sequence_parallel_pass.py -q -m nightly` - Result: `2 passed in 2.48s` - [x] `python -m pytest tests/regression/tensor_cast -q` - Result: `605 passed, 125 deselected, 13 warnings, 150 subtests passed in 310.83s (0:05:10)` - [x] `python -m py_compile tensor_cast/compilation/passes/sequence_parallel_pass.py tests/regression/tensor_cast/test_sp_pass_unit.py` - Result: passed - [x] `python -m ruff check tensor_cast/compilation/passes/sequence_parallel_pass.py tests/regression/tensor_cast/test_sp_pass_unit.py` - Result: `All checks passed!` - [x] `git diff --check gitcode-ascend/develop...HEAD` - Result: passed ### Issue #90 Direct Repro And Bad Cases All commands below exited with code 0. Log scan found no old failure signatures: no `BackendCompilerFailed`, no old invalid reshape, no `AssertionError`, no `RuntimeError`, no `TypeError: cannot pickle`. - [x] GLM5 q=128, TP=2, compile, SP, W8A8_DYNAMIC, 1 layer - Command: `python -X utf8 -m tensor_cast.scripts.text_generate zai-org/GLM-5 --device ATLAS_800_A3_752T_128G_DIE --num-queries 16 --query-length 128 --context-length 0 --compile --world-size 32 --tp-size 2 --dp-size 16 --ep-size 32 --moe-tp-size 1 --moe-dp-size 1 --quantize-linear-action W8A8_DYNAMIC --quantize-attention-action DISABLED --enable-sequence-parallel --num-hidden-layers-override 1 --log-level debug` - SP log: `SP ordered rewrites: 0 P1, 0 P2 matches`; `SP ordered rewrites: 0 P3 matches` - Result: `[analytic] Execution time: 0.001284 s`, `TPS/Device: 4.984e+04 token/s` - [x] GLM5 q=128, TP=2, compile, SP, linear quant disabled, 1 layer - SP log: `SP ordered rewrites: 0 P1, 0 P2 matches`; `SP ordered rewrites: 0 P3 matches` - Result: `[analytic] Execution time: 0.001468 s`, `TPS/Device: 4.359e+04 token/s` - [x] GLM5 q=127, TP=2, compile, SP, linear quant disabled, 1 layer - SP log: `SP pass: skipping because shard dimension is not divisible by 2` - Result: `[analytic] Execution time: 0.001468 s`, `TPS/Device: 4.326e+04 token/s` - [x] GLM5 q=128, TP=2, compile, SP + DFC, W8A8_DYNAMIC, 1 layer - SP log: `SP ordered rewrites: 0 P1, 0 P2 matches`; `SP ordered rewrites: 0 P3 matches` - Result: `[analytic] Execution time: 0.001284 s`, `TPS/Device: 4.984e+04 token/s` - [x] GLM5 q=128, TP=2, compile, DFC on, SP off, W8A8_DYNAMIC, 1 layer - Result: `[analytic] Execution time: 0.001284 s`, `TPS/Device: 4.984e+04 token/s` - [x] GLM5 q=5120, TP=2, compile, SP, W8A8_DYNAMIC, 1 layer - SP log: `SP ordered rewrites: 0 P1, 0 P2 matches`; `SP ordered rewrites: 0 P3 matches` - Result: `[analytic] Execution time: 0.009156 s`, `TPS/Device: 2.796e+05 token/s` - [x] GLM5 q=5120, TP=2, compile, SP off, W8A8_DYNAMIC, 1 layer - Result: `[analytic] Execution time: 0.009156 s`, `TPS/Device: 2.796e+05 token/s` - [x] GLM5 q=5120, TP=1, compile, SP on, W8A8_DYNAMIC, 1 layer - Result: `[analytic] Execution time: 0.015624 s`, `TPS/Device: 1.639e+05 token/s` - [x] GLM5 q=5120, TP=2, compile off, SP on, W8A8_DYNAMIC, 1 layer - Result: `[analytic] Execution time: 0.013055 s`, `TPS/Device: 1.961e+05 token/s` - [x] Qwen3-32B q=128, TP=2, compile, SP, 1 layer - SP log: `SP ordered rewrites: 1 P1, 1 P2 matches`; `SP ordered rewrites: 1 P3 matches` - Result: `[analytic] Execution time: 0.001414 s`, `TPS/Device: 4.527e+04 token/s` ### Qwen3 Prefill Performance Non-Regression - [x] Qwen3-32B prefill, `query-length=4112`, TP=16, compile, profiling model, SP on - Command uses profiling database: `tensor_cast/performance_model/profiling_database/data/ATLAS_800_A3_752T_128G_DIE/vllm_ascend/vllm0.18.0_torch2.9.0_cann8.5` - Mapping kept as expected: - `tensor_cast.reduce_scatter.default->hcom_reduceScatter_ (x2)` - `tensor_cast.all_gather.default->hcom_allGather_ (x2)` - Coverage: `Simulated Latency Coverage: 99.4% (2.474ms / 2.490ms)` - Metrics JSON: - M1 raw op count HR: `96.43%` - M2 fused op HR: `92.00%` - M3 fused op HR excluding zero-cost: `81.82%` - M4 per-shape HR: `89.47%` - M5 simulated latency coverage: `99.38%` - Result: `[empirical] Execution time: 0.154645 s` ------ ## ✅ Checklist / 检查列表 - [x] Linting tools used / 使用 lintrunner 工具 - [x] Bug fixes covered by unit tests / 修复的 Bug 已由单元测试覆盖 - [x] Modification covered by unit tests / 修改已由单元测试覆盖 - [ ] Documentation updated / 文档已更新 - [x] No Chinese comments in code files / 代码文件中不含中文注释 See merge request: Ascend/msmodeling!326	13 天前
core	refactor(tensor_cast): unify word embedding tp config Co-authored-by: Kudo__shinichi<liuning119@huawei.com> # message auto-generated for no-merge-commit merge: !344 merge codex/word-embedding-tp-normalize into develop refactor(tensor_cast): unify word embedding tp config Created-by: Kudo__shinichi Commit-by: Kudo__shinichi Merged-by: ascend-robot Description: # PR Template Thanks for your contribution; we appreciate it a lot. The following instructions will make your pull request healthier and help you get feedback more easily. If you do not understand some items, don't worry, just make the pull request and seek help from maintainers. 感谢您的贡献，我们非常重视。以下说明将使您的拉取请求更健康，更易于获得反馈。如果您不理解某些项目，请不要担心，只需提交拉取请求并从维护人员那里寻求帮助即可。 PR Type / PR类型 - [ ] Feature（功能新增） - [ ] Bugfix（Bug 修复） - [x] Docs（文档更新） - [ ] CI/CD（持续集成/持续部署） - [x] Refactor（代码重构） - [ ] Perf（性能优化） - [x] Test-Cases（测试用例更新） - [ ] Other（其他） ## 🔍 Motivation / 变更动机 `word_embedding_tp` and `word_embedding_tp_mode` represented the same configuration concept in two fields: one field toggled word embedding TP, and the other selected the TP mode. This PR reduces the public and internal configuration shape to a single parameter so users only need to configure `word_embedding_tp` as disabled, `col`, or `row`. ------ ## 📝 Modification / 修改内容 - Make `UserInputConfig.word_embedding_tp` the single nullable word embedding TP mode field. - Remove `word_embedding_tp_mode` and `embedding_parallel_mode` from the config model. - Pass the normalized `word_embedding_tp` mode directly into `ParallelConfig.embedding_parallel` and the embedding transformation. - Keep legacy bool input normalization for compatibility: `True -> col`, `False/None -> disabled`. - Remove redundant CLI-side bool/mode conversion and update related benchmark cases and user guide docs. - Add regression coverage for single-field config, legacy bool normalization, and invalid `word_embedding_tp` values. ------ ## 📐 Associated Test Results / 关联测试结果 - `python -m pytest tests/regression/tensor_cast/test_user_config.py -q`: 6 passed - `python -m pytest tests/regression/tensor_cast/test_user_config.py tests/regression/web_ui/test_command_builder.py tests/regression/tensor_cast/test_adapter_automation.py -q`: 98 passed - `python -m pytest tests/regression/tensor_cast/test_text_generate.py -k word_embedding_parallel -q`: 2 passed, 113 deselected - `python -m pytest tests/regression/tensor_cast/test_sequence_parallel_pass.py -o addopts= -m "nightly and not npu and not network" -q`: 2 passed - `python -m pytest tests/benchmark/models/test_model_regression.py --collect-only -q`: 15 tests collected - `python -m ruff check <changed python files>`: All checks passed - `python -m pre_commit run --from-ref origin/develop --to-ref HEAD`: passed - `git diff --check HEAD~1 HEAD`: passed ------ ## 🌟 Use cases (Optional) / 使用案例（可选） - Disable word embedding TP: `word_embedding_tp=None` - Enable column mode: `word_embedding_tp="col"` - Enable row mode: `word_embedding_tp="row"` - CLI usage: `--word-embedding-tp col` or `--word-embedding-tp row` ------ ## ✅ Checklist / 检查列表 Before PR: - [x] Bug fixes are fully covered by unit tests, the case that causes the bug should be added in the unit tests. / 修复的 Bug 已完全由单元测试覆盖，导致 Bug 的情况应在单元测试中添加。 - [x] The modification is covered by complete unit tests. If not, please add more unit tests to ensure the correctness. / 此拉取请求中的修改已完全由单元测试覆盖。如果不是，请添加更多单元测试以确保正确性。 - [x] All relevant documentation (API docs, docstrings, example tutorials) has been updated to reflect these changes. / 所有相关文档（API 文档、文档字符串、示例教程）已更新以反映这些更改。 - [x] Please ensure code files contain no Chinese comments. / 请保证代码文件中不含中文注释。 ------ See merge request: Ascend/msmodeling!344	12 天前
custom_model	Supports a plugin-based mechanism for custom model Co-authored-by: HongMaoShuiGuai<1120200577@qq.com> Co-authored-by: genius52<taochengcheng@h-partners.com> # message auto-generated for no-merge-commit merge: !61 merge custom_model into develop Supports a plugin-based mechanism for custom model Created-by: genius52 Commit-by: genius52;HongMaoShuiGuai Merged-by: ascend-robot Description: Thanks for your contribution; we appreciate it a lot. The following instructions will make your pull request healthier and help you get feedback more easily. If you do not understand some items, don't worry, just make the pull request and seek help from maintainers. 感谢您的贡献，我们非常重视。以下说明将使您的拉取请求更健康，更易于获得反馈。如果您不理解某些项目，请不要担心，只需提交拉取请求并从维护人员那里寻求帮助即可。 PR Type / PR类型 - [x] Feature（功能新增） - [ ] Bugfix（Bug 修复） - [ ] Docs（文档更新） - [ ] CI/CD（持续集成/持续部署） - [x] Refactor（代码重构） - [ ] Perf（性能优化） - [ ] Test-Cases（测试用例更新） - [ ] Other（其他） ## 🔍 Motivation / 变更动机 Please describe the motivation of this PR and the goal you want to achieve through this PR. 请描述您的拉取请求的动机和您希望通过此拉取请求实现的目标。为提升框架的扩展性与易用性，本次提交引入模型插件化机制，支持用户在不修改框架核心代码的前提下，通过独立文件注册自定义模型、转换逻辑与执行流水线，实现新模型的灵活接入与扩展，大幅降低适配成本，提升架构可维护性。 ------ ## 📝 Modification / 修改内容 Please briefly describe what modification is made in this PR. 请简要描述此拉取请求中进行的修改。引入模型插件化机制，支持通过注册器在不修改核心代码的情况下扩展新模型；新增完整的转换流水线与阶段执行体系，实现模型包装、补丁、量化、分片等流程的灵活自定义 ------ ## 📐 Associated Test Results / 关联测试结果 Please provide the related test results, such as test reports, etc. 请提供相关测试结果，例如测试报告等。 ![图像2026-3-4 15.40.png](https://raw.gitcode.com/user-images/assets/8428112/bf296ec7-3f30-4949-bb5d-86f432749ff8/图像2026-3-4_15.40.png '图像2026-3-4 15.40.png') ------ ## 🌟 Use cases (Optional) / 使用案例（可选） If this PR introduces a new feature, it is better to list some use cases here and update the documentation. 如果此拉取请求引入了新功能，最好在此处列出一些用例并更新文档。 ------ ## ✅ Checklist / 检查列表 Before PR: - [ ] [Linting tools](https://gitcode.com/Ascend/msmodeling/blob/develop/tensor_cast/README.md#coding-style) are used to fix the potential lint issues. / 使用 [lintrunner 工具](https://gitcode.com/Ascend/msmodeling/blob/develop/tensor_cast/README.md#coding-style) 来修复潜在的 lint 问题。 - [ ] Bug fixes are fully covered by unit tests, the case that causes the bug should be added in the unit tests. / 修复的 Bug 已完全由单元测试覆盖，导致 Bug 的情况应在单元测试中添加。 - [ ] The modification is covered by complete unit tests. If not, please add more unit tests to ensure the correctness. / 此拉取请求中的修改已完全由单元测试覆盖。如果不是，请添加更多单元测试以确保正确性。 - [ ] All relevant documentation (API docs, docstrings, example tutorials) has been updated to reflect these changes. / 所有相关文档（API 文档、文档字符串、示例教程）已更新以反映这些更改。 - [ ] Please ensure code files contain no Chinese comments. / 请保证代码文件中不含中文注释。 ------ See merge request: Ascend/msmodeling!61	3 个月前
device_profiles	docs: refine tensorcast quick starts and results Co-authored-by: minghang_c<chiminghang@h-partners.com> # message auto-generated for no-merge-commit merge: !116 merge docs/tensorcast-results into develop docs: refine tensorcast quick starts and results Created-by: minghang_c Commit-by: minghang_c Merged-by: ascend-robot Description: Thanks for your contribution; we appreciate it a lot. The following instructions will make your pull request healthier and help you get feedback more easily. If you do not understand some items, don't worry, just make the pull request and seek help from maintainers. 感谢您的贡献，我们非常重视。以下说明将使您的拉取请求更健康，更易于获得反馈。如果您不理解某些项目，请不要担心，只需提交拉取请求并从维护人员那里寻求帮助即可。 PR Type / PR类型 - [ ] Feature（功能新增） - [ ] Bugfix（Bug 修复） - [x] Docs（文档更新） - [ ] CI/CD（持续集成/持续部署） - [ ] Refactor（代码重构） - [ ] Perf（性能优化） - [ ] Test-Cases（测试用例更新） - [ ] Other（其他） ## 🔍 Motivation / 变更动机 Please describe the motivation of this PR and the goal you want to achieve through this PR. 请描述您的拉取请求的动机和您希望通过此拉取请求实现的目标。 Make the TensorCast docs easier to follow for first‑time users by putting the context up front and showing clearer, more complete output examples. ------ ## 📝 Modification / 修改内容 Please briefly describe what modification is made in this PR. 请简要描述此拉取请求中进行的修改。 Summary: - Move Introduction to the top, then Installation, then Quick Starts - Expand Result examples with more real output lines for text/video generation - Document the supported device profiles and provide a proper custom device guide link Files Changed: - docs/en/tensor_cast_instruct.md - tensor_cast/device_profiles/README.md ------ ## 📐 Associated Test Results / 关联测试结果 Please provide the related test results, such as test reports, etc. 请提供相关测试结果，例如测试报告等。 Not run (docs-only changes) ------ ## 🌟 Use cases (Optional) / 使用案例（可选） If this PR introduces a new feature, it is better to list some use cases here and update the documentation. 如果此拉取请求引入了新功能，最好在此处列出一些用例并更新文档。 ------ ## ✅ Checklist / 检查列表 Before PR: - [ ] [Linting tools](https://gitcode.com/Ascend/msmodeling/blob/develop/tensor_cast/README.md#coding-style) are used to fix the potential lint issues. / 使用 [lintrunner 工具](https://gitcode.com/Ascend/msmodeling/blob/develop/tensor_cast/README.md#coding-style) 来修复潜在的 lint 问题。 - [ ] Bug fixes are fully covered by unit tests, the case that causes the bug should be added in the unit tests. / 修复的 Bug 已完全由单元测试覆盖，导致 Bug 的情况应在单元测试中添加。 - [ ] The modification is covered by complete unit tests. If not, please add more unit tests to ensure the correctness. / 此拉取请求中的修改已完全由单元测试覆盖。如果不是，请添加更多单元测试以确保正确性。 - [ ] All relevant documentation (API docs, docstrings, example tutorials) has been updated to reflect these changes. / 所有相关文档（API 文档、文档字符串、示例教程）已更新以反映这些更改。 - [ ] Please ensure code files contain no Chinese comments. / 请保证代码文件中不含中文注释。 ------ See merge request: Ascend/msmodeling!116	2 个月前
diffusers	chore(ci): adopt pre-commit and retire legacy lintrunner adapters Co-authored-by: liujiawang<anonymousdev@163.com> # message auto-generated for no-merge-commit merge: !176 merge pre-commit into develop chore(ci): adopt pre-commit and retire legacy lintrunner adapters Created-by: AvadaKedavrua Commit-by: liujiawang;AvadaKedavrua Merged-by: ascend-robot Description: Thanks for your contribution; we appreciate it a lot. The following instructions will make your pull request healthier and help you get feedback more easily. If you do not understand some items, don't worry, just make the pull request and seek help from maintainers. 感谢您的贡献，我们非常重视。以下说明将使您的拉取请求更健康，更易于获得反馈。如果您不理解某些项目，请不要担心，只需提交拉取请求并从维护人员那里寻求帮助即可。 PR Type / PR类型 - [ ] Feature（功能新增） - [ ] Bugfix（Bug 修复） - [x] Docs（文档更新） - [x] CI/CD（持续集成/持续部署） - [ ] Refactor（代码重构） - [ ] Perf（性能优化） - [ ] Test-Cases（测试用例更新） - [ ] Other（其他） ------ ## Motivation / 变更动机 Continue the pre-commit migration: tighten Pylint so only high-signal messages run (`disable=all` + explicit `enable` list), fix real issues that remained under that profile, and translate hook/config comments to English. ------ ## Configuration changes（仅工具与注释 / tooling & comments only） \| Path \| What changed \| \|------\|----------------\| \| `pre-commit/pyproject.toml` \| Pylint: `[tool.pylint."messages control"]` with `disable = ["all"]` and a short allowlist of message IDs (E0100, E0601–E0611, E0632, E1101, E1120, W0632, W1514). Ruff: unchanged behavior; comments translated to English. Bandit: comments translated; rule allowlist/skip lists unchanged. \| \| `.pre-commit-config.yaml` \| Comments translated to English; Bandit hook display name set to bandit (Python security checks). Hook versions and args unchanged except for comment text. \| ------ ## Source code changes（应用代码 / application code） \| Area \| Files \| Purpose \| \|------\|--------\|---------\| \| `serving_cast` \| `communication.py`, `engine.py`, `instance.py`, `kv_cache_manager.py`, `load_gen.py`, `main.py`, `model_runner.py`, `request.py`, `serving.py`, `utils.py` \| Replace `from . import stime` with `import serving_cast.stime as stime` so Pylint resolves imports (fixes E0611). \| \| `serving_cast` \| `stime.py` \| Singleton salabim `Environment` via `_get_sim_env()` so type checkers/Pylint see `sim.Environment` (fixes E1101 on `SimulationEnv`). \| \| `serving_cast/service` \| `base_throughput_optimizer.py` \| `__init__` defaults + `assert runner is not None` before `run_inference` (fixes E1101 on base class). \| \| `tensor_cast` \| `diffusers/diffusers_model.py`, `diffusers/diffusers_utils.py`, `runtime.py` \| Add `encoding="utf-8"` to `open()` / trace export (fixes W1514). \| \| `web_ui` \| `callbacks.py` \| `refresh_optimizer_detail`: call `_optimizer_detail_view(rows, None, device)` and unpack five return values (fixes E1120). \| ------ ## Recent commits on `pre-commit` branch - `ci(pre-commit): fix pylint message selection with disable=all` - `fix: resolve pylint findings in serving_cast, tensor_cast, and web_ui` - `docs(pre-commit): translate comments to English and add all-files run log` ------ ![](https://raw.atomgit.com/Ascend/msmodeling/attachment/uploads/b22b18aa-4c84-4dc0-85f5-1e7e0715350e/pre-commit-all-files-run.svg) ------ ## Checklist / 检查列表 - [x] Please ensure code files contain no Chinese comments. / 请保证代码文件中不含中文注释。 See merge request: Ascend/msmodeling!176	1 个月前
layers	fix(tensor_cast): support GLM5 DSA tuple returns Co-authored-by: minghang_c<chiminghang@h-partners.com> # message auto-generated for no-merge-commit merge: !332 merge glm5-transformers-fix into develop fix(tensor_cast): support GLM5 DSA tuple returns Created-by: minghang_c Commit-by: minghang_c Merged-by: ascend-robot Description: ## 背景在 GLM-5 (`glm_moe_dsa`) / GLM-5.1 模型上执行 TensorCast 推理建模时，原始问题会在 decoder layer 返回值解包处失败： `bash python -m cli.inference.text_generate zai-org/GLM-5 \ --device ATLAS_800_A3_752T_128G_DIE \ --num-devices 16 \ --tp-size 16 \ --dp-size 1 \ --ep-size 16 \ --context-length 0 \ --query-length 3500 \ --num-queries 1 \ --compile \ --quantize-linear-action W4A8_STATIC \ --dump-input-shapes` 错误表现为 tuple 返回值数量不匹配： `text ValueError: not enough values to unpack (expected 3, got 2)` 修复 attention 返回协议后，repetition copy layer 路径还会暴露 decoder layer 返回值数量不匹配： `text ValueError: not enough values to unpack (expected 2, got 1)` 在 GLM-5.1 开启 MTP 时还会暴露两个 MTP 适配问题： `bash python -m cli.inference.text_generate zai-org/GLM-5.1 \ --device ATLAS_800_A3_752T_128G_DIE \ --num-devices 16 \ --tp-size 16 \ --dp-size 1 \ --ep-size 16 \ --context-length 0 \ --query-length 3500 \ --num-queries 1 \ --num-mtp-tokens 3 \ --compile \ --quantize-linear-action W4A8_STATIC \ --dump-input-shapes` 第一处是 synthetic MTP layer 使用 `layer_idx >= num_hidden_layers` 时访问 GLM DSA per-layer config 越界： `text IndexError: list index out of range # config.indexer_types[layer_idx]` 第二处是 GLM DSA decoder block 返回 tuple，而 MTP 通用流程期望继续处理 tensor： ```text torch._dynamo.exc.Unsupported: Dynamo does not know how to trace method` index_select `of class` tuple ` `## 根因 GLM-5 / GLM-5.1 的 HuggingFace decoder layer 有 DSA sparse attention 的跨层 top-k 传递协议： - attention 返回值协议是三元组：`(attn_output, attn_weights, topk_indices) `- decoder layer 返回值协议是二元组：`(hidden_states, topk_indices) `TensorCast 在模型转换过程中会： 1. 使用` mla_module_class_type `将 HF` GlmMoeDsaAttention `替换为 TensorCast sparse attention 实现； 2. 在 repetition 优化中，用` RegionMarkerWrapper `包裹代表层，并用` CopyLayerWrapper `替换后续重复层； 3. 开启 MTP 时，基于 decoder layer class 构造 synthetic MTP layers。原来的通用实现没有完整保留 GLM DSA 相关返回值和 per-layer config 协议： -` DeepseekSparseAttention `只返回` (attn_output, attn_weights)`，但 GLM decoder 期望 attention 返回 3 个值； -` CopyLayerWrapper `对 tuple 返回只构造` (hidden_states,)`，但 GLM decoder layer 期望 repeated layer 也返回 2 个值； -` maybe_enable_mtp() `只扩展了` layer_types `/` mlp_layer_types`，但没有扩展 GLM DSA 专用的` indexer_types`； -` MultiTokenPredictorLayer `没有处理 MTP block 返回 tuple 的模型族。因此问题本质是：TensorCast wrapper/replacement/MTP synthetic layer 没有完整保持被替换 HF 模块的 return contract 和 per-layer config contract。 ## 改动点 ### 1. 增加 GLM 专用 sparse attention wrapper 新增` tensor_cast/layers/glm5.py`：` `python class Glm5SparseAttention(DeepseekSparseAttention): def forward(self, args, kwargs): attn_output, attn_weights = super().forward(args, *kwargs) return attn_output, attn_weights, None` `并将` tensor_cast/transformers/builtin_model/glm5.py `中 GLM profile 的` mla_module_class_type `从通用` DeepseekSparseAttention `切换为` Glm5SparseAttention`。这样 GLM 的三元组 attention 返回协议只在 GLM adapter 层处理，不改变通用` DeepseekSparseAttention`，避免影响其他 built-in 模型。这里没有修改` tests/.ci/gate_policy.yaml`：`builtin_model `路径在 coverage 配置里被 omit，直接把新增实现放在` builtin_model/glm5.py `会导致新增测试无法生成 test_map；因此将可测的 wrapper 放到` tensor_cast/layers/glm5.py`，让 CI gate 可以通过正常 coverage/test_map 关联到` tests/regression/tensor_cast/test_glm5.py`。 ### 2. 让 repetition copy wrapper 保持代表层 tuple 长度在` tensor_cast/layers/internal.py `中： -` RegionMarkerWrapper `记录代表层真实返回 tuple 长度； -` CopyLayerWrapper `根据代表层返回长度补齐` None`，使 copy layer 的 tuple arity 与代表层一致。这个改动不包含 GLM 专属字段判断，例如不读取` prev_topk_indices`。它只保证通用 wrapper 的返回结构长度与代表层一致。对于 GLM，被 copy 的 decoder layer 会返回` (hidden_states, None)`，下一层如果收到` prev_topk_indices=None`，会按 HF 原逻辑重新计算 top-k，因此语义安全。 ### 3. 补齐 GLM DSA MTP per-layer config 在` tensor_cast/transformers/transformations.py `中，开启 MTP 时像` layer_types `/` mlp_layer_types `一样扩展` indexer_types`：` `python if hasattr(hf_config, "indexer_types") and isinstance(hf_config.indexer_types, list) and hf_config.indexer_types: hf_config.indexer_types.extend([hf_config.indexer_types[-1]] mtp_config.num_mtp_layers)` `这样 synthetic MTP layer 的` layer_idx=78,79,80 `可以访问合法的 GLM DSA indexer type，避免` IndexError`。 ### 4. 让 MTP layer 兼容 tuple block 输出在` tensor_cast/layers/mtp.py `中，如果` mtp_block `返回 tuple，则取第一个元素作为后续 hidden states：` `python if isinstance(hidden_states, tuple): hidden_states = hidden_states[0]` `这与 decoder layer tuple 协议一致：第一个元素是` hidden_states`，后续元素是模型族特定的辅助返回值。 ### 5. 增加轻量回归测试新增/扩展` tests/regression/tensor_cast/test_glm5.py`，覆盖： -` Glm5SparseAttention.forward `将二元组 attention 输出补齐为 GLM decoder 需要的三元组； -` maybe_enable_mtp() `会扩展 GLM DSA` indexer_types`； -` MultiTokenPredictorLayer `会从 tuple MTP block 输出中取` hidden_states`。 ## 验证已验证 GLM adapter / MTP 回归测试和现有 repetition wrapper 测试通过：` `bash /home/minghang/workspace/msmodeling-upstream/.venv/bin/python -m pytest \ tests/regression/tensor_cast/test_glm5.py \ tests/regression/tensor_cast/test_repetition_wrappers.py -q` `结果：` `text 4 passed in 0.02s` `已验证 GLM-5.1 + MTP 原始失败命令可运行并完成性能统计输出：` `bash /home/minghang/workspace/msmodeling-upstream/.venv/bin/python -m cli.inference.text_generate zai-org/GLM-5.1 \ --device ATLAS_800_A3_752T_128G_DIE \ --num-devices 16 \ --tp-size 16 \ --dp-size 1 \ --ep-size 16 \ --context-length 0 \ --query-length 3500 \ --num-queries 1 \ --num-mtp-tokens 3 \ --compile \ --quantize-linear-action W4A8_STATIC \ --dump-input-shapes` `结果摘要：` `text Model compilation and execution time: 8.125 s Total time for analytic: 283.311ms [analytic] TPS/Device: 772.1 token/s` `已验证新增 layer 文件的符号可被 CI gate AST 逻辑识别：` `text top-level: ['Glm5SparseAttention'] spans: [('Glm5SparseAttention.forward', 5, 7)]` `## 影响范围 - GLM attention 返回协议的三元组适配限定在` tensor_cast/layers/glm5.py `的` Glm5SparseAttention `中； - 通用` DeepseekSparseAttention `未修改，避免影响其他 MLA/DSA 模型； -` CopyLayerWrapper `的改动是通用 tuple arity 保持逻辑，不引入 GLM 专属字段判断； -` maybe_enable_mtp() `只对存在` indexer_types `的 HF config 做 list 扩展，和已有` layer_types `/` mlp_layer_types `扩展逻辑一致； -` MultiTokenPredictorLayer `对 tuple block 输出取第一个元素，兼容 decoder layer 标准 tuple 返回协议； - 不修改` tests/.ci/gate_policy.yaml`，避免触发配置变更导致 CI gate 运行 full suite。 See merge request: Ascend/msmodeling!332	13 天前
ops	feat：仿真建模支持deepseek-V4模型适配 Co-authored-by: ChenHuiwen<chenhuiwen7@huawei.com> # message auto-generated for no-merge-commit merge: !166 merge deepseek-v4 into develop feat：仿真建模支持deepseek-V4模型适配 Created-by: ChenHuiwen Commit-by: ChenHuiwen Merged-by: ascend-robot Description: Thanks for your contribution; we appreciate it a lot. The following instructions will make your pull request healthier and help you get feedback more easily. If you do not understand some items, don't worry, just make the pull request and seek help from maintainers. 感谢您的贡献，我们非常重视。以下说明将使您的拉取请求更健康，更易于获得反馈。如果您不理解某些项目，请不要担心，只需提交拉取请求并从维护人员那里寻求帮助即可。 PR Type / PR类型 - [x] Feature（功能新增） - [ ] Bugfix（Bug 修复） - [ ] Docs（文档更新） - [ ] CI/CD（持续集成/持续部署） - [ ] Refactor（代码重构） - [ ] Perf（性能优化） - [ ] Test-Cases（测试用例更新） - [ ] Other（其他） ## 🔍 Motivation / 变更动机为 msmodeling/tensor_cast 增加对 DeepSeek V4 (Flash/Pro) 模型的端到端支持，使其性能建模流水线能够覆盖 V4 引入的稀疏注意力（NSA / Window / Compressed / Heavily-Compressed 多 layer-type 路由）、HC（Head Compression）混合、Sinkhorn 拆分以及 Hash Routing MoE 等新结构，并补齐对应的 fake-tensor 语义算子与代价模型，让 V4 模型可以直接走通现有 analytic / multistream tracing 流程。 ------ ## 📝 Modification / 修改内容新增文件 / New files - tensor_cast/transformers/builtin_model/deepseek_v4.py：DeepSeek V4 builtin model profile，包含 DeepseekV4Config / DeepseekV4Model 注册、layer-type 校验（{0, 4, 128} 对应 sliding_attention / compressed_sparse_attention / heavily_compressed_attention）、以及与 transformers AutoConfig / AutoModel 的安全注册逻辑。 - tests/test_tensor_cast/test_deepseek_v4.py 与 tests/test_tensor_cast/data/deepseek_v4/.json：V4 模型对应的测试数据集与用例（含合法/非法/缺失/截短的 ratios 配置）。注意力 / Attention（tensor_cast/layers/mla.py，tensor_cast/ops/mla.py，tensor_cast/ops/rotary_embedding.py） - 新增 DeepseekV4SparseAttention 与 MultiheadLatentAttentionTensorCast 适配（含 requires_legacy_kv_b_decomposition、KV-cache window 写入路径等）。 - 新增 get_window_topk_idxs / get_compress_topk_idxs 索引生成工具。 - 新增 HC 路径语义算子：hc_pre_inv_rms、hc_pre_sinkhorn，分别对应参考实现中的 inverse-RMS 缩放与 Sinkhorn 加权 reduction。 - 新增 scatter_nd_update_mla 等 KV 写入算子的代价模型，按参考实现仅计 source 行读 + 更新行写，不计 slot_mapping / 整 cache 张量。 MoE / Gate（tensor_cast/layers/moe_layer.py，tensor_cast/ops/fused_moe.py） - MoELayer 增加 V4 统一 gating 路径：识别 gate 上的 is_v4 / hash 标志位，按参考 Gate.forward 顺序发出 matmul + score func + indices + gather/normalize/route_scale 各算子，使每一步按其真实 dtype（gate matmul 走 fp32）单独计费。 - 新增 moe_gating_top_k（带可选 bias 的 V4 非 hash 层）与 moe_gating_top_k_hash（基于 tid2eid 表的 hash 路由层）两个语义算子。性能模型 / Performance Model（tensor_cast/performance_model/__init__.py） - 引入 _safe_max_int 工具：在 fake / meta / functional tensor 上 tensor.max().item() 不可用时回退为 None，让 caller 走 shape-based 估算。 - 注册 V4 新算子（scatter_nd_update_mla、HC 系列、MoE 新 gating tail 等）的 PerformanceProperties，与参考实现的内存访问语义对齐。其他 / Misc - tensor_cast/core/config_resolver.py、input_generator.py、model_runner.py、device.py、transformers/transformations.py、 transformers/custom_model_registry.py、layers/utils.py、model_config.py、compilation/passes/multistream_pass.py：补齐 V4 在 config 解析、输入构造、runner 调度、device profile、模型变换与算子注册各环节的接入。 ------ ## 📐 Associated Test Results / 关联测试结果 Please provide the related test results, such as test reports, etc.* 请提供相关测试结果，例如测试报告等。 ![image.png](https://raw.gitcode.com/user-images/assets/8428112/4dbd32d5-6f6d-4b84-a840-a06eec62fc40/image.png 'image.png') ![image.png](https://raw.gitcode.com/user-images/assets/8428112/fda50383-9b30-4453-bfd1-391889bebb47/image.png 'image.png') ------ ## 🌟 Use cases (Optional) / 使用案例（可选） If this PR introduces a new feature, it is better to list some use cases here and update the documentation. 如果此拉取请求引入了新功能，最好在此处列出一些用例并更新文档。 ------ ## ✅ Checklist / 检查列表 Before PR: - [ ] [Linting tools](https://gitcode.com/Ascend/msmodeling/blob/develop/tensor_cast/README.md#coding-style) are used to fix the potential lint issues. / 使用 [lintrunner 工具](https://gitcode.com/Ascend/msmodeling/blob/develop/tensor_cast/README.md#coding-style) 来修复潜在的 lint 问题。 - [ ] Bug fixes are fully covered by unit tests, the case that causes the bug should be added in the unit tests. / 修复的 Bug 已完全由单元测试覆盖，导致 Bug 的情况应在单元测试中添加。 - [ ] The modification is covered by complete unit tests. If not, please add more unit tests to ensure the correctness. / 此拉取请求中的修改已完全由单元测试覆盖。如果不是，请添加更多单元测试以确保正确性。 - [ ] All relevant documentation (API docs, docstrings, example tutorials) has been updated to reflect these changes. / 所有相关文档（API 文档、文档字符串、示例教程）已更新以反映这些更改。 - [ ] Please ensure code files contain no Chinese comments. / 请保证代码文件中不含中文注释。 ------ See merge request: Ascend/msmodeling!166	20 天前
performance_model	【feat】Add operator bound breakdown reporting to text_generate Co-authored-by: lutean<lutean1@huawei.com> # message auto-generated for no-merge-commit merge: !246 merge develop into develop 【feat】Add operator bound breakdown reporting to text_generate Created-by: lutean Commit-by: lutean Merged-by: ascend-robot Description: # PR Template Thanks for your contribution; we appreciate it a lot. The following instructions will make your pull request healthier and help you get feedback more easily. If you do not understand some items, don't worry, just make the pull request and seek help from maintainers. 感谢您的贡献，我们非常重视。以下说明将使您的拉取请求更健康，更易于获得反馈。如果您不理解某些项目，请不要担心，只需提交拉取请求并从维护人员那里寻求帮助即可。 PR Type / PR类型 - [x] Feature（功能新增） - [ ] Bugfix（Bug 修复） - [ ] Docs（文档更新） - [ ] CI/CD（持续集成/持续部署） - [ ] Refactor（代码重构） - [ ] Perf（性能优化） - [ ] Test-Cases（测试用例更新） - [ ] Other（其他） ## 🔍 Motivation / 变更动机 Please describe the motivation of this PR and the goal you want to achieve through this PR. 请描述您的拉取请求的动机和您希望通过此拉取请求实现的目标。用户/开发者在使用text_generate时，不管是定位问题还是分析结果合理性，都需要获取该算子的bound信息，当前该信息只能通过--chrome-trace打印查看。现增加--dump-op-bound-results参数，若开启，增加每个算子的通信、计算、访存占比。 ------ ## 📝 Modification / 修改内容 Please briefly describe what modification is made in this PR. 请简要描述此拉取请求中进行的修改。 ------ ## 📐 Associated Test Results / 关联测试结果 Please provide the related test results, such as test reports, etc. 请提供相关测试结果，例如测试报告等。 ![image.png](https://raw.gitcode.com/user-images/assets/8428112/419a7ee2-878a-4eec-8ec5-2ab031928e66/image.png 'image.png') ------ ## 🌟 Use cases (Optional) / 使用案例（可选） If this PR introduces a new feature, it is better to list some use cases here and update the documentation. 如果此拉取请求引入了新功能，最好在此处列出一些用例并更新文档。 ------ ## ✅ Checklist / 检查列表 Before PR: - [ ] Bug fixes are fully covered by unit tests, the case that causes the bug should be added in the unit tests. / 修复的 Bug 已完全由单元测试覆盖，导致 Bug 的情况应在单元测试中添加。 - [ ] The modification is covered by complete unit tests. If not, please add more unit tests to ensure the correctness. / 此拉取请求中的修改已完全由单元测试覆盖。如果不是，请添加更多单元测试以确保正确性。 - [ ] All relevant documentation (API docs, docstrings, example tutorials) has been updated to reflect these changes. / 所有相关文档（API 文档、文档字符串、示例教程）已更新以反映这些更改。 - [ ] Please ensure code files contain no Chinese comments. / 请保证代码文件中不含中文注释。 ------ See merge request: Ascend/msmodeling!246	12 天前
scripts	refactor(tensor_cast): unify word embedding tp config Co-authored-by: Kudo__shinichi<liuning119@huawei.com> # message auto-generated for no-merge-commit merge: !344 merge codex/word-embedding-tp-normalize into develop refactor(tensor_cast): unify word embedding tp config Created-by: Kudo__shinichi Commit-by: Kudo__shinichi Merged-by: ascend-robot Description: # PR Template Thanks for your contribution; we appreciate it a lot. The following instructions will make your pull request healthier and help you get feedback more easily. If you do not understand some items, don't worry, just make the pull request and seek help from maintainers. 感谢您的贡献，我们非常重视。以下说明将使您的拉取请求更健康，更易于获得反馈。如果您不理解某些项目，请不要担心，只需提交拉取请求并从维护人员那里寻求帮助即可。 PR Type / PR类型 - [ ] Feature（功能新增） - [ ] Bugfix（Bug 修复） - [x] Docs（文档更新） - [ ] CI/CD（持续集成/持续部署） - [x] Refactor（代码重构） - [ ] Perf（性能优化） - [x] Test-Cases（测试用例更新） - [ ] Other（其他） ## 🔍 Motivation / 变更动机 `word_embedding_tp` and `word_embedding_tp_mode` represented the same configuration concept in two fields: one field toggled word embedding TP, and the other selected the TP mode. This PR reduces the public and internal configuration shape to a single parameter so users only need to configure `word_embedding_tp` as disabled, `col`, or `row`. ------ ## 📝 Modification / 修改内容 - Make `UserInputConfig.word_embedding_tp` the single nullable word embedding TP mode field. - Remove `word_embedding_tp_mode` and `embedding_parallel_mode` from the config model. - Pass the normalized `word_embedding_tp` mode directly into `ParallelConfig.embedding_parallel` and the embedding transformation. - Keep legacy bool input normalization for compatibility: `True -> col`, `False/None -> disabled`. - Remove redundant CLI-side bool/mode conversion and update related benchmark cases and user guide docs. - Add regression coverage for single-field config, legacy bool normalization, and invalid `word_embedding_tp` values. ------ ## 📐 Associated Test Results / 关联测试结果 - `python -m pytest tests/regression/tensor_cast/test_user_config.py -q`: 6 passed - `python -m pytest tests/regression/tensor_cast/test_user_config.py tests/regression/web_ui/test_command_builder.py tests/regression/tensor_cast/test_adapter_automation.py -q`: 98 passed - `python -m pytest tests/regression/tensor_cast/test_text_generate.py -k word_embedding_parallel -q`: 2 passed, 113 deselected - `python -m pytest tests/regression/tensor_cast/test_sequence_parallel_pass.py -o addopts= -m "nightly and not npu and not network" -q`: 2 passed - `python -m pytest tests/benchmark/models/test_model_regression.py --collect-only -q`: 15 tests collected - `python -m ruff check <changed python files>`: All checks passed - `python -m pre_commit run --from-ref origin/develop --to-ref HEAD`: passed - `git diff --check HEAD~1 HEAD`: passed ------ ## 🌟 Use cases (Optional) / 使用案例（可选） - Disable word embedding TP: `word_embedding_tp=None` - Enable column mode: `word_embedding_tp="col"` - Enable row mode: `word_embedding_tp="row"` - CLI usage: `--word-embedding-tp col` or `--word-embedding-tp row` ------ ## ✅ Checklist / 检查列表 Before PR: - [x] Bug fixes are fully covered by unit tests, the case that causes the bug should be added in the unit tests. / 修复的 Bug 已完全由单元测试覆盖，导致 Bug 的情况应在单元测试中添加。 - [x] The modification is covered by complete unit tests. If not, please add more unit tests to ensure the correctness. / 此拉取请求中的修改已完全由单元测试覆盖。如果不是，请添加更多单元测试以确保正确性。 - [x] All relevant documentation (API docs, docstrings, example tutorials) has been updated to reflect these changes. / 所有相关文档（API 文档、文档字符串、示例教程）已更新以反映这些更改。 - [x] Please ensure code files contain no Chinese comments. / 请保证代码文件中不含中文注释。 ------ See merge request: Ascend/msmodeling!344	12 天前
transformers	refactor(tensor_cast): unify word embedding tp config Co-authored-by: Kudo__shinichi<liuning119@huawei.com> # message auto-generated for no-merge-commit merge: !344 merge codex/word-embedding-tp-normalize into develop refactor(tensor_cast): unify word embedding tp config Created-by: Kudo__shinichi Commit-by: Kudo__shinichi Merged-by: ascend-robot Description: # PR Template Thanks for your contribution; we appreciate it a lot. The following instructions will make your pull request healthier and help you get feedback more easily. If you do not understand some items, don't worry, just make the pull request and seek help from maintainers. 感谢您的贡献，我们非常重视。以下说明将使您的拉取请求更健康，更易于获得反馈。如果您不理解某些项目，请不要担心，只需提交拉取请求并从维护人员那里寻求帮助即可。 PR Type / PR类型 - [ ] Feature（功能新增） - [ ] Bugfix（Bug 修复） - [x] Docs（文档更新） - [ ] CI/CD（持续集成/持续部署） - [x] Refactor（代码重构） - [ ] Perf（性能优化） - [x] Test-Cases（测试用例更新） - [ ] Other（其他） ## 🔍 Motivation / 变更动机 `word_embedding_tp` and `word_embedding_tp_mode` represented the same configuration concept in two fields: one field toggled word embedding TP, and the other selected the TP mode. This PR reduces the public and internal configuration shape to a single parameter so users only need to configure `word_embedding_tp` as disabled, `col`, or `row`. ------ ## 📝 Modification / 修改内容 - Make `UserInputConfig.word_embedding_tp` the single nullable word embedding TP mode field. - Remove `word_embedding_tp_mode` and `embedding_parallel_mode` from the config model. - Pass the normalized `word_embedding_tp` mode directly into `ParallelConfig.embedding_parallel` and the embedding transformation. - Keep legacy bool input normalization for compatibility: `True -> col`, `False/None -> disabled`. - Remove redundant CLI-side bool/mode conversion and update related benchmark cases and user guide docs. - Add regression coverage for single-field config, legacy bool normalization, and invalid `word_embedding_tp` values. ------ ## 📐 Associated Test Results / 关联测试结果 - `python -m pytest tests/regression/tensor_cast/test_user_config.py -q`: 6 passed - `python -m pytest tests/regression/tensor_cast/test_user_config.py tests/regression/web_ui/test_command_builder.py tests/regression/tensor_cast/test_adapter_automation.py -q`: 98 passed - `python -m pytest tests/regression/tensor_cast/test_text_generate.py -k word_embedding_parallel -q`: 2 passed, 113 deselected - `python -m pytest tests/regression/tensor_cast/test_sequence_parallel_pass.py -o addopts= -m "nightly and not npu and not network" -q`: 2 passed - `python -m pytest tests/benchmark/models/test_model_regression.py --collect-only -q`: 15 tests collected - `python -m ruff check <changed python files>`: All checks passed - `python -m pre_commit run --from-ref origin/develop --to-ref HEAD`: passed - `git diff --check HEAD~1 HEAD`: passed ------ ## 🌟 Use cases (Optional) / 使用案例（可选） - Disable word embedding TP: `word_embedding_tp=None` - Enable column mode: `word_embedding_tp="col"` - Enable row mode: `word_embedding_tp="row"` - CLI usage: `--word-embedding-tp col` or `--word-embedding-tp row` ------ ## ✅ Checklist / 检查列表 Before PR: - [x] Bug fixes are fully covered by unit tests, the case that causes the bug should be added in the unit tests. / 修复的 Bug 已完全由单元测试覆盖，导致 Bug 的情况应在单元测试中添加。 - [x] The modification is covered by complete unit tests. If not, please add more unit tests to ensure the correctness. / 此拉取请求中的修改已完全由单元测试覆盖。如果不是，请添加更多单元测试以确保正确性。 - [x] All relevant documentation (API docs, docstrings, example tutorials) has been updated to reflect these changes. / 所有相关文档（API 文档、文档字符串、示例教程）已更新以反映这些更改。 - [x] Please ensure code files contain no Chinese comments. / 请保证代码文件中不含中文注释。 ------ See merge request: Ascend/msmodeling!344	12 天前
__init__.py	initial import of tensor_cast	10 个月前
config.py	feat: profiling-based empirical performance model with CSV data source Co-authored-by: Horacehxw<horacehxw@gmail.com> # message auto-generated for no-merge-commit merge: !123 merge pr/perf-db-a into develop feat: profiling-based empirical performance model with CSV data source Created-by: Horacehxw Commit-by: Horacehxw Merged-by: ascend-robot Description: PR Type / PR类型 - [x] Feature（功能新增） - [ ] Bugfix（Bug 修复） - [ ] Docs（文档更新） - [ ] CI/CD（持续集成/持续部署） - [x] Refactor（代码重构） - [ ] Perf（性能优化） - [x] Test-Cases（测试用例更新） - [ ] Other（其他） ## 🔍 Motivation / 变更动机 TensorCast 现有的 Roofline 解析模型（`AnalyticPerformanceModel`）对昇腾 NPU 的性能预测精度有限：融合算子（SwiGlu、AddRmsNorm、DispatchFFNCombine）无法建模，HCCL 集合通信与理论带宽差距显著，FRACTAL_NZ 格式等硬件特性无法通过 Roofline 捕获。本 PR 实现了基于真实 NPU Profiling 数据的实测算子性能估算系统，将 kernel 实测耗时接入 TensorCast 仿真框架。与 PR#96 的关系：PR#96 已合入 develop，定义了 `DataSourcePerformanceModel` 接口骨架（stub）和 CLI 集成。本 PR 提供完整的功能实现：CSV 查询引擎（9 种 TC-vs-NPU shape matching 规则）、op_mapping 映射（60+ 算子）、插值、M1-M6 指标体系、以及 DFC/FlashComm 编译 Pass。接口完全兼容。 > 📌 配套的离线数据采集工具链将在后续 PR 中提交（tools/perf_data_collection/，与本 PR 无代码依赖）。 ------ ## 📝 Modification / 修改内容 ### 1. Profiling Data Source 核心实现（替换 PR#96 stub） \| 文件 \| 说明 \| \|------\|------\| \| `profiling_database/profiling_data_source.py` (+1,885) \| `ProfilingDataSource`：op_mapping.yaml 驱动的 CSV 查询引擎，支持 9 种 TC-vs-NPU shape 差异处理（batch dim stripping、seq padding、FRACTAL_NZ、ND transpose、SwiGlu concat、RoPE layout/kernel、composite 分解、flatten batch） \| \| `profiling_database/interpolating_data_source.py` (+702) \| `InterpolatingDataSource`：nearest-neighbor + 线性插值包装器 \| \| `profiling_database/data_source.py` (修改) \| `DataSourcePerformanceModel` ABC 扩展（新增 `EXTRAPOLATED` enum、`details` 字段） \| ### 2. EmpiricalPerformanceModel 增强 (+436) 在 PR#96 基础上增加 M1-M6 指标追踪： - M1-M4：覆盖率指标（raw count → fused → compute-only → per-shape） - M5：延迟加权覆盖率 - M6 input：empirical hit total（用于离线 E2E ratio 计算） - `log_stats()`：结构化 HIT/MISS 日志 - `export_hit_miss_report()`：JSON 格式指标导出 ### 3. 编译 Passes (+875) \| Pass \| 说明 \| \|------\|------\| \| `dispatch_ffn_combine_pass.py` \| DispatchFFNCombine 超级融合（init_routing_v2 + GroupedMatmul + unpermute_tokens → 单 op），支持 5 种量化变体 \| \| `flashcomm_v1_pass.py` \| FlashComm V1 图重写（matmul_all_reduce → 通信隐藏），对标 vLLM-ascend `ENABLE_FLASHCOMM1=1` \| ### 4. op_mapping.yaml（3 个版本，共 ~3,600 行） \| 版本 \| 算子数 \| \|------\|:------:\| \| `vllm0.13.0_torch2.8.0_cann8.3` \| ~45 \| \| `vllm0.15.0_torch2.9.0_cann8.5` \| ~55 \| \| `vllm0.18.0_torch2.9.0_cann8.5` \| ~60 \| ### 5. CSV Profiling Data（~250 files，Git LFS） ATLAS 800 A3 752T 128G 设备数据：HCCL 通信基准 + 3 个 vLLM 版本的 kernel 数据 + 微基准补充数据。 ### 6. 集成改动 \| 文件 \| 改动 \| \|------\|------\| \| `model_runner.py` \| profiling 模式集成（`perf_models[]` + `log_stats` + `ProfilingDataSource` 创建） \| \| `user_config.py` \| `--profiling-database` 参数 \| \| `scripts/text_generate.py` \| `--export-metrics` CLI + FlashComm 配置 \| \| `ops/fused_moe.py` \| 新增 `dispatch_ffn_combine` op \| \| `compile_backend.py` \| 注册 DFC + FlashComm passes \| ------ ## 📐 Associated Test Results / 关联测试结果 ### 单元测试 `$ pytest tests/perf_database/ -q 266 passed, 3 warnings in 1.94s $ pytest tests/test_tensor_cast/test_empirical.py tests/test_tensor_cast/test_dfc_pass.py -q 8 passed, 1 skipped in 120.75s $ lintrunner -a ok No lint issues.` ### 功能验证 bash # Analytic 模式（行为不变） $ python -m tensor_cast.scripts.text_generate Qwen/Qwen3-32B \ --num-queries 2 --query-length 3500 --device TEST_DEVICE → [analytic] Execution time: 1.744s, TPS/Device: 4013 token/s ✅ # Profiling 模式（新功能） $ python -m tensor_cast.scripts.text_generate Qwen/Qwen3-32B \ --num-queries 1 --query-length 4112 --word-embedding-tp row \ --device ATLAS_800_A3_752T_128G_DIE --world-size 16 --tp-size 16 \ --quantize-linear-action DISABLED \ --performance-model profiling --compile \ --profiling-database tensor_cast/performance_model/profiling_database/data/ATLAS_800_A3_752T_128G_DIE/vllm_ascend/vllm0.18.0_torch2.9.0_cann8.5 → [empirical] Execution time: 0.156s, TPS/Device: 1651 token/s ✅ ### M1-M5 指标 \| 场景 \| M3 (计算算子 HR) \| M5 (延迟覆盖) \| \|------\|:---------------:\|:------------:\| \| Qwen3-32B Prefill (BF16) \| 61.5% ✅ (>50%) \| 89.0% ✅ (>80%) \| \| Qwen3-32B Decode (BF16) \| 38.5% \| 80.1% ✅ (>80%) \| \| DeepSeek-V3 Prefill (W8A8) \| 52.6% ✅ (>50%) \| 68.9% \| \| DeepSeek-V3 Decode (W8A8) \| 15.8% \| 54.3% \| ------ ## 🌟 Use cases (Optional) / 使用案例（可选） bash # 1. 使用实测数据替代 Roofline 估算 python -m tensor_cast.scripts.text_generate <model_id> \ --performance-model profiling --compile \ --profiling-database <path_to_data_dir> # 2. 导出 M1-M5 指标 JSON（用于离线 M6 计算） python -m tensor_cast.scripts.text_generate <model_id> \ --performance-model profiling --compile \ --profiling-database <path_to_data_dir> \ --export-metrics results/metrics.json # 3. 同时运行 analytic + profiling 对比 python -m tensor_cast.scripts.text_generate <model_id> \ --performance-model analytic --performance-model profiling --compile \ --profiling-database <path_to_data_dir> ------ ## ✅ Checklist / 检查列表 Before PR: - [x] [Linting tools](https://gitcode.com/Ascend/msmodeling/blob/develop/tensor_cast/README.md#coding-style) are used to fix the potential lint issues. - [x] Bug fixes are fully covered by unit tests, the case that causes the bug should be added in the unit tests. - [x] The modification is covered by complete unit tests. If not, please add more unit tests to ensure the correctness. - [ ] All relevant documentation (API docs, docstrings, example tutorials) has been updated to reflect these changes. - [x] Please ensure code files contain no Chinese comments. ``` See merge request: Ascend/msmodeling!123	1 个月前
device.py	feat:支持A5硬件350系列性能评估 Co-authored-by: hqx<1343153389@qq.com> # message auto-generated for no-merge-commit merge: !336 merge ms4 into develop feat:支持A5硬件350系列性能评估 Created-by: h7star Commit-by: hqx Merged-by: ascend-robot Description: # PR Template Thanks for your contribution; we appreciate it a lot. The following instructions will make your pull request healthier and help you get feedback more easily. If you do not understand some items, don't worry, just make the pull request and seek help from maintainers. 感谢您的贡献，我们非常重视。以下说明将使您的拉取请求更健康，更易于获得反馈。如果您不理解某些项目，请不要担心，只需提交拉取请求并从维护人员那里寻求帮助即可。 PR Type / PR类型 - [x] Feature（功能新增） - [ ] Bugfix（Bug 修复） - [ ] Docs（文档更新） - [ ] CI/CD（持续集成/持续部署） - [ ] Refactor（代码重构） - [ ] Perf（性能优化） - [ ] Test-Cases（测试用例更新） - [ ] Other（其他） ## 🔍 Motivation / 变更动机支持对基于A5硬件模型进行建模。 ------ ## 📝 Modification / 修改内容新增了对A5硬件的支持，--device增加以下硬件：ATLAS_350_425T_112G、ATLAS_350_425T_84G。 ------ ## 📐 Associated Test Results / 关联测试结果 python -m cli.inference.text_generate Qwen/Qwen3-8B --num-queries 8 --query-length 1024 --device ATLAS_350_425T_112G --num-devices 2 --tp-size 2 --compile --quantize-linear-action MXFP4 ![image.png](https://raw.gitcode.com/user-images/assets/8428112/104cb5d4-5977-49a8-a061-651ad7a403ab/image.png 'image.png') ------ ## 🌟 Use cases (Optional) / 使用案例（可选） If this PR introduces a new feature, it is better to list some use cases here and update the documentation. 如果此拉取请求引入了新功能，最好在此处列出一些用例并更新文档。 ------ ## ✅ Checklist / 检查列表 Before PR: - [ ] Bug fixes are fully covered by unit tests, the case that causes the bug should be added in the unit tests. / 修复的 Bug 已完全由单元测试覆盖，导致 Bug 的情况应在单元测试中添加。 - [ ] The modification is covered by complete unit tests. If not, please add more unit tests to ensure the correctness. / 此拉取请求中的修改已完全由单元测试覆盖。如果不是，请添加更多单元测试以确保正确性。 - [ ] All relevant documentation (API docs, docstrings, example tutorials) has been updated to reflect these changes. / 所有相关文档（API 文档、文档字符串、示例教程）已更新以反映这些更改。 - [ ] Please ensure code files contain no Chinese comments. / 请保证代码文件中不含中文注释。 ------ See merge request: Ascend/msmodeling!336	13 天前
model_config.py	refactor(tensor_cast): unify word embedding tp config Co-authored-by: Kudo__shinichi<liuning119@huawei.com> # message auto-generated for no-merge-commit merge: !344 merge codex/word-embedding-tp-normalize into develop refactor(tensor_cast): unify word embedding tp config Created-by: Kudo__shinichi Commit-by: Kudo__shinichi Merged-by: ascend-robot Description: # PR Template Thanks for your contribution; we appreciate it a lot. The following instructions will make your pull request healthier and help you get feedback more easily. If you do not understand some items, don't worry, just make the pull request and seek help from maintainers. 感谢您的贡献，我们非常重视。以下说明将使您的拉取请求更健康，更易于获得反馈。如果您不理解某些项目，请不要担心，只需提交拉取请求并从维护人员那里寻求帮助即可。 PR Type / PR类型 - [ ] Feature（功能新增） - [ ] Bugfix（Bug 修复） - [x] Docs（文档更新） - [ ] CI/CD（持续集成/持续部署） - [x] Refactor（代码重构） - [ ] Perf（性能优化） - [x] Test-Cases（测试用例更新） - [ ] Other（其他） ## 🔍 Motivation / 变更动机 `word_embedding_tp` and `word_embedding_tp_mode` represented the same configuration concept in two fields: one field toggled word embedding TP, and the other selected the TP mode. This PR reduces the public and internal configuration shape to a single parameter so users only need to configure `word_embedding_tp` as disabled, `col`, or `row`. ------ ## 📝 Modification / 修改内容 - Make `UserInputConfig.word_embedding_tp` the single nullable word embedding TP mode field. - Remove `word_embedding_tp_mode` and `embedding_parallel_mode` from the config model. - Pass the normalized `word_embedding_tp` mode directly into `ParallelConfig.embedding_parallel` and the embedding transformation. - Keep legacy bool input normalization for compatibility: `True -> col`, `False/None -> disabled`. - Remove redundant CLI-side bool/mode conversion and update related benchmark cases and user guide docs. - Add regression coverage for single-field config, legacy bool normalization, and invalid `word_embedding_tp` values. ------ ## 📐 Associated Test Results / 关联测试结果 - `python -m pytest tests/regression/tensor_cast/test_user_config.py -q`: 6 passed - `python -m pytest tests/regression/tensor_cast/test_user_config.py tests/regression/web_ui/test_command_builder.py tests/regression/tensor_cast/test_adapter_automation.py -q`: 98 passed - `python -m pytest tests/regression/tensor_cast/test_text_generate.py -k word_embedding_parallel -q`: 2 passed, 113 deselected - `python -m pytest tests/regression/tensor_cast/test_sequence_parallel_pass.py -o addopts= -m "nightly and not npu and not network" -q`: 2 passed - `python -m pytest tests/benchmark/models/test_model_regression.py --collect-only -q`: 15 tests collected - `python -m ruff check <changed python files>`: All checks passed - `python -m pre_commit run --from-ref origin/develop --to-ref HEAD`: passed - `git diff --check HEAD~1 HEAD`: passed ------ ## 🌟 Use cases (Optional) / 使用案例（可选） - Disable word embedding TP: `word_embedding_tp=None` - Enable column mode: `word_embedding_tp="col"` - Enable row mode: `word_embedding_tp="row"` - CLI usage: `--word-embedding-tp col` or `--word-embedding-tp row` ------ ## ✅ Checklist / 检查列表 Before PR: - [x] Bug fixes are fully covered by unit tests, the case that causes the bug should be added in the unit tests. / 修复的 Bug 已完全由单元测试覆盖，导致 Bug 的情况应在单元测试中添加。 - [x] The modification is covered by complete unit tests. If not, please add more unit tests to ensure the correctness. / 此拉取请求中的修改已完全由单元测试覆盖。如果不是，请添加更多单元测试以确保正确性。 - [x] All relevant documentation (API docs, docstrings, example tutorials) has been updated to reflect these changes. / 所有相关文档（API 文档、文档字符串、示例教程）已更新以反映这些更改。 - [x] Please ensure code files contain no Chinese comments. / 请保证代码文件中不含中文注释。 ------ See merge request: Ascend/msmodeling!344	12 天前
parallel_group.py	chore(ci): adopt pre-commit and retire legacy lintrunner adapters Co-authored-by: liujiawang<anonymousdev@163.com> # message auto-generated for no-merge-commit merge: !176 merge pre-commit into develop chore(ci): adopt pre-commit and retire legacy lintrunner adapters Created-by: AvadaKedavrua Commit-by: liujiawang;AvadaKedavrua Merged-by: ascend-robot Description: Thanks for your contribution; we appreciate it a lot. The following instructions will make your pull request healthier and help you get feedback more easily. If you do not understand some items, don't worry, just make the pull request and seek help from maintainers. 感谢您的贡献，我们非常重视。以下说明将使您的拉取请求更健康，更易于获得反馈。如果您不理解某些项目，请不要担心，只需提交拉取请求并从维护人员那里寻求帮助即可。 PR Type / PR类型 - [ ] Feature（功能新增） - [ ] Bugfix（Bug 修复） - [x] Docs（文档更新） - [x] CI/CD（持续集成/持续部署） - [ ] Refactor（代码重构） - [ ] Perf（性能优化） - [ ] Test-Cases（测试用例更新） - [ ] Other（其他） ------ ## Motivation / 变更动机 Continue the pre-commit migration: tighten Pylint so only high-signal messages run (`disable=all` + explicit `enable` list), fix real issues that remained under that profile, and translate hook/config comments to English. ------ ## Configuration changes（仅工具与注释 / tooling & comments only） \| Path \| What changed \| \|------\|----------------\| \| `pre-commit/pyproject.toml` \| Pylint: `[tool.pylint."messages control"]` with `disable = ["all"]` and a short allowlist of message IDs (E0100, E0601–E0611, E0632, E1101, E1120, W0632, W1514). Ruff: unchanged behavior; comments translated to English. Bandit: comments translated; rule allowlist/skip lists unchanged. \| \| `.pre-commit-config.yaml` \| Comments translated to English; Bandit hook display name set to bandit (Python security checks). Hook versions and args unchanged except for comment text. \| ------ ## Source code changes（应用代码 / application code） \| Area \| Files \| Purpose \| \|------\|--------\|---------\| \| `serving_cast` \| `communication.py`, `engine.py`, `instance.py`, `kv_cache_manager.py`, `load_gen.py`, `main.py`, `model_runner.py`, `request.py`, `serving.py`, `utils.py` \| Replace `from . import stime` with `import serving_cast.stime as stime` so Pylint resolves imports (fixes E0611). \| \| `serving_cast` \| `stime.py` \| Singleton salabim `Environment` via `_get_sim_env()` so type checkers/Pylint see `sim.Environment` (fixes E1101 on `SimulationEnv`). \| \| `serving_cast/service` \| `base_throughput_optimizer.py` \| `__init__` defaults + `assert runner is not None` before `run_inference` (fixes E1101 on base class). \| \| `tensor_cast` \| `diffusers/diffusers_model.py`, `diffusers/diffusers_utils.py`, `runtime.py` \| Add `encoding="utf-8"` to `open()` / trace export (fixes W1514). \| \| `web_ui` \| `callbacks.py` \| `refresh_optimizer_detail`: call `_optimizer_detail_view(rows, None, device)` and unpack five return values (fixes E1120). \| ------ ## Recent commits on `pre-commit` branch - `ci(pre-commit): fix pylint message selection with disable=all` - `fix: resolve pylint findings in serving_cast, tensor_cast, and web_ui` - `docs(pre-commit): translate comments to English and add all-files run log` ------ ![](https://raw.atomgit.com/Ascend/msmodeling/attachment/uploads/b22b18aa-4c84-4dc0-85f5-1e7e0715350e/pre-commit-all-files-run.svg) ------ ## Checklist / 检查列表 - [x] Please ensure code files contain no Chinese comments. / 请保证代码文件中不含中文注释。 See merge request: Ascend/msmodeling!176	1 个月前
patch_torch.py	fix(tensor_cast): restore torch patches after runtime exit Co-authored-by: jia_ya_nan<jiayanan3@h-partners.com> # message auto-generated for no-merge-commit merge: !327 merge develop into develop fix(tensor_cast): restore torch patches after runtime exit Created-by: jia_ya_nan Commit-by: jia_ya_nan Merged-by: ascend-robot Description: # PR Template Thanks for your contribution; we appreciate it a lot. The following instructions will make your pull request healthier and help you get feedback more easily. If you do not understand some items, don't worry, just make the pull request and seek help from maintainers. 感谢您的贡献，我们非常重视。以下说明将使您的拉取请求更健康，更易于获得反馈。如果您不理解某些项目，请不要担心，只需提交拉取请求并从维护人员那里寻求帮助即可。 PR Type / PR类型 - [ ] Feature（功能新增） - [x] Bugfix（Bug 修复） - [ ] Docs（文档更新） - [ ] CI/CD（持续集成/持续部署） - [ ] Refactor（代码重构） - [ ] Perf（性能优化） - [ ] Test-Cases（测试用例更新） - [ ] Other（其他） ## 🔍 Motivation / 变更动机 Please describe the motivation of this PR and the goal you want to achieve through this PR. 请描述您的拉取请求的动机和您希望通过此拉取请求实现的目标。修复开启chunk prefill后出现 `RecursionError: maximum recursion depth exceeded` ------ ## 📝 Modification / 修改内容 Please briefly describe what modification is made in this PR. 请简要描述此拉取请求中进行的修改。根因是 Runtime.__exit__ 没有关闭 exit_stack，导致每次 Runtime 进入都会把 torch._prims_common.dtype_to_type 再包一层，吞吐优化多轮模拟后 wrapper 层数爆掉。 ------ ## 📐 Associated Test Results / 关联测试结果 Please provide the related test results, such as test reports, etc. 请提供相关测试结果，例如测试报告等。 ------ ## 🌟 Use cases (Optional) / 使用案例（可选） If this PR introduces a new feature, it is better to list some use cases here and update the documentation. 如果此拉取请求引入了新功能，最好在此处列出一些用例并更新文档。 ------ ## ✅ Checklist / 检查列表 Before PR: - [x] Bug fixes are fully covered by unit tests, the case that causes the bug should be added in the unit tests. / 修复的 Bug 已完全由单元测试覆盖，导致 Bug 的情况应在单元测试中添加。 - [x] The modification is covered by complete unit tests. If not, please add more unit tests to ensure the correctness. / 此拉取请求中的修改已完全由单元测试覆盖。如果不是，请添加更多单元测试以确保正确性。 - [x] All relevant documentation (API docs, docstrings, example tutorials) has been updated to reflect these changes. / 所有相关文档（API 文档、文档字符串、示例教程）已更新以反映这些更改。 - [x] Please ensure code files contain no Chinese comments. / 请保证代码文件中不含中文注释。 ------ See merge request: Ascend/msmodeling!327	14 天前
quantize_utils.py	chore(ci): adopt pre-commit and retire legacy lintrunner adapters Co-authored-by: liujiawang<anonymousdev@163.com> # message auto-generated for no-merge-commit merge: !176 merge pre-commit into develop chore(ci): adopt pre-commit and retire legacy lintrunner adapters Created-by: AvadaKedavrua Commit-by: liujiawang;AvadaKedavrua Merged-by: ascend-robot Description: Thanks for your contribution; we appreciate it a lot. The following instructions will make your pull request healthier and help you get feedback more easily. If you do not understand some items, don't worry, just make the pull request and seek help from maintainers. 感谢您的贡献，我们非常重视。以下说明将使您的拉取请求更健康，更易于获得反馈。如果您不理解某些项目，请不要担心，只需提交拉取请求并从维护人员那里寻求帮助即可。 PR Type / PR类型 - [ ] Feature（功能新增） - [ ] Bugfix（Bug 修复） - [x] Docs（文档更新） - [x] CI/CD（持续集成/持续部署） - [ ] Refactor（代码重构） - [ ] Perf（性能优化） - [ ] Test-Cases（测试用例更新） - [ ] Other（其他） ------ ## Motivation / 变更动机 Continue the pre-commit migration: tighten Pylint so only high-signal messages run (`disable=all` + explicit `enable` list), fix real issues that remained under that profile, and translate hook/config comments to English. ------ ## Configuration changes（仅工具与注释 / tooling & comments only） \| Path \| What changed \| \|------\|----------------\| \| `pre-commit/pyproject.toml` \| Pylint: `[tool.pylint."messages control"]` with `disable = ["all"]` and a short allowlist of message IDs (E0100, E0601–E0611, E0632, E1101, E1120, W0632, W1514). Ruff: unchanged behavior; comments translated to English. Bandit: comments translated; rule allowlist/skip lists unchanged. \| \| `.pre-commit-config.yaml` \| Comments translated to English; Bandit hook display name set to bandit (Python security checks). Hook versions and args unchanged except for comment text. \| ------ ## Source code changes（应用代码 / application code） \| Area \| Files \| Purpose \| \|------\|--------\|---------\| \| `serving_cast` \| `communication.py`, `engine.py`, `instance.py`, `kv_cache_manager.py`, `load_gen.py`, `main.py`, `model_runner.py`, `request.py`, `serving.py`, `utils.py` \| Replace `from . import stime` with `import serving_cast.stime as stime` so Pylint resolves imports (fixes E0611). \| \| `serving_cast` \| `stime.py` \| Singleton salabim `Environment` via `_get_sim_env()` so type checkers/Pylint see `sim.Environment` (fixes E1101 on `SimulationEnv`). \| \| `serving_cast/service` \| `base_throughput_optimizer.py` \| `__init__` defaults + `assert runner is not None` before `run_inference` (fixes E1101 on base class). \| \| `tensor_cast` \| `diffusers/diffusers_model.py`, `diffusers/diffusers_utils.py`, `runtime.py` \| Add `encoding="utf-8"` to `open()` / trace export (fixes W1514). \| \| `web_ui` \| `callbacks.py` \| `refresh_optimizer_detail`: call `_optimizer_detail_view(rows, None, device)` and unpack five return values (fixes E1120). \| ------ ## Recent commits on `pre-commit` branch - `ci(pre-commit): fix pylint message selection with disable=all` - `fix: resolve pylint findings in serving_cast, tensor_cast, and web_ui` - `docs(pre-commit): translate comments to English and add all-files run log` ------ ![](https://raw.atomgit.com/Ascend/msmodeling/attachment/uploads/b22b18aa-4c84-4dc0-85f5-1e7e0715350e/pre-commit-all-files-run.svg) ------ ## Checklist / 检查列表 - [x] Please ensure code files contain no Chinese comments. / 请保证代码文件中不含中文注释。 See merge request: Ascend/msmodeling!176	1 个月前
runtime.py	【feat】Add operator bound breakdown reporting to text_generate Co-authored-by: lutean<lutean1@huawei.com> # message auto-generated for no-merge-commit merge: !246 merge develop into develop 【feat】Add operator bound breakdown reporting to text_generate Created-by: lutean Commit-by: lutean Merged-by: ascend-robot Description: # PR Template Thanks for your contribution; we appreciate it a lot. The following instructions will make your pull request healthier and help you get feedback more easily. If you do not understand some items, don't worry, just make the pull request and seek help from maintainers. 感谢您的贡献，我们非常重视。以下说明将使您的拉取请求更健康，更易于获得反馈。如果您不理解某些项目，请不要担心，只需提交拉取请求并从维护人员那里寻求帮助即可。 PR Type / PR类型 - [x] Feature（功能新增） - [ ] Bugfix（Bug 修复） - [ ] Docs（文档更新） - [ ] CI/CD（持续集成/持续部署） - [ ] Refactor（代码重构） - [ ] Perf（性能优化） - [ ] Test-Cases（测试用例更新） - [ ] Other（其他） ## 🔍 Motivation / 变更动机 Please describe the motivation of this PR and the goal you want to achieve through this PR. 请描述您的拉取请求的动机和您希望通过此拉取请求实现的目标。用户/开发者在使用text_generate时，不管是定位问题还是分析结果合理性，都需要获取该算子的bound信息，当前该信息只能通过--chrome-trace打印查看。现增加--dump-op-bound-results参数，若开启，增加每个算子的通信、计算、访存占比。 ------ ## 📝 Modification / 修改内容 Please briefly describe what modification is made in this PR. 请简要描述此拉取请求中进行的修改。 ------ ## 📐 Associated Test Results / 关联测试结果 Please provide the related test results, such as test reports, etc. 请提供相关测试结果，例如测试报告等。 ![image.png](https://raw.gitcode.com/user-images/assets/8428112/419a7ee2-878a-4eec-8ec5-2ab031928e66/image.png 'image.png') ------ ## 🌟 Use cases (Optional) / 使用案例（可选） If this PR introduces a new feature, it is better to list some use cases here and update the documentation. 如果此拉取请求引入了新功能，最好在此处列出一些用例并更新文档。 ------ ## ✅ Checklist / 检查列表 Before PR: - [ ] Bug fixes are fully covered by unit tests, the case that causes the bug should be added in the unit tests. / 修复的 Bug 已完全由单元测试覆盖，导致 Bug 的情况应在单元测试中添加。 - [ ] The modification is covered by complete unit tests. If not, please add more unit tests to ensure the correctness. / 此拉取请求中的修改已完全由单元测试覆盖。如果不是，请添加更多单元测试以确保正确性。 - [ ] All relevant documentation (API docs, docstrings, example tutorials) has been updated to reflect these changes. / 所有相关文档（API 文档、文档字符串、示例教程）已更新以反映这些更改。 - [ ] Please ensure code files contain no Chinese comments. / 请保证代码文件中不含中文注释。 ------ See merge request: Ascend/msmodeling!246	12 天前
utils.py	feat(deps): adopt uv lockfile for reproducible dependency management Co-authored-by: liujiawang<anonymousdev@163.com> # message auto-generated for no-merge-commit merge: !267 merge uv into develop feat(deps): adopt uv lockfile for reproducible dependency management Created-by: AvadaKedavrua Commit-by: liujiawang Merged-by: ascend-robot Description: # PR Template Thanks for your contribution; we appreciate it a lot. The following instructions will make your pull request healthier and help you get feedback more easily. If you do not understand some items, don't worry, just make the pull request and seek help from maintainers. 感谢您的贡献，我们非常重视。以下说明将使您的拉取请求更健康，更易于获得反馈。如果您不理解某些项目，请不要担心，只需提交拉取请求并从维护人员那里寻求帮助即可。 PR Type / PR类型 - [x] Feature（功能新增） - [ ] Bugfix（Bug 修复） - [x] Docs（文档更新） - [x] CI/CD（持续集成/持续部署） - [x] Refactor（代码重构） - [ ] Perf（性能优化） - [ ] Test-Cases（测试用例更新） - [ ] Other（其他） ## 🔍 Motivation / 变更动机 Please describe the motivation of this PR and the goal you want to achieve through this PR. 请描述您的拉取请求的动机和您希望通过此拉取请求实现的目标。 msmodeling 是 CPU 仿真框架，但依赖 Python / PyTorch / Transformers 强耦合，现有 `pip install -r requirements.txt` 带来大量环境类问题： \| 痛点 \| 现状 \| \| --- \| --- \| \| 版本不可复现 \| 无 lockfile，每人/每次 CI 解析结果不同，issue 难复现 \| \| CUDA wheel 误装 \| CPU 仿真却默认拉 multi-GB CUDA PyTorch，安装慢 \| \| Python 版本分裂 \| 原 `>=3.9` 与 Transformers 5.x（需 ≥3.10）冲突 \| \| 依赖源漂移 \| `requirements.txt` 与运行时行为不一致；`check_dependencies()` 静默 `pip install` 覆盖用户环境 \| \| PyTorch 生态版本匹配 \| 装好 `torch` 后，再装 `torchvision` 等组件需上网查兼容版本，费时易错 \| \| 手写 manifest \| 新加依赖需手改 `pyproject.toml` / `requirements.txt`，易 typo、漏约束 \| 本 PR 目标：引入 uv + 纳入版本库的 `uv.lock` 作为可复现依赖契约，同时保留 `requirements.txt` 作为 pip 兼容路径。 `uv` / `uv.lock` 价值（本 PR 核心） 1. 可复现 — `uv.lock` 锁定完整传递依赖树（含 hash）；开发、CI、维护者对照同一份版本基线。 2. CPU-only PyTorch — `pyproject.toml` 通过 `[tool.uv.sources]` 走官方 CPU index（`torch` / `torchvision`）。 3. 受控升级 — manifest 声明 bounds；具体版本由 lock 固定；升级需 `uv lock` + PR 审查。 4. CI 防漂移 — 所有 `run_.sh` 经 `common.sh` 执行 `uv sync --frozen --group ci`。 5. 双路径兼容* — README 推荐 uv；pip + `requirements.txt` 仍可用（无 lock，有意接受漂移）。 `uv add` 工作流价值 \| 场景 \| 以前 \| 现在 \| \| --- \| --- \| --- \| \| 新增依赖 \| 手改 manifest + 自行查 torch/torchvision 兼容版本 \| `uv add <pkg>` 一次完成解析、写 manifest、更新 lock \| \| 本地包开发 \| 手配 `-e` \| `uv add --editable ./path` \| \| 临时试验 \| 污染 venv 或忘记卸载 \| `uv run --with <pkg> …` 不改 lockfile \| 示例：本 PR 通过 `uv add torchvision` 加入依赖——工具层未直接调用，但部分模型底层会 `import torchvision`；resolver 自动匹配与 locked `torch` 兼容的 `0.25.0`。为何将 `uv.lock` 纳入版本库（而非仅自动解析） msmodeling 是克隆即运行的应用/工具仓库，不是发布到 PyPI 供下游再锁版本的 library。若不上库 lock、每次 `uv sync` 自动重新解析： \| \| 上库 `uv.lock`（本 PR） \| 不上库、自动解析 \| \| --- \| --- \| --- \| \| 可复现性 \| 全树 pinned，issue 可对照 lock 复现 \| 随 PyPI/时间变化，「我这边能跑」类 issue 复发 \| \| CI \| `--frozen` 保证 manifest/lock 同步 \| 须去掉 `--frozen`，CI 结果非确定 \| \| 升级 \| 改 bounds → `uv lock` → 审 lock diff \| 隐式升级，难追溯 \| \| 与 pip 路径关系 \| lock = 开发/CI 真相；requirements.txt = 无 lock 的兼容退路 \| 两条路径都不可复现 \| 结论：`uv.lock` 必须上库；否则 uv 只剩「安装快」，无法解决版本漂移根因，且与现有 `--frozen` CI 矛盾。 Lockfile 维护约定：依赖变更 PR 须同时提交 `pyproject.toml` + `uv.lock`；优先 `uv add`；冲突时 rebase 后 `uv lock`，勿手改 lock。 Related: [#69](https://gitcode.com/Ascend/msmodeling/issues/69), RFC [`docs/RFC/rfc_uv_dependency_management_en.md`](docs/RFC/rfc_uv_dependency_management_en.md) Out of scope: NPU `torch-npu` 依赖 — 后续通过华为云源统一绑定。 ------ ## 📝 Modification / 修改内容 Please briefly describe what modification is made in this PR. 请简要描述此拉取请求中进行的修改。 - `pyproject.toml` + `uv.lock`（上库） — 运行时依赖；CPU index 路由 `torch` / `torchvision`；`lint` / `ci` groups - 新增 `torchvision>=0.25.0` — 模型 import 路径需要；经 `uv add` 解析并与 locked `torch` 对齐 - `.pre-commit-config.yaml` — `uv.lock` 排除 large-file 检查 - 移除 `check_dependencies()` — 版本由 uv / lockfile 管理 - `scripts/lib/common.sh` — `uv sync --frozen --group ci`（所有 `run_.sh`） - `requirements.txt`* — 对齐 bounds，含 `torchvision`；pip 退路 - RFC — 补充 `uv add` / `--editable` / `--with` 及 PyTorch 生态匹配痛点 ------ ## 📐 Associated Test Results / 关联测试结果 Please provide the related test results, such as test reports, etc. 请提供相关测试结果，例如测试报告等。 - 本地 `uv sync --frozen --group ci` 通过 - CodeArts pipeline 已触发验证（`docs-ci-pipeline-success`） ------ ## 🌟 Use cases (Optional) / 使用案例（可选） If this PR introduces a new feature, it is better to list some use cases here and update the documentation. 如果此拉取请求引入了新功能，最好在此处列出一些用例并更新文档。推荐（uv + 上库 lockfile） `bash pip install uv uv venv --python 3.13 .venv && source .venv/bin/activate uv sync uv add torchvision # 新增依赖：自动匹配 torch 版本并更新 lock uv run python -m cli.inference.text_generate --help` 临时试验（不改 lock） `bash uv run --with some-package python -c "import some_package"` CI / 本地测试入口 `bash bash scripts/run_smoke.sh # common.sh → uv sync --frozen --group ci` pip 兼容（无 lock） `bash pip install "torch>=2.7,<=2.10" "torchvision>=0.25.0" --index-url https://download.pytorch.org/whl/cpu pip install -r requirements.txt` ------ ## ✅ Checklist / 检查列表 Before PR: - [ ] Bug fixes are fully covered by unit tests, the case that causes the bug should be added in the unit tests. / 修复的 Bug 已完全由单元测试覆盖，导致 Bug 的情况应在单元测试中添加。 - [ ] The modification is covered by complete unit tests. If not, please add more unit tests to ensure the correctness. / 此拉取请求中的修改已完全由单元测试覆盖。如果不是，请添加更多单元测试以确保正确性。 - [x] All relevant documentation (API docs, docstrings, example tutorials) has been updated to reflect these changes. / 所有相关文档（API 文档、文档字符串、示例教程）已更新以反映这些更改。 - [x] Please ensure code files contain no Chinese comments. / 请保证代码文件中不含中文注释。 ------ See merge request: Ascend/msmodeling!267	21 天前