MindStudio-Modeling（msmodeling）是MindStudio建模寻优工具，评估模型及服务化等场景下的理论性能，并在此基础上寻找性能较优的部署策略等参数。

文件	最后提交记录	最后更新时间
.agents	优化 msModeling README 与中英文文档结构 Co-authored-by: eveyin1<qianyin2022@hotmail.com> # message auto-generated for no-merge-commit merge: !329 merge doc_fix1 into develop 优化 msModeling README 与中英文文档结构 Created-by: eveyin1 Commit-by: eveyin1 Merged-by: ascend-robot Description: # PR Template Thanks for your contribution; we appreciate it a lot. The following instructions will make your pull request healthier and help you get feedback more easily. If you do not understand some items, don't worry, just make the pull request and seek help from maintainers. 感谢您的贡献，我们非常重视。以下说明将使您的拉取请求更健康，更易于获得反馈。如果您不理解某些项目，请不要担心，只需提交拉取请求并从维护人员那里寻求帮助即可。 PR Type / PR类型 - [ ] Feature（功能新增） - [ ] Bugfix（Bug 修复） - [x] Docs（文档更新） - [ ] CI/CD（持续集成/持续部署） - [ ] Refactor（代码重构） - [ ] Perf（性能优化） - [ ] Test-Cases（测试用例更新） - [ ] Other（其他） ## 🔍 Motivation / 变更动机 Please describe the motivation of this PR and the goal you want to achieve through this PR. 请描述您的拉取请求的动机和您希望通过此拉取请求实现的目标。 ------ ## 📝 Modification / 修改内容 Please briefly describe what modification is made in this PR. 请简要描述此拉取请求中进行的修改。 1 readme参考最新模板内容整改 2 安装快速入门内容补齐 3 docs 英文文档整理 4 文档易用性评分 5 通过docs-tool aidd检查 ------ ## 📐 Associated Test Results / 关联测试结果 Please provide the related test results, such as test reports, etc. 请提供相关测试结果，例如测试报告等。 ![20260611-101033.jpg](https://raw.gitcode.com/user-images/assets/8428112/db108d98-9c1e-4eec-9508-2aacccb1d507/20260611-101033.jpg '20260611-101033.jpg') ------ ## 🌟 Use cases (Optional) / 使用案例（可选） If this PR introduces a new feature, it is better to list some use cases here and update the documentation. 如果此拉取请求引入了新功能，最好在此处列出一些用例并更新文档。 ------ ## ✅ Checklist / 检查列表 Before PR: - [ ] Bug fixes are fully covered by unit tests, the case that causes the bug should be added in the unit tests. / 修复的 Bug 已完全由单元测试覆盖，导致 Bug 的情况应在单元测试中添加。 - [ ] The modification is covered by complete unit tests. If not, please add more unit tests to ensure the correctness. / 此拉取请求中的修改已完全由单元测试覆盖。如果不是，请添加更多单元测试以确保正确性。 - [ ] All relevant documentation (API docs, docstrings, example tutorials) has been updated to reflect these changes. / 所有相关文档（API 文档、文档字符串、示例教程）已更新以反映这些更改。 - [ ] Please ensure code files contain no Chinese comments. / 请保证代码文件中不含中文注释。 ------ See merge request: Ascend/msmodeling!329	15 天前
.gitcode	chore(ci): adopt pre-commit and retire legacy lintrunner adapters Co-authored-by: liujiawang<anonymousdev@163.com> # message auto-generated for no-merge-commit merge: !176 merge pre-commit into develop chore(ci): adopt pre-commit and retire legacy lintrunner adapters Created-by: AvadaKedavrua Commit-by: liujiawang;AvadaKedavrua Merged-by: ascend-robot Description: Thanks for your contribution; we appreciate it a lot. The following instructions will make your pull request healthier and help you get feedback more easily. If you do not understand some items, don't worry, just make the pull request and seek help from maintainers. 感谢您的贡献，我们非常重视。以下说明将使您的拉取请求更健康，更易于获得反馈。如果您不理解某些项目，请不要担心，只需提交拉取请求并从维护人员那里寻求帮助即可。 PR Type / PR类型 - [ ] Feature（功能新增） - [ ] Bugfix（Bug 修复） - [x] Docs（文档更新） - [x] CI/CD（持续集成/持续部署） - [ ] Refactor（代码重构） - [ ] Perf（性能优化） - [ ] Test-Cases（测试用例更新） - [ ] Other（其他） ------ ## Motivation / 变更动机 Continue the pre-commit migration: tighten Pylint so only high-signal messages run (`disable=all` + explicit `enable` list), fix real issues that remained under that profile, and translate hook/config comments to English. ------ ## Configuration changes（仅工具与注释 / tooling & comments only） \| Path \| What changed \| \|------\|----------------\| \| `pre-commit/pyproject.toml` \| Pylint: `[tool.pylint."messages control"]` with `disable = ["all"]` and a short allowlist of message IDs (E0100, E0601–E0611, E0632, E1101, E1120, W0632, W1514). Ruff: unchanged behavior; comments translated to English. Bandit: comments translated; rule allowlist/skip lists unchanged. \| \| `.pre-commit-config.yaml` \| Comments translated to English; Bandit hook display name set to bandit (Python security checks). Hook versions and args unchanged except for comment text. \| ------ ## Source code changes（应用代码 / application code） \| Area \| Files \| Purpose \| \|------\|--------\|---------\| \| `serving_cast` \| `communication.py`, `engine.py`, `instance.py`, `kv_cache_manager.py`, `load_gen.py`, `main.py`, `model_runner.py`, `request.py`, `serving.py`, `utils.py` \| Replace `from . import stime` with `import serving_cast.stime as stime` so Pylint resolves imports (fixes E0611). \| \| `serving_cast` \| `stime.py` \| Singleton salabim `Environment` via `_get_sim_env()` so type checkers/Pylint see `sim.Environment` (fixes E1101 on `SimulationEnv`). \| \| `serving_cast/service` \| `base_throughput_optimizer.py` \| `__init__` defaults + `assert runner is not None` before `run_inference` (fixes E1101 on base class). \| \| `tensor_cast` \| `diffusers/diffusers_model.py`, `diffusers/diffusers_utils.py`, `runtime.py` \| Add `encoding="utf-8"` to `open()` / trace export (fixes W1514). \| \| `web_ui` \| `callbacks.py` \| `refresh_optimizer_detail`: call `_optimizer_detail_view(rows, None, device)` and unpack five return values (fixes E1120). \| ------ ## Recent commits on `pre-commit` branch - `ci(pre-commit): fix pylint message selection with disable=all` - `fix: resolve pylint findings in serving_cast, tensor_cast, and web_ui` - `docs(pre-commit): translate comments to English and add all-files run log` ------ ![](https://raw.atomgit.com/Ascend/msmodeling/attachment/uploads/b22b18aa-4c84-4dc0-85f5-1e7e0715350e/pre-commit-all-files-run.svg) ------ ## Checklist / 检查列表 - [x] Please ensure code files contain no Chinese comments. / 请保证代码文件中不含中文注释。 See merge request: Ascend/msmodeling!176	1 个月前
cli	refactor(tensor_cast): unify word embedding tp config Co-authored-by: Kudo__shinichi<liuning119@huawei.com> # message auto-generated for no-merge-commit merge: !344 merge codex/word-embedding-tp-normalize into develop refactor(tensor_cast): unify word embedding tp config Created-by: Kudo__shinichi Commit-by: Kudo__shinichi Merged-by: ascend-robot Description: # PR Template Thanks for your contribution; we appreciate it a lot. The following instructions will make your pull request healthier and help you get feedback more easily. If you do not understand some items, don't worry, just make the pull request and seek help from maintainers. 感谢您的贡献，我们非常重视。以下说明将使您的拉取请求更健康，更易于获得反馈。如果您不理解某些项目，请不要担心，只需提交拉取请求并从维护人员那里寻求帮助即可。 PR Type / PR类型 - [ ] Feature（功能新增） - [ ] Bugfix（Bug 修复） - [x] Docs（文档更新） - [ ] CI/CD（持续集成/持续部署） - [x] Refactor（代码重构） - [ ] Perf（性能优化） - [x] Test-Cases（测试用例更新） - [ ] Other（其他） ## 🔍 Motivation / 变更动机 `word_embedding_tp` and `word_embedding_tp_mode` represented the same configuration concept in two fields: one field toggled word embedding TP, and the other selected the TP mode. This PR reduces the public and internal configuration shape to a single parameter so users only need to configure `word_embedding_tp` as disabled, `col`, or `row`. ------ ## 📝 Modification / 修改内容 - Make `UserInputConfig.word_embedding_tp` the single nullable word embedding TP mode field. - Remove `word_embedding_tp_mode` and `embedding_parallel_mode` from the config model. - Pass the normalized `word_embedding_tp` mode directly into `ParallelConfig.embedding_parallel` and the embedding transformation. - Keep legacy bool input normalization for compatibility: `True -> col`, `False/None -> disabled`. - Remove redundant CLI-side bool/mode conversion and update related benchmark cases and user guide docs. - Add regression coverage for single-field config, legacy bool normalization, and invalid `word_embedding_tp` values. ------ ## 📐 Associated Test Results / 关联测试结果 - `python -m pytest tests/regression/tensor_cast/test_user_config.py -q`: 6 passed - `python -m pytest tests/regression/tensor_cast/test_user_config.py tests/regression/web_ui/test_command_builder.py tests/regression/tensor_cast/test_adapter_automation.py -q`: 98 passed - `python -m pytest tests/regression/tensor_cast/test_text_generate.py -k word_embedding_parallel -q`: 2 passed, 113 deselected - `python -m pytest tests/regression/tensor_cast/test_sequence_parallel_pass.py -o addopts= -m "nightly and not npu and not network" -q`: 2 passed - `python -m pytest tests/benchmark/models/test_model_regression.py --collect-only -q`: 15 tests collected - `python -m ruff check <changed python files>`: All checks passed - `python -m pre_commit run --from-ref origin/develop --to-ref HEAD`: passed - `git diff --check HEAD~1 HEAD`: passed ------ ## 🌟 Use cases (Optional) / 使用案例（可选） - Disable word embedding TP: `word_embedding_tp=None` - Enable column mode: `word_embedding_tp="col"` - Enable row mode: `word_embedding_tp="row"` - CLI usage: `--word-embedding-tp col` or `--word-embedding-tp row` ------ ## ✅ Checklist / 检查列表 Before PR: - [x] Bug fixes are fully covered by unit tests, the case that causes the bug should be added in the unit tests. / 修复的 Bug 已完全由单元测试覆盖，导致 Bug 的情况应在单元测试中添加。 - [x] The modification is covered by complete unit tests. If not, please add more unit tests to ensure the correctness. / 此拉取请求中的修改已完全由单元测试覆盖。如果不是，请添加更多单元测试以确保正确性。 - [x] All relevant documentation (API docs, docstrings, example tutorials) has been updated to reflect these changes. / 所有相关文档（API 文档、文档字符串、示例教程）已更新以反映这些更改。 - [x] Please ensure code files contain no Chinese comments. / 请保证代码文件中不含中文注释。 ------ See merge request: Ascend/msmodeling!344	14 天前
docs	refactor(tensor_cast): unify word embedding tp config Co-authored-by: Kudo__shinichi<liuning119@huawei.com> # message auto-generated for no-merge-commit merge: !344 merge codex/word-embedding-tp-normalize into develop refactor(tensor_cast): unify word embedding tp config Created-by: Kudo__shinichi Commit-by: Kudo__shinichi Merged-by: ascend-robot Description: # PR Template Thanks for your contribution; we appreciate it a lot. The following instructions will make your pull request healthier and help you get feedback more easily. If you do not understand some items, don't worry, just make the pull request and seek help from maintainers. 感谢您的贡献，我们非常重视。以下说明将使您的拉取请求更健康，更易于获得反馈。如果您不理解某些项目，请不要担心，只需提交拉取请求并从维护人员那里寻求帮助即可。 PR Type / PR类型 - [ ] Feature（功能新增） - [ ] Bugfix（Bug 修复） - [x] Docs（文档更新） - [ ] CI/CD（持续集成/持续部署） - [x] Refactor（代码重构） - [ ] Perf（性能优化） - [x] Test-Cases（测试用例更新） - [ ] Other（其他） ## 🔍 Motivation / 变更动机 `word_embedding_tp` and `word_embedding_tp_mode` represented the same configuration concept in two fields: one field toggled word embedding TP, and the other selected the TP mode. This PR reduces the public and internal configuration shape to a single parameter so users only need to configure `word_embedding_tp` as disabled, `col`, or `row`. ------ ## 📝 Modification / 修改内容 - Make `UserInputConfig.word_embedding_tp` the single nullable word embedding TP mode field. - Remove `word_embedding_tp_mode` and `embedding_parallel_mode` from the config model. - Pass the normalized `word_embedding_tp` mode directly into `ParallelConfig.embedding_parallel` and the embedding transformation. - Keep legacy bool input normalization for compatibility: `True -> col`, `False/None -> disabled`. - Remove redundant CLI-side bool/mode conversion and update related benchmark cases and user guide docs. - Add regression coverage for single-field config, legacy bool normalization, and invalid `word_embedding_tp` values. ------ ## 📐 Associated Test Results / 关联测试结果 - `python -m pytest tests/regression/tensor_cast/test_user_config.py -q`: 6 passed - `python -m pytest tests/regression/tensor_cast/test_user_config.py tests/regression/web_ui/test_command_builder.py tests/regression/tensor_cast/test_adapter_automation.py -q`: 98 passed - `python -m pytest tests/regression/tensor_cast/test_text_generate.py -k word_embedding_parallel -q`: 2 passed, 113 deselected - `python -m pytest tests/regression/tensor_cast/test_sequence_parallel_pass.py -o addopts= -m "nightly and not npu and not network" -q`: 2 passed - `python -m pytest tests/benchmark/models/test_model_regression.py --collect-only -q`: 15 tests collected - `python -m ruff check <changed python files>`: All checks passed - `python -m pre_commit run --from-ref origin/develop --to-ref HEAD`: passed - `git diff --check HEAD~1 HEAD`: passed ------ ## 🌟 Use cases (Optional) / 使用案例（可选） - Disable word embedding TP: `word_embedding_tp=None` - Enable column mode: `word_embedding_tp="col"` - Enable row mode: `word_embedding_tp="row"` - CLI usage: `--word-embedding-tp col` or `--word-embedding-tp row` ------ ## ✅ Checklist / 检查列表 Before PR: - [x] Bug fixes are fully covered by unit tests, the case that causes the bug should be added in the unit tests. / 修复的 Bug 已完全由单元测试覆盖，导致 Bug 的情况应在单元测试中添加。 - [x] The modification is covered by complete unit tests. If not, please add more unit tests to ensure the correctness. / 此拉取请求中的修改已完全由单元测试覆盖。如果不是，请添加更多单元测试以确保正确性。 - [x] All relevant documentation (API docs, docstrings, example tutorials) has been updated to reflect these changes. / 所有相关文档（API 文档、文档字符串、示例教程）已更新以反映这些更改。 - [x] Please ensure code files contain no Chinese comments. / 请保证代码文件中不含中文注释。 ------ See merge request: Ascend/msmodeling!344	14 天前
experimental	新增组合测试脚本run_throughput_optimizer_cases.py，组合多种设备 × 卡数 × 模型 × 输入/输出长度，输出总结csv，提升整体易用性 Co-authored-by: wangjin<wangjin171@huawei.com> # message auto-generated for no-merge-commit merge: !247 merge develop-w00637429 into develop 新增组合测试脚本run_throughput_optimizer_cases.py，组合多种设备 × 卡数 × 模型 × 输入/输出长度，输出总结csv，提升整体易用性 Created-by: gcw_hasgjVbP Commit-by: wangjin Merged-by: ascend-robot Description: PR Type / PR类型 - [ x] Feature（功能新增） - [ ] Bugfix（Bug 修复） - [ x] Docs（文档更新） - [ ] CI/CD（持续集成/持续部署） - [ ] Refactor（代码重构） - [ ] Perf（性能优化） - [ x] Test-Cases（测试用例更新） - [ ] Other（其他） ## 🔍 Motivation / 变更动机 ------ 紧接着这个pull request：https://gitcode.com/Ascend/msmodeling/pull/106 ## 📝 Modification / 修改内容 ### PR Review Comments & Changes ### Overview PR 功能：新增 `run_throughput_optimizer_cases.py`，通过 CSV 驱动多组 benchmark case，顺序调用 `throughput_optimizer` 的 `ParallelRunner` 运行，结果写入同一张 CSV。本文档记录收到的评审意见及对应修改说明。 --- #### Comment 1: CLI 文件名重命名评审意见： CLI 名字可以换一下，例如 `run_throughput_optimizer_cases`，以前老的 benchmark CLI 已经被 `throughput_optimizer` 取代了，可以将文件名替换成 `run_throughput_optimizer_cases.py`。修改： - 文件重命名：`cli/inference/run_benchmark_cases.py` → `cli/inference/run_throughput_optimizer_cases.py` - 模块 docstring 中所有 `run_benchmark_cases` 引用更新为 `run_throughput_optimizer_cases` - Usage 示例中 `python -m cli.inference.run_benchmark_cases` → `python -m cli.inference.run_throughput_optimizer_cases` --- #### Comment 2: 添加 UT 评审意见： Please add UT. 修改：新增 `tests/test_run_throughput_optimizer_cases.py`，包含 15 个测试类，共 71 个测试用例： \| 测试类 \| 覆盖范围 \| \|--------\|---------\| \| `TestParseListFloat` \| 分号分隔浮点数解析、空值、None、空白 \| \| `TestParseListInt` \| 整数列表解析、空值返回 None \| \| `TestParseBool` \| true/1/yes → True，其他 → False \| \| `TestParseOptionalBool` \| 空值 → None，有效值解析，无效值 → None \| \| `TestParseMode` \| agg/disagg/空值/无效值 \| \| `TestParseParallel` \| tp/pp/dp 解析（紧凑格式 + disagg 冗长格式）、无效格式 \| \| `TestSingleLimit` \| 空列表 → None、单值返回、多值抛 ValueError \| \| `TestLoadCasesFromCsv` \| CSV 加载、默认值、无效量化报错、空行跳过、缺少 case_name 自动命名 \| \| `TestWriteTemplateCsv` \| 模板 header 正确、示例行存在、tpot 默认值 50ms \| \| `TestBuildOptimizerArgs` \| agg/disagg 模式、_single_limit 应用、多值 ttft/tpot 报错 \| \| `TestBenchmarkResult` \| CSV header 与行长度一致、CSV 读写 roundtrip、无 error 属性 \| \| `TestParseArgs` \| argparse 各参数解析、默认值、--help 退出 \| \| `TestSaveResultsToCsv` \| 结果正确写入、FLUSH_BATCH_SIZE 常量验证 \| \| `TestDefaultTpotLimitMs` \| 默认值为 50.0ms \| \| `TestIntegrationExampleCase` \| 基于 example_cases.csv 输入和 result.csv 输出的端到端集成测试 \| `TestIntegrationExampleCase` 包含 5 个测试用例： - `test_csv_load_parses_example_case`：验证 CSV 行解析为 BenchmarkCase 的所有字段 - `test_build_optimizer_args_from_example_case`：验证 BenchmarkCase → Namespace 转换及 ParallelRunner 所需默认属性 - `test_result_row_from_example_output`：验证 BenchmarkResult → _result_row 输出与实际 result.csv 一致 - `test_full_csv_roundtrip`：端到端 CSV 写入 → 加载 → 构造结果 → 保存 → 读回验证 - `test_parse_parallel_disagg_output`：验证 disagg 模式 `"TP=1 \| PP=1 \| DP=64"` 格式解析 --- #### Comment 3: tpot_limits 单位确认评审意见： throughput_optimizer 里 tpot_limits 单位应该是 ms，请确认单位是否有误。修改：确认 throughput_optimizer 中 ttft_limits 和 tpot_limits 均为毫秒（ms）单位（`parallel_runner.py` 日志 `"Run Aggregation with ttft %r ms, tpot %r ms."`）。原代码默认值 `0.05` 为秒，与链路 ms 语义不一致，已在 Comment 5 中修复。 --- #### Comment 4: 使用 argparse 模块实现参数解析评审意见：建议使用 argparse 模块实现参数解析，增加可维护性和扩展性。修改：将 `_parse_args()` 从手动 while 循环解析替换为 `argparse.ArgumentParser`： `python def _parse_args(): parser = argparse.ArgumentParser( prog="run_throughput_optimizer_cases", description="...", ) parser.add_argument("--input-csv", ...) parser.add_argument("--write-template", ...) parser.add_argument("--output-csv", ...) parser.add_argument("--test-conversion", action="store_true", ...) args = parser.parse_args() return args.input_csv, args.write_template, args.output_csv, args.test_conversion` 返回类型（4-tuple）不变，`--num-workers` 参数已移除（当前仅支持顺序执行，无意义参数不应保留），`__main__` 块相应更新。新增 `--help` 自动生成帮助信息。 --- #### Comment 5: tpot_limits 默认值修正评审意见： tpot_limits 为空时默认值写成 0.05，而链路按 ms 语义处理，会把默认 SLO 变成极端严格值（0.05ms），应统一成显式毫秒常量并使用 50.0。修改： - 新增模块级常量 `DEFAULT_TPOT_LIMIT_MS = 50.0` - `load_cases_from_csv` 中 `tpot_limits = [0.05]` → `tpot_limits = [DEFAULT_TPOT_LIMIT_MS]` - `run_example` 中 `tpot_limits=[50]` → `tpot_limits=[DEFAULT_TPOT_LIMIT_MS]` - `_test_result_conversion` 中 `tpot_limits=[0.05]` → `tpot_limits=[DEFAULT_TPOT_LIMIT_MS]` - `_test_result_conversion` 中 mock 数据 `tpot=500` → `tpot=40`（原值 500 > 0.05 导致 SLO 过滤始终失败，测试实际不校验通过；改为 40 ≤ 50ms 使测试逻辑正确） - `write_template_csv` 中示例行 `tpot_limits` 使用 `str(int(DEFAULT_TPOT_LIMIT_MS))` --- #### Comment 6: 量化 Action 解析失败时应报错而非静默降级评审意见： quantize_linear_action / quantize_attention_action 解析失败时被静默降级为 None，后续回退默认量化配置，导致用户配置错误被"吞掉"。建议在配置非法时直接报错并给出可选值。修改： `python # 修改前 try: linear_action = QuantizeLinearAction(q_linear) if q_linear else None except ValueError: linear_action = None # 静默降级 # 修改后 if q_linear: try: linear_action = QuantizeLinearAction(q_linear) except ValueError: valid = ", ".join(e.value for e in QuantizeLinearAction) raise ValueError( f"Row case_name={case_name}: invalid quantize_linear_action '{q_linear}'. " f"Valid options: {valid}" ) from None else: linear_action = None` `quantize_attention_action` 同理处理。UT 中 `test_invalid_quantize_linear_raises`、`test_invalid_quantize_attention_raises`、`test_error_message_lists_valid_quantize_options` 覆盖此变更。 --- #### Comment 7: BenchmarkResult 移除 error 字段评审意见： BenchmarkResult 中不需要带有 best_decode_error 和 best_prefill_error。代码保持简洁高效。修改： - 从 `BenchmarkResult` dataclass 中移除 `best_decode_error` 和 `best_prefill_error` 字段 - 从 `_csv_header_and_ref_row` 的输出 header 中移除 `Decode_Error Message` 和 `Prefill_Error Message` 列 - 从 `_result_row` 中移除对应字段 - UT 中 `test_no_error_fields_in_result` 验证 BenchmarkResult 不包含 error 属性 --- #### Comment 8: project_root 路径层级修正评审意见：当前 project_root 计算层级少了一层，得到的是 `.../cli` 而不是仓库根目录；在非仓库根目录启动脚本时，tensor_cast 等包可能导入失败。建议改为基于 Path(\_\_file\_\_) 向上三级定位根目录。修改： `python # 修改前 project_root = os.path.dirname(os.path.dirname(os.path.abspath(__file__))) # 结果: .../msmodeling/cli/ (少一层) # 修改后 project_root = str(Path(__file__).resolve().parents[2]) # 路径: run_throughput_optimizer_cases.py -> inference/ -> cli/ -> msmodeling/ (正确)` 新增 `from pathlib import Path`，移除不再使用的 `import os`。 --- #### Comment 9: 配套文档和 CI 门禁评审意见： CLI 脚本对外使用，建议增加配套文档指导章节，配套 CI 门禁看护。修改： - 新增 `docs/en/run_throughput_optimizer_cases.md` 使用指南文档 - CI 门禁作为后续 follow-up：在 CI 流水线中添加 `tests/test_run_throughput_optimizer_cases.py` 执行步骤 --- #### Comment 10: ttft_limits/tpot_limits 多值静默忽略问题评审意见： ttft_limits/tpot_limits 在 CSV 中声明为列表输入，但实际执行仅使用第一个元素，容易导致用户误以为已覆盖多组 SLO。建议在当前单值执行模型下显式限制为单值并报错。修改：新增 `_single_limit` 辅助函数： `python def _single_limit(values: List[float], name: str) -> Optional[float]: if not values: return None if len(values) > 1: raise ValueError( f"{name} accepts at most one value, got {len(values)}: {values}" ) return values[0]` 在 `_build_optimizer_args` 和 `_summary_results_to_benchmark_result` 中使用 `_single_limit` 替代原来的 `case.ttft_limits[0] if case.ttft_limits else None`。多值输入现在会抛出明确错误，引导用户拆分为多行。UT 中 `TestSingleLimit` 和 `TestBuildOptimizerArgs` 覆盖此变更。 --- #### Comment 11: 批量 flush 优化评审意见：循环中每处理一个 case 就写入 CSV 并 flush，频繁 I/O 可能影响性能；如果案例数量较多（>100），考虑批量写入或使用缓冲。修改： - 新增常量 `FLUSH_BATCH_SIZE = 10` - `run_cases_and_save` 中将每个 case 后的 `f.flush()` 改为每 `FLUSH_BATCH_SIZE` 个 case flush 一次，并在循环结束后最终 flush： `python for idx, case in enumerate(cases, 1): result = run_benchmark_case(case) all_results.append(result) writer.writerow(_result_row(result)) if idx % FLUSH_BATCH_SIZE == 0: f.flush() # Final flush after all cases f.flush()` 在保证每 10 个 case 即可容错恢复的前提下，减少 I/O syscall 次数。UT 中 `test_batch_flush_constant` 验证常量值。 --- ## 📐 Associated Test Results / 关联测试结果 Please provide the related test results, such as test reports, etc. 请提供相关测试结果，例如测试报告等。 ![image.png](https://raw.gitcode.com/user-images/assets/8428112/e4d0e506-b1b7-4d95-b032-5ab78ed9aebf/image.png 'image.png') ![image.png](https://raw.gitcode.com/user-images/assets/8428112/41dab5eb-f190-4a78-9a1f-6b2470400900/image.png 'image.png') ------ ## 🌟 Use cases (Optional) / 使用案例（可选） If this PR introduces a new feature, it is better to list some use cases here and update the documentation. 如果此拉取请求引入了新功能，最好在此处列出一些用例并更新文档。见pr关联的文档 ------ ## ✅ Checklist / 检查列表 Before PR: - [x ] Bug fixes are fully covered by unit tests, the case that causes the bug should be added in the unit tests. / 修复的 Bug 已完全由单元测试覆盖，导致 Bug 的情况应在单元测试中添加。 - [x ] The modification is covered by complete unit tests. If not, please add more unit tests to ensure the correctness. / 此拉取请求中的修改已完全由单元测试覆盖。如果不是，请添加更多单元测试以确保正确性。 - [x ] All relevant documentation (API docs, docstrings, example tutorials) has been updated to reflect these changes. / 所有相关文档（API 文档、文档字符串、示例教程）已更新以反映这些更改。 - [x ] Please ensure code files contain no Chinese comments. / 请保证代码文件中不含中文注释。 ------ See merge request: Ascend/msmodeling!247	15 天前
pre-commit	feat: pipe pre-commit output through LLM renderer for compact diagnostics Co-authored-by: liujiawang<anonymousdev@163.com> # message auto-generated for no-merge-commit merge: !197 merge pre-commit into develop feat: pipe pre-commit output through LLM renderer for compact diagnostics Created-by: AvadaKedavrua Commit-by: liujiawang Merged-by: ascend-robot Description: PR Type / PR类型 - [x] Feature（功能新增） - [ ] Bugfix（Bug 修复） - [ ] Docs（文档更新） - [ ] CI/CD（持续集成/持续部署） - [ ] Refactor（代码重构） - [ ] Perf（性能优化） - [ ] Test-Cases（测试用例更新） - [ ] Other（其他） ## 🔍 Motivation / 变更动机 Idea 来自于 rtk，让 pre-commit 的报错信息更适合 LLM 解决，省 80% 的 token。使用方式: 1. `uv run pre-commit install` 注册 git hooks 2. `patch .git/hooks/pre-commit pre-commit/llm_render.patch` 第一步是注册 pre-commit，是使用 pre-commit 必备的一步第二步是 patch hook，让输出对接到 llm_render.py。当 `PRE_COMMIT_LLM_FILTER=1` 生效时，才会进行 filter，性能损耗可忽略不计如果想手动执行，在根目录下： `PRE_COMMIT_LLM_FILTER=1 uv run pre-commit run \| pre-commit/llm_render.py` 即可 ------ ## 📝 Modification / 修改内容 Please briefly describe what modification is made in this PR. 请简要描述此拉取请求中进行的修改。 ------ ## 📐 Associated Test Results / 关联测试结果 Before: `` [WARNING] Unstaged files detected. [INFO] Stashing unstaged files to /root/.cache/pre-commit/patch1778825692-1277080. trim trailing whitespace.................................................Passed fix end of files.........................................................Failed - hook id: end-of-file-fixer - exit code: 1 - files were modified by this hook Fixing test.py check yaml...........................................(no files to check)Skipped check for added large files..............................................Passed check for merge conflicts................................................Passed detect private key.......................................................Passed check json...........................................(no files to check)Skipped ruff check...............................................................Failed - hook id: ruff-check - exit code: 1 ::error title=Ruff (invalid-syntax),file=/root/workspace/gitcode/msmodeling/test.py,line=1,col=8,endLine=1,endColumn=9::test.py:1:8: invalid-syntax: Expected :`, found` = ::error title=Ruff (invalid-syntax),file=/root/workspace/gitcode/msmodeling/test.py,line=1,col=10,endLine=1,endColumn=11::test.py:1:10: invalid-syntax: Invalid annotated assignment target ::error title=Ruff (invalid-syntax),file=/root/workspace/gitcode/msmodeling/test.py,line=1,endLine=2::test.py:1:12: invalid-syntax: Expected an expression ::error title=Ruff (invalid-syntax),file=/root/workspace/gitcode/msmodeling/test.py,line=2,col=1,endLine=2,endColumn=5::test.py:2:1: invalid-syntax: Unexpected indentation ::error title=Ruff (invalid-syntax),file=/root/workspace/gitcode/msmodeling/test.py,line=3,col=1,endLine=3,endColumn=1::test.py:3:1: invalid-syntax: Expected a statement ruff format..............................................................Failed - hook id: ruff-format - exit code: 2 error: Failed to parse test.py:1:8: Expected :`, found` = codespell................................................................Passed pylint (Python code quality check).......................................Failed - hook id: pylint - exit code: 2 *********** Module msmodeling.test test.py:1:4: E0001: Parsing failed: 'cannot assign to literal here. Maybe you meant '==' instead of '='? (msmodeling.test, line 1)' (syntax-error) bandit (Python security checks)..........................................Passed typos....................................................................Passed [WARNING] Stashed changes conflicted with hook auto-fixes... Rolling back fixes... [INFO] Restored changes from /root/.cache/pre-commit/patch1778825692-1277080. `After:` `end-of-file-fixer - test.py > (fixed) ruff-check - msmodeling/test.py > 1:8 invalid-syntax: test.py:1:8: invalid-syntax: Expected` :`, found` = `> 1:10 invalid-syntax: test.py:1:10: invalid-syntax: Invalid annotated assignment target > 2:1 invalid-syntax: test.py:2:1: invalid-syntax: Unexpected indentation > 3:1 invalid-syntax: test.py:3:1: invalid-syntax: Expected a statement ruff-format - test.py > 1:8 Expected` :`, found` = `pylint - test.py > 1:4 E0001: Parsing failed: 'cannot assign to literal here. Maybe you meant '==' instead of '='? (msmodeling.test, line 1)' (syntax-error)` `` ------ ## 🌟 Use cases (Optional) / 使用案例（可选） If this PR introduces a new feature, it is better to list some use cases here and update the documentation. 如果此拉取请求引入了新功能，最好在此处列出一些用例并更新文档。 ------ ## ✅ Checklist / 检查列表 Before PR**: - [ ] Bug fixes are fully covered by unit tests, the case that causes the bug should be added in the unit tests. / 修复的 Bug 已完全由单元测试覆盖，导致 Bug 的情况应在单元测试中添加。 - [ ] The modification is covered by complete unit tests. If not, please add more unit tests to ensure the correctness. / 此拉取请求中的修改已完全由单元测试覆盖。如果不是，请添加更多单元测试以确保正确性。 - [ ] All relevant documentation (API docs, docstrings, example tutorials) has been updated to reflect these changes. / 所有相关文档（API 文档、文档字符串、示例教程）已更新以反映这些更改。 - [ ] Please ensure code files contain no Chinese comments. / 请保证代码文件中不含中文注释。 ------ See merge request: Ascend/msmodeling!197	1 个月前
scripts	【REFACTOR】CI Gate 联合执行与 Nightly 流水线加固 Co-authored-by: liujiawang<anonymousdev@163.com> # message auto-generated for no-merge-commit merge: !342 merge fix into develop 【REFACTOR】CI Gate 联合执行与 Nightly 流水线加固 Created-by: AvadaKedavrua Commit-by: liujiawang Merged-by: ascend-robot Description: ## 修改原因 CI Gate / Nightly helpers 在 union 重构后存在多处逻辑与体验问题： - 全豁免测试仍跑 Phase 2（#112） - Nightly audit 使用过期 test_map（#113） - coverage fallback 在 union 重构后断裂（#119） - `PRODUCT_SOURCE_PREFIXES` 与 `gate_policy.yaml` 双源 roots - 重复 collect/git diff、日志含内部术语、nightly 缺终端摘要本 PR 统一执行模型并补齐回归测试与文档。 --- ## 修改内容 CI Gate (`scripts/helpers/ci_gate/`) - 预跑硬阻断仅保留删除测试/删除源码；覆盖率映射改为跑后软策略 - `compute_execution_plan` 联合去重 pytest waves；产品/测试变更时附加 `--cov` - `fetch_diff()` 单次 git diff；`load_gate_policy` 按 yaml mtime 缓存 - `roots` 单一来源：`tests/.ci/gate_policy.yaml` - 用户向英文日志与成功摘要 `print` Nightly (`scripts/helpers/nightly/`) - 过期/冗余审计使用新鲜 test_map（#113） - 终端摘要、英文 phase 标签；移除 drift TODO - `allowed_node_ids` 复用、弱覆盖符号检测传 mapping Common / Policy - `coverage_config.py` 懒加载 `product_roots()` - `gate_policy.yaml`：`tests/helpers/` exclude - 文档同步：`tests/README.md`、`docs/design/ut_refactor.md`、`tests/SKILL.md` 关联 issue：#112 #113 #114 #115 #116 #117 #119 #120 #121 #122 #123 #124 --- ## 优化特性对比（Before / After） ### 特性 1：联合去重 pytest 执行（核心优化）场景：同一 PR 同时改了产品代码 `tensor_cast/foo.py` 和测试文件 `tests/regression/cli/test_shared.py`，且 `test_shared.py::test_x` 恰好也是 `foo.py` 在 `test_map` 里的回归用例。 #### Before（分阶段、可能重复跑） text # 1) 预跑：新源码无 test_map 映射 → 直接硬阻断，pytest 还没跑 BLOCK: tensor_cast/foo.py has no test_map entry for symbol Foo.bar # 2) Phase 0：单独跑一轮带 --cov 的 pytest 做 coverage fallback pytest tests/regression/cli/test_shared.py -m not npu --cov ... # 3) Phase 1：跑变更测试 pytest tests/regression/cli/test_shared.py::test_x -m not npu # 4) Phase 2：再跑映射回归（与 Phase 1 重叠） pytest tests/regression/cli/test_shared.py::test_x \ tests/regression/cli/test_other.py::test_y \ -m "not npu and not nightly and not network" → test_shared.py::test_x 被执行 2~3 次；git diff 可能重复 fetch；coverage 与映射校验割裂 #### After（先跑、后验、去重） text # 1) 预跑硬阻断：仅删测试 / 删源码 Validating hard-blocking policy ... (no block — foo.py 是修改不是删除) # 2) 计划：联合调度，node id 去重（changed-test 优先） Scheduling 2 test node(s): new or changed test file Scheduling 1 test node(s): changed product file mapped regression Sample node(s): tests/regression/cli/test_shared.py::test_x, ... Execution uses 2 pytest wave(s) after deduplication # 3) 单次 union pytest（附带 --cov --cov-context=test） Wave 1 (-m not npu): tests/regression/cli/test_shared.py::test_x # changed-test，只跑一次 Wave 2 (-m not npu and not nightly and not network): tests/regression/cli/test_other.py::test_y # 纯回归，test_x 不再重复 # 4) 跑后软策略：用同一次 .coverage 做 mapping fallback Checking new/modified source coverage mapping against collected data ... CI gate passed: 2 test node(s) (new or changed test file; changed product file mapped regression) → 同一 node 只执行 1 次；coverage fallback 与 pytest 同轮完成；CI 耗时更短、行为更可预期 ### 特性 2：配置变更触发全量，跳过无效 collect 场景**：PR 只改了 `pyproject.toml`（依赖/测试配置变更）。 #### Before `text config change detected → full suite still collect changed test files in Phase 1 ... # 多余 collect pytest tests/ -m not npu` #### After `text Config path(s): pyproject.toml Selected full test suite: dependency or test configuration changed pytest tests/ -m not npu # changed_test_nodes 为空，跳过 gate_new_tests collect CI gate passed: full test suite (pyproject.toml)` ### 特性 3：Nightly 终端摘要（用户可读） #### Before `text Phase 2a done. elapsed=1832.4s # 流水线结束，无一行总览；内部 phase 编号` #### After `text Nightly pipeline finished. test_map: written (line 74.1%, branch 61.8%) nightly-marked: 142 passed in 1832s benchmark: 38 passed in 412s network: 12 passed in 89s weak coverage symbols: 3 report: /path/to/nightly_report.json` --- ## 自验证 ### Helpers 回归测试目的：确认 CI Gate / Nightly / Common helpers 全量回归通过步骤： 1. 进入仓库根目录 2. 执行： `bash uv run python -m pytest tests/regression/scripts/helpers/ -q` 结果： `341 passed, 5 warnings in 1.02s` ### gate_policy 缓存失效目的：确认 yaml mtime 变化后 `load_gate_policy` 缓存失效步骤： 1. 运行单测： `bash uv run python -m pytest tests/regression/scripts/helpers/ci_gate/test_gate_policy.py::test_load_gate_policy_cached_until_yaml_mtime_changes -v` 结果：PASSED（修复同秒写入 mtime 未变导致的 flaky） ### pre-commit 目的：提交前 hook 全绿步骤： 1. `git commit` 触发 pre-commit（ruff、pylint、bandit、typos 等）结果：全部 Passed See merge request: Ascend/msmodeling!342	14 天前
serving_cast	feat(tensor_cast): Organize serving_cast.main output structure Co-authored-by: Elrond G<elrondgcn@gmail.com> # message auto-generated for no-merge-commit merge: !226 merge feature/develop/serving_cast_output_file into develop feat(tensor_cast): Organize serving_cast.main output structure Created-by: elrond-g Commit-by: Elrond G Merged-by: ascend-robot Description: # PR Template Thanks for your contribution; we appreciate it a lot. The following instructions will make your pull request healthier and help you get feedback more easily. If you do not understand some items, don't worry, just make the pull request and seek help from maintainers. 感谢您的贡献，我们非常重视。以下说明将使您的拉取请求更健康，更易于获得反馈。如果您不理解某些项目，请不要担心，只需提交拉取请求并从维护人员那里寻求帮助即可。 PR Type / PR类型 - [x] Feature（功能新增） - [ ] Bugfix（Bug 修复） - [ ] Docs（文档更新） - [ ] CI/CD（持续集成/持续部署） - [ ] Refactor（代码重构） - [ ] Perf（性能优化） - [ ] Test-Cases（测试用例更新） - [ ] Other（其他） ## 🔍 Motivation / 变更动机 Please describe the motivation of this PR and the goal you want to achieve through this PR. 请描述您的拉取请求的动机和您希望通过此拉取请求实现的目标。 Used for structured interaction between cli tools and other services ------ ## 📝 Modification / 修改内容 Please briefly describe what modification is made in this PR. 请简要描述此拉取请求中进行的修改。 Add a parameter "--output-json" to the running result of the cli tool tensor_cast.scripts.text_generate, and save it to the specified file. ------ ## 📐 Associated Test Results / 关联测试结果 Please provide the related test results, such as test reports, etc. 请提供相关测试结果，例如测试报告等。 `bash python -m serving_cast.main --instance_config_path serving_cast/example/instances.yaml --common_config_path serving_cast/example/common.yaml --output_json output_json.json` ------ ## 🌟 Use cases (Optional) / 使用案例（可选） If this PR introduces a new feature, it is better to list some use cases here and update the documentation. 如果此拉取请求引入了新功能，最好在此处列出一些用例并更新文档。 `args: --output_json", type: string (file_name) default: None, desc: If set, write the benchmark summary (per-metric table and overall summary) as a structured JSON to this file path.` ------ ## ✅ Checklist / 检查列表 Before PR: - [ ] Bug fixes are fully covered by unit tests, the case that causes the bug should be added in the unit tests. / 修复的 Bug 已完全由单元测试覆盖，导致 Bug 的情况应在单元测试中添加。 - [ ] The modification is covered by complete unit tests. If not, please add more unit tests to ensure the correctness. / 此拉取请求中的修改已完全由单元测试覆盖。如果不是，请添加更多单元测试以确保正确性。 - [ ] All relevant documentation (API docs, docstrings, example tutorials) has been updated to reflect these changes. / 所有相关文档（API 文档、文档字符串、示例教程）已更新以反映这些更改。 - [ ] Please ensure code files contain no Chinese comments. / 请保证代码文件中不含中文注释。 ------ See merge request: Ascend/msmodeling!226	16 天前
tensor_cast	refactor(tensor_cast): unify word embedding tp config Co-authored-by: Kudo__shinichi<liuning119@huawei.com> # message auto-generated for no-merge-commit merge: !344 merge codex/word-embedding-tp-normalize into develop refactor(tensor_cast): unify word embedding tp config Created-by: Kudo__shinichi Commit-by: Kudo__shinichi Merged-by: ascend-robot Description: # PR Template Thanks for your contribution; we appreciate it a lot. The following instructions will make your pull request healthier and help you get feedback more easily. If you do not understand some items, don't worry, just make the pull request and seek help from maintainers. 感谢您的贡献，我们非常重视。以下说明将使您的拉取请求更健康，更易于获得反馈。如果您不理解某些项目，请不要担心，只需提交拉取请求并从维护人员那里寻求帮助即可。 PR Type / PR类型 - [ ] Feature（功能新增） - [ ] Bugfix（Bug 修复） - [x] Docs（文档更新） - [ ] CI/CD（持续集成/持续部署） - [x] Refactor（代码重构） - [ ] Perf（性能优化） - [x] Test-Cases（测试用例更新） - [ ] Other（其他） ## 🔍 Motivation / 变更动机 `word_embedding_tp` and `word_embedding_tp_mode` represented the same configuration concept in two fields: one field toggled word embedding TP, and the other selected the TP mode. This PR reduces the public and internal configuration shape to a single parameter so users only need to configure `word_embedding_tp` as disabled, `col`, or `row`. ------ ## 📝 Modification / 修改内容 - Make `UserInputConfig.word_embedding_tp` the single nullable word embedding TP mode field. - Remove `word_embedding_tp_mode` and `embedding_parallel_mode` from the config model. - Pass the normalized `word_embedding_tp` mode directly into `ParallelConfig.embedding_parallel` and the embedding transformation. - Keep legacy bool input normalization for compatibility: `True -> col`, `False/None -> disabled`. - Remove redundant CLI-side bool/mode conversion and update related benchmark cases and user guide docs. - Add regression coverage for single-field config, legacy bool normalization, and invalid `word_embedding_tp` values. ------ ## 📐 Associated Test Results / 关联测试结果 - `python -m pytest tests/regression/tensor_cast/test_user_config.py -q`: 6 passed - `python -m pytest tests/regression/tensor_cast/test_user_config.py tests/regression/web_ui/test_command_builder.py tests/regression/tensor_cast/test_adapter_automation.py -q`: 98 passed - `python -m pytest tests/regression/tensor_cast/test_text_generate.py -k word_embedding_parallel -q`: 2 passed, 113 deselected - `python -m pytest tests/regression/tensor_cast/test_sequence_parallel_pass.py -o addopts= -m "nightly and not npu and not network" -q`: 2 passed - `python -m pytest tests/benchmark/models/test_model_regression.py --collect-only -q`: 15 tests collected - `python -m ruff check <changed python files>`: All checks passed - `python -m pre_commit run --from-ref origin/develop --to-ref HEAD`: passed - `git diff --check HEAD~1 HEAD`: passed ------ ## 🌟 Use cases (Optional) / 使用案例（可选） - Disable word embedding TP: `word_embedding_tp=None` - Enable column mode: `word_embedding_tp="col"` - Enable row mode: `word_embedding_tp="row"` - CLI usage: `--word-embedding-tp col` or `--word-embedding-tp row` ------ ## ✅ Checklist / 检查列表 Before PR: - [x] Bug fixes are fully covered by unit tests, the case that causes the bug should be added in the unit tests. / 修复的 Bug 已完全由单元测试覆盖，导致 Bug 的情况应在单元测试中添加。 - [x] The modification is covered by complete unit tests. If not, please add more unit tests to ensure the correctness. / 此拉取请求中的修改已完全由单元测试覆盖。如果不是，请添加更多单元测试以确保正确性。 - [x] All relevant documentation (API docs, docstrings, example tutorials) has been updated to reflect these changes. / 所有相关文档（API 文档、文档字符串、示例教程）已更新以反映这些更改。 - [x] Please ensure code files contain no Chinese comments. / 请保证代码文件中不含中文注释。 ------ See merge request: Ascend/msmodeling!344	14 天前
tests	refactor(tensor_cast): unify word embedding tp config Co-authored-by: Kudo__shinichi<liuning119@huawei.com> # message auto-generated for no-merge-commit merge: !344 merge codex/word-embedding-tp-normalize into develop refactor(tensor_cast): unify word embedding tp config Created-by: Kudo__shinichi Commit-by: Kudo__shinichi Merged-by: ascend-robot Description: # PR Template Thanks for your contribution; we appreciate it a lot. The following instructions will make your pull request healthier and help you get feedback more easily. If you do not understand some items, don't worry, just make the pull request and seek help from maintainers. 感谢您的贡献，我们非常重视。以下说明将使您的拉取请求更健康，更易于获得反馈。如果您不理解某些项目，请不要担心，只需提交拉取请求并从维护人员那里寻求帮助即可。 PR Type / PR类型 - [ ] Feature（功能新增） - [ ] Bugfix（Bug 修复） - [x] Docs（文档更新） - [ ] CI/CD（持续集成/持续部署） - [x] Refactor（代码重构） - [ ] Perf（性能优化） - [x] Test-Cases（测试用例更新） - [ ] Other（其他） ## 🔍 Motivation / 变更动机 `word_embedding_tp` and `word_embedding_tp_mode` represented the same configuration concept in two fields: one field toggled word embedding TP, and the other selected the TP mode. This PR reduces the public and internal configuration shape to a single parameter so users only need to configure `word_embedding_tp` as disabled, `col`, or `row`. ------ ## 📝 Modification / 修改内容 - Make `UserInputConfig.word_embedding_tp` the single nullable word embedding TP mode field. - Remove `word_embedding_tp_mode` and `embedding_parallel_mode` from the config model. - Pass the normalized `word_embedding_tp` mode directly into `ParallelConfig.embedding_parallel` and the embedding transformation. - Keep legacy bool input normalization for compatibility: `True -> col`, `False/None -> disabled`. - Remove redundant CLI-side bool/mode conversion and update related benchmark cases and user guide docs. - Add regression coverage for single-field config, legacy bool normalization, and invalid `word_embedding_tp` values. ------ ## 📐 Associated Test Results / 关联测试结果 - `python -m pytest tests/regression/tensor_cast/test_user_config.py -q`: 6 passed - `python -m pytest tests/regression/tensor_cast/test_user_config.py tests/regression/web_ui/test_command_builder.py tests/regression/tensor_cast/test_adapter_automation.py -q`: 98 passed - `python -m pytest tests/regression/tensor_cast/test_text_generate.py -k word_embedding_parallel -q`: 2 passed, 113 deselected - `python -m pytest tests/regression/tensor_cast/test_sequence_parallel_pass.py -o addopts= -m "nightly and not npu and not network" -q`: 2 passed - `python -m pytest tests/benchmark/models/test_model_regression.py --collect-only -q`: 15 tests collected - `python -m ruff check <changed python files>`: All checks passed - `python -m pre_commit run --from-ref origin/develop --to-ref HEAD`: passed - `git diff --check HEAD~1 HEAD`: passed ------ ## 🌟 Use cases (Optional) / 使用案例（可选） - Disable word embedding TP: `word_embedding_tp=None` - Enable column mode: `word_embedding_tp="col"` - Enable row mode: `word_embedding_tp="row"` - CLI usage: `--word-embedding-tp col` or `--word-embedding-tp row` ------ ## ✅ Checklist / 检查列表 Before PR: - [x] Bug fixes are fully covered by unit tests, the case that causes the bug should be added in the unit tests. / 修复的 Bug 已完全由单元测试覆盖，导致 Bug 的情况应在单元测试中添加。 - [x] The modification is covered by complete unit tests. If not, please add more unit tests to ensure the correctness. / 此拉取请求中的修改已完全由单元测试覆盖。如果不是，请添加更多单元测试以确保正确性。 - [x] All relevant documentation (API docs, docstrings, example tutorials) has been updated to reflect these changes. / 所有相关文档（API 文档、文档字符串、示例教程）已更新以反映这些更改。 - [x] Please ensure code files contain no Chinese comments. / 请保证代码文件中不含中文注释。 ------ See merge request: Ascend/msmodeling!344	14 天前
tools	【FEAT】MindStudio CLI 统一 stderr Logo Co-authored-by: liujiawang<anonymousdev@163.com> # message auto-generated for no-merge-commit merge: !307 merge feat-logo into develop 【FEAT】MindStudio CLI 统一 stderr Logo Created-by: AvadaKedavrua Commit-by: liujiawang Merged-by: ascend-robot Description: ## 修改原因 MindStudio Modeling 各 Python CLI 启动时缺少统一品牌标识，用户在仿真、吞吐寻优、适配与 profiling 工具间切换时难以从终端首屏确认产品归属。本 PR 在 `parse_args` 成功后向 stderr 输出固定四行 MindStudio Logo，并支持 TTY/TERM 降级与 Windows `colorama` 控制台初始化。 --- ## 修改内容 - 新增共享模块 `cli/logo.py`：`render_logo` / `print_logo`，65 列块 + 终端居中 + ANSI/纯文本降级 - 在 11 个 Python 入口（`cli/inference/`、`serving_cast/main.py`、`tools/perf_data_collection` 驱动脚本）于 `parse_args` 后调用 `print_logo()`；`--help` 路径不输出 Logo - 依赖： - 运行时* `colorama>=0.4.6` — 写入 `[project] dependencies` 与 `requirements.txt`，Windows 上调用 `just_fix_windows_console()` 启用控制台 VT/ANSI 输出 - CI 静态检查 `types-colorama>=0.4.15` — 写入 `[dependency-groups] ci`，见下方说明 - 详设文档：`docs/design/mindstudio-brand-logo-design.md`（本仓仅 Python 范围） - 测试：`tests/regression/cli/test_logo.py`（14 条模块 UT）+ `tests/regression/cli/test_logo_cli_hooks.py`（help 抑制与入口 hook 回归，in-process `run_module_main`） ### 为何需要 `types-colorama`（CI 组，非运行时） `cli/logo.py` 在 Windows 路径下会调用 `colorama.just_fix_windows_console()`。`colorama` 包本身未提供完整的 inline 类型注解，mypy / 仓库 `type_check` 在无 stub 时会报 Cannot find implementation or library stub for module named "colorama"，或将其视为 untyped 调用。 `types-colorama` 是社区维护的 PEP 561 stub 包（`.pyi`），仅用于开发态与 CI 的类型检查，不会随默认 `uv sync` 进入用户仿真运行时环境（位于 `dependency-groups.ci`，与 `pytest-cov` 等工具同属 CI 组）。加入该依赖的目的： 1. 让 `uv sync --group ci` + mypy 能正确解析 `colorama` API，满足本 PR 静态检查门禁，无需在业务代码中使用 `# type: ignore` 绕过规范 2. 与项目现有做法一致：第三方库缺类型时，在 `ci` 组补 `types-` stub，而非放宽 mypy 配置若仅安装运行时依赖（`uv sync` / `pip install -r requirements.txt`），不需要也不会*安装 `types-colorama`；Logo 功能仅依赖运行时 `colorama`。 --- ## 自验证 ### Logo 四行块渲染（纯文本 / 80 列居中）目的：确认固定四行布局、品牌行与 Slogan 居中、无前置空行。步骤： 1. 在仓库根目录执行： `bash uv run python -c "from cli.logo import render_logo; print(render_logo(color=False, terminal_cols=80))"` 结果： ![Logo plain render](https://raw.atomgit.com/Ascend/msmodeling/attachment/uploads/46913ba2-005e-402e-9b1c-4c70dc556ce1/pr307-logo-plain.png) ### Logo 模块 + CLI hook 回归测试目的：满足 CI Gate 对新增 `print_logo` 路径的覆盖；确认 `--help` 不泄漏 Logo，正常 `parse_args` 后 stderr 含品牌块。步骤： 1. 在仓库根目录执行： `bash uv run pytest tests/regression/cli/test_logo.py tests/regression/cli/test_logo_cli_hooks.py -v --tb=no` 结果： ![pytest logo tests](https://raw.atomgit.com/Ascend/msmodeling/attachment/uploads/b0062152-4981-4c64-a6d1-9f70f209a1a9/pr307-logo-pytest.png) ### `--help` 不输出 Logo 目的：确认 argparse 在 `print_logo` 之前退出，help 路径保持干净。步骤： 1. 执行： `bash uv run python -m cli.inference.text_generate --help 2>&1 \| head -5` 结果： ![help without logo](https://raw.atomgit.com/Ascend/msmodeling/attachment/uploads/a04794d0-fffd-44e3-8a8e-632f7f2d7f5b/pr307-logo-help.png) ### 端到端 ![image.png](https://raw.atomgit.com/user-images/assets/8428112/d984e234-87c7-4e91-81d2-ceeb0120a22b/image.png 'image.png') See merge request: Ascend/msmodeling!307	17 天前
web_ui	绑定web_ui运行地址 Co-authored-by: zwt<zhuweite@huawei.com> # message auto-generated for no-merge-commit merge: !335 merge develop into develop 绑定web_ui运行地址 Created-by: zwt__ Commit-by: zwt Merged-by: ascend-robot Description: # PR Template PR Type / PR类型 - [ ] Feature（功能新增） - [x] Bugfix（Bug 修复） - [ ] Docs（文档更新） - [ ] CI/CD（持续集成/持续部署） - [ ] Refactor（代码重构） - [ ] Perf（性能优化） - [x] Test-Cases（测试用例更新） - [ ] Other（其他） ## 🔍 Motivation / 变更动机 1. 绑定 Web UI 运行地址到 `127.0.0.1`，确保服务仅在本地运行，避免潜在的安全风险 2. 修复 Gradio 不兼容性问题，移除 `show_copy_button=True` 配置（该参数在当前 Gradio 版本中不支持） ------ ## 📝 Modification / 修改内容 ### 代码修改 1. web_ui/web_ui_start.py - 移除 `--host` 参数支持，固定使用 `127.0.0.1` 作为服务器地址 - 移除 `--host` 命令行参数 - 添加注释说明 host 固定为 `127.0.0.1` 2. web_ui/components.py - 移除 `show_copy_button=True` 配置 - 修复 Gradio 不兼容性问题（该参数在当前 Gradio 版本中不支持） 3. tests/regression/web_ui/test_web_ui_start.py - 更新测试用例以匹配新的行为 - `test_main_custom_args`: 移除 `--host` 参数 - `test_main_with_env_vars`: 更新期望的 `server_name` 为 `"127.0.0.1"`，并更新文档说明 `GRADIO_SERVER_NAME` 环境变量被忽略 ------ ## 📐 Associated Test Results / 关联测试结果 ![54041280-3e63-402b-8f32-61bee4cf4bf4.png](https://raw.gitcode.com/user-images/assets/8428112/8dbc75c9-cd4c-4627-9c4a-855a65b16711/54041280-3e63-402b-8f32-61bee4cf4bf4.png '54041280-3e63-402b-8f32-61bee4cf4bf4.png') ![f92ac57b-6033-46fc-8e9e-750ac3c82ab2.png](https://raw.gitcode.com/user-images/assets/8428112/1294a7db-9cc8-4a68-98e4-0f91315d51e6/f92ac57b-6033-46fc-8e9e-750ac3c82ab2.png 'f92ac57b-6033-46fc-8e9e-750ac3c82ab2.png') ------ ## ✅ Checklist / 检查列表 Before PR: - [x] Bug fixes are fully covered by unit tests, the case that causes the bug should be added in the unit tests. / 修复的 Bug 已完全由单元测试覆盖，导致 Bug 的情况应在单元测试中添加。 - [x] The modification is covered by complete unit tests. If not, please add more unit tests to ensure the correctness. / 此拉取请求中的修改已完全由单元测试覆盖。如果不是，请添加更多单元测试以确保正确性。 - [ ] All relevant documentation (API docs, docstrings, example tutorials) has been updated to reflect these changes. / 所有相关文档（API 文档、文档字符串、示例教程）已更新以反映这些更改。 - [x] Please ensure code files contain no Chinese comments. / 请保证代码文件中不含中文注释。 ------ See merge request: Ascend/msmodeling!335	15 天前
.gitattributes	feat: profiling-based empirical performance model with CSV data source Co-authored-by: Horacehxw<horacehxw@gmail.com> # message auto-generated for no-merge-commit merge: !123 merge pr/perf-db-a into develop feat: profiling-based empirical performance model with CSV data source Created-by: Horacehxw Commit-by: Horacehxw Merged-by: ascend-robot Description: PR Type / PR类型 - [x] Feature（功能新增） - [ ] Bugfix（Bug 修复） - [ ] Docs（文档更新） - [ ] CI/CD（持续集成/持续部署） - [x] Refactor（代码重构） - [ ] Perf（性能优化） - [x] Test-Cases（测试用例更新） - [ ] Other（其他） ## 🔍 Motivation / 变更动机 TensorCast 现有的 Roofline 解析模型（`AnalyticPerformanceModel`）对昇腾 NPU 的性能预测精度有限：融合算子（SwiGlu、AddRmsNorm、DispatchFFNCombine）无法建模，HCCL 集合通信与理论带宽差距显著，FRACTAL_NZ 格式等硬件特性无法通过 Roofline 捕获。本 PR 实现了基于真实 NPU Profiling 数据的实测算子性能估算系统，将 kernel 实测耗时接入 TensorCast 仿真框架。与 PR#96 的关系：PR#96 已合入 develop，定义了 `DataSourcePerformanceModel` 接口骨架（stub）和 CLI 集成。本 PR 提供完整的功能实现：CSV 查询引擎（9 种 TC-vs-NPU shape matching 规则）、op_mapping 映射（60+ 算子）、插值、M1-M6 指标体系、以及 DFC/FlashComm 编译 Pass。接口完全兼容。 > 📌 配套的离线数据采集工具链将在后续 PR 中提交（tools/perf_data_collection/，与本 PR 无代码依赖）。 ------ ## 📝 Modification / 修改内容 ### 1. Profiling Data Source 核心实现（替换 PR#96 stub） \| 文件 \| 说明 \| \|------\|------\| \| `profiling_database/profiling_data_source.py` (+1,885) \| `ProfilingDataSource`：op_mapping.yaml 驱动的 CSV 查询引擎，支持 9 种 TC-vs-NPU shape 差异处理（batch dim stripping、seq padding、FRACTAL_NZ、ND transpose、SwiGlu concat、RoPE layout/kernel、composite 分解、flatten batch） \| \| `profiling_database/interpolating_data_source.py` (+702) \| `InterpolatingDataSource`：nearest-neighbor + 线性插值包装器 \| \| `profiling_database/data_source.py` (修改) \| `DataSourcePerformanceModel` ABC 扩展（新增 `EXTRAPOLATED` enum、`details` 字段） \| ### 2. EmpiricalPerformanceModel 增强 (+436) 在 PR#96 基础上增加 M1-M6 指标追踪： - M1-M4：覆盖率指标（raw count → fused → compute-only → per-shape） - M5：延迟加权覆盖率 - M6 input：empirical hit total（用于离线 E2E ratio 计算） - `log_stats()`：结构化 HIT/MISS 日志 - `export_hit_miss_report()`：JSON 格式指标导出 ### 3. 编译 Passes (+875) \| Pass \| 说明 \| \|------\|------\| \| `dispatch_ffn_combine_pass.py` \| DispatchFFNCombine 超级融合（init_routing_v2 + GroupedMatmul + unpermute_tokens → 单 op），支持 5 种量化变体 \| \| `flashcomm_v1_pass.py` \| FlashComm V1 图重写（matmul_all_reduce → 通信隐藏），对标 vLLM-ascend `ENABLE_FLASHCOMM1=1` \| ### 4. op_mapping.yaml（3 个版本，共 ~3,600 行） \| 版本 \| 算子数 \| \|------\|:------:\| \| `vllm0.13.0_torch2.8.0_cann8.3` \| ~45 \| \| `vllm0.15.0_torch2.9.0_cann8.5` \| ~55 \| \| `vllm0.18.0_torch2.9.0_cann8.5` \| ~60 \| ### 5. CSV Profiling Data（~250 files，Git LFS） ATLAS 800 A3 752T 128G 设备数据：HCCL 通信基准 + 3 个 vLLM 版本的 kernel 数据 + 微基准补充数据。 ### 6. 集成改动 \| 文件 \| 改动 \| \|------\|------\| \| `model_runner.py` \| profiling 模式集成（`perf_models[]` + `log_stats` + `ProfilingDataSource` 创建） \| \| `user_config.py` \| `--profiling-database` 参数 \| \| `scripts/text_generate.py` \| `--export-metrics` CLI + FlashComm 配置 \| \| `ops/fused_moe.py` \| 新增 `dispatch_ffn_combine` op \| \| `compile_backend.py` \| 注册 DFC + FlashComm passes \| ------ ## 📐 Associated Test Results / 关联测试结果 ### 单元测试 `$ pytest tests/perf_database/ -q 266 passed, 3 warnings in 1.94s $ pytest tests/test_tensor_cast/test_empirical.py tests/test_tensor_cast/test_dfc_pass.py -q 8 passed, 1 skipped in 120.75s $ lintrunner -a ok No lint issues.` ### 功能验证 bash # Analytic 模式（行为不变） $ python -m tensor_cast.scripts.text_generate Qwen/Qwen3-32B \ --num-queries 2 --query-length 3500 --device TEST_DEVICE → [analytic] Execution time: 1.744s, TPS/Device: 4013 token/s ✅ # Profiling 模式（新功能） $ python -m tensor_cast.scripts.text_generate Qwen/Qwen3-32B \ --num-queries 1 --query-length 4112 --word-embedding-tp row \ --device ATLAS_800_A3_752T_128G_DIE --world-size 16 --tp-size 16 \ --quantize-linear-action DISABLED \ --performance-model profiling --compile \ --profiling-database tensor_cast/performance_model/profiling_database/data/ATLAS_800_A3_752T_128G_DIE/vllm_ascend/vllm0.18.0_torch2.9.0_cann8.5 → [empirical] Execution time: 0.156s, TPS/Device: 1651 token/s ✅ ### M1-M5 指标 \| 场景 \| M3 (计算算子 HR) \| M5 (延迟覆盖) \| \|------\|:---------------:\|:------------:\| \| Qwen3-32B Prefill (BF16) \| 61.5% ✅ (>50%) \| 89.0% ✅ (>80%) \| \| Qwen3-32B Decode (BF16) \| 38.5% \| 80.1% ✅ (>80%) \| \| DeepSeek-V3 Prefill (W8A8) \| 52.6% ✅ (>50%) \| 68.9% \| \| DeepSeek-V3 Decode (W8A8) \| 15.8% \| 54.3% \| ------ ## 🌟 Use cases (Optional) / 使用案例（可选） bash # 1. 使用实测数据替代 Roofline 估算 python -m tensor_cast.scripts.text_generate <model_id> \ --performance-model profiling --compile \ --profiling-database <path_to_data_dir> # 2. 导出 M1-M5 指标 JSON（用于离线 M6 计算） python -m tensor_cast.scripts.text_generate <model_id> \ --performance-model profiling --compile \ --profiling-database <path_to_data_dir> \ --export-metrics results/metrics.json # 3. 同时运行 analytic + profiling 对比 python -m tensor_cast.scripts.text_generate <model_id> \ --performance-model analytic --performance-model profiling --compile \ --profiling-database <path_to_data_dir> ------ ## ✅ Checklist / 检查列表 Before PR: - [x] [Linting tools](https://gitcode.com/Ascend/msmodeling/blob/develop/tensor_cast/README.md#coding-style) are used to fix the potential lint issues. - [x] Bug fixes are fully covered by unit tests, the case that causes the bug should be added in the unit tests. - [x] The modification is covered by complete unit tests. If not, please add more unit tests to ensure the correctness. - [ ] All relevant documentation (API docs, docstrings, example tutorials) has been updated to reflect these changes. - [x] Please ensure code files contain no Chinese comments. ``` See merge request: Ascend/msmodeling!123	1 个月前
.gitignore	chore(test): add unit test runner script with coverage enforcement Co-authored-by: jia_ya_nan<jiayanan3@h-partners.com> # message auto-generated for no-merge-commit merge: !146 merge feature/pd-ratio-throughput-optimization into develop chore(test): add unit test runner script with coverage enforcement Created-by: jia_ya_nan Commit-by: jia_ya_nan Merged-by: ascend-robot Description: Thanks for your contribution; we appreciate it a lot. The following instructions will make your pull request healthier and help you get feedback more easily. If you do not understand some items, don't worry, just make the pull request and seek help from maintainers. 感谢您的贡献，我们非常重视。以下说明将使您的拉取请求更健康，更易于获得反馈。如果您不理解某些项目，请不要担心，只需提交拉取请求并从维护人员那里寻求帮助即可。 PR Type / PR类型 - [ ] Feature（功能新增） - [ ] Bugfix（Bug 修复） - [ ] Docs（文档更新） - [x] CI/CD（持续集成/持续部署） - [ ] Refactor（代码重构） - [ ] Perf（性能优化） - [ ] Test-Cases（测试用例更新） - [ ] Other（其他） ## 🔍 Motivation / 变更动机添加ut覆盖率校验，阈值为80% ------ ## 📝 Modification / 修改内容统计排除 builtin_model ------ ## 📐 Associated Test Results / 关联测试结果 Please provide the related test results, such as test reports, etc. 请提供相关测试结果，例如测试报告等。 `bash ./tests/run_ut.sh serving_cast` ![image.png](https://raw.gitcode.com/user-images/assets/8428112/61bb77cf-f8c6-4750-baf2-6730fbe54596/image.png 'image.png') `bash ./tests/run_ut.sh tensor_cast` ![image.png](https://raw.gitcode.com/user-images/assets/8428112/8e772017-95fd-4092-9540-b52a68964bef/image.png 'image.png') ------ ## 🌟 Use cases (Optional) / 使用案例（可选） If this PR introduces a new feature, it is better to list some use cases here and update the documentation. 如果此拉取请求引入了新功能，最好在此处列出一些用例并更新文档。 ------ ## ✅ Checklist / 检查列表 Before PR: - [x] [Linting tools](https://gitcode.com/Ascend/msmodeling/blob/develop/tensor_cast/README.md#coding-style) are used to fix the potential lint issues. / 使用 [lintrunner 工具](https://gitcode.com/Ascend/msmodeling/blob/develop/tensor_cast/README.md#coding-style) 来修复潜在的 lint 问题。 - [ ] Bug fixes are fully covered by unit tests, the case that causes the bug should be added in the unit tests. / 修复的 Bug 已完全由单元测试覆盖，导致 Bug 的情况应在单元测试中添加。 - [ ] The modification is covered by complete unit tests. If not, please add more unit tests to ensure the correctness. / 此拉取请求中的修改已完全由单元测试覆盖。如果不是，请添加更多单元测试以确保正确性。 - [ ] All relevant documentation (API docs, docstrings, example tutorials) has been updated to reflect these changes. / 所有相关文档（API 文档、文档字符串、示例教程）已更新以反映这些更改。 - [x] Please ensure code files contain no Chinese comments. / 请保证代码文件中不含中文注释。 ------ See merge request: Ascend/msmodeling!146	2 个月前
.pre-commit-config.yaml	Add model adapter onboarding automation Co-authored-by: jhon-117<fangkai15@huawei.com> # message auto-generated for no-merge-commit merge: !282 merge codex/model-adaptation-efficiency-v2 into develop Add model adapter onboarding automation Created-by: jhon-117 Commit-by: jhon-117 Merged-by: ascend-robot Description: # PR Template Thanks for your contribution; we appreciate it a lot. The following instructions will make your pull request healthier and help you get feedback more easily. If you do not understand some items, don't worry, just make the pull request and seek help from maintainers. 感谢您的贡献，我们非常重视。以下说明将使您的拉取请求更健康，更易于获得反馈。如果您不理解某些项目，请不要担心，只需提交拉取请求并从维护人员那里寻求帮助即可。 PR Type / PR类型 - [x] Feature（功能新增） - [ ] Bugfix（Bug 修复） - [ ] Docs（文档更新） - [ ] CI/CD（持续集成/持续部署） - [ ] Refactor（代码重构） - [ ] Perf（性能优化） - [ ] Test-Cases（测试用例更新） - [ ] Other（其他） ## 🔍 Motivation / 变更动机 Please describe the motivation of this PR and the goal you want to achieve through this PR. 请描述您的拉取请求的动机和您希望通过此拉取请求实现的目标。 ------ ## 📝 Modification / 修改内容本 PR 实现 TensorCast 新模型接入效率提升流程，围绕“用户只必须提供 raw Insight profiling 导出文件 + 对应仿真命令”的适配方式，补齐 doctor、evidence、patch discovery、profile draft、ST case 生成和 qwen3-vl replay 验证能力。主要改动：新增 tensor_cast.adapter 自动化模块：仿真命令解析与 AdaptationContext raw MindStudio Insight profiling 解析用户 hints 读取、冲突检测和 provenance profile candidate 生成与 review/validation evidence draft 生成与 verifier mismatch 分类 PatchReport、patch discovery、profile draft 渲染 ST guardrail case 生成新增 CLI： python -m cli.inference.model_doctor python -m cli.inference.verify_model_profile model_doctor 支持： --from-command-file --raw-insight-file --hints-file --patch-failure-file --ignore-existing-profile --profile-draft-output 增强 qwen3-vl replay：新增 tiny config-only fixture：tests/assets/model_config/qwen3_vl_tiny/config.json 支持在 --ignore-existing-profile qwen3_vl 下通过 installed transformers 源码发现 VL profile 字段 patch discovery 可基于 qwen3-vl placeholder/mask meta failure 生成 patch_method_for_qwen3_vl 草案新增文档： docs/design/model_adaptation_efficiency_design.md docs/en/tensor_cast_new_model_adaptation.md 增强 runtime/transformations：暴露 runtime summary 所需信息记录 patch reports 支持 profile registry replay/audit ignore ------ ## 📐 Associated Test Results / 关联测试结果 pytest tests/test_tensor_cast/test_adapter_automation.py -q # 29 passed pytest tests/test_tensor_cast -k "adapter or doctor or evidence" -q # 29 passed python -m compileall -q tensor_cast/adapter cli/inference/model_doctor.py cli/inference/verify_model_profile.py cli/inference/adapter_cli.py tests/test_tensor_cast/test_adapter_automation.py # passed python -m cli.inference.model_doctor --help # passed python -m cli.inference.verify_model_profile --help # passed 额外 smoke： qwen3-vl tiny CLI replay smoke：通过 qwen3-vl patch code draft CLI smoke：通过 deepseek fixture doctor/replay smoke：通过，仅出现 fixture 自带 rope 参数 warning，不影响结果。 ------ ## 🌟 Use cases (Optional) / 使用案例（可选） If this PR introduces a new feature, it is better to list some use cases here and update the documentation. 如果此拉取请求引入了新功能，最好在此处列出一些用例并更新文档。 ------ ## ✅ Checklist / 检查列表 Before PR: - [ ] Bug fixes are fully covered by unit tests, the case that causes the bug should be added in the unit tests. / 修复的 Bug 已完全由单元测试覆盖，导致 Bug 的情况应在单元测试中添加。 - [ ] The modification is covered by complete unit tests. If not, please add more unit tests to ensure the correctness. / 此拉取请求中的修改已完全由单元测试覆盖。如果不是，请添加更多单元测试以确保正确性。 - [ ] All relevant documentation (API docs, docstrings, example tutorials) has been updated to reflect these changes. / 所有相关文档（API 文档、文档字符串、示例教程）已更新以反映这些更改。 - [ ] Please ensure code files contain no Chinese comments. / 请保证代码文件中不含中文注释。 ------ See merge request: Ascend/msmodeling!282	22 天前
AGENTS.md	[feat] 添加SKILL：通过自然语言配置 device Co-authored-by: wendellX<wenxiaoping6@huawei.com> # message auto-generated for no-merge-commit merge: !243 merge skill_device_cfg into develop [feat] 添加SKILL：通过自然语言配置 device Created-by: wendellX Commit-by: wendellX Merged-by: ascend-robot Description: # PR Template Thanks for your contribution; we appreciate it a lot. The following instructions will make your pull request healthier and help you get feedback more easily. If you do not understand some items, don't worry, just make the pull request and seek help from maintainers. 感谢您的贡献，我们非常重视。以下说明将使您的拉取请求更健康，更易于获得反馈。如果您不理解某些项目，请不要担心，只需提交拉取请求并从维护人员那里寻求帮助即可。 PR Type / PR类型 - [X] Feature（功能新增） - [ ] Bugfix（Bug 修复） - [ ] Docs（文档更新） - [ ] CI/CD（持续集成/持续部署） - [ ] Refactor（代码重构） - [ ] Perf（性能优化） - [ ] Test-Cases（测试用例更新） - [ ] Other（其他） ## 🔍 Motivation / 变更动机 ------ ## 📝 Modification / 修改内容添加SKILL文件：通过自然语言配置 device ------ ## 📐 Associated Test Results / 关联测试结果 --- 用户：我要导入新的设备拓扑 Claude：我们先不用填复杂参数，我会一步步带你配。先告诉我下面 3 件事就够了，不知道的直接写“不知道”： 1. 这是什么硬件？例如厂商和型号。 2. 你想按什么粒度建这个拓扑：单卡、单 die / chiplet，还是别的调度单元？ 3. 你手头有没有这些原始信息里的任意一部分：显存容量、显存带宽、FP16/BF16/INT8/FP32 算力、以及“几卡/几 die、它们之间怎么连”的描述？有的话原样贴出来就行。你可以直接像这样回我： - 型号：XXX - 粒度：单卡 / 单 die / 不确定 - 已知规格：96GB 显存，3.2TB/s，BF16 800T，8 卡，卡间 400GB/s 收到后我会先帮你整理成“已确认 / 还不确定 / 可以先默认”的清单，再继续下一步。用户：华为的ATLAS_800_A3_560T_128G_DIE。单 die，显存容量 64GiB，显存带宽 1.6TiB/s Claude：我先记下目前已确认的事实。 - 已确认： - 厂商：HUAWEI - profile 候选名：`ATLAS_800_A3_560T_128G_DIE` - 粒度：单 die - 每个 profile 单元显存：64GiB - 每个 profile 单元显存带宽：1.6TiB/s - 说明： - 仓库里现有的 [tensor_cast/device.py:299-320] `ATLAS_800_A3_752T_128G_DIE` 也是“名字带 128G，但单 die 显存写 64GiB”的风格，所以你这个命名方式是对得上的。接下来我只追问 3 件最关键的事，不知道可以直接写“不知道”或“先默认”： 1. `560T` 指的是哪种算力？ - 例如：`FP16 560T`、`BF16 560T`、`INT8 560T` - 如果资料里还有 `FP32 / BF16 / FP16 / INT8` 的其他峰值，也请原样贴出来。 2. 资料里有没有单独的“通用 / vector 算力”？ - 这项主要影响 softmax、norm、激活函数这类非矩阵算子的估算。 - 你可以直接回： - `没有，先留空` - 或 `没有，先用临时估值` 3. 这个单 die profile 的互联拓扑你想怎么建？ - 你可以直接选一种： - `沿用现有 A3 die 拓扑` - `只建 8 卡节点内` - `我只知道 2 die/卡、8 卡/节点，其它先默认` - 如果你知道互联带宽，顺手告诉我是“单向”还是“双向”。你回这 3 项后，我就能整理出将写入的字段、默认值和待校准项，再给你最后确认。用户：FP32 75T，BF16 245.8T，FP16 280T，INT8 560T。通信 grid 形状是 [48, 8, 2]。 Claude：好的，目前我整理出的“已确认”是： - `name`: `ATLAS_800_A3_560T_128G_DIE` - `vendor`: `HUAWEI` - 粒度：单 die - 每 die 显存：64 GiB - 每 die 显存带宽：1.6 TiB/s - `mma_ops`： - FP32 = 75T - BF16 = 245.8T - FP16 = 280T - INT8 = 560T - `comm_grid` 形状：`[48, 8, 2]` 这基本就是把现有 A3 die 模板 [tensor_cast/device.py:169-182] 和 280T 档位算力风格结合起来了。还差 3 个会直接写进代码的点，你选一下即可： 1. `gp_ops` 怎么处理？ - `先留空` - `先用临时估值：FP32 8T，BF16/FP16 16T` - `我有官方值：...` 2. `[48, 8, 2]` 的每层互联参数，是否直接沿用现有 A3 die 拓扑 [tensor_cast/device.py:169-182]？ - 外层：196 GB/s，5.5 us - 中层：196 GB/s，0.5 us - 内层：224 GB/s，0.2 us - `comm_efficiency = 0.7` - 类型按当前代码默认 `CLOS` - 这些带宽会按“单向带宽”写入你可以直接回：`沿用现有 A3 die 拓扑` 3. 如果没有更精确数据，是否接受先沿用当前默认效率和静态开销，并标记 `needs calibration`？ - `compute_efficiency = 0.7` - `memory_efficiency = 0.6` - `StaticCost(5us, 2us, 10us)` 你回这 3 项后，我就直接改 [tensor_cast/device.py]，然后做导入注册验证，并给你可用的 `--device ATLAS_800_A3_560T_128G_DIE`。用户：1 使用临时值；2 沿用A3 ； 3 compute_efficiency=0.7，memory_efficiency=0.6。 Claude：还差一个会写进代码的小点：`StaticCost` 你要不要也先沿用现有 A3 默认值 `mma=5us, gp=2us, comm=10us`？如果可以，我收到你一句“可以”就直接改代码并验证。用户：mma_op_cost_s=5e-06、gp_op_cost_s=2e-06、comm_op_cost_s=1e-05。用这个吧 Claude：已加到 [tensor_cast/device.py:299-320]，复用了现有 A3 die 拓扑 [tensor_cast/device.py:169-182]。 - 已注册的 `DeviceProfile.name` - `ATLAS_800_A3_560T_128G_DIE` - 直接使用的你提供的事实 - `vendor="HUAWEI"` - 单 die 粒度 - `memory_size_bytes=64 * (1024*3)` - `memory_bandwidth_bytes_ps=1.6 (1024*4)` - `mma_ops` - FP32 = `75 1e12` - BF16 = `245.8 * 1e12` - FP16 = `280 * 1e12` - INT8 = `560 * 1e12` - `comm_grid` 形状 `[48, 8, 2]` - `compute_efficiency=0.7` - `memory_efficiency=0.6` - `StaticCost(mma=5e-06, gp=2e-06, comm=1e-05)` - 用户确认的估值 - `gp_ops` - FP32 = `8 * 1e12` - BF16 = `16 * 1e12` - FP16 = `16 * 1e12` - `needs calibration` - `gp_ops` 目前是临时估值，不是官方 vector/general compute 指标 - 互联参数目前沿用现有 A3 die 拓扑： - `start_dim=0`: 196 GB/s, 5.5 us, `comm_efficiency=0.7` - `start_dim=1`: 196 GB/s, 0.5 us, `comm_efficiency=0.7` - `start_dim=2`: 224 GB/s, 0.2 us, `comm_efficiency=0.7` - 验证命令 - `python - <<'PY' from tensor_cast.device import DeviceProfile print(DeviceProfile.all_device_profiles["ATLAS_800_A3_560T_128G_DIE"]) PY` - 结果：成功打印该 `DeviceProfile`，注册正常 - 可直接用的命令 - `python -m cli.inference.text_generate Qwen/Qwen3-32B --num-queries 2 --query-length 3500 --device ATLAS_800_A3_560T_128G_DIE` 已经可用了；如果你愿意，下一步最值得校准的是 `gp_ops` 是否有官方值。 ------ ## 🌟 Use cases (Optional) / 使用案例（可选） If this PR introduces a new feature, it is better to list some use cases here and update the documentation. 如果此拉取请求引入了新功能，最好在此处列出一些用例并更新文档。 ------ ## ✅ Checklist / 检查列表 Before PR: - [ ] Bug fixes are fully covered by unit tests, the case that causes the bug should be added in the unit tests. / 修复的 Bug 已完全由单元测试覆盖，导致 Bug 的情况应在单元测试中添加。 - [ ] The modification is covered by complete unit tests. If not, please add more unit tests to ensure the correctness. / 此拉取请求中的修改已完全由单元测试覆盖。如果不是，请添加更多单元测试以确保正确性。 - [ ] All relevant documentation (API docs, docstrings, example tutorials) has been updated to reflect these changes. / 所有相关文档（API 文档、文档字符串、示例教程）已更新以反映这些更改。 - [ ] Please ensure code files contain no Chinese comments. / 请保证代码文件中不含中文注释。 ------ See merge request: Ascend/msmodeling!243	24 天前
CLAUDE.md	[feat] 添加SKILL：通过自然语言配置 device Co-authored-by: wendellX<wenxiaoping6@huawei.com> # message auto-generated for no-merge-commit merge: !243 merge skill_device_cfg into develop [feat] 添加SKILL：通过自然语言配置 device Created-by: wendellX Commit-by: wendellX Merged-by: ascend-robot Description: # PR Template Thanks for your contribution; we appreciate it a lot. The following instructions will make your pull request healthier and help you get feedback more easily. If you do not understand some items, don't worry, just make the pull request and seek help from maintainers. 感谢您的贡献，我们非常重视。以下说明将使您的拉取请求更健康，更易于获得反馈。如果您不理解某些项目，请不要担心，只需提交拉取请求并从维护人员那里寻求帮助即可。 PR Type / PR类型 - [X] Feature（功能新增） - [ ] Bugfix（Bug 修复） - [ ] Docs（文档更新） - [ ] CI/CD（持续集成/持续部署） - [ ] Refactor（代码重构） - [ ] Perf（性能优化） - [ ] Test-Cases（测试用例更新） - [ ] Other（其他） ## 🔍 Motivation / 变更动机 ------ ## 📝 Modification / 修改内容添加SKILL文件：通过自然语言配置 device ------ ## 📐 Associated Test Results / 关联测试结果 --- 用户：我要导入新的设备拓扑 Claude：我们先不用填复杂参数，我会一步步带你配。先告诉我下面 3 件事就够了，不知道的直接写“不知道”： 1. 这是什么硬件？例如厂商和型号。 2. 你想按什么粒度建这个拓扑：单卡、单 die / chiplet，还是别的调度单元？ 3. 你手头有没有这些原始信息里的任意一部分：显存容量、显存带宽、FP16/BF16/INT8/FP32 算力、以及“几卡/几 die、它们之间怎么连”的描述？有的话原样贴出来就行。你可以直接像这样回我： - 型号：XXX - 粒度：单卡 / 单 die / 不确定 - 已知规格：96GB 显存，3.2TB/s，BF16 800T，8 卡，卡间 400GB/s 收到后我会先帮你整理成“已确认 / 还不确定 / 可以先默认”的清单，再继续下一步。用户：华为的ATLAS_800_A3_560T_128G_DIE。单 die，显存容量 64GiB，显存带宽 1.6TiB/s Claude：我先记下目前已确认的事实。 - 已确认： - 厂商：HUAWEI - profile 候选名：`ATLAS_800_A3_560T_128G_DIE` - 粒度：单 die - 每个 profile 单元显存：64GiB - 每个 profile 单元显存带宽：1.6TiB/s - 说明： - 仓库里现有的 [tensor_cast/device.py:299-320] `ATLAS_800_A3_752T_128G_DIE` 也是“名字带 128G，但单 die 显存写 64GiB”的风格，所以你这个命名方式是对得上的。接下来我只追问 3 件最关键的事，不知道可以直接写“不知道”或“先默认”： 1. `560T` 指的是哪种算力？ - 例如：`FP16 560T`、`BF16 560T`、`INT8 560T` - 如果资料里还有 `FP32 / BF16 / FP16 / INT8` 的其他峰值，也请原样贴出来。 2. 资料里有没有单独的“通用 / vector 算力”？ - 这项主要影响 softmax、norm、激活函数这类非矩阵算子的估算。 - 你可以直接回： - `没有，先留空` - 或 `没有，先用临时估值` 3. 这个单 die profile 的互联拓扑你想怎么建？ - 你可以直接选一种： - `沿用现有 A3 die 拓扑` - `只建 8 卡节点内` - `我只知道 2 die/卡、8 卡/节点，其它先默认` - 如果你知道互联带宽，顺手告诉我是“单向”还是“双向”。你回这 3 项后，我就能整理出将写入的字段、默认值和待校准项，再给你最后确认。用户：FP32 75T，BF16 245.8T，FP16 280T，INT8 560T。通信 grid 形状是 [48, 8, 2]。 Claude：好的，目前我整理出的“已确认”是： - `name`: `ATLAS_800_A3_560T_128G_DIE` - `vendor`: `HUAWEI` - 粒度：单 die - 每 die 显存：64 GiB - 每 die 显存带宽：1.6 TiB/s - `mma_ops`： - FP32 = 75T - BF16 = 245.8T - FP16 = 280T - INT8 = 560T - `comm_grid` 形状：`[48, 8, 2]` 这基本就是把现有 A3 die 模板 [tensor_cast/device.py:169-182] 和 280T 档位算力风格结合起来了。还差 3 个会直接写进代码的点，你选一下即可： 1. `gp_ops` 怎么处理？ - `先留空` - `先用临时估值：FP32 8T，BF16/FP16 16T` - `我有官方值：...` 2. `[48, 8, 2]` 的每层互联参数，是否直接沿用现有 A3 die 拓扑 [tensor_cast/device.py:169-182]？ - 外层：196 GB/s，5.5 us - 中层：196 GB/s，0.5 us - 内层：224 GB/s，0.2 us - `comm_efficiency = 0.7` - 类型按当前代码默认 `CLOS` - 这些带宽会按“单向带宽”写入你可以直接回：`沿用现有 A3 die 拓扑` 3. 如果没有更精确数据，是否接受先沿用当前默认效率和静态开销，并标记 `needs calibration`？ - `compute_efficiency = 0.7` - `memory_efficiency = 0.6` - `StaticCost(5us, 2us, 10us)` 你回这 3 项后，我就直接改 [tensor_cast/device.py]，然后做导入注册验证，并给你可用的 `--device ATLAS_800_A3_560T_128G_DIE`。用户：1 使用临时值；2 沿用A3 ； 3 compute_efficiency=0.7，memory_efficiency=0.6。 Claude：还差一个会写进代码的小点：`StaticCost` 你要不要也先沿用现有 A3 默认值 `mma=5us, gp=2us, comm=10us`？如果可以，我收到你一句“可以”就直接改代码并验证。用户：mma_op_cost_s=5e-06、gp_op_cost_s=2e-06、comm_op_cost_s=1e-05。用这个吧 Claude：已加到 [tensor_cast/device.py:299-320]，复用了现有 A3 die 拓扑 [tensor_cast/device.py:169-182]。 - 已注册的 `DeviceProfile.name` - `ATLAS_800_A3_560T_128G_DIE` - 直接使用的你提供的事实 - `vendor="HUAWEI"` - 单 die 粒度 - `memory_size_bytes=64 * (1024*3)` - `memory_bandwidth_bytes_ps=1.6 (1024*4)` - `mma_ops` - FP32 = `75 1e12` - BF16 = `245.8 * 1e12` - FP16 = `280 * 1e12` - INT8 = `560 * 1e12` - `comm_grid` 形状 `[48, 8, 2]` - `compute_efficiency=0.7` - `memory_efficiency=0.6` - `StaticCost(mma=5e-06, gp=2e-06, comm=1e-05)` - 用户确认的估值 - `gp_ops` - FP32 = `8 * 1e12` - BF16 = `16 * 1e12` - FP16 = `16 * 1e12` - `needs calibration` - `gp_ops` 目前是临时估值，不是官方 vector/general compute 指标 - 互联参数目前沿用现有 A3 die 拓扑： - `start_dim=0`: 196 GB/s, 5.5 us, `comm_efficiency=0.7` - `start_dim=1`: 196 GB/s, 0.5 us, `comm_efficiency=0.7` - `start_dim=2`: 224 GB/s, 0.2 us, `comm_efficiency=0.7` - 验证命令 - `python - <<'PY' from tensor_cast.device import DeviceProfile print(DeviceProfile.all_device_profiles["ATLAS_800_A3_560T_128G_DIE"]) PY` - 结果：成功打印该 `DeviceProfile`，注册正常 - 可直接用的命令 - `python -m cli.inference.text_generate Qwen/Qwen3-32B --num-queries 2 --query-length 3500 --device ATLAS_800_A3_560T_128G_DIE` 已经可用了；如果你愿意，下一步最值得校准的是 `gp_ops` 是否有官方值。 ------ ## 🌟 Use cases (Optional) / 使用案例（可选） If this PR introduces a new feature, it is better to list some use cases here and update the documentation. 如果此拉取请求引入了新功能，最好在此处列出一些用例并更新文档。 ------ ## ✅ Checklist / 检查列表 Before PR: - [ ] Bug fixes are fully covered by unit tests, the case that causes the bug should be added in the unit tests. / 修复的 Bug 已完全由单元测试覆盖，导致 Bug 的情况应在单元测试中添加。 - [ ] The modification is covered by complete unit tests. If not, please add more unit tests to ensure the correctness. / 此拉取请求中的修改已完全由单元测试覆盖。如果不是，请添加更多单元测试以确保正确性。 - [ ] All relevant documentation (API docs, docstrings, example tutorials) has been updated to reflect these changes. / 所有相关文档（API 文档、文档字符串、示例教程）已更新以反映这些更改。 - [ ] Please ensure code files contain no Chinese comments. / 请保证代码文件中不含中文注释。 ------ See merge request: Ascend/msmodeling!243	24 天前
LICENSE	删除冗余目录 Co-authored-by: tt0cool<xujintao8@h-partners.com> # message auto-generated for no-merge-commit merge: !22 merge master into develop 删除冗余目录 Created-by: jsez-li-bin Commit-by: tt0cool Merged-by: ascend-robot Description: 删除资料修改目录 See merge request: Ascend/msmodeling!22	5 个月前
README.md	【fix】修复文档链接与命令示例问题 Co-authored-by: eveyin1<qianyin2022@hotmail.com> # message auto-generated for no-merge-commit merge: !339 merge doc_aidd_fix into develop 【fix】修复文档链接与命令示例问题 Created-by: eveyin1 Commit-by: eveyin1 Merged-by: ascend-robot Description: # PR Template Thanks for your contribution; we appreciate it a lot. The following instructions will make your pull request healthier and help you get feedback more easily. If you do not understand some items, don't worry, just make the pull request and seek help from maintainers. 感谢您的贡献，我们非常重视。以下说明将使您的拉取请求更健康，更易于获得反馈。如果您不理解某些项目，请不要担心，只需提交拉取请求并从维护人员那里寻求帮助即可。 PR Type / PR类型 - [ ] Feature（功能新增） - [ ] Bugfix（Bug 修复） - [x] Docs（文档更新） - [ ] CI/CD（持续集成/持续部署） - [ ] Refactor（代码重构） - [ ] Perf（性能优化） - [ ] Test-Cases（测试用例更新） - [ ] Other（其他） ## 🔍 Motivation / 变更动机 Please describe the motivation of this PR and the goal you want to achieve through this PR. 请描述您的拉取请求的动机和您希望通过此拉取请求实现的目标。 ------ ## 📝 Modification / 修改内容 Please briefly describe what modification is made in this PR. 请简要描述此拉取请求中进行的修改。 ------ ## 📐 Associated Test Results / 关联测试结果 Please provide the related test results, such as test reports, etc. 请提供相关测试结果，例如测试报告等。 ------ ## 🌟 Use cases (Optional) / 使用案例（可选） If this PR introduces a new feature, it is better to list some use cases here and update the documentation. 如果此拉取请求引入了新功能，最好在此处列出一些用例并更新文档。 ------ ## ✅ Checklist / 检查列表 Before PR: - [ ] Bug fixes are fully covered by unit tests, the case that causes the bug should be added in the unit tests. / 修复的 Bug 已完全由单元测试覆盖，导致 Bug 的情况应在单元测试中添加。 - [ ] The modification is covered by complete unit tests. If not, please add more unit tests to ensure the correctness. / 此拉取请求中的修改已完全由单元测试覆盖。如果不是，请添加更多单元测试以确保正确性。 - [ ] All relevant documentation (API docs, docstrings, example tutorials) has been updated to reflect these changes. / 所有相关文档（API 文档、文档字符串、示例教程）已更新以反映这些更改。 - [ ] Please ensure code files contain no Chinese comments. / 请保证代码文件中不含中文注释。 ------ See merge request: Ascend/msmodeling!339	14 天前
__init__.py	Refactor transformers folder layout	9 个月前
pyproject.toml	【FIX】CI Gate 增量门禁加固：test_map 漏检修复、pytest 策略与 node 级豁免 Co-authored-by: liujiawang<anonymousdev@163.com> # message auto-generated for no-merge-commit merge: !334 merge fix into develop 【FIX】CI Gate 增量门禁加固：test_map 漏检修复、pytest 策略与 node 级豁免 Created-by: AvadaKedavrua Commit-by: liujiawang Merged-by: ascend-robot Description: ## 背景修复 CI 增量门禁多处漏检/误拦（含 #107）：Phase 0 `test_map` 合并覆盖同 symbol、未映射 symbol 误拦、pytest 继承 `addopts` 行为不可控、`exemptions.tests` 整文件误豁免等。 Closes #107 --- ## 改动摘要 ### 策略与 coverage - `_merge_test_maps`：同 symbol test id union，不再 replace（修复 `test_main_extra` 等用例被跳过） - coverage 兜底：symbol 双 map 均无 → 查 Phase 0 `.coverage`（含 import 空 context）→ 放行 - omit SSOT：读 `pyproject.toml` `[tool.coverage.run].omit`；`gate_policy.yaml` 新增 `roots`，`exemptions.tests` 接入 Phase 0/2 ### Pytest 执行（`pytest_runner.py`） - 门禁 subprocess 统一 `-o addopts=` + 显式 `-m` - collect-first → `min(cpu, collected)` + `--dist worksteal` - Phase 0: `not npu`；Phase 1/2 增量: `not npu and not nightly and not network`；config 全量: `tests/` + `not npu` - 全量触发：`conftest.py`、`requirements.txt`、`uv.lock` 等；不含 `gate_policy.yaml` ### `exemptions.tests`（node 级） - 登记必须为 pytest node（`tests/...py::func` 或 `...::Class::method`），禁止 `[`、禁止 class-only、禁止整文件 pathspec 回退 - Phase 0：按改动文件 collect node → 过滤豁免 → 显式 node 列表跑 pytest+cov；全豁免打 log 跳过（避免 exit 5） - Phase 2：`incremental_tests` 过滤豁免 node - Phase 0 失败输出可复制 `exemptions.tests` YAML 模板 ### 可观测性 - Phase 0/1/2 及策略拦截日志：说明各阶段目的、失败含义、跳过原因 - `run_*.sh`：`-vv`；nested subprocess+cov+xdist 回归测试改 stub --- ## 自验证 `bash uv run pytest tests/regression/scripts/helpers/ci_gate/ \ tests/regression/scripts/helpers/common/test_pytest_runner.py -q` 141 passed See merge request: Ascend/msmodeling!334	14 天前
requirements.txt	新增psutil模块 Co-authored-by: pengzhipin<pengzhipin1@h-partners.com> # message auto-generated for no-merge-commit merge: !323 merge msModeling_requirements into develop 新增psutil模块 Created-by: weixin_43113933 Commit-by: pengzhipin Merged-by: ascend-robot Description: # PR Template Thanks for your contribution; we appreciate it a lot. The following instructions will make your pull request healthier and help you get feedback more easily. If you do not understand some items, don't worry, just make the pull request and seek help from maintainers. 感谢您的贡献，我们非常重视。以下说明将使您的拉取请求更健康，更易于获得反馈。如果您不理解某些项目，请不要担心，只需提交拉取请求并从维护人员那里寻求帮助即可。 PR Type / PR类型 - [ ] Feature（功能新增） - [ ] Bugfix（Bug 修复） - [x] Docs（文档更新） - [ ] CI/CD（持续集成/持续部署） - [ ] Refactor（代码重构） - [ ] Perf（性能优化） - [ ] Test-Cases（测试用例更新） - [ ] Other（其他） ## 🔍 Motivation / 变更动机 Please describe the motivation of this PR and the goal you want to achieve through this PR. 请描述您的拉取请求的动机和您希望通过此拉取请求实现的目标。修复流水线ut报错bug,缺少psutil模块 ------ ## 📝 Modification / 修改内容 Please briefly describe what modification is made in this PR. 请简要描述此拉取请求中进行的修改。 requirements.txt新增psutil ------ ## 📐 Associated Test Results / 关联测试结果 Please provide the related test results, such as test reports, etc. 请提供相关测试结果，例如测试报告等。 ------ ## 🌟 Use cases (Optional) / 使用案例（可选） If this PR introduces a new feature, it is better to list some use cases here and update the documentation. 如果此拉取请求引入了新功能，最好在此处列出一些用例并更新文档。 ------ ## ✅ Checklist / 检查列表 Before PR: - [ ] Bug fixes are fully covered by unit tests, the case that causes the bug should be added in the unit tests. / 修复的 Bug 已完全由单元测试覆盖，导致 Bug 的情况应在单元测试中添加。 - [ ] The modification is covered by complete unit tests. If not, please add more unit tests to ensure the correctness. / 此拉取请求中的修改已完全由单元测试覆盖。如果不是，请添加更多单元测试以确保正确性。 - [ ] All relevant documentation (API docs, docstrings, example tutorials) has been updated to reflect these changes. / 所有相关文档（API 文档、文档字符串、示例教程）已更新以反映这些更改。 - [ ] Please ensure code files contain no Chinese comments. / 请保证代码文件中不含中文注释。 ------ See merge request: Ascend/msmodeling!323	14 天前
uv.lock	【FIX】CI Gate 增量门禁加固：test_map 漏检修复、pytest 策略与 node 级豁免 Co-authored-by: liujiawang<anonymousdev@163.com> # message auto-generated for no-merge-commit merge: !334 merge fix into develop 【FIX】CI Gate 增量门禁加固：test_map 漏检修复、pytest 策略与 node 级豁免 Created-by: AvadaKedavrua Commit-by: liujiawang Merged-by: ascend-robot Description: ## 背景修复 CI 增量门禁多处漏检/误拦（含 #107）：Phase 0 `test_map` 合并覆盖同 symbol、未映射 symbol 误拦、pytest 继承 `addopts` 行为不可控、`exemptions.tests` 整文件误豁免等。 Closes #107 --- ## 改动摘要 ### 策略与 coverage - `_merge_test_maps`：同 symbol test id union，不再 replace（修复 `test_main_extra` 等用例被跳过） - coverage 兜底：symbol 双 map 均无 → 查 Phase 0 `.coverage`（含 import 空 context）→ 放行 - omit SSOT：读 `pyproject.toml` `[tool.coverage.run].omit`；`gate_policy.yaml` 新增 `roots`，`exemptions.tests` 接入 Phase 0/2 ### Pytest 执行（`pytest_runner.py`） - 门禁 subprocess 统一 `-o addopts=` + 显式 `-m` - collect-first → `min(cpu, collected)` + `--dist worksteal` - Phase 0: `not npu`；Phase 1/2 增量: `not npu and not nightly and not network`；config 全量: `tests/` + `not npu` - 全量触发：`conftest.py`、`requirements.txt`、`uv.lock` 等；不含 `gate_policy.yaml` ### `exemptions.tests`（node 级） - 登记必须为 pytest node（`tests/...py::func` 或 `...::Class::method`），禁止 `[`、禁止 class-only、禁止整文件 pathspec 回退 - Phase 0：按改动文件 collect node → 过滤豁免 → 显式 node 列表跑 pytest+cov；全豁免打 log 跳过（避免 exit 5） - Phase 2：`incremental_tests` 过滤豁免 node - Phase 0 失败输出可复制 `exemptions.tests` YAML 模板 ### 可观测性 - Phase 0/1/2 及策略拦截日志：说明各阶段目的、失败含义、跳过原因 - `run_*.sh`：`-vv`；nested subprocess+cov+xdist 回归测试改 stub --- ## 自验证 `bash uv run pytest tests/regression/scripts/helpers/ci_gate/ \ tests/regression/scripts/helpers/common/test_pytest_runner.py -q` 141 passed See merge request: Ascend/msmodeling!334	14 天前

MindStudio Modeling

昇腾 AI 模型性能建模与仿真工具

✨ 最新消息

🔹 [2026.06.10]：msModeling 新增 DeepSeek-V4 模型支持
🔹 [2026.04.02]：msModeling 新增 GLM5 模型支持

ℹ️ 简介

MindStudio Modeling（msModeling）是专为昇腾 AI 处理器打造的神经网络推理性能仿真与分析框架，提供单模型性能仿真、服务级吞吐优化、服务化参数自动寻优与可视化分析能力，帮助开发者在无物理硬件或部署前期预测模型性能、识别瓶颈并优化配置。

⚙️ 功能介绍

msModeling 提供 TensorCast、Throughput Optimizer、ServingCast、Web UI 和 OptiX 等功能模块，覆盖单模型性能仿真、吞吐优化、服务级仿真、可视化交互与服务化参数自动寻优等场景。模型与特性覆盖范围请参见《模型支持与特性支持矩阵》。

功能名称	功能描述
TensorCast	算子仿真模块，拦截 PyTorch 计算图，在指定 DeviceProfile 上模拟推理过程，输出算子级性能分解、内存占用、算子 shape 及 Chrome Trace。
Throughput Optimizer	吞吐优化模块，在 SLO 约束下自动搜索最优并行策略与 batch 配置，支持 PD 混部、PD 分离、PD 配比三种模式。
ServingCast	服务级推理仿真模块，基于 YAML 配置模拟多实例、多请求的端到端 serving 场景，输出吞吐、TTFT、TPOT 等系统级指标。
Web UI	可视化交互界面，支持通过页面配置模型、芯片、并行、量化和 workload 参数，并查看曲线、表格和导出结果。
OptiX	服务化参数自动寻优工具，基于 PSO 粒子寻优算法对 vLLM、MindIE 等服务框架进行参数寻优与验证。

🚀 快速入门

以 TensorCast 单模型仿真与 ServingCast 服务仿真为例，快速跑通核心流程，请参见《TensorCast 与 ServingCast 快速入门》。

📦 安装指南

介绍工具的环境依赖与安装方法，请参见《msModeling 安装指南》。

📘 使用指南

各工具的详细使用说明请参阅其源码仓库中的 README 文件，也可通过上方功能介绍表格中的链接直接跳转。

💡 典型案例

通过典型问题场景帮助用户理解并掌握工具使用，请参见《吞吐优化指南》与《服务仿真指南》中的示例。

❓ FAQ

常见问题及解决方案，请提交 Issues 或参见各模块使用指南。

🌌 智能检索

为提升文档查阅效率，我们提供多种高效检索方式：
🔹 AI 问答（DeepWiki）：自然语言问答，快速把握项目架构与模块关系。
🔹 AI 问答（ZRead）：中文问答体验更优，精准定位功能用法与细节。
🔹 精确搜索（ReadTheDocs）：关键词全文检索，直达接口、参数与报错等信息。

🛠️ 贡献指南

欢迎参与项目贡献。提交代码前请使用 pre-commit 保证代码风格一致，并确保相关单元测试通过。如有疑问，请提交 Issues。

⚖️ 相关说明

🔹 《版本说明》
🔹 《许可证声明》
🔹 《安全声明》
🔹 免责声明：本工具仿真与优化结果仅供性能评估参考，最终性能表现请以真实环境实测为准

🤝 建议与交流

欢迎大家为社区做贡献。如果有任何疑问或建议，请提交 Issues，我们会尽快回复。感谢您的支持。

SIG 例会：MindStudio Modeling Weekly Meeting 每周三 10:00-12:00（UTC+8）举行，会议纪要与议题请参见 sig-msit-modeling，也可使用时区转换查看本地时间。

即时互动（微信群）	官方资讯（公众号）	深度支持（助手/论坛）
_{扫码加入技术交流群}	_{扫码关注官方公众号}	扫码入群并关注公众号，直达 MindStudio 用户与开发者最快捷的交流平台：快速提问：与社区小伙伴即时探讨技术问题掌握动态：第一时间获取版本发布与功能更新通知经验共享：与广大开发者交流最佳实践与实战心得更多支持渠道：👉 昇腾助手： 👉 昇腾论坛：