ascend-robot【FEAT】MindStudio CLI 统一 stderr Logo

文件	最后提交记录	最后更新时间
comm_bench	【FEAT】MindStudio CLI 统一 stderr Logo Co-authored-by: liujiawang<anonymousdev@163.com> # message auto-generated for no-merge-commit merge: !307 merge feat-logo into develop 【FEAT】MindStudio CLI 统一 stderr Logo Created-by: AvadaKedavrua Commit-by: liujiawang Merged-by: ascend-robot Description: ## 修改原因 MindStudio Modeling 各 Python CLI 启动时缺少统一品牌标识，用户在仿真、吞吐寻优、适配与 profiling 工具间切换时难以从终端首屏确认产品归属。本 PR 在 `parse_args` 成功后向 stderr 输出固定四行 MindStudio Logo，并支持 TTY/TERM 降级与 Windows `colorama` 控制台初始化。 --- ## 修改内容 - 新增共享模块 `cli/logo.py`：`render_logo` / `print_logo`，65 列块 + 终端居中 + ANSI/纯文本降级 - 在 11 个 Python 入口（`cli/inference/`、`serving_cast/main.py`、`tools/perf_data_collection` 驱动脚本）于 `parse_args` 后调用 `print_logo()`；`--help` 路径不输出 Logo - 依赖： - 运行时* `colorama>=0.4.6` — 写入 `[project] dependencies` 与 `requirements.txt`，Windows 上调用 `just_fix_windows_console()` 启用控制台 VT/ANSI 输出 - CI 静态检查 `types-colorama>=0.4.15` — 写入 `[dependency-groups] ci`，见下方说明 - 详设文档：`docs/design/mindstudio-brand-logo-design.md`（本仓仅 Python 范围） - 测试：`tests/regression/cli/test_logo.py`（14 条模块 UT）+ `tests/regression/cli/test_logo_cli_hooks.py`（help 抑制与入口 hook 回归，in-process `run_module_main`） ### 为何需要 `types-colorama`（CI 组，非运行时） `cli/logo.py` 在 Windows 路径下会调用 `colorama.just_fix_windows_console()`。`colorama` 包本身未提供完整的 inline 类型注解，mypy / 仓库 `type_check` 在无 stub 时会报 Cannot find implementation or library stub for module named "colorama"，或将其视为 untyped 调用。 `types-colorama` 是社区维护的 PEP 561 stub 包（`.pyi`），仅用于开发态与 CI 的类型检查，不会随默认 `uv sync` 进入用户仿真运行时环境（位于 `dependency-groups.ci`，与 `pytest-cov` 等工具同属 CI 组）。加入该依赖的目的： 1. 让 `uv sync --group ci` + mypy 能正确解析 `colorama` API，满足本 PR 静态检查门禁，无需在业务代码中使用 `# type: ignore` 绕过规范 2. 与项目现有做法一致：第三方库缺类型时，在 `ci` 组补 `types-` stub，而非放宽 mypy 配置若仅安装运行时依赖（`uv sync` / `pip install -r requirements.txt`），不需要也不会*安装 `types-colorama`；Logo 功能仅依赖运行时 `colorama`。 --- ## 自验证 ### Logo 四行块渲染（纯文本 / 80 列居中）目的：确认固定四行布局、品牌行与 Slogan 居中、无前置空行。步骤： 1. 在仓库根目录执行： `bash uv run python -c "from cli.logo import render_logo; print(render_logo(color=False, terminal_cols=80))"` 结果： ![Logo plain render](https://raw.atomgit.com/Ascend/msmodeling/attachment/uploads/46913ba2-005e-402e-9b1c-4c70dc556ce1/pr307-logo-plain.png) ### Logo 模块 + CLI hook 回归测试目的：满足 CI Gate 对新增 `print_logo` 路径的覆盖；确认 `--help` 不泄漏 Logo，正常 `parse_args` 后 stderr 含品牌块。步骤： 1. 在仓库根目录执行： `bash uv run pytest tests/regression/cli/test_logo.py tests/regression/cli/test_logo_cli_hooks.py -v --tb=no` 结果： ![pytest logo tests](https://raw.atomgit.com/Ascend/msmodeling/attachment/uploads/b0062152-4981-4c64-a6d1-9f70f209a1a9/pr307-logo-pytest.png) ### `--help` 不输出 Logo 目的：确认 argparse 在 `print_logo` 之前退出，help 路径保持干净。步骤： 1. 执行： `bash uv run python -m cli.inference.text_generate --help 2>&1 \| head -5` 结果： ![help without logo](https://raw.atomgit.com/Ascend/msmodeling/attachment/uploads/a04794d0-fffd-44e3-8a8e-632f7f2d7f5b/pr307-logo-help.png) ### 端到端 ![image.png](https://raw.atomgit.com/user-images/assets/8428112/d984e234-87c7-4e91-81d2-ceeb0120a22b/image.png 'image.png') See merge request: Ascend/msmodeling!307	25 天前
grid_generator	完善 GLM5 shape grid 生成与 microbench 回填支持 Co-authored-by: Secluded_Ocean<tangchuxiao0709@qq.com> # message auto-generated for no-merge-commit merge: !252 merge codex/debug-shape-grid-generation into develop 完善 GLM5 shape grid 生成与 microbench 回填支持 Created-by: Secluded_Ocean Commit-by: Secluded_Ocean Merged-by: ascend-robot Description: ## Summary - improve GLM5 shape grid generation and EP32 DFC replay coverage - normalize shape matching and dedupe behavior - extend microbench update/replay tooling and related tests ## Validation - Not run locally because pytest is not installed for the available Python interpreter. See merge request: Ascend/msmodeling!252	30 天前
op_replay	【FEAT】MindStudio CLI 统一 stderr Logo Co-authored-by: liujiawang<anonymousdev@163.com> # message auto-generated for no-merge-commit merge: !307 merge feat-logo into develop 【FEAT】MindStudio CLI 统一 stderr Logo Created-by: AvadaKedavrua Commit-by: liujiawang Merged-by: ascend-robot Description: ## 修改原因 MindStudio Modeling 各 Python CLI 启动时缺少统一品牌标识，用户在仿真、吞吐寻优、适配与 profiling 工具间切换时难以从终端首屏确认产品归属。本 PR 在 `parse_args` 成功后向 stderr 输出固定四行 MindStudio Logo，并支持 TTY/TERM 降级与 Windows `colorama` 控制台初始化。 --- ## 修改内容 - 新增共享模块 `cli/logo.py`：`render_logo` / `print_logo`，65 列块 + 终端居中 + ANSI/纯文本降级 - 在 11 个 Python 入口（`cli/inference/`、`serving_cast/main.py`、`tools/perf_data_collection` 驱动脚本）于 `parse_args` 后调用 `print_logo()`；`--help` 路径不输出 Logo - 依赖： - 运行时* `colorama>=0.4.6` — 写入 `[project] dependencies` 与 `requirements.txt`，Windows 上调用 `just_fix_windows_console()` 启用控制台 VT/ANSI 输出 - CI 静态检查 `types-colorama>=0.4.15` — 写入 `[dependency-groups] ci`，见下方说明 - 详设文档：`docs/design/mindstudio-brand-logo-design.md`（本仓仅 Python 范围） - 测试：`tests/regression/cli/test_logo.py`（14 条模块 UT）+ `tests/regression/cli/test_logo_cli_hooks.py`（help 抑制与入口 hook 回归，in-process `run_module_main`） ### 为何需要 `types-colorama`（CI 组，非运行时） `cli/logo.py` 在 Windows 路径下会调用 `colorama.just_fix_windows_console()`。`colorama` 包本身未提供完整的 inline 类型注解，mypy / 仓库 `type_check` 在无 stub 时会报 Cannot find implementation or library stub for module named "colorama"，或将其视为 untyped 调用。 `types-colorama` 是社区维护的 PEP 561 stub 包（`.pyi`），仅用于开发态与 CI 的类型检查，不会随默认 `uv sync` 进入用户仿真运行时环境（位于 `dependency-groups.ci`，与 `pytest-cov` 等工具同属 CI 组）。加入该依赖的目的： 1. 让 `uv sync --group ci` + mypy 能正确解析 `colorama` API，满足本 PR 静态检查门禁，无需在业务代码中使用 `# type: ignore` 绕过规范 2. 与项目现有做法一致：第三方库缺类型时，在 `ci` 组补 `types-` stub，而非放宽 mypy 配置若仅安装运行时依赖（`uv sync` / `pip install -r requirements.txt`），不需要也不会*安装 `types-colorama`；Logo 功能仅依赖运行时 `colorama`。 --- ## 自验证 ### Logo 四行块渲染（纯文本 / 80 列居中）目的：确认固定四行布局、品牌行与 Slogan 居中、无前置空行。步骤： 1. 在仓库根目录执行： `bash uv run python -c "from cli.logo import render_logo; print(render_logo(color=False, terminal_cols=80))"` 结果： ![Logo plain render](https://raw.atomgit.com/Ascend/msmodeling/attachment/uploads/46913ba2-005e-402e-9b1c-4c70dc556ce1/pr307-logo-plain.png) ### Logo 模块 + CLI hook 回归测试目的：满足 CI Gate 对新增 `print_logo` 路径的覆盖；确认 `--help` 不泄漏 Logo，正常 `parse_args` 后 stderr 含品牌块。步骤： 1. 在仓库根目录执行： `bash uv run pytest tests/regression/cli/test_logo.py tests/regression/cli/test_logo_cli_hooks.py -v --tb=no` 结果： ![pytest logo tests](https://raw.atomgit.com/Ascend/msmodeling/attachment/uploads/b0062152-4981-4c64-a6d1-9f70f209a1a9/pr307-logo-pytest.png) ### `--help` 不输出 Logo 目的：确认 argparse 在 `print_logo` 之前退出，help 路径保持干净。步骤： 1. 执行： `bash uv run python -m cli.inference.text_generate --help 2>&1 \| head -5` 结果： ![help without logo](https://raw.atomgit.com/Ascend/msmodeling/attachment/uploads/a04794d0-fffd-44e3-8a8e-632f7f2d7f5b/pr307-logo-help.png) ### 端到端 ![image.png](https://raw.atomgit.com/user-images/assets/8428112/d984e234-87c7-4e91-81d2-ceeb0120a22b/image.png 'image.png') See merge request: Ascend/msmodeling!307	25 天前
parsers	feat: profiling data collection toolchain Co-authored-by: Horacehxw<horacehxw@gmail.com> # message auto-generated for no-merge-commit merge: !124 merge pr/perf-db-b into develop feat: profiling data collection toolchain Created-by: Horacehxw Commit-by: Horacehxw Merged-by: ascend-robot Description: PR Type / PR类型 - [x] Feature（功能新增） - [ ] Bugfix（Bug 修复） - [ ] Docs（文档更新） - [ ] CI/CD（持续集成/持续部署） - [ ] Refactor（代码重构） - [ ] Perf（性能优化） - [x] Test-Cases（测试用例更新） - [ ] Other（其他） ## 🔍 Motivation / 变更动机实测算子性能估算系统（见 [PR-A: feat: profiling-based empirical performance model with CSV data source](https://gitcode.com/Ascend/msmodeling/pull/123) )依赖 NPU Profiling 数据（per-kernel CSV + HCCL 通信基准）。这些数据需要从 NPU 设备上采集、解析、验证后才能使用。本 PR 提供完整的离线数据采集工具链，覆盖从 Profiling 原始数据解析到微基准测试、shape 变异、通信基准、M6 E2E 精度计算的全流程。 > 📌 本 PR 与 [PR-A](https://gitcode.com/Ascend/msmodeling/pull/123)（核心功能）无代码依赖——tools/ 不 import tensor_cast，可独立 review 和合入。 ------ ## 📝 Modification / 修改内容所有新增文件位于 `tools/perf_data_collection/` 和 `tests/tools/`。 ### 1. 数据解析与转换 \| 工具 \| 行数 \| 说明 \| \|------\|:----:\|------\| \| `parse_kernel_details.py` \| 698 \| 解析 NPU `kernel_details.csv` → 按 kernel type 拆分为独立 CSV（MatMulV2.csv、SwiGlu.csv 等），支持 FRACTAL_NZ format 转换、shape 归一化 \| \| `build_comm_csv.py` \| — \| HCCL benchmark 结果 → 通信 CSV 构建 \| \| `fia_common.py` + `fill_fia_runtime_metadata.py` \| 1,544 \| FusedInferAttentionScore 运行时元数据推断与回填 \| ### 2. 微基准测试 \| 工具 \| 行数 \| 说明 \| \|------\|:----:\|------\| \| `start_microbench.py` \| 771 \| 自动化微基准运行入口：读取 op_mapping.yaml → 按 kernel type 选择 replay 脚本 → msprof 采集 → 解析结果回写 CSV \| \| `op_replay/.py` (25+ scripts) \| 5,258 \| 每个 NPU kernel 一个回放脚本（MatMulV2、SwiGlu、FusedInferAttentionScore、QuantBatchMatmulV3、DispatchFFNCombine 等），使用 torch_npu API 构造输入并调用 \| \| `generate_shape_grid.py` \| 2,075 \| 从现有 CSV 数据出发，通过 shape mutation（维度缩放、block padding 变体、量化变体等）生成更多 shape 组合，扩大覆盖面 \| ### 3. 通信基准 \| 工具 \| 行数 \| 说明 \| \|------\|:----:\|------\| \| `generate_comm_microbench.py` \| — \| 生成 HCCL 通信基准测试脚本（allReduce、allGather、allToAll、reduceScatter） \| \| `validate_comm_alignment.py` \| 345 \| HCCL 微基准 CSV vs CommAnalyticModel 对齐验证 \| \| `run_comm_bench.sh` \| 204 \| HCCL benchmark 运行脚本 \| ### 4. 精度评估 \| 工具 \| 行数 \| 说明 \| \|------\|:----:\|------\| \| `compute_m6.py` \| — \| M6（E2E Ratio）离线计算：TC 预测总时间 / 真实 per-forward 时间，使用 ArgMaxV2 作为 anchor kernel 切分 forward passes \| ------ ## 📐 Associated Test Results / 关联测试结果 `$ pytest tests/tools/ -q --ignore=tests/tools/test_fia_parser_backfill.py 63 passed, 2 failed in 0.11s` 2 个失败为预期行为（`profile_and_update_db.py` 尚未实现，测试先于代码）。 ### Import 隔离验证 `python # 确认 tools/ 不依赖 tensor_cast import ast, sys, pathlib for f in pathlib.Path('tools/perf_data_collection').rglob('.py'): tree = ast.parse(f.read_text()) for node in ast.walk(tree): if isinstance(node, (ast.Import, ast.ImportFrom)): mod = getattr(node, 'module', '') or '' if 'tensor_cast' in mod: print(f'ERROR: {f}:{node.lineno}') sys.exit(1) # → OK: no tensor_cast imports in tools/` ------ ## 🌟 Use cases (Optional) / 使用案例（可选） bash # 1. 从 NPU profiling 原始数据生成 per-kernel CSV python tools/perf_data_collection/parse_kernel_details.py \ --input <kernel_details.csv> --output-dir <data_dir> # 2. 运行微基准测试（需要 NPU 设备） python tools/perf_data_collection/start_microbench.py \ --data-dir <data_dir> --device ATLAS_800_A3_752T_128G_DIE # 3. 生成 shape 变异矩阵扩大覆盖 python tools/perf_data_collection/generate_shape_grid.py \ --csv-dir <data_dir> --output-dir <output_dir> # 4. 计算 M6 E2E 精度 python tools/perf_data_collection/compute_m6.py \ --tc-report results/metrics.json \ --profiler-output <profiling_trace_dir> # 5. HCCL 通信基准 bash tools/perf_data_collection/run_comm_bench.sh ------ ## ✅ Checklist / 检查列表 Before PR: - [x] [Linting tools](https://gitcode.com/Ascend/msmodeling/blob/develop/tensor_cast/README.md#coding-style) are used to fix the potential lint issues. - [x] Bug fixes are fully covered by unit tests, the case that causes the bug should be added in the unit tests. - [x] The modification is covered by complete unit tests. If not, please add more unit tests to ensure the correctness. - [ ] All relevant documentation (API docs, docstrings, example tutorials) has been updated to reflect these changes. - [x] Please ensure code files contain no Chinese comments. See merge request: Ascend/msmodeling!124	1 个月前
README.md	【REFACTOR】【TESTS】重构 tests 目录并补充 smoke 测试 Co-authored-by: liujiawang<anonymousdev@163.com> # message auto-generated for no-merge-commit merge: !266 merge refactor-tests into develop 【REFACTOR】【TESTS】重构 tests 目录并补充 smoke 测试 Created-by: AvadaKedavrua Commit-by: liujiawang;AvadaKedavrua Merged-by: ascend-robot Description: # PR Template Thanks for your contribution; we appreciate it a lot. The following instructions will make your pull request healthier and help you get feedback more easily. If you do not understand some items, don't worry, just make the pull request and seek help from maintainers. 感谢您的贡献，我们非常重视。以下说明将使您的拉取请求更健康，更易于获得反馈。如果您不理解某些项目，请不要担心，只需提交拉取请求并从维护人员那里寻求帮助即可。 PR Type / PR类型 - [x] Feature（功能新增） - [x] Bugfix（Bug 修复） - [x] Docs（文档更新） - [x] CI/CD（持续集成/持续部署） - [x] Refactor（代码重构） - [x] Perf（性能优化） - [x] Test-Cases（测试用例更新） - [ ] Other（其他） ## 🔍 Motivation / 变更动机现有 UT 目录混合了 smoke、regression、benchmark、模型配置资产和脚本测试，导致本地执行入口不统一、CI 无法稳定做增量选择，nightly 也缺少统一的 test_map 刷新和报告链路。旧的 `tests/run_ut.sh` 入口难以表达“快速 smoke / 完整 regression / benchmark / PR gate / nightly”这些不同场景，新增或删除源码时也缺少基于 test_map 的覆盖检查。本 PR 目标是把测试体系拆成清晰分层，并补齐 CI gate、nightly、coverage gate、test_map 构建和文档，让开发者和流水线都能按同一套目录与脚本执行测试。同时修复整理过程中暴露出的少量模型配置、serving 输出文本和文档问题。 ------ ## 📝 Modification / 修改内容本次按 hunk/主题重新整理为 17 个提交，主要修改如下： - 重构测试目录：将旧 `tests/test_tensor_cast`、`serving_cast/tests`、`web_ui/tests`、`tests/tools`、`tests/perf_database`、`tests/st` 拆分到 `tests/smoke/`、`tests/regression/`、`tests/benchmark/`。 - 统一测试资产：将模型配置移动到 `tests/assets/model_config/`，同步更新 pre-commit exclude、缓存目录和文档说明。 - 新增共享测试 helper：补充断言、配置工厂、fake subprocess、模型构造、op registry 等公共测试工具，减少测试重复代码。 - 新增 CI gate helper：增加 diff 分类、test_map 读取、AST 符号映射、coverage gate、增量测试选择和 gate policy 配置。 - 新增 nightly helper：增加 pytest 结果解析、报告模型、报告构建、Feishu webhook 通知、test_map 刷新和 benchmark 调度入口。 - 新增统一脚本入口：新增 `scripts/run_smoke.sh`、`scripts/run_regression.sh`、`scripts/run_benchmark.sh`、`scripts/run_ci_gate.sh`、`scripts/run_nightly.sh`，替换旧 `tests/run_ut.sh`。 - 更新配置与文档：补充 `pyproject.toml` 的 pytest marker/testpaths/filterwarnings，更新 `README.md`、`tests/README.md`、`docs/en/web_ui.md`、`web_ui/README.md`、`tools/perf_data_collection/README.md`。 - 修复模型/输出细节：为 `DeepseekV32DecoderLayer` 注册 `config_class`；规范 serving_cast optimizer summary 的冒号和 runner 日志文案。 ### 后续 UT 怎么上 - 新增快速兜底用例放到 `tests/smoke/`：覆盖导入、基础 compile path、轻量 config resolver、轻量 serving/tensor_cast 主路径，要求无 NPU、无大模型权重、反馈快，适合每个 PR 先跑。 - 新增功能回归用例放到 `tests/regression/<domain>/`：按 `tensor_cast`、`serving_cast`、`cli`、`web_ui`、`scripts/helpers` 等领域归档，覆盖具体 bugfix、边界条件、行为契约和工具脚本逻辑。 - 新增长耗时或性能相关用例放到 `tests/benchmark/`：模型基准、perf_database、trace/CSV 性能数据处理等不阻塞普通 PR gate 的测试归到 benchmark 层。 - 新增模型配置、fixture、样例数据优先放到 `tests/assets/` 或就近 `fixtures/`，避免继续散落在测试包内部；大文件或生成缓存走 `.msmodeling_cache/`、`tests/assets/cache/`，不直接进入源码目录。 - 新测试默认使用 `tests/helpers/` 的公共构造器和断言工具；需要 fake subprocess、模型配置、op registry 时复用已有 helper，减少每个测试重复 mock 和手写配置。 - 需要 NPU 的用例必须打 `@pytest.mark.npu`；只应 nightly 跑的大模型/长耗时 compile 用例打 `@pytest.mark.nightly`，避免进入默认本地和 PR 快速路径。 ### 怎么根据语义上 UT - 测试不再只按文件名机械归类，而是按“被测语义”挂到对应源码符号：产品源码的函数、类、方法、顶层行为需要在 test_map 中映射到验证它的测试 nodeid。 - 新增产品源码时，如果是可执行逻辑，应新增对应 smoke 或 regression 用例，并让 test_map 能找到该源码/符号；确实不需要测试的符号需要在 exemption 中写明原因。 - 修改已有源码时，CI gate 会用 AST 定位变更行落在哪个 top-level definition 或 class/method span，再通过 test_map 选择关联测试；如果符号没有映射，会阻断或扩大测试范围，避免“改了逻辑但没跑语义相关 UT”。 - 删除源码时，CI gate 会检查 test_map 中是否仍有引用该源码的测试，防止遗留无效映射；删除测试时也会检查是否破坏已有源码覆盖关系。 - 跨层依赖变更会按语义优先选择所属 regression layer，无法明确归属或配置变更时升级为更完整的套件，保证增量选择不会漏测。 - test_map 由 nightly 在完整测试通过后刷新，PR gate 消费稳定版本；这样避免每个 PR 临时生成不可信映射，同时让语义映射随主干测试演进。 ### 流水线做了什么调整 - 本地与 CI 统一入口：`run_smoke.sh` 跑快速 smoke，`run_regression.sh` 跑完整 regression，`run_benchmark.sh` 跑 benchmark，`run_ci_gate.sh` 跑 PR 增量门禁，`run_nightly.sh` 跑夜间全流程。 - PR gate 从“固定跑一批 UT”改为“diff -> classify changes -> load test_map -> apply gate rules -> run selected pytest -> coverage gate”。配置变更、源码新增/删除、测试新增/删除、源码修改会走不同 gate 规则。 - coverage gate 统一读取 `MSMODELING_TEST_LINE_THRESHOLD` 和 `MSMODELING_TEST_BRANCH_THRESHOLD`，默认 line 70、branch 50；pytest 默认排除 `npu`，PR gate 额外排除 `nightly`。 - nightly 分两阶段：先跑非 NPU、非 nightly 的 smoke/regression 并在通过后刷新 test_map；再跑 nightly 标记用例与 benchmark，并构建结构化报告。 - 流水线统一支持 `MSMODELING_OFFLINE`、`MSMODELING_TEST_WEIGHTS_PRUNE`、`MSMODELING_TEST_MAP_PATH` 等环境变量，减少不同脚本各自处理缓存、离线和权重清理的差异。 - benchmark 不纳入普通 coverage gate，避免性能/长耗时用例拖慢 PR 门禁；必要时可通过独立 benchmark pipeline 或 nightly 验证。 - pre-commit exclude 同步到新目录，模型配置资产和 fixtures 不再被无意义格式化或误报。已处理问题清单： - 旧 UT 入口单一，无法区分 smoke、regression、benchmark、nightly 和 PR gate。 - 模型配置资产散落在测试用例目录下，pre-commit 与测试引用路径容易漂移。 - 新增/删除源码缺少 test_map 覆盖检查，CI 不能精准阻断未补测试的变更。 - coverage 阈值、pytest marker、离线模式、权重缓存清理等 CI 参数缺少统一入口。 - nightly 缺少结构化报告、失败摘要和 Feishu 通知链路。 - web UI 测试仍在模块内，未纳入统一 regression 层级。 - `DeepseekV32DecoderLayer` 缺少 `config_class`，影响配置类识别一致性。 - serving_cast 部分日志/summary 文案有多余空格或表达不统一。 ------ ## 📐 Associated Test Results / 关联测试结果提交过程中每个 commit 均触发 pre-commit hook，已通过已检查文件的 trailing whitespace、EOF、YAML/JSON、大文件、merge conflict、private key、ruff、ruff-format、codespell、pylint、bandit、typos 等检查。本地未额外执行完整 smoke/regression/benchmark 全量测试；推送后以 GitCode CI 结果为准。建议重点关注： - `bash ./scripts/run_smoke.sh` - `bash ./scripts/run_regression.sh` - `bash ./scripts/run_ci_gate.sh`（需设置 `MSMODELING_TEST_MAP_PATH`） - `bash ./scripts/run_nightly.sh`（需设置 `MSMODELING_TEST_MAP_PATH`） ------ ## 🌟 Use cases (Optional) / 使用案例（可选） - 本地快速验证：开发者运行 `bash ./scripts/run_smoke.sh` 获取快速反馈。 - 本地完整回归：开发者运行 `bash ./scripts/run_regression.sh` 覆盖主要回归用例。 - PR 增量门禁：CI 设置 `MSMODELING_TEST_MAP_PATH` 后运行 `bash ./scripts/run_ci_gate.sh`，按 diff 与 test_map 选择用例并执行 coverage gate。 - 夜间任务：nightly 先跑非 nightly 的 smoke/regression 并刷新 test_map，再执行 nightly/benchmark 并生成报告，可选 Feishu 通知。 ------ ## ✅ Checklist / 检查列表 Before PR: - [x] Bug fixes are fully covered by unit tests, the case that causes the bug should be added in the unit tests. / 修复的 Bug 已完全由单元测试覆盖，导致 Bug 的情况应在单元测试中添加。 - [ ] The modification is covered by complete unit tests. If not, please add more unit tests to ensure the correctness. / 此拉取请求中的修改已完全由单元测试覆盖。如果不是，请添加更多单元测试以确保正确性。 - [x] All relevant documentation (API docs, docstrings, example tutorials) has been updated to reflect these changes. / 所有相关文档（API 文档、文档字符串、示例教程）已更新以反映这些更改。 - [x] Please ensure code files contain no Chinese comments. / 请保证代码文件中不含中文注释。 ------ See merge request: Ascend/msmodeling!266	1 个月前
__init__.py	chore(ci): adopt pre-commit and retire legacy lintrunner adapters Co-authored-by: liujiawang<anonymousdev@163.com> # message auto-generated for no-merge-commit merge: !176 merge pre-commit into develop chore(ci): adopt pre-commit and retire legacy lintrunner adapters Created-by: AvadaKedavrua Commit-by: liujiawang;AvadaKedavrua Merged-by: ascend-robot Description: Thanks for your contribution; we appreciate it a lot. The following instructions will make your pull request healthier and help you get feedback more easily. If you do not understand some items, don't worry, just make the pull request and seek help from maintainers. 感谢您的贡献，我们非常重视。以下说明将使您的拉取请求更健康，更易于获得反馈。如果您不理解某些项目，请不要担心，只需提交拉取请求并从维护人员那里寻求帮助即可。 PR Type / PR类型 - [ ] Feature（功能新增） - [ ] Bugfix（Bug 修复） - [x] Docs（文档更新） - [x] CI/CD（持续集成/持续部署） - [ ] Refactor（代码重构） - [ ] Perf（性能优化） - [ ] Test-Cases（测试用例更新） - [ ] Other（其他） ------ ## Motivation / 变更动机 Continue the pre-commit migration: tighten Pylint so only high-signal messages run (`disable=all` + explicit `enable` list), fix real issues that remained under that profile, and translate hook/config comments to English. ------ ## Configuration changes（仅工具与注释 / tooling & comments only） \| Path \| What changed \| \|------\|----------------\| \| `pre-commit/pyproject.toml` \| Pylint: `[tool.pylint."messages control"]` with `disable = ["all"]` and a short allowlist of message IDs (E0100, E0601–E0611, E0632, E1101, E1120, W0632, W1514). Ruff: unchanged behavior; comments translated to English. Bandit: comments translated; rule allowlist/skip lists unchanged. \| \| `.pre-commit-config.yaml` \| Comments translated to English; Bandit hook display name set to bandit (Python security checks). Hook versions and args unchanged except for comment text. \| ------ ## Source code changes（应用代码 / application code） \| Area \| Files \| Purpose \| \|------\|--------\|---------\| \| `serving_cast` \| `communication.py`, `engine.py`, `instance.py`, `kv_cache_manager.py`, `load_gen.py`, `main.py`, `model_runner.py`, `request.py`, `serving.py`, `utils.py` \| Replace `from . import stime` with `import serving_cast.stime as stime` so Pylint resolves imports (fixes E0611). \| \| `serving_cast` \| `stime.py` \| Singleton salabim `Environment` via `_get_sim_env()` so type checkers/Pylint see `sim.Environment` (fixes E1101 on `SimulationEnv`). \| \| `serving_cast/service` \| `base_throughput_optimizer.py` \| `__init__` defaults + `assert runner is not None` before `run_inference` (fixes E1101 on base class). \| \| `tensor_cast` \| `diffusers/diffusers_model.py`, `diffusers/diffusers_utils.py`, `runtime.py` \| Add `encoding="utf-8"` to `open()` / trace export (fixes W1514). \| \| `web_ui` \| `callbacks.py` \| `refresh_optimizer_detail`: call `_optimizer_detail_view(rows, None, device)` and unpack five return values (fixes E1120). \| ------ ## Recent commits on `pre-commit` branch - `ci(pre-commit): fix pylint message selection with disable=all` - `fix: resolve pylint findings in serving_cast, tensor_cast, and web_ui` - `docs(pre-commit): translate comments to English and add all-files run log` ------ ![](https://raw.atomgit.com/Ascend/msmodeling/attachment/uploads/b22b18aa-4c84-4dc0-85f5-1e7e0715350e/pre-commit-all-files-run.svg) ------ ## Checklist / 检查列表 - [x] Please ensure code files contain no Chinese comments. / 请保证代码文件中不含中文注释。 See merge request: Ascend/msmodeling!176	1 个月前
fia_common.py	feat: profiling data collection toolchain Co-authored-by: Horacehxw<horacehxw@gmail.com> # message auto-generated for no-merge-commit merge: !124 merge pr/perf-db-b into develop feat: profiling data collection toolchain Created-by: Horacehxw Commit-by: Horacehxw Merged-by: ascend-robot Description: PR Type / PR类型 - [x] Feature（功能新增） - [ ] Bugfix（Bug 修复） - [ ] Docs（文档更新） - [ ] CI/CD（持续集成/持续部署） - [ ] Refactor（代码重构） - [ ] Perf（性能优化） - [x] Test-Cases（测试用例更新） - [ ] Other（其他） ## 🔍 Motivation / 变更动机实测算子性能估算系统（见 [PR-A: feat: profiling-based empirical performance model with CSV data source](https://gitcode.com/Ascend/msmodeling/pull/123) )依赖 NPU Profiling 数据（per-kernel CSV + HCCL 通信基准）。这些数据需要从 NPU 设备上采集、解析、验证后才能使用。本 PR 提供完整的离线数据采集工具链，覆盖从 Profiling 原始数据解析到微基准测试、shape 变异、通信基准、M6 E2E 精度计算的全流程。 > 📌 本 PR 与 [PR-A](https://gitcode.com/Ascend/msmodeling/pull/123)（核心功能）无代码依赖——tools/ 不 import tensor_cast，可独立 review 和合入。 ------ ## 📝 Modification / 修改内容所有新增文件位于 `tools/perf_data_collection/` 和 `tests/tools/`。 ### 1. 数据解析与转换 \| 工具 \| 行数 \| 说明 \| \|------\|:----:\|------\| \| `parse_kernel_details.py` \| 698 \| 解析 NPU `kernel_details.csv` → 按 kernel type 拆分为独立 CSV（MatMulV2.csv、SwiGlu.csv 等），支持 FRACTAL_NZ format 转换、shape 归一化 \| \| `build_comm_csv.py` \| — \| HCCL benchmark 结果 → 通信 CSV 构建 \| \| `fia_common.py` + `fill_fia_runtime_metadata.py` \| 1,544 \| FusedInferAttentionScore 运行时元数据推断与回填 \| ### 2. 微基准测试 \| 工具 \| 行数 \| 说明 \| \|------\|:----:\|------\| \| `start_microbench.py` \| 771 \| 自动化微基准运行入口：读取 op_mapping.yaml → 按 kernel type 选择 replay 脚本 → msprof 采集 → 解析结果回写 CSV \| \| `op_replay/.py` (25+ scripts) \| 5,258 \| 每个 NPU kernel 一个回放脚本（MatMulV2、SwiGlu、FusedInferAttentionScore、QuantBatchMatmulV3、DispatchFFNCombine 等），使用 torch_npu API 构造输入并调用 \| \| `generate_shape_grid.py` \| 2,075 \| 从现有 CSV 数据出发，通过 shape mutation（维度缩放、block padding 变体、量化变体等）生成更多 shape 组合，扩大覆盖面 \| ### 3. 通信基准 \| 工具 \| 行数 \| 说明 \| \|------\|:----:\|------\| \| `generate_comm_microbench.py` \| — \| 生成 HCCL 通信基准测试脚本（allReduce、allGather、allToAll、reduceScatter） \| \| `validate_comm_alignment.py` \| 345 \| HCCL 微基准 CSV vs CommAnalyticModel 对齐验证 \| \| `run_comm_bench.sh` \| 204 \| HCCL benchmark 运行脚本 \| ### 4. 精度评估 \| 工具 \| 行数 \| 说明 \| \|------\|:----:\|------\| \| `compute_m6.py` \| — \| M6（E2E Ratio）离线计算：TC 预测总时间 / 真实 per-forward 时间，使用 ArgMaxV2 作为 anchor kernel 切分 forward passes \| ------ ## 📐 Associated Test Results / 关联测试结果 `$ pytest tests/tools/ -q --ignore=tests/tools/test_fia_parser_backfill.py 63 passed, 2 failed in 0.11s` 2 个失败为预期行为（`profile_and_update_db.py` 尚未实现，测试先于代码）。 ### Import 隔离验证 `python # 确认 tools/ 不依赖 tensor_cast import ast, sys, pathlib for f in pathlib.Path('tools/perf_data_collection').rglob('.py'): tree = ast.parse(f.read_text()) for node in ast.walk(tree): if isinstance(node, (ast.Import, ast.ImportFrom)): mod = getattr(node, 'module', '') or '' if 'tensor_cast' in mod: print(f'ERROR: {f}:{node.lineno}') sys.exit(1) # → OK: no tensor_cast imports in tools/` ------ ## 🌟 Use cases (Optional) / 使用案例（可选） bash # 1. 从 NPU profiling 原始数据生成 per-kernel CSV python tools/perf_data_collection/parse_kernel_details.py \ --input <kernel_details.csv> --output-dir <data_dir> # 2. 运行微基准测试（需要 NPU 设备） python tools/perf_data_collection/start_microbench.py \ --data-dir <data_dir> --device ATLAS_800_A3_752T_128G_DIE # 3. 生成 shape 变异矩阵扩大覆盖 python tools/perf_data_collection/generate_shape_grid.py \ --csv-dir <data_dir> --output-dir <output_dir> # 4. 计算 M6 E2E 精度 python tools/perf_data_collection/compute_m6.py \ --tc-report results/metrics.json \ --profiler-output <profiling_trace_dir> # 5. HCCL 通信基准 bash tools/perf_data_collection/run_comm_bench.sh ------ ## ✅ Checklist / 检查列表 Before PR: - [x] [Linting tools](https://gitcode.com/Ascend/msmodeling/blob/develop/tensor_cast/README.md#coding-style) are used to fix the potential lint issues. - [x] Bug fixes are fully covered by unit tests, the case that causes the bug should be added in the unit tests. - [x] The modification is covered by complete unit tests. If not, please add more unit tests to ensure the correctness. - [ ] All relevant documentation (API docs, docstrings, example tutorials) has been updated to reflect these changes. - [x] Please ensure code files contain no Chinese comments. See merge request: Ascend/msmodeling!124	1 个月前
fill_fia_runtime_metadata.py	feat: profiling data collection toolchain Co-authored-by: Horacehxw<horacehxw@gmail.com> # message auto-generated for no-merge-commit merge: !124 merge pr/perf-db-b into develop feat: profiling data collection toolchain Created-by: Horacehxw Commit-by: Horacehxw Merged-by: ascend-robot Description: PR Type / PR类型 - [x] Feature（功能新增） - [ ] Bugfix（Bug 修复） - [ ] Docs（文档更新） - [ ] CI/CD（持续集成/持续部署） - [ ] Refactor（代码重构） - [ ] Perf（性能优化） - [x] Test-Cases（测试用例更新） - [ ] Other（其他） ## 🔍 Motivation / 变更动机实测算子性能估算系统（见 [PR-A: feat: profiling-based empirical performance model with CSV data source](https://gitcode.com/Ascend/msmodeling/pull/123) )依赖 NPU Profiling 数据（per-kernel CSV + HCCL 通信基准）。这些数据需要从 NPU 设备上采集、解析、验证后才能使用。本 PR 提供完整的离线数据采集工具链，覆盖从 Profiling 原始数据解析到微基准测试、shape 变异、通信基准、M6 E2E 精度计算的全流程。 > 📌 本 PR 与 [PR-A](https://gitcode.com/Ascend/msmodeling/pull/123)（核心功能）无代码依赖——tools/ 不 import tensor_cast，可独立 review 和合入。 ------ ## 📝 Modification / 修改内容所有新增文件位于 `tools/perf_data_collection/` 和 `tests/tools/`。 ### 1. 数据解析与转换 \| 工具 \| 行数 \| 说明 \| \|------\|:----:\|------\| \| `parse_kernel_details.py` \| 698 \| 解析 NPU `kernel_details.csv` → 按 kernel type 拆分为独立 CSV（MatMulV2.csv、SwiGlu.csv 等），支持 FRACTAL_NZ format 转换、shape 归一化 \| \| `build_comm_csv.py` \| — \| HCCL benchmark 结果 → 通信 CSV 构建 \| \| `fia_common.py` + `fill_fia_runtime_metadata.py` \| 1,544 \| FusedInferAttentionScore 运行时元数据推断与回填 \| ### 2. 微基准测试 \| 工具 \| 行数 \| 说明 \| \|------\|:----:\|------\| \| `start_microbench.py` \| 771 \| 自动化微基准运行入口：读取 op_mapping.yaml → 按 kernel type 选择 replay 脚本 → msprof 采集 → 解析结果回写 CSV \| \| `op_replay/.py` (25+ scripts) \| 5,258 \| 每个 NPU kernel 一个回放脚本（MatMulV2、SwiGlu、FusedInferAttentionScore、QuantBatchMatmulV3、DispatchFFNCombine 等），使用 torch_npu API 构造输入并调用 \| \| `generate_shape_grid.py` \| 2,075 \| 从现有 CSV 数据出发，通过 shape mutation（维度缩放、block padding 变体、量化变体等）生成更多 shape 组合，扩大覆盖面 \| ### 3. 通信基准 \| 工具 \| 行数 \| 说明 \| \|------\|:----:\|------\| \| `generate_comm_microbench.py` \| — \| 生成 HCCL 通信基准测试脚本（allReduce、allGather、allToAll、reduceScatter） \| \| `validate_comm_alignment.py` \| 345 \| HCCL 微基准 CSV vs CommAnalyticModel 对齐验证 \| \| `run_comm_bench.sh` \| 204 \| HCCL benchmark 运行脚本 \| ### 4. 精度评估 \| 工具 \| 行数 \| 说明 \| \|------\|:----:\|------\| \| `compute_m6.py` \| — \| M6（E2E Ratio）离线计算：TC 预测总时间 / 真实 per-forward 时间，使用 ArgMaxV2 作为 anchor kernel 切分 forward passes \| ------ ## 📐 Associated Test Results / 关联测试结果 `$ pytest tests/tools/ -q --ignore=tests/tools/test_fia_parser_backfill.py 63 passed, 2 failed in 0.11s` 2 个失败为预期行为（`profile_and_update_db.py` 尚未实现，测试先于代码）。 ### Import 隔离验证 `python # 确认 tools/ 不依赖 tensor_cast import ast, sys, pathlib for f in pathlib.Path('tools/perf_data_collection').rglob('.py'): tree = ast.parse(f.read_text()) for node in ast.walk(tree): if isinstance(node, (ast.Import, ast.ImportFrom)): mod = getattr(node, 'module', '') or '' if 'tensor_cast' in mod: print(f'ERROR: {f}:{node.lineno}') sys.exit(1) # → OK: no tensor_cast imports in tools/` ------ ## 🌟 Use cases (Optional) / 使用案例（可选） bash # 1. 从 NPU profiling 原始数据生成 per-kernel CSV python tools/perf_data_collection/parse_kernel_details.py \ --input <kernel_details.csv> --output-dir <data_dir> # 2. 运行微基准测试（需要 NPU 设备） python tools/perf_data_collection/start_microbench.py \ --data-dir <data_dir> --device ATLAS_800_A3_752T_128G_DIE # 3. 生成 shape 变异矩阵扩大覆盖 python tools/perf_data_collection/generate_shape_grid.py \ --csv-dir <data_dir> --output-dir <output_dir> # 4. 计算 M6 E2E 精度 python tools/perf_data_collection/compute_m6.py \ --tc-report results/metrics.json \ --profiler-output <profiling_trace_dir> # 5. HCCL 通信基准 bash tools/perf_data_collection/run_comm_bench.sh ------ ## ✅ Checklist / 检查列表 Before PR: - [x] [Linting tools](https://gitcode.com/Ascend/msmodeling/blob/develop/tensor_cast/README.md#coding-style) are used to fix the potential lint issues. - [x] Bug fixes are fully covered by unit tests, the case that causes the bug should be added in the unit tests. - [x] The modification is covered by complete unit tests. If not, please add more unit tests to ensure the correctness. - [ ] All relevant documentation (API docs, docstrings, example tutorials) has been updated to reflect these changes. - [x] Please ensure code files contain no Chinese comments. See merge request: Ascend/msmodeling!124	1 个月前
generate_shape_grid.py	【FEAT】MindStudio CLI 统一 stderr Logo Co-authored-by: liujiawang<anonymousdev@163.com> # message auto-generated for no-merge-commit merge: !307 merge feat-logo into develop 【FEAT】MindStudio CLI 统一 stderr Logo Created-by: AvadaKedavrua Commit-by: liujiawang Merged-by: ascend-robot Description: ## 修改原因 MindStudio Modeling 各 Python CLI 启动时缺少统一品牌标识，用户在仿真、吞吐寻优、适配与 profiling 工具间切换时难以从终端首屏确认产品归属。本 PR 在 `parse_args` 成功后向 stderr 输出固定四行 MindStudio Logo，并支持 TTY/TERM 降级与 Windows `colorama` 控制台初始化。 --- ## 修改内容 - 新增共享模块 `cli/logo.py`：`render_logo` / `print_logo`，65 列块 + 终端居中 + ANSI/纯文本降级 - 在 11 个 Python 入口（`cli/inference/`、`serving_cast/main.py`、`tools/perf_data_collection` 驱动脚本）于 `parse_args` 后调用 `print_logo()`；`--help` 路径不输出 Logo - 依赖： - 运行时* `colorama>=0.4.6` — 写入 `[project] dependencies` 与 `requirements.txt`，Windows 上调用 `just_fix_windows_console()` 启用控制台 VT/ANSI 输出 - CI 静态检查 `types-colorama>=0.4.15` — 写入 `[dependency-groups] ci`，见下方说明 - 详设文档：`docs/design/mindstudio-brand-logo-design.md`（本仓仅 Python 范围） - 测试：`tests/regression/cli/test_logo.py`（14 条模块 UT）+ `tests/regression/cli/test_logo_cli_hooks.py`（help 抑制与入口 hook 回归，in-process `run_module_main`） ### 为何需要 `types-colorama`（CI 组，非运行时） `cli/logo.py` 在 Windows 路径下会调用 `colorama.just_fix_windows_console()`。`colorama` 包本身未提供完整的 inline 类型注解，mypy / 仓库 `type_check` 在无 stub 时会报 Cannot find implementation or library stub for module named "colorama"，或将其视为 untyped 调用。 `types-colorama` 是社区维护的 PEP 561 stub 包（`.pyi`），仅用于开发态与 CI 的类型检查，不会随默认 `uv sync` 进入用户仿真运行时环境（位于 `dependency-groups.ci`，与 `pytest-cov` 等工具同属 CI 组）。加入该依赖的目的： 1. 让 `uv sync --group ci` + mypy 能正确解析 `colorama` API，满足本 PR 静态检查门禁，无需在业务代码中使用 `# type: ignore` 绕过规范 2. 与项目现有做法一致：第三方库缺类型时，在 `ci` 组补 `types-` stub，而非放宽 mypy 配置若仅安装运行时依赖（`uv sync` / `pip install -r requirements.txt`），不需要也不会*安装 `types-colorama`；Logo 功能仅依赖运行时 `colorama`。 --- ## 自验证 ### Logo 四行块渲染（纯文本 / 80 列居中）目的：确认固定四行布局、品牌行与 Slogan 居中、无前置空行。步骤： 1. 在仓库根目录执行： `bash uv run python -c "from cli.logo import render_logo; print(render_logo(color=False, terminal_cols=80))"` 结果： ![Logo plain render](https://raw.atomgit.com/Ascend/msmodeling/attachment/uploads/46913ba2-005e-402e-9b1c-4c70dc556ce1/pr307-logo-plain.png) ### Logo 模块 + CLI hook 回归测试目的：满足 CI Gate 对新增 `print_logo` 路径的覆盖；确认 `--help` 不泄漏 Logo，正常 `parse_args` 后 stderr 含品牌块。步骤： 1. 在仓库根目录执行： `bash uv run pytest tests/regression/cli/test_logo.py tests/regression/cli/test_logo_cli_hooks.py -v --tb=no` 结果： ![pytest logo tests](https://raw.atomgit.com/Ascend/msmodeling/attachment/uploads/b0062152-4981-4c64-a6d1-9f70f209a1a9/pr307-logo-pytest.png) ### `--help` 不输出 Logo 目的：确认 argparse 在 `print_logo` 之前退出，help 路径保持干净。步骤： 1. 执行： `bash uv run python -m cli.inference.text_generate --help 2>&1 \| head -5` 结果： ![help without logo](https://raw.atomgit.com/Ascend/msmodeling/attachment/uploads/a04794d0-fffd-44e3-8a8e-632f7f2d7f5b/pr307-logo-help.png) ### 端到端 ![image.png](https://raw.atomgit.com/user-images/assets/8428112/d984e234-87c7-4e91-81d2-ceeb0120a22b/image.png 'image.png') See merge request: Ascend/msmodeling!307	25 天前
memory_estimator.py	feat: profiling data collection toolchain Co-authored-by: Horacehxw<horacehxw@gmail.com> # message auto-generated for no-merge-commit merge: !124 merge pr/perf-db-b into develop feat: profiling data collection toolchain Created-by: Horacehxw Commit-by: Horacehxw Merged-by: ascend-robot Description: PR Type / PR类型 - [x] Feature（功能新增） - [ ] Bugfix（Bug 修复） - [ ] Docs（文档更新） - [ ] CI/CD（持续集成/持续部署） - [ ] Refactor（代码重构） - [ ] Perf（性能优化） - [x] Test-Cases（测试用例更新） - [ ] Other（其他） ## 🔍 Motivation / 变更动机实测算子性能估算系统（见 [PR-A: feat: profiling-based empirical performance model with CSV data source](https://gitcode.com/Ascend/msmodeling/pull/123) )依赖 NPU Profiling 数据（per-kernel CSV + HCCL 通信基准）。这些数据需要从 NPU 设备上采集、解析、验证后才能使用。本 PR 提供完整的离线数据采集工具链，覆盖从 Profiling 原始数据解析到微基准测试、shape 变异、通信基准、M6 E2E 精度计算的全流程。 > 📌 本 PR 与 [PR-A](https://gitcode.com/Ascend/msmodeling/pull/123)（核心功能）无代码依赖——tools/ 不 import tensor_cast，可独立 review 和合入。 ------ ## 📝 Modification / 修改内容所有新增文件位于 `tools/perf_data_collection/` 和 `tests/tools/`。 ### 1. 数据解析与转换 \| 工具 \| 行数 \| 说明 \| \|------\|:----:\|------\| \| `parse_kernel_details.py` \| 698 \| 解析 NPU `kernel_details.csv` → 按 kernel type 拆分为独立 CSV（MatMulV2.csv、SwiGlu.csv 等），支持 FRACTAL_NZ format 转换、shape 归一化 \| \| `build_comm_csv.py` \| — \| HCCL benchmark 结果 → 通信 CSV 构建 \| \| `fia_common.py` + `fill_fia_runtime_metadata.py` \| 1,544 \| FusedInferAttentionScore 运行时元数据推断与回填 \| ### 2. 微基准测试 \| 工具 \| 行数 \| 说明 \| \|------\|:----:\|------\| \| `start_microbench.py` \| 771 \| 自动化微基准运行入口：读取 op_mapping.yaml → 按 kernel type 选择 replay 脚本 → msprof 采集 → 解析结果回写 CSV \| \| `op_replay/.py` (25+ scripts) \| 5,258 \| 每个 NPU kernel 一个回放脚本（MatMulV2、SwiGlu、FusedInferAttentionScore、QuantBatchMatmulV3、DispatchFFNCombine 等），使用 torch_npu API 构造输入并调用 \| \| `generate_shape_grid.py` \| 2,075 \| 从现有 CSV 数据出发，通过 shape mutation（维度缩放、block padding 变体、量化变体等）生成更多 shape 组合，扩大覆盖面 \| ### 3. 通信基准 \| 工具 \| 行数 \| 说明 \| \|------\|:----:\|------\| \| `generate_comm_microbench.py` \| — \| 生成 HCCL 通信基准测试脚本（allReduce、allGather、allToAll、reduceScatter） \| \| `validate_comm_alignment.py` \| 345 \| HCCL 微基准 CSV vs CommAnalyticModel 对齐验证 \| \| `run_comm_bench.sh` \| 204 \| HCCL benchmark 运行脚本 \| ### 4. 精度评估 \| 工具 \| 行数 \| 说明 \| \|------\|:----:\|------\| \| `compute_m6.py` \| — \| M6（E2E Ratio）离线计算：TC 预测总时间 / 真实 per-forward 时间，使用 ArgMaxV2 作为 anchor kernel 切分 forward passes \| ------ ## 📐 Associated Test Results / 关联测试结果 `$ pytest tests/tools/ -q --ignore=tests/tools/test_fia_parser_backfill.py 63 passed, 2 failed in 0.11s` 2 个失败为预期行为（`profile_and_update_db.py` 尚未实现，测试先于代码）。 ### Import 隔离验证 `python # 确认 tools/ 不依赖 tensor_cast import ast, sys, pathlib for f in pathlib.Path('tools/perf_data_collection').rglob('.py'): tree = ast.parse(f.read_text()) for node in ast.walk(tree): if isinstance(node, (ast.Import, ast.ImportFrom)): mod = getattr(node, 'module', '') or '' if 'tensor_cast' in mod: print(f'ERROR: {f}:{node.lineno}') sys.exit(1) # → OK: no tensor_cast imports in tools/` ------ ## 🌟 Use cases (Optional) / 使用案例（可选） bash # 1. 从 NPU profiling 原始数据生成 per-kernel CSV python tools/perf_data_collection/parse_kernel_details.py \ --input <kernel_details.csv> --output-dir <data_dir> # 2. 运行微基准测试（需要 NPU 设备） python tools/perf_data_collection/start_microbench.py \ --data-dir <data_dir> --device ATLAS_800_A3_752T_128G_DIE # 3. 生成 shape 变异矩阵扩大覆盖 python tools/perf_data_collection/generate_shape_grid.py \ --csv-dir <data_dir> --output-dir <output_dir> # 4. 计算 M6 E2E 精度 python tools/perf_data_collection/compute_m6.py \ --tc-report results/metrics.json \ --profiler-output <profiling_trace_dir> # 5. HCCL 通信基准 bash tools/perf_data_collection/run_comm_bench.sh ------ ## ✅ Checklist / 检查列表 Before PR: - [x] [Linting tools](https://gitcode.com/Ascend/msmodeling/blob/develop/tensor_cast/README.md#coding-style) are used to fix the potential lint issues. - [x] Bug fixes are fully covered by unit tests, the case that causes the bug should be added in the unit tests. - [x] The modification is covered by complete unit tests. If not, please add more unit tests to ensure the correctness. - [ ] All relevant documentation (API docs, docstrings, example tutorials) has been updated to reflect these changes. - [x] Please ensure code files contain no Chinese comments. See merge request: Ascend/msmodeling!124	1 个月前
signature_utils.py	完善 GLM5 shape grid 生成与 microbench 回填支持 Co-authored-by: Secluded_Ocean<tangchuxiao0709@qq.com> # message auto-generated for no-merge-commit merge: !252 merge codex/debug-shape-grid-generation into develop 完善 GLM5 shape grid 生成与 microbench 回填支持 Created-by: Secluded_Ocean Commit-by: Secluded_Ocean Merged-by: ascend-robot Description: ## Summary - improve GLM5 shape grid generation and EP32 DFC replay coverage - normalize shape matching and dedupe behavior - extend microbench update/replay tooling and related tests ## Validation - Not run locally because pytest is not installed for the available Python interpreter. See merge request: Ascend/msmodeling!252	30 天前
start_microbench.py	【FEAT】MindStudio CLI 统一 stderr Logo Co-authored-by: liujiawang<anonymousdev@163.com> # message auto-generated for no-merge-commit merge: !307 merge feat-logo into develop 【FEAT】MindStudio CLI 统一 stderr Logo Created-by: AvadaKedavrua Commit-by: liujiawang Merged-by: ascend-robot Description: ## 修改原因 MindStudio Modeling 各 Python CLI 启动时缺少统一品牌标识，用户在仿真、吞吐寻优、适配与 profiling 工具间切换时难以从终端首屏确认产品归属。本 PR 在 `parse_args` 成功后向 stderr 输出固定四行 MindStudio Logo，并支持 TTY/TERM 降级与 Windows `colorama` 控制台初始化。 --- ## 修改内容 - 新增共享模块 `cli/logo.py`：`render_logo` / `print_logo`，65 列块 + 终端居中 + ANSI/纯文本降级 - 在 11 个 Python 入口（`cli/inference/`、`serving_cast/main.py`、`tools/perf_data_collection` 驱动脚本）于 `parse_args` 后调用 `print_logo()`；`--help` 路径不输出 Logo - 依赖： - 运行时* `colorama>=0.4.6` — 写入 `[project] dependencies` 与 `requirements.txt`，Windows 上调用 `just_fix_windows_console()` 启用控制台 VT/ANSI 输出 - CI 静态检查 `types-colorama>=0.4.15` — 写入 `[dependency-groups] ci`，见下方说明 - 详设文档：`docs/design/mindstudio-brand-logo-design.md`（本仓仅 Python 范围） - 测试：`tests/regression/cli/test_logo.py`（14 条模块 UT）+ `tests/regression/cli/test_logo_cli_hooks.py`（help 抑制与入口 hook 回归，in-process `run_module_main`） ### 为何需要 `types-colorama`（CI 组，非运行时） `cli/logo.py` 在 Windows 路径下会调用 `colorama.just_fix_windows_console()`。`colorama` 包本身未提供完整的 inline 类型注解，mypy / 仓库 `type_check` 在无 stub 时会报 Cannot find implementation or library stub for module named "colorama"，或将其视为 untyped 调用。 `types-colorama` 是社区维护的 PEP 561 stub 包（`.pyi`），仅用于开发态与 CI 的类型检查，不会随默认 `uv sync` 进入用户仿真运行时环境（位于 `dependency-groups.ci`，与 `pytest-cov` 等工具同属 CI 组）。加入该依赖的目的： 1. 让 `uv sync --group ci` + mypy 能正确解析 `colorama` API，满足本 PR 静态检查门禁，无需在业务代码中使用 `# type: ignore` 绕过规范 2. 与项目现有做法一致：第三方库缺类型时，在 `ci` 组补 `types-` stub，而非放宽 mypy 配置若仅安装运行时依赖（`uv sync` / `pip install -r requirements.txt`），不需要也不会*安装 `types-colorama`；Logo 功能仅依赖运行时 `colorama`。 --- ## 自验证 ### Logo 四行块渲染（纯文本 / 80 列居中）目的：确认固定四行布局、品牌行与 Slogan 居中、无前置空行。步骤： 1. 在仓库根目录执行： `bash uv run python -c "from cli.logo import render_logo; print(render_logo(color=False, terminal_cols=80))"` 结果： ![Logo plain render](https://raw.atomgit.com/Ascend/msmodeling/attachment/uploads/46913ba2-005e-402e-9b1c-4c70dc556ce1/pr307-logo-plain.png) ### Logo 模块 + CLI hook 回归测试目的：满足 CI Gate 对新增 `print_logo` 路径的覆盖；确认 `--help` 不泄漏 Logo，正常 `parse_args` 后 stderr 含品牌块。步骤： 1. 在仓库根目录执行： `bash uv run pytest tests/regression/cli/test_logo.py tests/regression/cli/test_logo_cli_hooks.py -v --tb=no` 结果： ![pytest logo tests](https://raw.atomgit.com/Ascend/msmodeling/attachment/uploads/b0062152-4981-4c64-a6d1-9f70f209a1a9/pr307-logo-pytest.png) ### `--help` 不输出 Logo 目的：确认 argparse 在 `print_logo` 之前退出，help 路径保持干净。步骤： 1. 执行： `bash uv run python -m cli.inference.text_generate --help 2>&1 \| head -5` 结果： ![help without logo](https://raw.atomgit.com/Ascend/msmodeling/attachment/uploads/a04794d0-fffd-44e3-8a8e-632f7f2d7f5b/pr307-logo-help.png) ### 端到端 ![image.png](https://raw.atomgit.com/user-images/assets/8428112/d984e234-87c7-4e91-81d2-ceeb0120a22b/image.png 'image.png') See merge request: Ascend/msmodeling!307	25 天前

`tools/perf_data_collection`

Utilities for parsing profiling outputs, back-filling runtime metadata, generating theory-driven shape grids, replaying operators with msprof, and validating the resulting performance database.

Directory Layout

tools/perf_data_collection/
  comm_bench/
  grid_generator/
  op_replay/
  parsers/
  fia_common.py
  fill_fia_runtime_metadata.py
  generate_shape_grid.py
  memory_estimator.py
  start_microbench.py
  README.md

Offline comparison scripts now live in the sibling directory tools/perf_data_analysis/.

Typical Workflow

Parse raw profiling output with parsers/parse_kernel_details.py.
If FusedInferAttentionScore.csv needs richer runtime fields, fill them with fill_fia_runtime_metadata.py.
Expand sparse operator coverage with generate_shape_grid.py.
Replay operators and write microbench results back with start_microbench.py.
Use tools/perf_data_analysis/ and comm_bench/ to compare traces, inspect gaps, and validate communication data.

Top-level Files

File	Purpose
`fia_common.py`	Shared helpers for FIA shapes and metadata parsing.
`fill_fia_runtime_metadata.py`	Merges FIA runtime JSONL metadata into `FusedInferAttentionScore.csv`.
`generate_shape_grid.py`	Appends theory-generated shape rows into database CSV files.
`memory_estimator.py`	Estimates tensor memory usage for theory-mode shape generation.
`start_microbench.py`	Runs replay under `msprof`, aggregates `op_summary_*.csv`, and writes results back to the database.

Related: `tools/perf_data_analysis/`

Offline analysis and reporting helpers in the sibling package tools/perf_data_analysis/.

File	Purpose
`compute_m6.py`	Compares end-to-end timing between TC trace data and profiling trace data.
`generate_op_comparison.py`	Aggregates TC vs profiling comparisons by operator.
`generate_per_shape_comparison.py`	Produces per-`(kernel_type, shape)` comparison CSV output.

`comm_bench/`

Communication microbench generation and validation tools.

File	Purpose
`generate_comm_microbench.py`	Generates or directly runs HCCL microbench workloads.
`run_comm_bench.sh`	Batch collection entry script for communication benchmark data.
`validate_comm_alignment.py`	Checks whether measured communication results align with the parser model.

`grid_generator/`

Core engine for theory-mode shape expansion.

File	Purpose
`config.py`	Loads and validates shape-grid configuration.
`config.yaml`	Routing and generation rules for theory-mode shape expansion.
`evaluator.py`	Safe expression evaluator used by theory-mode dimensions.
`model_configs.py`	Model architecture presets used to prune or derive shape candidates.
`runner.py`	Main theory-mode generation pipeline.
`shape_grids.py`	Shared grid definitions.
`theory_router.py`	Routes operators to the appropriate theory generator.
`utils.py`	Shared CSV, row, and shape helpers.
`generators/base.py`	Base interfaces and helpers for generator implementations.
`generators/fused_attention.py`	Theory generator for fused attention operators.
`generators/moe.py`	Theory generator for MoE-related operators.

`op_replay/`

Operator replay framework and per-operator replay scripts.

File	Purpose
`common.py`	Shared CLI, path, tensor construction, and CSV utilities for replay.
`probe_dfc_constraints.py`	Probes shape constraints for `DispatchFFNCombine`.
`replay_framework.py`	Common replay framework used by individual operators.
`run_all_op.py`	Discovers and runs all `*_run.py` scripts.
`*_run.py`	Per-operator replay entry points.

`parsers/`

Profiling and trace parsing helpers.

File	Purpose
`parse_kernel_details.py`	Converts `kernel_details*.csv` or profiling directories into per-operator CSV files.
`trace_to_csv.py`	Flattens TC Chrome trace JSON into CSV.

Main CLI Scripts

`parsers/parse_kernel_details.py`

Parses one kernel_details*.csv file or a profiling directory and writes one CSV per operator into the target database directory.

py -3 tools/perf_data_collection/parsers/parse_kernel_details.py `
  --profiling-path G:\path\to\profiling_dir `
  --database-path tensor_cast/performance_model/profiling_database/data/.../dev_0331

Argument	Required	Description
`--profiling-path`	Yes	A single `kernel_details*.csv` file or a profiling directory scanned recursively.
`--database-path`	No	Explicit output directory for generated operator CSV files.
`--device`	No	Device name used when deriving the output path.
`--vllm-version`	No	vLLM version or full version-directory name used when deriving the output path.
`--torch-version`	No	PyTorch version used when deriving the output path.
`--cann-version`	No	CANN version used when deriving the output path.

`fill_fia_runtime_metadata.py`

Back-fills runtime JSONL metadata into FusedInferAttentionScore.csv.

Argument	Required	Description
`--csv-path`	Yes	Path to `FusedInferAttentionScore.csv`.
`--jsonl-path`	Yes	Path to FIA runtime JSONL input.
`--output-path`	No	Output CSV path. Defaults to in-place overwrite.
`--metadata-tag`	No	Completeness tag written into matched rows.

`generate_shape_grid.py`

Appends theory-generated shape rows into existing database CSV files.

Argument	Required	Description
`--target-models`	No	Comma-separated model names used to prune GEMM `(N, K)` candidates.
`--data-dir`	No	Explicit CSV root directory.
`--device`	No	Device name used when deriving the database path.
`--vllm-version`	No	vLLM version used when deriving the database path.
`--torch-version`	No	PyTorch version used when deriving the database path.
`--cann-version`	No	CANN version used when deriving the database path.
`--rows`	No	Maximum appended rows per CSV. `0` means no cap.
`--seed`	No	Random seed for reproducible sampling.
`--max-hbm-gb`	No	Per-row HBM budget in GiB. `0` disables memory filtering.

`start_microbench.py`

Runs replay scripts under msprof, aggregates profiling metrics, and writes them back into matching operator CSV files.

Argument	Required	Description
`--database-path`	No	Explicit database directory to read and update.
`--device`	No	Device directory name used when deriving the target path.
`--vllm-version`	No	vLLM version used when deriving the target path.
`--torch-version`	No	PyTorch version used when deriving the target path.
`--cann-version`	No	CANN version used when deriving the target path.
`--prof-path`	No	Existing `PROF_*` directory to parse directly without launching `msprof`.
`--op`	No	Restrict updates to specific operator names.
`--dispatch-ffn-combine-ep-size`	No	EP size used for `DispatchFFNCombine` replay and row matching.
`--repeat-count`	No	Replay repeat count passed through to operator scripts.
`--update-mode`	No	`all` or `missing-only`.
`--fail-fast`	No	Stop immediately when one replay script fails.
`--prune-empty-duration-rows`	No	Delete rows whose replay and profiling durations are both invalid after writeback.

`parsers/trace_to_csv.py`

Converts a TC Chrome trace JSON file into flat CSV output.

Argument	Required	Description
`--trace`	Yes	Input TC Chrome trace JSON path.
`--output`	No	Output CSV path. Defaults to `stdout`.

`tools/perf_data_analysis/compute_m6.py`

Argument	Required	Description
`--tc-trace`	Yes	TC Chrome trace JSON file.
`--prof-trace`	Yes	Forward-pass profiling trace CSV file.
`--source-filter`	No	Comma-separated source filters.
`--json-output`	No	Output JSON path.

`tools/perf_data_analysis/generate_op_comparison.py`

Argument	Required	Description
`--trace-dir`	No	Directory containing forward-pass trace CSV files.
`--data-dir`	No	Database directory containing `op_mapping.yaml`.
`--output`	No	Output JSON path.

`tools/perf_data_analysis/generate_per_shape_comparison.py`

Argument	Required	Description
`--tc-trace`	Yes	TC Chrome trace JSON file.
`--prof-trace`	Yes	Profiling trace CSV file.
`--output`	No	Output CSV path. Defaults to `stdout`.

`comm_bench/` Script Arguments

`comm_bench/generate_comm_microbench.py`

Argument	Required	Description
`--output-dir`	No	Directory to write per-op CSV files (requires `--do-run`).
`--ops`	No	Communication operators to include.
`--num-devices`	No	Number of devices in each communication group.
`--topology-tier`	No	Topology tier `0`, `1`, or `2`.
`--grid-shape`	No	Hardware grid shape.
`--dtype`	No	Tensor dtype.
`--bytes-grid`	No	Custom `message_bytes` grid.
`--do-run`	Yes	Run the benchmark directly (requires `torchrun`).
`--output-csv`	No	Single output CSV path, only valid for one operator.
`--bench-mode`	No	`kernel` or `event`.

`comm_bench/run_comm_bench.sh`

bash tools/perf_data_collection/comm_bench/run_comm_bench.sh [OUTPUT_DIR]

Argument	Required	Description
`OUTPUT_DIR`	No	Output directory for generated communication CSV files.

`comm_bench/validate_comm_alignment.py`

Argument	Required	Description
`--csv-dir`	Yes	Directory containing `hcom_*.csv`.
`--tolerance`	No	Acceptable ratio tolerance. Default is `2.0`.
`--verbose`	No	Print all checked rows.

Shared `op_replay/` Arguments

Most op_replay/*_run.py scripts share the following arguments:

Argument	Required	Description
`--database-path`	No	Explicit database directory.
`--device`	No	Device directory name.
`--vllm-version`	No	vLLM version or full version-directory name.
`--torch-version`	No	PyTorch version.
`--cann-version`	No	CANN version.
`--repeat-count`	No	Replay count per row.
`--update-mode`	No	`all` or `missing-only`.

Environment variable:

MSMODELING_OP_REPLAY_REPEAT_COUNT provides the default replay count when the CLI flag is not set (code default 30 if unset).

Environment variables (`tools/perf_data_collection/`)

Variable	Default	Description
`MSMODELING_OP_REPLAY_REPEAT_COUNT`	`30`	Default `--repeat-count` for `op_replay/*_run.py`
`VLLM_ASCEND_PATH`	sibling `../vllm-ascend`	vllm-ascend repo root for custom Triton kernels
`ASCEND_CUSTOM_OPP_PATH`	—	Required for custom OPP operators; see `start_microbench.py` module doc
`LD_LIBRARY_PATH`	—	Custom OPP `op_api/lib`; required with `ASCEND_CUSTOM_OPP_PATH` for some ops
`ASCEND_HOME_PATH` / `ASCEND_TOOLKIT_HOME` / `ASCEND_TOOLKIT_HOME_PATH` / `ASCEND_INSTALL_PATH`	filesystem probes	CANN install root for version detection
`MASTER_ADDR` / `MASTER_PORT` / `RANK` / `WORLD_SIZE` / `LOCAL_RANK`	torchrun defaults	Distributed launch for comm bench / DFC replay

Full cross-module list: Environment Variables.

`op_replay/run_all_op.py`

Argument	Required	Description
`--database-path`	No	Explicit database directory.
`--device`	No	Device directory name.
`--vllm-version`	No	vLLM version.
`--torch-version`	No	PyTorch version.
`--cann-version`	No	CANN version.
`--repeat-count`	No	Replay count passed to each operator script.
`--update-mode`	No	Replay update mode forwarded to each operator script.
`--execution-mode`	No	`inprocess` or `subprocess`.
`--op`	No	Restrict execution to selected operators.
`--dispatch-ffn-combine-ep-size`	No	Forwarded only to `DispatchFFNCombine_run.py`.
`--continue-on-error`	No	Continue running after individual operator failures.

`op_replay/probe_dfc_constraints.py`

This script has no CLI arguments. Run it directly.

`op_replay/*_run.py` Overview

File	Kernel Type	Extra Arguments	Purpose
`AddRmsNormBias_run.py`	`AddRmsNormBias`	None	Replays the fused AddRmsNormBias operator.
`Add_run.py`	`Add`	None	Replays `torch.add`.
`ArgMaxV2_run.py`	`ArgMaxV2`	None	Replays `torch.argmax`.
`AscendQuantV2_run.py`	`AscendQuantV2`	None	Replays `torch_npu.npu_quantize`.
`DispatchFFNCombine_run.py`	`DispatchFFNCombine`	`--ep-size`, `--balanced`, `--no-balanced`	Replays the fused DFC operator.
`DynamicQuant_run.py`	`DynamicQuant`	None	Replays `torch_npu.npu_dynamic_quant`.
`FusedInferAttentionScore_run.py`	`FusedInferAttentionScore`	None	Replays FIA.
`GatherV2_run.py`	`GatherV2`	None	Replays embedding and gather workloads.
`Index_run.py`	`Index`	None	Replays index-based tensor access.
`InterleaveRope_run.py`	`InterleaveRope`	None	Replays interleaved rope.
`KvRmsNormRopeCache_run.py`	`KvRmsNormRopeCache`	None	Replays KV rope cache.
`MaskedFill_run.py`	`MaskedFill`	None	Replays `masked_fill_`.
`MatMulCommon_run.py`	`MatMulCommon`	None	Replays the generic matmul path.
`MatMulV2_run.py`	`MatMulV2`	None	Replays `MatMulV2`.
`MatMulV3_run.py`	`MatMulV3`	None	Replays `MatMulV3`.
`PadV3_run.py`	`PadV3`	None	Replays pad based on input and output shapes.
`QuantBatchMatmulV3_run.py`	`QuantBatchMatmulV3`	None	Replays quantized batch matmul.
`ReshapeAndCacheNdKernel_run.py`	`ReshapeAndCacheNdKernel`	None	Replays reshape-and-cache.
`RINGMLAPrefillBF16Kernel_run.py`	`RINGMLAPrefillBF16Kernel`	None	Replays MLA prefill kernel.
`RmsNorm_run.py`	`RmsNorm`	None	Replays RMSNorm.
`Slice_run.py`	`Slice`	None	Replays Slice.
`SoftmaxV2_run.py`	`SoftmaxV2`	None	Replays Softmax.
`Sort_run.py`	`Sort`	None	Replays Sort.
`split_qkv_rmsnorm_rope_kernel_run.py`	`split_qkv_rmsnorm_rope_kernel`	None	Replays the custom Triton QKV kernel.
`SwiGlu_run.py`	`SwiGlu`	None	Replays SwiGlu.
`TensorMove_run.py`	`TensorMove`	None	Replays tensor copy.
`Transpose_run.py`	`Transpose`	None	Replays Transpose.

Common Commands

Parse profiling output:

py -3 tools/perf_data_collection/parsers/parse_kernel_details.py `
  --profiling-path G:\path\to\profiling_dir `
  --database-path tensor_cast/performance_model/profiling_database/data/.../dev_0331

Generate theory-driven shape rows:

py -3 tools/perf_data_collection/generate_shape_grid.py `
  --target-models dsv3,qwen3-32b `
  --data-dir tensor_cast/performance_model/profiling_database/data/.../dev_0331 `
  --rows 2000 `
  --seed 20260409

Replay operators and write back results:

py -3 tools/perf_data_collection/start_microbench.py `
  --database-path tensor_cast/performance_model/profiling_database/data/.../dev_0331 `
  --repeat-count 1 `
  --update-mode missing-only

`start_microbench.py` Update Modes

all: update every matched row and append unmatched profiling samples into the target CSV.
missing-only: replay and fill only rows whose Average Duration(us) and Profiling Average Duration(us) are both invalid (0 or empty); rows that already contain at least one valid duration are skipped, and unmatched profiling samples are reported but not appended.

Empty-row Pruning

Default behavior: rows with both Average Duration(us) and Profiling Average Duration(us) invalid are kept so missing-only mode can retry them later.
--prune-empty-duration-rows: opt-in cleanup that removes those rows after writeback. Use it only when you intentionally want to delete unrecoverable empty rows from the database.

tools/perf_data_collection

Directory Layout

Typical Workflow

Top-level Files

Subdirectories and Related Tools

Related: tools/perf_data_analysis/

comm_bench/

grid_generator/

op_replay/

parsers/

Main CLI Scripts

parsers/parse_kernel_details.py

fill_fia_runtime_metadata.py

generate_shape_grid.py

start_microbench.py

parsers/trace_to_csv.py

Related Analysis Script Arguments

tools/perf_data_analysis/compute_m6.py

tools/perf_data_analysis/generate_op_comparison.py

tools/perf_data_analysis/generate_per_shape_comparison.py

comm_bench/ Script Arguments

comm_bench/generate_comm_microbench.py

comm_bench/run_comm_bench.sh

comm_bench/validate_comm_alignment.py

Shared op_replay/ Arguments

Environment variables (tools/perf_data_collection/)

op_replay/run_all_op.py

op_replay/probe_dfc_constraints.py

op_replay/*_run.py Overview