msmodeling/docs/perf_database/tutorial · Ascend/MindStudio-Modeling - AtomGit

ascend-robot【FIX】【TEST】修复 README/文档失效链接并默认运行完整 benchmark 套件

文件	最后提交记录	最后更新时间
MICROBENCH_RUN_SCRIPT_TUTORIAL_zh.md	feat: profiling data collection toolchain Co-authored-by: Horacehxw<horacehxw@gmail.com> # message auto-generated for no-merge-commit merge: !124 merge pr/perf-db-b into develop feat: profiling data collection toolchain Created-by: Horacehxw Commit-by: Horacehxw Merged-by: ascend-robot Description: PR Type / PR类型 - [x] Feature（功能新增） - [ ] Bugfix（Bug 修复） - [ ] Docs（文档更新） - [ ] CI/CD（持续集成/持续部署） - [ ] Refactor（代码重构） - [ ] Perf（性能优化） - [x] Test-Cases（测试用例更新） - [ ] Other（其他） ## 🔍 Motivation / 变更动机实测算子性能估算系统（见 [PR-A: feat: profiling-based empirical performance model with CSV data source](https://gitcode.com/Ascend/msmodeling/pull/123) )依赖 NPU Profiling 数据（per-kernel CSV + HCCL 通信基准）。这些数据需要从 NPU 设备上采集、解析、验证后才能使用。本 PR 提供完整的离线数据采集工具链，覆盖从 Profiling 原始数据解析到微基准测试、shape 变异、通信基准、M6 E2E 精度计算的全流程。 > 📌 本 PR 与 [PR-A](https://gitcode.com/Ascend/msmodeling/pull/123)（核心功能）无代码依赖——tools/ 不 import tensor_cast，可独立 review 和合入。 ------ ## 📝 Modification / 修改内容所有新增文件位于 `tools/perf_data_collection/` 和 `tests/tools/`。 ### 1. 数据解析与转换 \| 工具 \| 行数 \| 说明 \| \|------\|:----:\|------\| \| `parse_kernel_details.py` \| 698 \| 解析 NPU `kernel_details.csv` → 按 kernel type 拆分为独立 CSV（MatMulV2.csv、SwiGlu.csv 等），支持 FRACTAL_NZ format 转换、shape 归一化 \| \| `build_comm_csv.py` \| — \| HCCL benchmark 结果 → 通信 CSV 构建 \| \| `fia_common.py` + `fill_fia_runtime_metadata.py` \| 1,544 \| FusedInferAttentionScore 运行时元数据推断与回填 \| ### 2. 微基准测试 \| 工具 \| 行数 \| 说明 \| \|------\|:----:\|------\| \| `start_microbench.py` \| 771 \| 自动化微基准运行入口：读取 op_mapping.yaml → 按 kernel type 选择 replay 脚本 → msprof 采集 → 解析结果回写 CSV \| \| `op_replay/.py` (25+ scripts) \| 5,258 \| 每个 NPU kernel 一个回放脚本（MatMulV2、SwiGlu、FusedInferAttentionScore、QuantBatchMatmulV3、DispatchFFNCombine 等），使用 torch_npu API 构造输入并调用 \| \| `generate_shape_grid.py` \| 2,075 \| 从现有 CSV 数据出发，通过 shape mutation（维度缩放、block padding 变体、量化变体等）生成更多 shape 组合，扩大覆盖面 \| ### 3. 通信基准 \| 工具 \| 行数 \| 说明 \| \|------\|:----:\|------\| \| `generate_comm_microbench.py` \| — \| 生成 HCCL 通信基准测试脚本（allReduce、allGather、allToAll、reduceScatter） \| \| `validate_comm_alignment.py` \| 345 \| HCCL 微基准 CSV vs CommAnalyticModel 对齐验证 \| \| `run_comm_bench.sh` \| 204 \| HCCL benchmark 运行脚本 \| ### 4. 精度评估 \| 工具 \| 行数 \| 说明 \| \|------\|:----:\|------\| \| `compute_m6.py` \| — \| M6（E2E Ratio）离线计算：TC 预测总时间 / 真实 per-forward 时间，使用 ArgMaxV2 作为 anchor kernel 切分 forward passes \| ------ ## 📐 Associated Test Results / 关联测试结果 `$ pytest tests/tools/ -q --ignore=tests/tools/test_fia_parser_backfill.py 63 passed, 2 failed in 0.11s` 2 个失败为预期行为（`profile_and_update_db.py` 尚未实现，测试先于代码）。 ### Import 隔离验证 `python # 确认 tools/ 不依赖 tensor_cast import ast, sys, pathlib for f in pathlib.Path('tools/perf_data_collection').rglob('.py'): tree = ast.parse(f.read_text()) for node in ast.walk(tree): if isinstance(node, (ast.Import, ast.ImportFrom)): mod = getattr(node, 'module', '') or '' if 'tensor_cast' in mod: print(f'ERROR: {f}:{node.lineno}') sys.exit(1) # → OK: no tensor_cast imports in tools/` ------ ## 🌟 Use cases (Optional) / 使用案例（可选） bash # 1. 从 NPU profiling 原始数据生成 per-kernel CSV python tools/perf_data_collection/parse_kernel_details.py \ --input <kernel_details.csv> --output-dir <data_dir> # 2. 运行微基准测试（需要 NPU 设备） python tools/perf_data_collection/start_microbench.py \ --data-dir <data_dir> --device ATLAS_800_A3_752T_128G_DIE # 3. 生成 shape 变异矩阵扩大覆盖 python tools/perf_data_collection/generate_shape_grid.py \ --csv-dir <data_dir> --output-dir <output_dir> # 4. 计算 M6 E2E 精度 python tools/perf_data_collection/compute_m6.py \ --tc-report results/metrics.json \ --profiler-output <profiling_trace_dir> # 5. HCCL 通信基准 bash tools/perf_data_collection/run_comm_bench.sh ------ ## ✅ Checklist / 检查列表 Before PR: - [x] [Linting tools](https://gitcode.com/Ascend/msmodeling/blob/develop/tensor_cast/README.md#coding-style) are used to fix the potential lint issues. - [x] Bug fixes are fully covered by unit tests, the case that causes the bug should be added in the unit tests. - [x] The modification is covered by complete unit tests. If not, please add more unit tests to ensure the correctness. - [ ] All relevant documentation (API docs, docstrings, example tutorials) has been updated to reflect these changes. - [x] Please ensure code files contain no Chinese comments. See merge request: Ascend/msmodeling!124	1 个月前
OP_PLUGIN_MAPPING_TUTORIAL.md	【FIX】【TEST】修复 README/文档失效链接并默认运行完整 benchmark 套件 Co-authored-by: liujiawang<anonymousdev@163.com> # message auto-generated for no-merge-commit merge: !331 merge fix into develop 【FIX】【TEST】修复 README/文档失效链接并默认运行完整 benchmark 套件 Created-by: AvadaKedavrua Commit-by: liujiawang Merged-by: ascend-robot Description: ## 修改原因 1. `README.md` 社区区公众号二维码指向 `msinsight` 仓库旧路径，资源已 404，用户扫码/预览失败。 2. `OP_PLUGIN_MAPPING_TUTORIAL.md` 中 Op Mapping skill 相对路径错误，文档内链接跳转失败。 3. benchmark 入口默认只跑 `tests/benchmark/ops/`，`tests/benchmark/models/` 模型回归被静默跳过，CI/nightly 覆盖不足。 4. 全量 benchmark 启用后，`qwen3-30b-a3b` decode/prefill baseline 与当前 compile 输出不一致，需刷新。 --- ## 修改内容 \| 类别 \| 文件 \| 变更 \| \|------\|------\|------\| \| 文档链接 \| `README.md` \| 公众号图片 URL 换为可用 `user-images` 资源；TOC 补全 Contributions / Community 等章节锚点 \| \| 文档链接 \| `docs/perf_database/tutorial/OP_PLUGIN_MAPPING_TUTORIAL.md` \| skill 路径 `../skills/...` → `../../../.agents/skills/op-mapping/SKILL.md` \| \| benchmark 默认行为 \| `scripts/run_benchmark.sh`、`scripts/helpers/nightly/main.py` \| 移除 `MSMODELING_BENCHMARK_MODELS` 开关，固定跑 `tests/benchmark/` 全目录 \| \| 设计文档 \| `docs/design/ut_refactor.md` \| 同步 benchmark phase 描述 \| \| baseline \| `tests/benchmark/models/cases/qwen3-30b-a3b-{decode,prefill}.json` \| 刷新 `baseline_time_s` 与 operator top-N \| \| lint \| `experimental/optix/`、`scripts/`、`tensor_cast/`、`tests/` 等 \| 为 `inspect.*` 误报补 `pylint: disable` 注释 \| --- ## 自验证 ### README 公众号图片链接目的：确认旧链接 404、新链接可访问。步骤： 1. 检查旧 URL HTTP 状态 2. 检查新 URL HTTP 状态 `bash curl -sI "https://raw.gitcode.com/Ascend/msinsight/raw/master/docs/zh/user_guide/figures/readme/officialAccount.jpg" \| head -1 curl -sI "https://raw.gitcode.com/user-images/assets/8428112/2a22a707-de26-4bb3-b312-4952035e021b/30be980e7fd65b2486d251b48a7999f3.jpg" \| head -1` 结果： `text HTTP/1.1 404 Not Found HTTP/1.1 200 OK` ### Op Mapping skill 文档路径目的：确认教程内链接指向真实文件。步骤： 1. 在仓库根目录检查 skill 文件是否存在 `bash test -f .agents/skills/op-mapping/SKILL.md && echo OK` 结果： `text OK` ### Benchmark 入口默认全量目的：确认 `run_benchmark.sh` 不再依赖 `MSMODELING_BENCHMARK_MODELS`，默认覆盖 models 子目录。步骤： 1. 查看脚本 benchmark target 配置 `bash grep -n "TESTS_BENCHMARK" scripts/run_benchmark.sh` 结果： `text run_pytest "${TESTS_BENCHMARK}/" \` ### CI 流水线目的：确认改动未破坏现有 CI/docs CI。步骤： 1. 查看 PR #331 CI label 状态结果：PR 已打标 `ci-pipeline-passed`、`docs-ci-pipeline-success`。 See merge request: Ascend/msmodeling!331	20 天前