msmodeling/tools/perf_data_collection/grid_generator · Ascend/MindStudio-Modeling - AtomGit

ascend-robot完善 GLM5 shape grid 生成与 microbench 回填支持

文件	最后提交记录	最后更新时间
generators	完善 GLM5 shape grid 生成与 microbench 回填支持 Co-authored-by: Secluded_Ocean<tangchuxiao0709@qq.com> # message auto-generated for no-merge-commit merge: !252 merge codex/debug-shape-grid-generation into develop 完善 GLM5 shape grid 生成与 microbench 回填支持 Created-by: Secluded_Ocean Commit-by: Secluded_Ocean Merged-by: ascend-robot Description: ## Summary - improve GLM5 shape grid generation and EP32 DFC replay coverage - normalize shape matching and dedupe behavior - extend microbench update/replay tooling and related tests ## Validation - Not run locally because pytest is not installed for the available Python interpreter. See merge request: Ascend/msmodeling!252	1 个月前
__init__.py	feat: profiling data collection toolchain Co-authored-by: Horacehxw<horacehxw@gmail.com> # message auto-generated for no-merge-commit merge: !124 merge pr/perf-db-b into develop feat: profiling data collection toolchain Created-by: Horacehxw Commit-by: Horacehxw Merged-by: ascend-robot Description: PR Type / PR类型 - [x] Feature（功能新增） - [ ] Bugfix（Bug 修复） - [ ] Docs（文档更新） - [ ] CI/CD（持续集成/持续部署） - [ ] Refactor（代码重构） - [ ] Perf（性能优化） - [x] Test-Cases（测试用例更新） - [ ] Other（其他） ## 🔍 Motivation / 变更动机实测算子性能估算系统（见 [PR-A: feat: profiling-based empirical performance model with CSV data source](https://gitcode.com/Ascend/msmodeling/pull/123) )依赖 NPU Profiling 数据（per-kernel CSV + HCCL 通信基准）。这些数据需要从 NPU 设备上采集、解析、验证后才能使用。本 PR 提供完整的离线数据采集工具链，覆盖从 Profiling 原始数据解析到微基准测试、shape 变异、通信基准、M6 E2E 精度计算的全流程。 > 📌 本 PR 与 [PR-A](https://gitcode.com/Ascend/msmodeling/pull/123)（核心功能）无代码依赖——tools/ 不 import tensor_cast，可独立 review 和合入。 ------ ## 📝 Modification / 修改内容所有新增文件位于 `tools/perf_data_collection/` 和 `tests/tools/`。 ### 1. 数据解析与转换 \| 工具 \| 行数 \| 说明 \| \|------\|:----:\|------\| \| `parse_kernel_details.py` \| 698 \| 解析 NPU `kernel_details.csv` → 按 kernel type 拆分为独立 CSV（MatMulV2.csv、SwiGlu.csv 等），支持 FRACTAL_NZ format 转换、shape 归一化 \| \| `build_comm_csv.py` \| — \| HCCL benchmark 结果 → 通信 CSV 构建 \| \| `fia_common.py` + `fill_fia_runtime_metadata.py` \| 1,544 \| FusedInferAttentionScore 运行时元数据推断与回填 \| ### 2. 微基准测试 \| 工具 \| 行数 \| 说明 \| \|------\|:----:\|------\| \| `start_microbench.py` \| 771 \| 自动化微基准运行入口：读取 op_mapping.yaml → 按 kernel type 选择 replay 脚本 → msprof 采集 → 解析结果回写 CSV \| \| `op_replay/.py` (25+ scripts) \| 5,258 \| 每个 NPU kernel 一个回放脚本（MatMulV2、SwiGlu、FusedInferAttentionScore、QuantBatchMatmulV3、DispatchFFNCombine 等），使用 torch_npu API 构造输入并调用 \| \| `generate_shape_grid.py` \| 2,075 \| 从现有 CSV 数据出发，通过 shape mutation（维度缩放、block padding 变体、量化变体等）生成更多 shape 组合，扩大覆盖面 \| ### 3. 通信基准 \| 工具 \| 行数 \| 说明 \| \|------\|:----:\|------\| \| `generate_comm_microbench.py` \| — \| 生成 HCCL 通信基准测试脚本（allReduce、allGather、allToAll、reduceScatter） \| \| `validate_comm_alignment.py` \| 345 \| HCCL 微基准 CSV vs CommAnalyticModel 对齐验证 \| \| `run_comm_bench.sh` \| 204 \| HCCL benchmark 运行脚本 \| ### 4. 精度评估 \| 工具 \| 行数 \| 说明 \| \|------\|:----:\|------\| \| `compute_m6.py` \| — \| M6（E2E Ratio）离线计算：TC 预测总时间 / 真实 per-forward 时间，使用 ArgMaxV2 作为 anchor kernel 切分 forward passes \| ------ ## 📐 Associated Test Results / 关联测试结果 `$ pytest tests/tools/ -q --ignore=tests/tools/test_fia_parser_backfill.py 63 passed, 2 failed in 0.11s` 2 个失败为预期行为（`profile_and_update_db.py` 尚未实现，测试先于代码）。 ### Import 隔离验证 `python # 确认 tools/ 不依赖 tensor_cast import ast, sys, pathlib for f in pathlib.Path('tools/perf_data_collection').rglob('.py'): tree = ast.parse(f.read_text()) for node in ast.walk(tree): if isinstance(node, (ast.Import, ast.ImportFrom)): mod = getattr(node, 'module', '') or '' if 'tensor_cast' in mod: print(f'ERROR: {f}:{node.lineno}') sys.exit(1) # → OK: no tensor_cast imports in tools/` ------ ## 🌟 Use cases (Optional) / 使用案例（可选） bash # 1. 从 NPU profiling 原始数据生成 per-kernel CSV python tools/perf_data_collection/parse_kernel_details.py \ --input <kernel_details.csv> --output-dir <data_dir> # 2. 运行微基准测试（需要 NPU 设备） python tools/perf_data_collection/start_microbench.py \ --data-dir <data_dir> --device ATLAS_800_A3_752T_128G_DIE # 3. 生成 shape 变异矩阵扩大覆盖 python tools/perf_data_collection/generate_shape_grid.py \ --csv-dir <data_dir> --output-dir <output_dir> # 4. 计算 M6 E2E 精度 python tools/perf_data_collection/compute_m6.py \ --tc-report results/metrics.json \ --profiler-output <profiling_trace_dir> # 5. HCCL 通信基准 bash tools/perf_data_collection/run_comm_bench.sh ------ ## ✅ Checklist / 检查列表 Before PR: - [x] [Linting tools](https://gitcode.com/Ascend/msmodeling/blob/develop/tensor_cast/README.md#coding-style) are used to fix the potential lint issues. - [x] Bug fixes are fully covered by unit tests, the case that causes the bug should be added in the unit tests. - [x] The modification is covered by complete unit tests. If not, please add more unit tests to ensure the correctness. - [ ] All relevant documentation (API docs, docstrings, example tutorials) has been updated to reflect these changes. - [x] Please ensure code files contain no Chinese comments. See merge request: Ascend/msmodeling!124	1 个月前
config.py	feat: profiling data collection toolchain Co-authored-by: Horacehxw<horacehxw@gmail.com> # message auto-generated for no-merge-commit merge: !124 merge pr/perf-db-b into develop feat: profiling data collection toolchain Created-by: Horacehxw Commit-by: Horacehxw Merged-by: ascend-robot Description: PR Type / PR类型 - [x] Feature（功能新增） - [ ] Bugfix（Bug 修复） - [ ] Docs（文档更新） - [ ] CI/CD（持续集成/持续部署） - [ ] Refactor（代码重构） - [ ] Perf（性能优化） - [x] Test-Cases（测试用例更新） - [ ] Other（其他） ## 🔍 Motivation / 变更动机实测算子性能估算系统（见 [PR-A: feat: profiling-based empirical performance model with CSV data source](https://gitcode.com/Ascend/msmodeling/pull/123) )依赖 NPU Profiling 数据（per-kernel CSV + HCCL 通信基准）。这些数据需要从 NPU 设备上采集、解析、验证后才能使用。本 PR 提供完整的离线数据采集工具链，覆盖从 Profiling 原始数据解析到微基准测试、shape 变异、通信基准、M6 E2E 精度计算的全流程。 > 📌 本 PR 与 [PR-A](https://gitcode.com/Ascend/msmodeling/pull/123)（核心功能）无代码依赖——tools/ 不 import tensor_cast，可独立 review 和合入。 ------ ## 📝 Modification / 修改内容所有新增文件位于 `tools/perf_data_collection/` 和 `tests/tools/`。 ### 1. 数据解析与转换 \| 工具 \| 行数 \| 说明 \| \|------\|:----:\|------\| \| `parse_kernel_details.py` \| 698 \| 解析 NPU `kernel_details.csv` → 按 kernel type 拆分为独立 CSV（MatMulV2.csv、SwiGlu.csv 等），支持 FRACTAL_NZ format 转换、shape 归一化 \| \| `build_comm_csv.py` \| — \| HCCL benchmark 结果 → 通信 CSV 构建 \| \| `fia_common.py` + `fill_fia_runtime_metadata.py` \| 1,544 \| FusedInferAttentionScore 运行时元数据推断与回填 \| ### 2. 微基准测试 \| 工具 \| 行数 \| 说明 \| \|------\|:----:\|------\| \| `start_microbench.py` \| 771 \| 自动化微基准运行入口：读取 op_mapping.yaml → 按 kernel type 选择 replay 脚本 → msprof 采集 → 解析结果回写 CSV \| \| `op_replay/.py` (25+ scripts) \| 5,258 \| 每个 NPU kernel 一个回放脚本（MatMulV2、SwiGlu、FusedInferAttentionScore、QuantBatchMatmulV3、DispatchFFNCombine 等），使用 torch_npu API 构造输入并调用 \| \| `generate_shape_grid.py` \| 2,075 \| 从现有 CSV 数据出发，通过 shape mutation（维度缩放、block padding 变体、量化变体等）生成更多 shape 组合，扩大覆盖面 \| ### 3. 通信基准 \| 工具 \| 行数 \| 说明 \| \|------\|:----:\|------\| \| `generate_comm_microbench.py` \| — \| 生成 HCCL 通信基准测试脚本（allReduce、allGather、allToAll、reduceScatter） \| \| `validate_comm_alignment.py` \| 345 \| HCCL 微基准 CSV vs CommAnalyticModel 对齐验证 \| \| `run_comm_bench.sh` \| 204 \| HCCL benchmark 运行脚本 \| ### 4. 精度评估 \| 工具 \| 行数 \| 说明 \| \|------\|:----:\|------\| \| `compute_m6.py` \| — \| M6（E2E Ratio）离线计算：TC 预测总时间 / 真实 per-forward 时间，使用 ArgMaxV2 作为 anchor kernel 切分 forward passes \| ------ ## 📐 Associated Test Results / 关联测试结果 `$ pytest tests/tools/ -q --ignore=tests/tools/test_fia_parser_backfill.py 63 passed, 2 failed in 0.11s` 2 个失败为预期行为（`profile_and_update_db.py` 尚未实现，测试先于代码）。 ### Import 隔离验证 `python # 确认 tools/ 不依赖 tensor_cast import ast, sys, pathlib for f in pathlib.Path('tools/perf_data_collection').rglob('.py'): tree = ast.parse(f.read_text()) for node in ast.walk(tree): if isinstance(node, (ast.Import, ast.ImportFrom)): mod = getattr(node, 'module', '') or '' if 'tensor_cast' in mod: print(f'ERROR: {f}:{node.lineno}') sys.exit(1) # → OK: no tensor_cast imports in tools/` ------ ## 🌟 Use cases (Optional) / 使用案例（可选） bash # 1. 从 NPU profiling 原始数据生成 per-kernel CSV python tools/perf_data_collection/parse_kernel_details.py \ --input <kernel_details.csv> --output-dir <data_dir> # 2. 运行微基准测试（需要 NPU 设备） python tools/perf_data_collection/start_microbench.py \ --data-dir <data_dir> --device ATLAS_800_A3_752T_128G_DIE # 3. 生成 shape 变异矩阵扩大覆盖 python tools/perf_data_collection/generate_shape_grid.py \ --csv-dir <data_dir> --output-dir <output_dir> # 4. 计算 M6 E2E 精度 python tools/perf_data_collection/compute_m6.py \ --tc-report results/metrics.json \ --profiler-output <profiling_trace_dir> # 5. HCCL 通信基准 bash tools/perf_data_collection/run_comm_bench.sh ------ ## ✅ Checklist / 检查列表 Before PR: - [x] [Linting tools](https://gitcode.com/Ascend/msmodeling/blob/develop/tensor_cast/README.md#coding-style) are used to fix the potential lint issues. - [x] Bug fixes are fully covered by unit tests, the case that causes the bug should be added in the unit tests. - [x] The modification is covered by complete unit tests. If not, please add more unit tests to ensure the correctness. - [ ] All relevant documentation (API docs, docstrings, example tutorials) has been updated to reflect these changes. - [x] Please ensure code files contain no Chinese comments. See merge request: Ascend/msmodeling!124	1 个月前
config.yaml	完善 GLM5 shape grid 生成与 microbench 回填支持 Co-authored-by: Secluded_Ocean<tangchuxiao0709@qq.com> # message auto-generated for no-merge-commit merge: !252 merge codex/debug-shape-grid-generation into develop 完善 GLM5 shape grid 生成与 microbench 回填支持 Created-by: Secluded_Ocean Commit-by: Secluded_Ocean Merged-by: ascend-robot Description: ## Summary - improve GLM5 shape grid generation and EP32 DFC replay coverage - normalize shape matching and dedupe behavior - extend microbench update/replay tooling and related tests ## Validation - Not run locally because pytest is not installed for the available Python interpreter. See merge request: Ascend/msmodeling!252	1 个月前
evaluator.py	feat: profiling data collection toolchain Co-authored-by: Horacehxw<horacehxw@gmail.com> # message auto-generated for no-merge-commit merge: !124 merge pr/perf-db-b into develop feat: profiling data collection toolchain Created-by: Horacehxw Commit-by: Horacehxw Merged-by: ascend-robot Description: PR Type / PR类型 - [x] Feature（功能新增） - [ ] Bugfix（Bug 修复） - [ ] Docs（文档更新） - [ ] CI/CD（持续集成/持续部署） - [ ] Refactor（代码重构） - [ ] Perf（性能优化） - [x] Test-Cases（测试用例更新） - [ ] Other（其他） ## 🔍 Motivation / 变更动机实测算子性能估算系统（见 [PR-A: feat: profiling-based empirical performance model with CSV data source](https://gitcode.com/Ascend/msmodeling/pull/123) )依赖 NPU Profiling 数据（per-kernel CSV + HCCL 通信基准）。这些数据需要从 NPU 设备上采集、解析、验证后才能使用。本 PR 提供完整的离线数据采集工具链，覆盖从 Profiling 原始数据解析到微基准测试、shape 变异、通信基准、M6 E2E 精度计算的全流程。 > 📌 本 PR 与 [PR-A](https://gitcode.com/Ascend/msmodeling/pull/123)（核心功能）无代码依赖——tools/ 不 import tensor_cast，可独立 review 和合入。 ------ ## 📝 Modification / 修改内容所有新增文件位于 `tools/perf_data_collection/` 和 `tests/tools/`。 ### 1. 数据解析与转换 \| 工具 \| 行数 \| 说明 \| \|------\|:----:\|------\| \| `parse_kernel_details.py` \| 698 \| 解析 NPU `kernel_details.csv` → 按 kernel type 拆分为独立 CSV（MatMulV2.csv、SwiGlu.csv 等），支持 FRACTAL_NZ format 转换、shape 归一化 \| \| `build_comm_csv.py` \| — \| HCCL benchmark 结果 → 通信 CSV 构建 \| \| `fia_common.py` + `fill_fia_runtime_metadata.py` \| 1,544 \| FusedInferAttentionScore 运行时元数据推断与回填 \| ### 2. 微基准测试 \| 工具 \| 行数 \| 说明 \| \|------\|:----:\|------\| \| `start_microbench.py` \| 771 \| 自动化微基准运行入口：读取 op_mapping.yaml → 按 kernel type 选择 replay 脚本 → msprof 采集 → 解析结果回写 CSV \| \| `op_replay/.py` (25+ scripts) \| 5,258 \| 每个 NPU kernel 一个回放脚本（MatMulV2、SwiGlu、FusedInferAttentionScore、QuantBatchMatmulV3、DispatchFFNCombine 等），使用 torch_npu API 构造输入并调用 \| \| `generate_shape_grid.py` \| 2,075 \| 从现有 CSV 数据出发，通过 shape mutation（维度缩放、block padding 变体、量化变体等）生成更多 shape 组合，扩大覆盖面 \| ### 3. 通信基准 \| 工具 \| 行数 \| 说明 \| \|------\|:----:\|------\| \| `generate_comm_microbench.py` \| — \| 生成 HCCL 通信基准测试脚本（allReduce、allGather、allToAll、reduceScatter） \| \| `validate_comm_alignment.py` \| 345 \| HCCL 微基准 CSV vs CommAnalyticModel 对齐验证 \| \| `run_comm_bench.sh` \| 204 \| HCCL benchmark 运行脚本 \| ### 4. 精度评估 \| 工具 \| 行数 \| 说明 \| \|------\|:----:\|------\| \| `compute_m6.py` \| — \| M6（E2E Ratio）离线计算：TC 预测总时间 / 真实 per-forward 时间，使用 ArgMaxV2 作为 anchor kernel 切分 forward passes \| ------ ## 📐 Associated Test Results / 关联测试结果 `$ pytest tests/tools/ -q --ignore=tests/tools/test_fia_parser_backfill.py 63 passed, 2 failed in 0.11s` 2 个失败为预期行为（`profile_and_update_db.py` 尚未实现，测试先于代码）。 ### Import 隔离验证 `python # 确认 tools/ 不依赖 tensor_cast import ast, sys, pathlib for f in pathlib.Path('tools/perf_data_collection').rglob('.py'): tree = ast.parse(f.read_text()) for node in ast.walk(tree): if isinstance(node, (ast.Import, ast.ImportFrom)): mod = getattr(node, 'module', '') or '' if 'tensor_cast' in mod: print(f'ERROR: {f}:{node.lineno}') sys.exit(1) # → OK: no tensor_cast imports in tools/` ------ ## 🌟 Use cases (Optional) / 使用案例（可选） bash # 1. 从 NPU profiling 原始数据生成 per-kernel CSV python tools/perf_data_collection/parse_kernel_details.py \ --input <kernel_details.csv> --output-dir <data_dir> # 2. 运行微基准测试（需要 NPU 设备） python tools/perf_data_collection/start_microbench.py \ --data-dir <data_dir> --device ATLAS_800_A3_752T_128G_DIE # 3. 生成 shape 变异矩阵扩大覆盖 python tools/perf_data_collection/generate_shape_grid.py \ --csv-dir <data_dir> --output-dir <output_dir> # 4. 计算 M6 E2E 精度 python tools/perf_data_collection/compute_m6.py \ --tc-report results/metrics.json \ --profiler-output <profiling_trace_dir> # 5. HCCL 通信基准 bash tools/perf_data_collection/run_comm_bench.sh ------ ## ✅ Checklist / 检查列表 Before PR: - [x] [Linting tools](https://gitcode.com/Ascend/msmodeling/blob/develop/tensor_cast/README.md#coding-style) are used to fix the potential lint issues. - [x] Bug fixes are fully covered by unit tests, the case that causes the bug should be added in the unit tests. - [x] The modification is covered by complete unit tests. If not, please add more unit tests to ensure the correctness. - [ ] All relevant documentation (API docs, docstrings, example tutorials) has been updated to reflect these changes. - [x] Please ensure code files contain no Chinese comments. See merge request: Ascend/msmodeling!124	1 个月前
model_configs.py	完善 GLM5 shape grid 生成与 microbench 回填支持 Co-authored-by: Secluded_Ocean<tangchuxiao0709@qq.com> # message auto-generated for no-merge-commit merge: !252 merge codex/debug-shape-grid-generation into develop 完善 GLM5 shape grid 生成与 microbench 回填支持 Created-by: Secluded_Ocean Commit-by: Secluded_Ocean Merged-by: ascend-robot Description: ## Summary - improve GLM5 shape grid generation and EP32 DFC replay coverage - normalize shape matching and dedupe behavior - extend microbench update/replay tooling and related tests ## Validation - Not run locally because pytest is not installed for the available Python interpreter. See merge request: Ascend/msmodeling!252	1 个月前
runner.py	feat: profiling data collection toolchain Co-authored-by: Horacehxw<horacehxw@gmail.com> # message auto-generated for no-merge-commit merge: !124 merge pr/perf-db-b into develop feat: profiling data collection toolchain Created-by: Horacehxw Commit-by: Horacehxw Merged-by: ascend-robot Description: PR Type / PR类型 - [x] Feature（功能新增） - [ ] Bugfix（Bug 修复） - [ ] Docs（文档更新） - [ ] CI/CD（持续集成/持续部署） - [ ] Refactor（代码重构） - [ ] Perf（性能优化） - [x] Test-Cases（测试用例更新） - [ ] Other（其他） ## 🔍 Motivation / 变更动机实测算子性能估算系统（见 [PR-A: feat: profiling-based empirical performance model with CSV data source](https://gitcode.com/Ascend/msmodeling/pull/123) )依赖 NPU Profiling 数据（per-kernel CSV + HCCL 通信基准）。这些数据需要从 NPU 设备上采集、解析、验证后才能使用。本 PR 提供完整的离线数据采集工具链，覆盖从 Profiling 原始数据解析到微基准测试、shape 变异、通信基准、M6 E2E 精度计算的全流程。 > 📌 本 PR 与 [PR-A](https://gitcode.com/Ascend/msmodeling/pull/123)（核心功能）无代码依赖——tools/ 不 import tensor_cast，可独立 review 和合入。 ------ ## 📝 Modification / 修改内容所有新增文件位于 `tools/perf_data_collection/` 和 `tests/tools/`。 ### 1. 数据解析与转换 \| 工具 \| 行数 \| 说明 \| \|------\|:----:\|------\| \| `parse_kernel_details.py` \| 698 \| 解析 NPU `kernel_details.csv` → 按 kernel type 拆分为独立 CSV（MatMulV2.csv、SwiGlu.csv 等），支持 FRACTAL_NZ format 转换、shape 归一化 \| \| `build_comm_csv.py` \| — \| HCCL benchmark 结果 → 通信 CSV 构建 \| \| `fia_common.py` + `fill_fia_runtime_metadata.py` \| 1,544 \| FusedInferAttentionScore 运行时元数据推断与回填 \| ### 2. 微基准测试 \| 工具 \| 行数 \| 说明 \| \|------\|:----:\|------\| \| `start_microbench.py` \| 771 \| 自动化微基准运行入口：读取 op_mapping.yaml → 按 kernel type 选择 replay 脚本 → msprof 采集 → 解析结果回写 CSV \| \| `op_replay/.py` (25+ scripts) \| 5,258 \| 每个 NPU kernel 一个回放脚本（MatMulV2、SwiGlu、FusedInferAttentionScore、QuantBatchMatmulV3、DispatchFFNCombine 等），使用 torch_npu API 构造输入并调用 \| \| `generate_shape_grid.py` \| 2,075 \| 从现有 CSV 数据出发，通过 shape mutation（维度缩放、block padding 变体、量化变体等）生成更多 shape 组合，扩大覆盖面 \| ### 3. 通信基准 \| 工具 \| 行数 \| 说明 \| \|------\|:----:\|------\| \| `generate_comm_microbench.py` \| — \| 生成 HCCL 通信基准测试脚本（allReduce、allGather、allToAll、reduceScatter） \| \| `validate_comm_alignment.py` \| 345 \| HCCL 微基准 CSV vs CommAnalyticModel 对齐验证 \| \| `run_comm_bench.sh` \| 204 \| HCCL benchmark 运行脚本 \| ### 4. 精度评估 \| 工具 \| 行数 \| 说明 \| \|------\|:----:\|------\| \| `compute_m6.py` \| — \| M6（E2E Ratio）离线计算：TC 预测总时间 / 真实 per-forward 时间，使用 ArgMaxV2 作为 anchor kernel 切分 forward passes \| ------ ## 📐 Associated Test Results / 关联测试结果 `$ pytest tests/tools/ -q --ignore=tests/tools/test_fia_parser_backfill.py 63 passed, 2 failed in 0.11s` 2 个失败为预期行为（`profile_and_update_db.py` 尚未实现，测试先于代码）。 ### Import 隔离验证 `python # 确认 tools/ 不依赖 tensor_cast import ast, sys, pathlib for f in pathlib.Path('tools/perf_data_collection').rglob('.py'): tree = ast.parse(f.read_text()) for node in ast.walk(tree): if isinstance(node, (ast.Import, ast.ImportFrom)): mod = getattr(node, 'module', '') or '' if 'tensor_cast' in mod: print(f'ERROR: {f}:{node.lineno}') sys.exit(1) # → OK: no tensor_cast imports in tools/` ------ ## 🌟 Use cases (Optional) / 使用案例（可选） bash # 1. 从 NPU profiling 原始数据生成 per-kernel CSV python tools/perf_data_collection/parse_kernel_details.py \ --input <kernel_details.csv> --output-dir <data_dir> # 2. 运行微基准测试（需要 NPU 设备） python tools/perf_data_collection/start_microbench.py \ --data-dir <data_dir> --device ATLAS_800_A3_752T_128G_DIE # 3. 生成 shape 变异矩阵扩大覆盖 python tools/perf_data_collection/generate_shape_grid.py \ --csv-dir <data_dir> --output-dir <output_dir> # 4. 计算 M6 E2E 精度 python tools/perf_data_collection/compute_m6.py \ --tc-report results/metrics.json \ --profiler-output <profiling_trace_dir> # 5. HCCL 通信基准 bash tools/perf_data_collection/run_comm_bench.sh ------ ## ✅ Checklist / 检查列表 Before PR: - [x] [Linting tools](https://gitcode.com/Ascend/msmodeling/blob/develop/tensor_cast/README.md#coding-style) are used to fix the potential lint issues. - [x] Bug fixes are fully covered by unit tests, the case that causes the bug should be added in the unit tests. - [x] The modification is covered by complete unit tests. If not, please add more unit tests to ensure the correctness. - [ ] All relevant documentation (API docs, docstrings, example tutorials) has been updated to reflect these changes. - [x] Please ensure code files contain no Chinese comments. See merge request: Ascend/msmodeling!124	1 个月前
shape_grids.py	feat: profiling data collection toolchain Co-authored-by: Horacehxw<horacehxw@gmail.com> # message auto-generated for no-merge-commit merge: !124 merge pr/perf-db-b into develop feat: profiling data collection toolchain Created-by: Horacehxw Commit-by: Horacehxw Merged-by: ascend-robot Description: PR Type / PR类型 - [x] Feature（功能新增） - [ ] Bugfix（Bug 修复） - [ ] Docs（文档更新） - [ ] CI/CD（持续集成/持续部署） - [ ] Refactor（代码重构） - [ ] Perf（性能优化） - [x] Test-Cases（测试用例更新） - [ ] Other（其他） ## 🔍 Motivation / 变更动机实测算子性能估算系统（见 [PR-A: feat: profiling-based empirical performance model with CSV data source](https://gitcode.com/Ascend/msmodeling/pull/123) )依赖 NPU Profiling 数据（per-kernel CSV + HCCL 通信基准）。这些数据需要从 NPU 设备上采集、解析、验证后才能使用。本 PR 提供完整的离线数据采集工具链，覆盖从 Profiling 原始数据解析到微基准测试、shape 变异、通信基准、M6 E2E 精度计算的全流程。 > 📌 本 PR 与 [PR-A](https://gitcode.com/Ascend/msmodeling/pull/123)（核心功能）无代码依赖——tools/ 不 import tensor_cast，可独立 review 和合入。 ------ ## 📝 Modification / 修改内容所有新增文件位于 `tools/perf_data_collection/` 和 `tests/tools/`。 ### 1. 数据解析与转换 \| 工具 \| 行数 \| 说明 \| \|------\|:----:\|------\| \| `parse_kernel_details.py` \| 698 \| 解析 NPU `kernel_details.csv` → 按 kernel type 拆分为独立 CSV（MatMulV2.csv、SwiGlu.csv 等），支持 FRACTAL_NZ format 转换、shape 归一化 \| \| `build_comm_csv.py` \| — \| HCCL benchmark 结果 → 通信 CSV 构建 \| \| `fia_common.py` + `fill_fia_runtime_metadata.py` \| 1,544 \| FusedInferAttentionScore 运行时元数据推断与回填 \| ### 2. 微基准测试 \| 工具 \| 行数 \| 说明 \| \|------\|:----:\|------\| \| `start_microbench.py` \| 771 \| 自动化微基准运行入口：读取 op_mapping.yaml → 按 kernel type 选择 replay 脚本 → msprof 采集 → 解析结果回写 CSV \| \| `op_replay/.py` (25+ scripts) \| 5,258 \| 每个 NPU kernel 一个回放脚本（MatMulV2、SwiGlu、FusedInferAttentionScore、QuantBatchMatmulV3、DispatchFFNCombine 等），使用 torch_npu API 构造输入并调用 \| \| `generate_shape_grid.py` \| 2,075 \| 从现有 CSV 数据出发，通过 shape mutation（维度缩放、block padding 变体、量化变体等）生成更多 shape 组合，扩大覆盖面 \| ### 3. 通信基准 \| 工具 \| 行数 \| 说明 \| \|------\|:----:\|------\| \| `generate_comm_microbench.py` \| — \| 生成 HCCL 通信基准测试脚本（allReduce、allGather、allToAll、reduceScatter） \| \| `validate_comm_alignment.py` \| 345 \| HCCL 微基准 CSV vs CommAnalyticModel 对齐验证 \| \| `run_comm_bench.sh` \| 204 \| HCCL benchmark 运行脚本 \| ### 4. 精度评估 \| 工具 \| 行数 \| 说明 \| \|------\|:----:\|------\| \| `compute_m6.py` \| — \| M6（E2E Ratio）离线计算：TC 预测总时间 / 真实 per-forward 时间，使用 ArgMaxV2 作为 anchor kernel 切分 forward passes \| ------ ## 📐 Associated Test Results / 关联测试结果 `$ pytest tests/tools/ -q --ignore=tests/tools/test_fia_parser_backfill.py 63 passed, 2 failed in 0.11s` 2 个失败为预期行为（`profile_and_update_db.py` 尚未实现，测试先于代码）。 ### Import 隔离验证 `python # 确认 tools/ 不依赖 tensor_cast import ast, sys, pathlib for f in pathlib.Path('tools/perf_data_collection').rglob('.py'): tree = ast.parse(f.read_text()) for node in ast.walk(tree): if isinstance(node, (ast.Import, ast.ImportFrom)): mod = getattr(node, 'module', '') or '' if 'tensor_cast' in mod: print(f'ERROR: {f}:{node.lineno}') sys.exit(1) # → OK: no tensor_cast imports in tools/` ------ ## 🌟 Use cases (Optional) / 使用案例（可选） bash # 1. 从 NPU profiling 原始数据生成 per-kernel CSV python tools/perf_data_collection/parse_kernel_details.py \ --input <kernel_details.csv> --output-dir <data_dir> # 2. 运行微基准测试（需要 NPU 设备） python tools/perf_data_collection/start_microbench.py \ --data-dir <data_dir> --device ATLAS_800_A3_752T_128G_DIE # 3. 生成 shape 变异矩阵扩大覆盖 python tools/perf_data_collection/generate_shape_grid.py \ --csv-dir <data_dir> --output-dir <output_dir> # 4. 计算 M6 E2E 精度 python tools/perf_data_collection/compute_m6.py \ --tc-report results/metrics.json \ --profiler-output <profiling_trace_dir> # 5. HCCL 通信基准 bash tools/perf_data_collection/run_comm_bench.sh ------ ## ✅ Checklist / 检查列表 Before PR: - [x] [Linting tools](https://gitcode.com/Ascend/msmodeling/blob/develop/tensor_cast/README.md#coding-style) are used to fix the potential lint issues. - [x] Bug fixes are fully covered by unit tests, the case that causes the bug should be added in the unit tests. - [x] The modification is covered by complete unit tests. If not, please add more unit tests to ensure the correctness. - [ ] All relevant documentation (API docs, docstrings, example tutorials) has been updated to reflect these changes. - [x] Please ensure code files contain no Chinese comments. See merge request: Ascend/msmodeling!124	1 个月前
theory_router.py	完善 GLM5 shape grid 生成与 microbench 回填支持 Co-authored-by: Secluded_Ocean<tangchuxiao0709@qq.com> # message auto-generated for no-merge-commit merge: !252 merge codex/debug-shape-grid-generation into develop 完善 GLM5 shape grid 生成与 microbench 回填支持 Created-by: Secluded_Ocean Commit-by: Secluded_Ocean Merged-by: ascend-robot Description: ## Summary - improve GLM5 shape grid generation and EP32 DFC replay coverage - normalize shape matching and dedupe behavior - extend microbench update/replay tooling and related tests ## Validation - Not run locally because pytest is not installed for the available Python interpreter. See merge request: Ascend/msmodeling!252	1 个月前
utils.py	完善 GLM5 shape grid 生成与 microbench 回填支持 Co-authored-by: Secluded_Ocean<tangchuxiao0709@qq.com> # message auto-generated for no-merge-commit merge: !252 merge codex/debug-shape-grid-generation into develop 完善 GLM5 shape grid 生成与 microbench 回填支持 Created-by: Secluded_Ocean Commit-by: Secluded_Ocean Merged-by: ascend-robot Description: ## Summary - improve GLM5 shape grid generation and EP32 DFC replay coverage - normalize shape matching and dedupe behavior - extend microbench update/replay tooling and related tests ## Validation - Not run locally because pytest is not installed for the available Python interpreter. See merge request: Ascend/msmodeling!252	1 个月前