ascend-robot优化 msModeling README 与中英文文档结构

文件	最后提交记录	最后更新时间
device_config	[feat] 添加SKILL：通过自然语言配置 device Co-authored-by: wendellX<wenxiaoping6@huawei.com> # message auto-generated for no-merge-commit merge: !243 merge skill_device_cfg into develop [feat] 添加SKILL：通过自然语言配置 device Created-by: wendellX Commit-by: wendellX Merged-by: ascend-robot Description: # PR Template Thanks for your contribution; we appreciate it a lot. The following instructions will make your pull request healthier and help you get feedback more easily. If you do not understand some items, don't worry, just make the pull request and seek help from maintainers. 感谢您的贡献，我们非常重视。以下说明将使您的拉取请求更健康，更易于获得反馈。如果您不理解某些项目，请不要担心，只需提交拉取请求并从维护人员那里寻求帮助即可。 PR Type / PR类型 - [X] Feature（功能新增） - [ ] Bugfix（Bug 修复） - [ ] Docs（文档更新） - [ ] CI/CD（持续集成/持续部署） - [ ] Refactor（代码重构） - [ ] Perf（性能优化） - [ ] Test-Cases（测试用例更新） - [ ] Other（其他） ## 🔍 Motivation / 变更动机 ------ ## 📝 Modification / 修改内容添加SKILL文件：通过自然语言配置 device ------ ## 📐 Associated Test Results / 关联测试结果 --- 用户：我要导入新的设备拓扑 Claude：我们先不用填复杂参数，我会一步步带你配。先告诉我下面 3 件事就够了，不知道的直接写“不知道”： 1. 这是什么硬件？例如厂商和型号。 2. 你想按什么粒度建这个拓扑：单卡、单 die / chiplet，还是别的调度单元？ 3. 你手头有没有这些原始信息里的任意一部分：显存容量、显存带宽、FP16/BF16/INT8/FP32 算力、以及“几卡/几 die、它们之间怎么连”的描述？有的话原样贴出来就行。你可以直接像这样回我： - 型号：XXX - 粒度：单卡 / 单 die / 不确定 - 已知规格：96GB 显存，3.2TB/s，BF16 800T，8 卡，卡间 400GB/s 收到后我会先帮你整理成“已确认 / 还不确定 / 可以先默认”的清单，再继续下一步。用户：华为的ATLAS_800_A3_560T_128G_DIE。单 die，显存容量 64GiB，显存带宽 1.6TiB/s Claude：我先记下目前已确认的事实。 - 已确认： - 厂商：HUAWEI - profile 候选名：`ATLAS_800_A3_560T_128G_DIE` - 粒度：单 die - 每个 profile 单元显存：64GiB - 每个 profile 单元显存带宽：1.6TiB/s - 说明： - 仓库里现有的 [tensor_cast/device.py:299-320] `ATLAS_800_A3_752T_128G_DIE` 也是“名字带 128G，但单 die 显存写 64GiB”的风格，所以你这个命名方式是对得上的。接下来我只追问 3 件最关键的事，不知道可以直接写“不知道”或“先默认”： 1. `560T` 指的是哪种算力？ - 例如：`FP16 560T`、`BF16 560T`、`INT8 560T` - 如果资料里还有 `FP32 / BF16 / FP16 / INT8` 的其他峰值，也请原样贴出来。 2. 资料里有没有单独的“通用 / vector 算力”？ - 这项主要影响 softmax、norm、激活函数这类非矩阵算子的估算。 - 你可以直接回： - `没有，先留空` - 或 `没有，先用临时估值` 3. 这个单 die profile 的互联拓扑你想怎么建？ - 你可以直接选一种： - `沿用现有 A3 die 拓扑` - `只建 8 卡节点内` - `我只知道 2 die/卡、8 卡/节点，其它先默认` - 如果你知道互联带宽，顺手告诉我是“单向”还是“双向”。你回这 3 项后，我就能整理出将写入的字段、默认值和待校准项，再给你最后确认。用户：FP32 75T，BF16 245.8T，FP16 280T，INT8 560T。通信 grid 形状是 [48, 8, 2]。 Claude：好的，目前我整理出的“已确认”是： - `name`: `ATLAS_800_A3_560T_128G_DIE` - `vendor`: `HUAWEI` - 粒度：单 die - 每 die 显存：64 GiB - 每 die 显存带宽：1.6 TiB/s - `mma_ops`： - FP32 = 75T - BF16 = 245.8T - FP16 = 280T - INT8 = 560T - `comm_grid` 形状：`[48, 8, 2]` 这基本就是把现有 A3 die 模板 [tensor_cast/device.py:169-182] 和 280T 档位算力风格结合起来了。还差 3 个会直接写进代码的点，你选一下即可： 1. `gp_ops` 怎么处理？ - `先留空` - `先用临时估值：FP32 8T，BF16/FP16 16T` - `我有官方值：...` 2. `[48, 8, 2]` 的每层互联参数，是否直接沿用现有 A3 die 拓扑 [tensor_cast/device.py:169-182]？ - 外层：196 GB/s，5.5 us - 中层：196 GB/s，0.5 us - 内层：224 GB/s，0.2 us - `comm_efficiency = 0.7` - 类型按当前代码默认 `CLOS` - 这些带宽会按“单向带宽”写入你可以直接回：`沿用现有 A3 die 拓扑` 3. 如果没有更精确数据，是否接受先沿用当前默认效率和静态开销，并标记 `needs calibration`？ - `compute_efficiency = 0.7` - `memory_efficiency = 0.6` - `StaticCost(5us, 2us, 10us)` 你回这 3 项后，我就直接改 [tensor_cast/device.py]，然后做导入注册验证，并给你可用的 `--device ATLAS_800_A3_560T_128G_DIE`。用户：1 使用临时值；2 沿用A3 ； 3 compute_efficiency=0.7，memory_efficiency=0.6。 Claude：还差一个会写进代码的小点：`StaticCost` 你要不要也先沿用现有 A3 默认值 `mma=5us, gp=2us, comm=10us`？如果可以，我收到你一句“可以”就直接改代码并验证。用户：mma_op_cost_s=5e-06、gp_op_cost_s=2e-06、comm_op_cost_s=1e-05。用这个吧 Claude：已加到 [tensor_cast/device.py:299-320]，复用了现有 A3 die 拓扑 [tensor_cast/device.py:169-182]。 - 已注册的 `DeviceProfile.name` - `ATLAS_800_A3_560T_128G_DIE` - 直接使用的你提供的事实 - `vendor="HUAWEI"` - 单 die 粒度 - `memory_size_bytes=64 * (1024*3)` - `memory_bandwidth_bytes_ps=1.6 (1024*4)` - `mma_ops` - FP32 = `75 1e12` - BF16 = `245.8 * 1e12` - FP16 = `280 * 1e12` - INT8 = `560 * 1e12` - `comm_grid` 形状 `[48, 8, 2]` - `compute_efficiency=0.7` - `memory_efficiency=0.6` - `StaticCost(mma=5e-06, gp=2e-06, comm=1e-05)` - 用户确认的估值 - `gp_ops` - FP32 = `8 * 1e12` - BF16 = `16 * 1e12` - FP16 = `16 * 1e12` - `needs calibration` - `gp_ops` 目前是临时估值，不是官方 vector/general compute 指标 - 互联参数目前沿用现有 A3 die 拓扑： - `start_dim=0`: 196 GB/s, 5.5 us, `comm_efficiency=0.7` - `start_dim=1`: 196 GB/s, 0.5 us, `comm_efficiency=0.7` - `start_dim=2`: 224 GB/s, 0.2 us, `comm_efficiency=0.7` - 验证命令 - `python - <<'PY' from tensor_cast.device import DeviceProfile print(DeviceProfile.all_device_profiles["ATLAS_800_A3_560T_128G_DIE"]) PY` - 结果：成功打印该 `DeviceProfile`，注册正常 - 可直接用的命令 - `python -m cli.inference.text_generate Qwen/Qwen3-32B --num-queries 2 --query-length 3500 --device ATLAS_800_A3_560T_128G_DIE` 已经可用了；如果你愿意，下一步最值得校准的是 `gp_ops` 是否有官方值。 ------ ## 🌟 Use cases (Optional) / 使用案例（可选） If this PR introduces a new feature, it is better to list some use cases here and update the documentation. 如果此拉取请求引入了新功能，最好在此处列出一些用例并更新文档。 ------ ## ✅ Checklist / 检查列表 Before PR: - [ ] Bug fixes are fully covered by unit tests, the case that causes the bug should be added in the unit tests. / 修复的 Bug 已完全由单元测试覆盖，导致 Bug 的情况应在单元测试中添加。 - [ ] The modification is covered by complete unit tests. If not, please add more unit tests to ensure the correctness. / 此拉取请求中的修改已完全由单元测试覆盖。如果不是，请添加更多单元测试以确保正确性。 - [ ] All relevant documentation (API docs, docstrings, example tutorials) has been updated to reflect these changes. / 所有相关文档（API 文档、文档字符串、示例教程）已更新以反映这些更改。 - [ ] Please ensure code files contain no Chinese comments. / 请保证代码文件中不含中文注释。 ------ See merge request: Ascend/msmodeling!243	1 个月前
microbench	[feat] 添加SKILL：通过自然语言配置 device Co-authored-by: wendellX<wenxiaoping6@huawei.com> # message auto-generated for no-merge-commit merge: !243 merge skill_device_cfg into develop [feat] 添加SKILL：通过自然语言配置 device Created-by: wendellX Commit-by: wendellX Merged-by: ascend-robot Description: # PR Template Thanks for your contribution; we appreciate it a lot. The following instructions will make your pull request healthier and help you get feedback more easily. If you do not understand some items, don't worry, just make the pull request and seek help from maintainers. 感谢您的贡献，我们非常重视。以下说明将使您的拉取请求更健康，更易于获得反馈。如果您不理解某些项目，请不要担心，只需提交拉取请求并从维护人员那里寻求帮助即可。 PR Type / PR类型 - [X] Feature（功能新增） - [ ] Bugfix（Bug 修复） - [ ] Docs（文档更新） - [ ] CI/CD（持续集成/持续部署） - [ ] Refactor（代码重构） - [ ] Perf（性能优化） - [ ] Test-Cases（测试用例更新） - [ ] Other（其他） ## 🔍 Motivation / 变更动机 ------ ## 📝 Modification / 修改内容添加SKILL文件：通过自然语言配置 device ------ ## 📐 Associated Test Results / 关联测试结果 --- 用户：我要导入新的设备拓扑 Claude：我们先不用填复杂参数，我会一步步带你配。先告诉我下面 3 件事就够了，不知道的直接写“不知道”： 1. 这是什么硬件？例如厂商和型号。 2. 你想按什么粒度建这个拓扑：单卡、单 die / chiplet，还是别的调度单元？ 3. 你手头有没有这些原始信息里的任意一部分：显存容量、显存带宽、FP16/BF16/INT8/FP32 算力、以及“几卡/几 die、它们之间怎么连”的描述？有的话原样贴出来就行。你可以直接像这样回我： - 型号：XXX - 粒度：单卡 / 单 die / 不确定 - 已知规格：96GB 显存，3.2TB/s，BF16 800T，8 卡，卡间 400GB/s 收到后我会先帮你整理成“已确认 / 还不确定 / 可以先默认”的清单，再继续下一步。用户：华为的ATLAS_800_A3_560T_128G_DIE。单 die，显存容量 64GiB，显存带宽 1.6TiB/s Claude：我先记下目前已确认的事实。 - 已确认： - 厂商：HUAWEI - profile 候选名：`ATLAS_800_A3_560T_128G_DIE` - 粒度：单 die - 每个 profile 单元显存：64GiB - 每个 profile 单元显存带宽：1.6TiB/s - 说明： - 仓库里现有的 [tensor_cast/device.py:299-320] `ATLAS_800_A3_752T_128G_DIE` 也是“名字带 128G，但单 die 显存写 64GiB”的风格，所以你这个命名方式是对得上的。接下来我只追问 3 件最关键的事，不知道可以直接写“不知道”或“先默认”： 1. `560T` 指的是哪种算力？ - 例如：`FP16 560T`、`BF16 560T`、`INT8 560T` - 如果资料里还有 `FP32 / BF16 / FP16 / INT8` 的其他峰值，也请原样贴出来。 2. 资料里有没有单独的“通用 / vector 算力”？ - 这项主要影响 softmax、norm、激活函数这类非矩阵算子的估算。 - 你可以直接回： - `没有，先留空` - 或 `没有，先用临时估值` 3. 这个单 die profile 的互联拓扑你想怎么建？ - 你可以直接选一种： - `沿用现有 A3 die 拓扑` - `只建 8 卡节点内` - `我只知道 2 die/卡、8 卡/节点，其它先默认` - 如果你知道互联带宽，顺手告诉我是“单向”还是“双向”。你回这 3 项后，我就能整理出将写入的字段、默认值和待校准项，再给你最后确认。用户：FP32 75T，BF16 245.8T，FP16 280T，INT8 560T。通信 grid 形状是 [48, 8, 2]。 Claude：好的，目前我整理出的“已确认”是： - `name`: `ATLAS_800_A3_560T_128G_DIE` - `vendor`: `HUAWEI` - 粒度：单 die - 每 die 显存：64 GiB - 每 die 显存带宽：1.6 TiB/s - `mma_ops`： - FP32 = 75T - BF16 = 245.8T - FP16 = 280T - INT8 = 560T - `comm_grid` 形状：`[48, 8, 2]` 这基本就是把现有 A3 die 模板 [tensor_cast/device.py:169-182] 和 280T 档位算力风格结合起来了。还差 3 个会直接写进代码的点，你选一下即可： 1. `gp_ops` 怎么处理？ - `先留空` - `先用临时估值：FP32 8T，BF16/FP16 16T` - `我有官方值：...` 2. `[48, 8, 2]` 的每层互联参数，是否直接沿用现有 A3 die 拓扑 [tensor_cast/device.py:169-182]？ - 外层：196 GB/s，5.5 us - 中层：196 GB/s，0.5 us - 内层：224 GB/s，0.2 us - `comm_efficiency = 0.7` - 类型按当前代码默认 `CLOS` - 这些带宽会按“单向带宽”写入你可以直接回：`沿用现有 A3 die 拓扑` 3. 如果没有更精确数据，是否接受先沿用当前默认效率和静态开销，并标记 `needs calibration`？ - `compute_efficiency = 0.7` - `memory_efficiency = 0.6` - `StaticCost(5us, 2us, 10us)` 你回这 3 项后，我就直接改 [tensor_cast/device.py]，然后做导入注册验证，并给你可用的 `--device ATLAS_800_A3_560T_128G_DIE`。用户：1 使用临时值；2 沿用A3 ； 3 compute_efficiency=0.7，memory_efficiency=0.6。 Claude：还差一个会写进代码的小点：`StaticCost` 你要不要也先沿用现有 A3 默认值 `mma=5us, gp=2us, comm=10us`？如果可以，我收到你一句“可以”就直接改代码并验证。用户：mma_op_cost_s=5e-06、gp_op_cost_s=2e-06、comm_op_cost_s=1e-05。用这个吧 Claude：已加到 [tensor_cast/device.py:299-320]，复用了现有 A3 die 拓扑 [tensor_cast/device.py:169-182]。 - 已注册的 `DeviceProfile.name` - `ATLAS_800_A3_560T_128G_DIE` - 直接使用的你提供的事实 - `vendor="HUAWEI"` - 单 die 粒度 - `memory_size_bytes=64 * (1024*3)` - `memory_bandwidth_bytes_ps=1.6 (1024*4)` - `mma_ops` - FP32 = `75 1e12` - BF16 = `245.8 * 1e12` - FP16 = `280 * 1e12` - INT8 = `560 * 1e12` - `comm_grid` 形状 `[48, 8, 2]` - `compute_efficiency=0.7` - `memory_efficiency=0.6` - `StaticCost(mma=5e-06, gp=2e-06, comm=1e-05)` - 用户确认的估值 - `gp_ops` - FP32 = `8 * 1e12` - BF16 = `16 * 1e12` - FP16 = `16 * 1e12` - `needs calibration` - `gp_ops` 目前是临时估值，不是官方 vector/general compute 指标 - 互联参数目前沿用现有 A3 die 拓扑： - `start_dim=0`: 196 GB/s, 5.5 us, `comm_efficiency=0.7` - `start_dim=1`: 196 GB/s, 0.5 us, `comm_efficiency=0.7` - `start_dim=2`: 224 GB/s, 0.2 us, `comm_efficiency=0.7` - 验证命令 - `python - <<'PY' from tensor_cast.device import DeviceProfile print(DeviceProfile.all_device_profiles["ATLAS_800_A3_560T_128G_DIE"]) PY` - 结果：成功打印该 `DeviceProfile`，注册正常 - 可直接用的命令 - `python -m cli.inference.text_generate Qwen/Qwen3-32B --num-queries 2 --query-length 3500 --device ATLAS_800_A3_560T_128G_DIE` 已经可用了；如果你愿意，下一步最值得校准的是 `gp_ops` 是否有官方值。 ------ ## 🌟 Use cases (Optional) / 使用案例（可选） If this PR introduces a new feature, it is better to list some use cases here and update the documentation. 如果此拉取请求引入了新功能，最好在此处列出一些用例并更新文档。 ------ ## ✅ Checklist / 检查列表 Before PR: - [ ] Bug fixes are fully covered by unit tests, the case that causes the bug should be added in the unit tests. / 修复的 Bug 已完全由单元测试覆盖，导致 Bug 的情况应在单元测试中添加。 - [ ] The modification is covered by complete unit tests. If not, please add more unit tests to ensure the correctness. / 此拉取请求中的修改已完全由单元测试覆盖。如果不是，请添加更多单元测试以确保正确性。 - [ ] All relevant documentation (API docs, docstrings, example tutorials) has been updated to reflect these changes. / 所有相关文档（API 文档、文档字符串、示例教程）已更新以反映这些更改。 - [ ] Please ensure code files contain no Chinese comments. / 请保证代码文件中不含中文注释。 ------ See merge request: Ascend/msmodeling!243	1 个月前
model-adaptation	Add model adapter onboarding automation Co-authored-by: jhon-117<fangkai15@huawei.com> # message auto-generated for no-merge-commit merge: !282 merge codex/model-adaptation-efficiency-v2 into develop Add model adapter onboarding automation Created-by: jhon-117 Commit-by: jhon-117 Merged-by: ascend-robot Description: # PR Template Thanks for your contribution; we appreciate it a lot. The following instructions will make your pull request healthier and help you get feedback more easily. If you do not understand some items, don't worry, just make the pull request and seek help from maintainers. 感谢您的贡献，我们非常重视。以下说明将使您的拉取请求更健康，更易于获得反馈。如果您不理解某些项目，请不要担心，只需提交拉取请求并从维护人员那里寻求帮助即可。 PR Type / PR类型 - [x] Feature（功能新增） - [ ] Bugfix（Bug 修复） - [ ] Docs（文档更新） - [ ] CI/CD（持续集成/持续部署） - [ ] Refactor（代码重构） - [ ] Perf（性能优化） - [ ] Test-Cases（测试用例更新） - [ ] Other（其他） ## 🔍 Motivation / 变更动机 Please describe the motivation of this PR and the goal you want to achieve through this PR. 请描述您的拉取请求的动机和您希望通过此拉取请求实现的目标。 ------ ## 📝 Modification / 修改内容本 PR 实现 TensorCast 新模型接入效率提升流程，围绕“用户只必须提供 raw Insight profiling 导出文件 + 对应仿真命令”的适配方式，补齐 doctor、evidence、patch discovery、profile draft、ST case 生成和 qwen3-vl replay 验证能力。主要改动：新增 tensor_cast.adapter 自动化模块：仿真命令解析与 AdaptationContext raw MindStudio Insight profiling 解析用户 hints 读取、冲突检测和 provenance profile candidate 生成与 review/validation evidence draft 生成与 verifier mismatch 分类 PatchReport、patch discovery、profile draft 渲染 ST guardrail case 生成新增 CLI： python -m cli.inference.model_doctor python -m cli.inference.verify_model_profile model_doctor 支持： --from-command-file --raw-insight-file --hints-file --patch-failure-file --ignore-existing-profile --profile-draft-output 增强 qwen3-vl replay：新增 tiny config-only fixture：tests/assets/model_config/qwen3_vl_tiny/config.json 支持在 --ignore-existing-profile qwen3_vl 下通过 installed transformers 源码发现 VL profile 字段 patch discovery 可基于 qwen3-vl placeholder/mask meta failure 生成 patch_method_for_qwen3_vl 草案新增文档： docs/design/model_adaptation_efficiency_design.md docs/en/tensor_cast_new_model_adaptation.md 增强 runtime/transformations：暴露 runtime summary 所需信息记录 patch reports 支持 profile registry replay/audit ignore ------ ## 📐 Associated Test Results / 关联测试结果 pytest tests/test_tensor_cast/test_adapter_automation.py -q # 29 passed pytest tests/test_tensor_cast -k "adapter or doctor or evidence" -q # 29 passed python -m compileall -q tensor_cast/adapter cli/inference/model_doctor.py cli/inference/verify_model_profile.py cli/inference/adapter_cli.py tests/test_tensor_cast/test_adapter_automation.py # passed python -m cli.inference.model_doctor --help # passed python -m cli.inference.verify_model_profile --help # passed 额外 smoke： qwen3-vl tiny CLI replay smoke：通过 qwen3-vl patch code draft CLI smoke：通过 deepseek fixture doctor/replay smoke：通过，仅出现 fixture 自带 rope 参数 warning，不影响结果。 ------ ## 🌟 Use cases (Optional) / 使用案例（可选） If this PR introduces a new feature, it is better to list some use cases here and update the documentation. 如果此拉取请求引入了新功能，最好在此处列出一些用例并更新文档。 ------ ## ✅ Checklist / 检查列表 Before PR: - [ ] Bug fixes are fully covered by unit tests, the case that causes the bug should be added in the unit tests. / 修复的 Bug 已完全由单元测试覆盖，导致 Bug 的情况应在单元测试中添加。 - [ ] The modification is covered by complete unit tests. If not, please add more unit tests to ensure the correctness. / 此拉取请求中的修改已完全由单元测试覆盖。如果不是，请添加更多单元测试以确保正确性。 - [ ] All relevant documentation (API docs, docstrings, example tutorials) has been updated to reflect these changes. / 所有相关文档（API 文档、文档字符串、示例教程）已更新以反映这些更改。 - [ ] Please ensure code files contain no Chinese comments. / 请保证代码文件中不含中文注释。 ------ See merge request: Ascend/msmodeling!282	29 天前
msmodeling-env-installer	feat(skills): add msmodeling-env-installer Co-authored-by: lutean<lutean1@huawei.com> # message auto-generated for no-merge-commit merge: !279 merge develop into develop feat(skills): add msmodeling-env-installer Created-by: lutean Commit-by: lutean Merged-by: ascend-robot Description: # PR Template Thanks for your contribution; we appreciate it a lot. The following instructions will make your pull request healthier and help you get feedback more easily. If you do not understand some items, don't worry, just make the pull request and seek help from maintainers. 感谢您的贡献，我们非常重视。以下说明将使您的拉取请求更健康，更易于获得反馈。如果您不理解某些项目，请不要担心，只需提交拉取请求并从维护人员那里寻求帮助即可。 PR Type / PR类型 - [x] Feature（功能新增） - [ ] Bugfix（Bug 修复） - [ ] Docs（文档更新） - [ ] CI/CD（持续集成/持续部署） - [ ] Refactor（代码重构） - [ ] Perf（性能优化） - [ ] Test-Cases（测试用例更新） - [ ] Other（其他） ## 🔍 Motivation / 变更动机 Please describe the motivation of this PR and the goal you want to achieve through this PR. 请描述您的拉取请求的动机和您希望通过此拉取请求实现的目标。 msmodeling 项目的环境安装涉及 Python 版本、虚拟环境创建、依赖安装、`PYTHONPATH` 配置以及可选的 Hugging Face 镜像源设置等步骤。当前这些操作主要依赖人工阅读文档并手动执行，容易因平台差异、命令遗漏或环境变量配置不一致导致安装失败，增加新贡献者和 AI Agent 进入项目的成本。本 PR 引入 `msmodeling-env-installer` skill，沉淀标准化的环境安装流程，并提供 Windows 自动化安装脚本，使 Agent 能够在明确规则下完成依赖安装、环境校验和问题定位。该能力可以提升首次环境搭建的一致性，减少重复沟通和手工排障成本，为后续 TensorCast、ServingCast 相关开发和测试提供更稳定的基础环境。 ------ ## 📝 Modification / 修改内容 Please briefly describe what modification is made in this PR. 请简要描述此拉取请求中进行的修改。 \| 文件 \| 说明 \| \|:---\|:---\| \| `.agents\skills\msmodeling-env-installer\SKILL.md` \| Skill 主说明、触发条件和执行规则 \| \| `.agents\skills\msmodeling-env-installer\scripts\install-current-project-deps.ps1` \| Windows PowerShell 自动化安装脚本 \| \| `.agents\skills\msmodeling-env-installer\scripts\install-current-project-deps.sh` \| Linux/macOS/WSL/Git Bash 自动化安装脚本 \| \| `.agents\skills\README.md` \| skills 索引与 quick start \| \| `docs/RFC/rfc_msmodeling_env_installer_skill_zh.md` \| 本 RFC 文档 \| ------ ## 📐 Associated Test Results / 关联测试结果 Please provide the related test results, such as test reports, etc. 请提供相关测试结果，例如测试报告等。 ------ ## 🌟 Use cases (Optional) / 使用案例（可选） If this PR introduces a new feature, it is better to list some use cases here and update the documentation. 如果此拉取请求引入了新功能，最好在此处列出一些用例并更新文档。 ------ ## ✅ Checklist / 检查列表 Before PR: - [ ] Bug fixes are fully covered by unit tests, the case that causes the bug should be added in the unit tests. / 修复的 Bug 已完全由单元测试覆盖，导致 Bug 的情况应在单元测试中添加。 - [ ] The modification is covered by complete unit tests. If not, please add more unit tests to ensure the correctness. / 此拉取请求中的修改已完全由单元测试覆盖。如果不是，请添加更多单元测试以确保正确性。 - [ ] All relevant documentation (API docs, docstrings, example tutorials) has been updated to reflect these changes. / 所有相关文档（API 文档、文档字符串、示例教程）已更新以反映这些更改。 - [ ] Please ensure code files contain no Chinese comments. / 请保证代码文件中不含中文注释。 ------ See merge request: Ascend/msmodeling!279	29 天前
op-mapping	Update profiling op mapping skill docs Co-authored-by: Secluded_Ocean<tangchuxiao0709@qq.com> # message auto-generated for no-merge-commit merge: !212 merge pr/glm5-op-mapping-skill-docs into develop Update profiling op mapping skill docs Created-by: Secluded_Ocean Commit-by: Secluded_Ocean Merged-by: ascend-robot Description: PR Type / PR类型 - [ ] Feature（功能新增） - [ ] Bugfix（Bug 修复） - [x] Docs（文档更新） - [ ] CI/CD（持续集成/持续部署） - [ ] Refactor（代码重构） - [ ] Perf（性能优化） - [ ] Test-Cases（测试用例更新） - [ ] Other（其他） ## 🔍 Motivation / 变更动机 This PR updates the profiling database op-mapping skill documentation. During GLM5 profiling database expansion, several recurring issues were identified: - Some TensorCast operators do not map to profiling CSV rows by direct tensor-shape matching. - Semantic operators such as grouped MoE and LightningIndexer require explicit query-mode handling. - Generated placeholder rows with empty or zero latency must not be treated as valid profiling data. - Future op-mapping work needs clearer worker/verifier instructions to avoid incorrect mappings. The goal of this PR is to document these lessons in the op-mapping skill so future profiling database updates follow a clearer and safer workflow. ------ ## 📝 Modification / 修改内容 This PR updates the op-mapping skill documents: - `docs/perf_database/skills/op-mapping/SKILL.md` - `docs/perf_database/skills/op-mapping/single-op-worker-prompt.md` - `docs/perf_database/skills/op-mapping/verifier-prompt.md` Main changes: - Clarify when an operator needs a dedicated `query_mode`. - Clarify that placeholder latency rows should not be used as measured profiling data. - Strengthen the worker instructions for checking TensorCast op semantics, NPU kernel names, CSV shapes, and replay feasibility. - Strengthen the verifier instructions for reviewing operator mapping quality and shape matching assumptions. ------ ## 📐 Associated Test Results / 关联测试结果 This PR only updates documentation/prompt files. No runtime test is required. Manual check: `text Reviewed the updated skill and prompt files for profiling database op-mapping workflow consistency.` ------ ## 🌟 Use cases (Optional) / 使用案例（可选） Future profiling database contributors can use this skill to: - Add or verify op mappings for new models. - Decide whether a default compute lookup is enough or whether a dedicated query mode is required. - Avoid treating shape-generated placeholder rows as real latency data. - Review replay feasibility before adding generated CSV shapes. ------ ## ✅ Checklist / 检查列表 Before PR: - [ ] Bug fixes are fully covered by unit tests, the case that causes the bug should be added in the unit tests. / 修复的 Bug 已完全由单元测试覆盖，导致 Bug 的情况应在单元测试中添加。 - [ ] The modification is covered by complete unit tests. If not, please add more unit tests to ensure the correctness. / 此拉取请求中的修改已完全由单元测试覆盖。如果不是，请添加更多单元测试以确保正确性。 - [x] All relevant documentation (API docs, docstrings, example tutorials) has been updated to reflect these changes. / 所有相关文档（API 文档、文档字符串、示例教程）已更新以反映这些更改。 - [x] Please ensure code files contain no Chinese comments. / 请保证代码文件中不含中文注释。 See merge request: Ascend/msmodeling!212	28 天前
optix-config	优化 msModeling README 与中英文文档结构 Co-authored-by: eveyin1<qianyin2022@hotmail.com> # message auto-generated for no-merge-commit merge: !329 merge doc_fix1 into develop 优化 msModeling README 与中英文文档结构 Created-by: eveyin1 Commit-by: eveyin1 Merged-by: ascend-robot Description: # PR Template Thanks for your contribution; we appreciate it a lot. The following instructions will make your pull request healthier and help you get feedback more easily. If you do not understand some items, don't worry, just make the pull request and seek help from maintainers. 感谢您的贡献，我们非常重视。以下说明将使您的拉取请求更健康，更易于获得反馈。如果您不理解某些项目，请不要担心，只需提交拉取请求并从维护人员那里寻求帮助即可。 PR Type / PR类型 - [ ] Feature（功能新增） - [ ] Bugfix（Bug 修复） - [x] Docs（文档更新） - [ ] CI/CD（持续集成/持续部署） - [ ] Refactor（代码重构） - [ ] Perf（性能优化） - [ ] Test-Cases（测试用例更新） - [ ] Other（其他） ## 🔍 Motivation / 变更动机 Please describe the motivation of this PR and the goal you want to achieve through this PR. 请描述您的拉取请求的动机和您希望通过此拉取请求实现的目标。 ------ ## 📝 Modification / 修改内容 Please briefly describe what modification is made in this PR. 请简要描述此拉取请求中进行的修改。 1 readme参考最新模板内容整改 2 安装快速入门内容补齐 3 docs 英文文档整理 4 文档易用性评分 5 通过docs-tool aidd检查 ------ ## 📐 Associated Test Results / 关联测试结果 Please provide the related test results, such as test reports, etc. 请提供相关测试结果，例如测试报告等。 ![20260611-101033.jpg](https://raw.gitcode.com/user-images/assets/8428112/db108d98-9c1e-4eec-9508-2aacccb1d507/20260611-101033.jpg '20260611-101033.jpg') ------ ## 🌟 Use cases (Optional) / 使用案例（可选） If this PR introduces a new feature, it is better to list some use cases here and update the documentation. 如果此拉取请求引入了新功能，最好在此处列出一些用例并更新文档。 ------ ## ✅ Checklist / 检查列表 Before PR: - [ ] Bug fixes are fully covered by unit tests, the case that causes the bug should be added in the unit tests. / 修复的 Bug 已完全由单元测试覆盖，导致 Bug 的情况应在单元测试中添加。 - [ ] The modification is covered by complete unit tests. If not, please add more unit tests to ensure the correctness. / 此拉取请求中的修改已完全由单元测试覆盖。如果不是，请添加更多单元测试以确保正确性。 - [ ] All relevant documentation (API docs, docstrings, example tutorials) has been updated to reflect these changes. / 所有相关文档（API 文档、文档字符串、示例教程）已更新以反映这些更改。 - [ ] Please ensure code files contain no Chinese comments. / 请保证代码文件中不含中文注释。 ------ See merge request: Ascend/msmodeling!329	22 天前
optix-deploy	[feat]添加服务化寻优部署&参数推荐 skill Co-authored-by: wendellX<wenxiaoping6@huawei.com> # message auto-generated for no-merge-commit merge: !289 merge skill_param_optimizer into develop [feat]添加服务化寻优部署&参数推荐 skill Created-by: wendellX Commit-by: wendellX Merged-by: ascend-robot Description: # PR Template Thanks for your contribution; we appreciate it a lot. The following instructions will make your pull request healthier and help you get feedback more easily. If you do not understand some items, don't worry, just make the pull request and seek help from maintainers. 感谢您的贡献，我们非常重视。以下说明将使您的拉取请求更健康，更易于获得反馈。如果您不理解某些项目，请不要担心，只需提交拉取请求并从维护人员那里寻求帮助即可。 PR Type / PR类型 - [x] Feature（功能新增） - [ ] Bugfix（Bug 修复） - [ ] Docs（文档更新） - [ ] CI/CD（持续集成/持续部署） - [ ] Refactor（代码重构） - [ ] Perf（性能优化） - [ ] Test-Cases（测试用例更新） - [ ] Other（其他） ## 🔍 Motivation / 变更动机 [feat]添加服务化寻优部署&参数推荐 skill ------ ## 📝 Modification / 修改内容 [feat]添加服务化寻优部署&参数推荐 skill ------ ## 📐 Associated Test Results / 关联测试结果 ### 1 部署： ![image.png](https://raw.gitcode.com/user-images/assets/8428112/cd164f53-ed16-43cf-94e1-a30c5b7091e1/image.png 'image.png') ![image.png](https://raw.gitcode.com/user-images/assets/8428112/5ada2b45-662b-4763-842a-185cb232afdf/image.png 'image.png') ### 2 参数推荐： ![image.png](https://raw.gitcode.com/user-images/assets/8428112/1dceb238-27a5-472a-9103-a8491abd4f2a/image.png 'image.png') ![image.png](https://raw.gitcode.com/user-images/assets/8428112/baaded7f-db52-45ff-b2f0-bf9e6abcd873/image.png 'image.png') ![image.png](https://raw.gitcode.com/user-images/assets/8428112/f556c71d-1914-46ab-84b5-1e3194cdd321/image.png 'image.png') ![image.png](https://raw.gitcode.com/user-images/assets/8428112/5d34821a-4020-4995-ae50-4e664feac129/image.png 'image.png') ![image.png](https://raw.gitcode.com/user-images/assets/8428112/06c6ffde-f332-434b-8826-cc7fe2563f45/image.png 'image.png') ### 3 参数写入 config.toml ![image.png](https://raw.gitcode.com/user-images/assets/8428112/6c24e5fc-28e7-495f-b638-0fbac628a505/image.png 'image.png') ![image.png](https://raw.gitcode.com/user-images/assets/8428112/40b308f3-949a-42b4-80e4-0c685d30401c/image.png 'image.png') ![image.png](https://raw.gitcode.com/user-images/assets/8428112/46632b05-941b-4e4d-b163-a1ff769c90e3/image.png 'image.png') #### 3.1 在 config.toml 中校验 ![image.png](https://raw.gitcode.com/user-images/assets/8428112/3aa3f7bf-29da-401e-94d3-dbd6d501290a/image.png 'image.png') #### 3.2 运行： ![image.png](https://raw.gitcode.com/user-images/assets/8428112/365a86c6-ab71-442e-9f2a-8ef2c8dd9bc0/image.png 'image.png') ------ ## 🌟 Use cases (Optional) / 使用案例（可选） If this PR introduces a new feature, it is better to list some use cases here and update the documentation. 如果此拉取请求引入了新功能，最好在此处列出一些用例并更新文档。 ------ ## ✅ Checklist / 检查列表 Before PR: - [ ] Bug fixes are fully covered by unit tests, the case that causes the bug should be added in the unit tests. / 修复的 Bug 已完全由单元测试覆盖，导致 Bug 的情况应在单元测试中添加。 - [ ] The modification is covered by complete unit tests. If not, please add more unit tests to ensure the correctness. / 此拉取请求中的修改已完全由单元测试覆盖。如果不是，请添加更多单元测试以确保正确性。 - [ ] All relevant documentation (API docs, docstrings, example tutorials) has been updated to reflect these changes. / 所有相关文档（API 文档、文档字符串、示例教程）已更新以反映这些更改。 - [ ] Please ensure code files contain no Chinese comments. / 请保证代码文件中不含中文注释。 ------ See merge request: Ascend/msmodeling!289	29 天前
optix-param-recommend	[feat]添加服务化寻优部署&参数推荐 skill Co-authored-by: wendellX<wenxiaoping6@huawei.com> # message auto-generated for no-merge-commit merge: !289 merge skill_param_optimizer into develop [feat]添加服务化寻优部署&参数推荐 skill Created-by: wendellX Commit-by: wendellX Merged-by: ascend-robot Description: # PR Template Thanks for your contribution; we appreciate it a lot. The following instructions will make your pull request healthier and help you get feedback more easily. If you do not understand some items, don't worry, just make the pull request and seek help from maintainers. 感谢您的贡献，我们非常重视。以下说明将使您的拉取请求更健康，更易于获得反馈。如果您不理解某些项目，请不要担心，只需提交拉取请求并从维护人员那里寻求帮助即可。 PR Type / PR类型 - [x] Feature（功能新增） - [ ] Bugfix（Bug 修复） - [ ] Docs（文档更新） - [ ] CI/CD（持续集成/持续部署） - [ ] Refactor（代码重构） - [ ] Perf（性能优化） - [ ] Test-Cases（测试用例更新） - [ ] Other（其他） ## 🔍 Motivation / 变更动机 [feat]添加服务化寻优部署&参数推荐 skill ------ ## 📝 Modification / 修改内容 [feat]添加服务化寻优部署&参数推荐 skill ------ ## 📐 Associated Test Results / 关联测试结果 ### 1 部署： ![image.png](https://raw.gitcode.com/user-images/assets/8428112/cd164f53-ed16-43cf-94e1-a30c5b7091e1/image.png 'image.png') ![image.png](https://raw.gitcode.com/user-images/assets/8428112/5ada2b45-662b-4763-842a-185cb232afdf/image.png 'image.png') ### 2 参数推荐： ![image.png](https://raw.gitcode.com/user-images/assets/8428112/1dceb238-27a5-472a-9103-a8491abd4f2a/image.png 'image.png') ![image.png](https://raw.gitcode.com/user-images/assets/8428112/baaded7f-db52-45ff-b2f0-bf9e6abcd873/image.png 'image.png') ![image.png](https://raw.gitcode.com/user-images/assets/8428112/f556c71d-1914-46ab-84b5-1e3194cdd321/image.png 'image.png') ![image.png](https://raw.gitcode.com/user-images/assets/8428112/5d34821a-4020-4995-ae50-4e664feac129/image.png 'image.png') ![image.png](https://raw.gitcode.com/user-images/assets/8428112/06c6ffde-f332-434b-8826-cc7fe2563f45/image.png 'image.png') ### 3 参数写入 config.toml ![image.png](https://raw.gitcode.com/user-images/assets/8428112/6c24e5fc-28e7-495f-b638-0fbac628a505/image.png 'image.png') ![image.png](https://raw.gitcode.com/user-images/assets/8428112/40b308f3-949a-42b4-80e4-0c685d30401c/image.png 'image.png') ![image.png](https://raw.gitcode.com/user-images/assets/8428112/46632b05-941b-4e4d-b163-a1ff769c90e3/image.png 'image.png') #### 3.1 在 config.toml 中校验 ![image.png](https://raw.gitcode.com/user-images/assets/8428112/3aa3f7bf-29da-401e-94d3-dbd6d501290a/image.png 'image.png') #### 3.2 运行： ![image.png](https://raw.gitcode.com/user-images/assets/8428112/365a86c6-ab71-442e-9f2a-8ef2c8dd9bc0/image.png 'image.png') ------ ## 🌟 Use cases (Optional) / 使用案例（可选） If this PR introduces a new feature, it is better to list some use cases here and update the documentation. 如果此拉取请求引入了新功能，最好在此处列出一些用例并更新文档。 ------ ## ✅ Checklist / 检查列表 Before PR: - [ ] Bug fixes are fully covered by unit tests, the case that causes the bug should be added in the unit tests. / 修复的 Bug 已完全由单元测试覆盖，导致 Bug 的情况应在单元测试中添加。 - [ ] The modification is covered by complete unit tests. If not, please add more unit tests to ensure the correctness. / 此拉取请求中的修改已完全由单元测试覆盖。如果不是，请添加更多单元测试以确保正确性。 - [ ] All relevant documentation (API docs, docstrings, example tutorials) has been updated to reflect these changes. / 所有相关文档（API 文档、文档字符串、示例教程）已更新以反映这些更改。 - [ ] Please ensure code files contain no Chinese comments. / 请保证代码文件中不含中文注释。 ------ See merge request: Ascend/msmodeling!289	29 天前
text-generate-executor	feat(skills): add text generate and throughput optimizer executors Co-authored-by: lutean<lutean1@huawei.com> # message auto-generated for no-merge-commit merge: !278 merge develop into develop feat(skills): add text generate and throughput optimizer executors Created-by: lutean Commit-by: lutean Merged-by: ascend-robot Description: # PR Template Thanks for your contribution; we appreciate it a lot. The following instructions will make your pull request healthier and help you get feedback more easily. If you do not understand some items, don't worry, just make the pull request and seek help from maintainers. 感谢您的贡献，我们非常重视。以下说明将使您的拉取请求更健康，更易于获得反馈。如果您不理解某些项目，请不要担心，只需提交拉取请求并从维护人员那里寻求帮助即可。 PR Type / PR类型 - [x] Feature（功能新增） - [ ] Bugfix（Bug 修复） - [ ] Docs（文档更新） - [ ] CI/CD（持续集成/持续部署） - [ ] Refactor（代码重构） - [ ] Perf（性能优化） - [ ] Test-Cases（测试用例更新） - [ ] Other（其他） ## 🔍 Motivation / 变更动机 Please describe the motivation of this PR and the goal you want to achieve through this PR. 请描述您的拉取请求的动机和您希望通过此拉取请求实现的目标。当前 `text_generate` 和 `throughput_optimizer` CLI 的参数组合较多，涉及模型、硬件 profile、并行策略、量化、PD 分离、SLO 约束等多类配置。用户在进行性能验证或吞吐规划时，容易遗漏关键参数、混淆部署模式，或无法快速从原始输出中提取可对比的结论。本 PR 新增 `text-generate-executor` 和 `throughput-optimizer-executor` 两个 skills，用于将自然语言性能验证/吞吐规划需求转换为可确认、可执行的 CLI 命令，并在执行后整理关键指标与结果摘要。这样可以降低常见建模任务的使用门槛，提高参数收集、命令构造和结果解读的一致性。 ------ ## 📝 Modification / 修改内容 Please briefly describe what modification is made in this PR. 请简要描述此拉取请求中进行的修改。 \| 文件 \| 说明 \| \|:---\|:---\| \| `skills/text-generate-executor/SKILL.md` \| Skill 主说明和执行规则 \| \| `skills/text-generate-executor/references/dialog-flow.md` \| 渐进式问参流程 \| \| `skills/text-generate-executor/references/text-generate-params.md` \| text_generate 参数速查 \| \| `skills/throughput-optimizer-executor/SKILL.md` \| Skill 主说明和执行规则 \| \| `skills/throughput-optimizer-executor/references/dialog-flow.md` \| 问参流程和模式分支 \| \| `skills/throughput-optimizer-executor/references/throughput-optimizer-params.md` \| 参数说明和默认规则 \| ------ ## 📐 Associated Test Results / 关联测试结果 Please provide the related test results, such as test reports, etc. 请提供相关测试结果，例如测试报告等。 ------ ## 🌟 Use cases (Optional) / 使用案例（可选） If this PR introduces a new feature, it is better to list some use cases here and update the documentation. 如果此拉取请求引入了新功能，最好在此处列出一些用例并更新文档。 ------ ## ✅ Checklist / 检查列表 Before PR: - [ ] Bug fixes are fully covered by unit tests, the case that causes the bug should be added in the unit tests. / 修复的 Bug 已完全由单元测试覆盖，导致 Bug 的情况应在单元测试中添加。 - [ ] The modification is covered by complete unit tests. If not, please add more unit tests to ensure the correctness. / 此拉取请求中的修改已完全由单元测试覆盖。如果不是，请添加更多单元测试以确保正确性。 - [ ] All relevant documentation (API docs, docstrings, example tutorials) has been updated to reflect these changes. / 所有相关文档（API 文档、文档字符串、示例教程）已更新以反映这些更改。 - [ ] Please ensure code files contain no Chinese comments. / 请保证代码文件中不含中文注释。 ------ See merge request: Ascend/msmodeling!278	29 天前
throughput-optimizer-executor	feat(skills): add text generate and throughput optimizer executors Co-authored-by: lutean<lutean1@huawei.com> # message auto-generated for no-merge-commit merge: !278 merge develop into develop feat(skills): add text generate and throughput optimizer executors Created-by: lutean Commit-by: lutean Merged-by: ascend-robot Description: # PR Template Thanks for your contribution; we appreciate it a lot. The following instructions will make your pull request healthier and help you get feedback more easily. If you do not understand some items, don't worry, just make the pull request and seek help from maintainers. 感谢您的贡献，我们非常重视。以下说明将使您的拉取请求更健康，更易于获得反馈。如果您不理解某些项目，请不要担心，只需提交拉取请求并从维护人员那里寻求帮助即可。 PR Type / PR类型 - [x] Feature（功能新增） - [ ] Bugfix（Bug 修复） - [ ] Docs（文档更新） - [ ] CI/CD（持续集成/持续部署） - [ ] Refactor（代码重构） - [ ] Perf（性能优化） - [ ] Test-Cases（测试用例更新） - [ ] Other（其他） ## 🔍 Motivation / 变更动机 Please describe the motivation of this PR and the goal you want to achieve through this PR. 请描述您的拉取请求的动机和您希望通过此拉取请求实现的目标。当前 `text_generate` 和 `throughput_optimizer` CLI 的参数组合较多，涉及模型、硬件 profile、并行策略、量化、PD 分离、SLO 约束等多类配置。用户在进行性能验证或吞吐规划时，容易遗漏关键参数、混淆部署模式，或无法快速从原始输出中提取可对比的结论。本 PR 新增 `text-generate-executor` 和 `throughput-optimizer-executor` 两个 skills，用于将自然语言性能验证/吞吐规划需求转换为可确认、可执行的 CLI 命令，并在执行后整理关键指标与结果摘要。这样可以降低常见建模任务的使用门槛，提高参数收集、命令构造和结果解读的一致性。 ------ ## 📝 Modification / 修改内容 Please briefly describe what modification is made in this PR. 请简要描述此拉取请求中进行的修改。 \| 文件 \| 说明 \| \|:---\|:---\| \| `skills/text-generate-executor/SKILL.md` \| Skill 主说明和执行规则 \| \| `skills/text-generate-executor/references/dialog-flow.md` \| 渐进式问参流程 \| \| `skills/text-generate-executor/references/text-generate-params.md` \| text_generate 参数速查 \| \| `skills/throughput-optimizer-executor/SKILL.md` \| Skill 主说明和执行规则 \| \| `skills/throughput-optimizer-executor/references/dialog-flow.md` \| 问参流程和模式分支 \| \| `skills/throughput-optimizer-executor/references/throughput-optimizer-params.md` \| 参数说明和默认规则 \| ------ ## 📐 Associated Test Results / 关联测试结果 Please provide the related test results, such as test reports, etc. 请提供相关测试结果，例如测试报告等。 ------ ## 🌟 Use cases (Optional) / 使用案例（可选） If this PR introduces a new feature, it is better to list some use cases here and update the documentation. 如果此拉取请求引入了新功能，最好在此处列出一些用例并更新文档。 ------ ## ✅ Checklist / 检查列表 Before PR: - [ ] Bug fixes are fully covered by unit tests, the case that causes the bug should be added in the unit tests. / 修复的 Bug 已完全由单元测试覆盖，导致 Bug 的情况应在单元测试中添加。 - [ ] The modification is covered by complete unit tests. If not, please add more unit tests to ensure the correctness. / 此拉取请求中的修改已完全由单元测试覆盖。如果不是，请添加更多单元测试以确保正确性。 - [ ] All relevant documentation (API docs, docstrings, example tutorials) has been updated to reflect these changes. / 所有相关文档（API 文档、文档字符串、示例教程）已更新以反映这些更改。 - [ ] Please ensure code files contain no Chinese comments. / 请保证代码文件中不含中文注释。 ------ See merge request: Ascend/msmodeling!278	29 天前
README.md	feat(skills): add text generate and throughput optimizer executors Co-authored-by: lutean<lutean1@huawei.com> # message auto-generated for no-merge-commit merge: !278 merge develop into develop feat(skills): add text generate and throughput optimizer executors Created-by: lutean Commit-by: lutean Merged-by: ascend-robot Description: # PR Template Thanks for your contribution; we appreciate it a lot. The following instructions will make your pull request healthier and help you get feedback more easily. If you do not understand some items, don't worry, just make the pull request and seek help from maintainers. 感谢您的贡献，我们非常重视。以下说明将使您的拉取请求更健康，更易于获得反馈。如果您不理解某些项目，请不要担心，只需提交拉取请求并从维护人员那里寻求帮助即可。 PR Type / PR类型 - [x] Feature（功能新增） - [ ] Bugfix（Bug 修复） - [ ] Docs（文档更新） - [ ] CI/CD（持续集成/持续部署） - [ ] Refactor（代码重构） - [ ] Perf（性能优化） - [ ] Test-Cases（测试用例更新） - [ ] Other（其他） ## 🔍 Motivation / 变更动机 Please describe the motivation of this PR and the goal you want to achieve through this PR. 请描述您的拉取请求的动机和您希望通过此拉取请求实现的目标。当前 `text_generate` 和 `throughput_optimizer` CLI 的参数组合较多，涉及模型、硬件 profile、并行策略、量化、PD 分离、SLO 约束等多类配置。用户在进行性能验证或吞吐规划时，容易遗漏关键参数、混淆部署模式，或无法快速从原始输出中提取可对比的结论。本 PR 新增 `text-generate-executor` 和 `throughput-optimizer-executor` 两个 skills，用于将自然语言性能验证/吞吐规划需求转换为可确认、可执行的 CLI 命令，并在执行后整理关键指标与结果摘要。这样可以降低常见建模任务的使用门槛，提高参数收集、命令构造和结果解读的一致性。 ------ ## 📝 Modification / 修改内容 Please briefly describe what modification is made in this PR. 请简要描述此拉取请求中进行的修改。 \| 文件 \| 说明 \| \|:---\|:---\| \| `skills/text-generate-executor/SKILL.md` \| Skill 主说明和执行规则 \| \| `skills/text-generate-executor/references/dialog-flow.md` \| 渐进式问参流程 \| \| `skills/text-generate-executor/references/text-generate-params.md` \| text_generate 参数速查 \| \| `skills/throughput-optimizer-executor/SKILL.md` \| Skill 主说明和执行规则 \| \| `skills/throughput-optimizer-executor/references/dialog-flow.md` \| 问参流程和模式分支 \| \| `skills/throughput-optimizer-executor/references/throughput-optimizer-params.md` \| 参数说明和默认规则 \| ------ ## 📐 Associated Test Results / 关联测试结果 Please provide the related test results, such as test reports, etc. 请提供相关测试结果，例如测试报告等。 ------ ## 🌟 Use cases (Optional) / 使用案例（可选） If this PR introduces a new feature, it is better to list some use cases here and update the documentation. 如果此拉取请求引入了新功能，最好在此处列出一些用例并更新文档。 ------ ## ✅ Checklist / 检查列表 Before PR: - [ ] Bug fixes are fully covered by unit tests, the case that causes the bug should be added in the unit tests. / 修复的 Bug 已完全由单元测试覆盖，导致 Bug 的情况应在单元测试中添加。 - [ ] The modification is covered by complete unit tests. If not, please add more unit tests to ensure the correctness. / 此拉取请求中的修改已完全由单元测试覆盖。如果不是，请添加更多单元测试以确保正确性。 - [ ] All relevant documentation (API docs, docstrings, example tutorials) has been updated to reflect these changes. / 所有相关文档（API 文档、文档字符串、示例教程）已更新以反映这些更改。 - [ ] Please ensure code files contain no Chinese comments. / 请保证代码文件中不含中文注释。 ------ See merge request: Ascend/msmodeling!278	29 天前

msmodeling skills

本目录存放 msmodeling 项目专用的 Claude Code skills，用于把常见性能建模、设备建模和 profiling 辅助任务沉淀为可复用的执行流程。

使用提示：如需在 Claude Code 中启用这些 skills，请将本目录 .agents/skills 完整复制到 .claude/skills。

AI agents 必读：请先阅读项目根目录的 AGENTS.md，了解项目规范和 Skill 体系。

msmodeling skills
msmodeling-env-installer
model-adaptation
device_config
op_mapping
microbench
text-generate-executor
throughput-optimizer-executor

msmodeling-env-installer

msmodeling 环境安装器——将“安装 msmodeling 环境依赖”“创建 myenv”“安装当前仓库 requirements.txt”“配置 PYTHONPATH / HF_ENDPOINT”等明确指向 msmodeling 的请求转换为可执行、可验证、可回溯的环境安装流程。用户只说“安装环境”或“安装依赖”时，需要先确认是否安装 msmodeling 当前仓库的环境依赖。

What it does

引导 AI agent 按 RFC 中定义的流程完成开发环境初始化：

仓库根目录校验：确认当前目录包含 README.md 和 requirements.txt。
Python 与 uv 检查：要求 Python 3.10+，缺少 uv 时按镜像安装并解析真实可执行路径。
安装路径选择：默认用 uv 新建 myenv；已有环境 fallback 前检查 torch_npu、torch-npu 和 cudatoolkit。
依赖安装与验证：安装 requirements.txt 后执行 uv pip check --python <venv-python> 或 python -m pip check。
环境变量配置：按需设置当前会话 PYTHONPATH 和 HF_ENDPOINT=https://hf-mirror.com。

File layout

File	Purpose
`msmodeling-env-installer/SKILL.md`	Skill 定义、触发场景、安装流程和安全规则
`msmodeling-env-installer/scripts/install-current-project-deps.ps1`	Windows PowerShell 自动化安装脚本
`msmodeling-env-installer/scripts/install-current-project-deps.sh`	Linux/macOS/WSL/Git Bash 自动化安装脚本

Quick start

在对话中直接提出明确需求，例如“请帮我安装 msmodeling 环境依赖”“按 README 配置 msmodeling 环境”。如果只说“安装环境”，agent 需要先确认是否安装 msmodeling 当前仓库的环境依赖。

Windows PowerShell 可以从仓库根目录直接运行：

.\.agents\skills\msmodeling-env-installer\scripts\install-current-project-deps.ps1

Linux/macOS/WSL/Git Bash 可以从仓库根目录直接运行：

bash ./.agents/skills/msmodeling-env-installer/scripts/install-current-project-deps.sh

Key constraints

不修改 requirements.txt、README 或项目源码。
不默认覆盖已有 myenv，也不默认持久化系统级环境变量。
网络安装需要用户确认和工具权限授权。
scripts/install-current-project-deps.ps1 当前仅适用于 Windows PowerShell；Linux/macOS 使用 README 通用命令。

model-adaptation

TensorCast 新模型接入流程 skill——从仿真命令和 MindStudio Insight raw profiling 出发，引导 agent 运行 model_adapter doctor、审阅 ModelProfile、处理 patch/bug AI task、导出 evidence.yaml 并运行 verify。

What it does

将新模型接入拆成确定性工具和人工 checkpoint：

收集两个必需输入：仿真命令和匹配的 raw profiling。
运行 doctor，审阅 candidate profile、evidence draft、human questions 和 ai tasks。
对需要人工确认的字段生成精确问题，并把确认结果写入 hints.yaml 或 evidence.yaml。
对 patch/bug 场景使用 ai_tasks[].prompt_text 驱动用户或用户的 AI 助手生成代码，并要求人工 review。
使用 export-evidence 导出 evidence.yaml，再运行 verify。

File layout

File	Purpose
`model-adaptation/SKILL.md`	新模型接入的核心工作流、人工 checkpoint 和验证要求

Quick start

当用户说“接入新模型”“生成 ModelProfile”“根据 doctor report 继续适配”“处理 patch AI task”“从 doctor report 导出 evidence”时使用该 skill。

Key constraints

不凭模型名猜 profile 字段。
doctor 不生成模型专属 patch 代码，只生成 AI task 和 prompt。
evidence.yaml 从 doctor_after_profile.json.evidence_draft 导出后再人工审阅。
不提交 raw profiling、本地 walkthrough、私人路径或临时材料。

device_config

设备画像自然语言导入器——通过渐进式对话引导用户将自然语言硬件描述转换为 TensorCast DeviceProfile。

What it does

引导 AI agent 通过渐进式对话流程：

渐进收集信息：首轮只问硬件名称、资料来源和粒度偏好，每轮最多 2-3 个问题。
维护内部事实表：confirmed / ambiguous / missing / needs calibration。
生成可运行 profile：将用户确认的值、临时估值和兜底默认值全部写入 tensor_cast/device.py。
验证 + 输出 CLI 命令：运行导入检查，输出 --device <PROFILE_NAME> 可执行命令。

File layout

File	Purpose
`device_config/SKILL.md`	Skill 定义、约束条件和执行流程

Quick start

在 Claude Code 对话中直接提出需求，例如"我要导入新的设备拓扑"，遵循 agent 的渐进式提问，逐步提供硬件规格。

Key constraints

DeviceProfile.__post_init__ 会自动注册 profile，name 必须唯一。
默认写入 tensor_cast/device.py，只有用户明确要求时才写入 tensor_cast/device_profiles/。
所有默认值、估值和假设必须对用户可见，列入 needs calibration。

op_mapping

op_mapping.yaml 生成器——将 TensorCast 仿真算子映射到 NPU profiling 内核类型。

What it does

通过并行子 Agent 团队（每个算子一个 Agent）追踪完整的 vLLM→CANN 调用链，生成 op_mapping.yaml。

File layout

File	Purpose
`op-mapping/SKILL.md`	核心执行流程、六阶段工作流
`op-mapping/op-mapping-template.yaml`	YAML 模板片段
`op-mapping/single-op-worker-prompt.md`	单算子 Worker Agent 指令
`op-mapping/verifier-prompt.md`	验证阶段指令
`op-mapping/ref/shape_matching_catalog.md`	TC tensor 与 NPU profiling shape 的 10 种差异
`op-mapping/ref/tc_input_count_rules.md`	`tc_input_count` 安全使用规则
`op-mapping/ref/zero_cost_classification.md`	零开销算子分类规则

Quick start

收集完所有输入（model、device、profiling CSV、repo 版本）后，agent 自动执行六阶段流程：GATHER → FORWARD MAPPING → REVERSE MAPPING → VERIFY → WRITE → COMMIT。

Key constraints

kernel_type 必须与 CSV 文件名完全一致（无 .csv 后缀）。
三个映射路径：aten→op-plugin→aclnn、torch_npu.npu_*→op-plugin→aclnn、vllm-ascend 自定义/Triton。
alternate_kernel_types 必须在同一抽象层级，禁止用融合大 op 作为子 op 的备选。

microbench

Microbench Run Script 生成器——从 profiling CSV 生成可在 NPU 上重放的 <KernelType>_run.py。

What it does

为 profiling 内核 CSV 生成可运行的 tools/perf_data_collection/op_replay/<KernelType>_run.py，用于 NPU 实测重放。

File layout

File	Purpose
`microbench/SKILL.md`	Skill 定义和 repo 搜索顺序

Quick start

用户提供 kernel_type、设备 profile、vllm_ascend 版本和 CSV 路径后，agent 生成可重放的 run script。

Key constraints

优先使用本地已克隆的 repos，按指定路径搜索。
repo 缺失时按 SKILL.md 中提供的 clone 命令获取。
生成的 run script 由 run_all_op.py / profile_and_update_db.py 调用。

text-generate-executor

text_generate 单点验证执行器。用于把用户关于 python -m cli.inference.text_generate 的验证诉求转换为可确认、可执行的 CLI 命令，并在确认后运行和总结结果。

What it does

面向已有模型、硬件、batch/query length、prefill 或 decode 模式、固定 TP/DP/EP/MOE 策略、profiling database、trace/debug 或 throughput optimizer 最优行复验的场景，生成单点仿真命令。

File layout

File	Purpose
`text-generate-executor/SKILL.md`	Skill 主说明、默认策略、校验规则和 handoff 规则
`text-generate-executor/references/dialog-flow.md`	渐进式问参流程
`text-generate-executor/references/text-generate-params.md`	`text_generate` 参数速查

Quick start

提出“帮我跑 text_generate 验证”“把 throughput_optimizer 最优行转 text_generate 跑一下”“导出 chrome trace”等请求时，agent 会补齐缺失参数，展示命令和假设，并在用户确认后执行。

Key constraints

执行前必须展示完整命令和关键假设，并要求显式确认。
Decode 模式必须确认 --context-length；profiling 模式必须提供 --profiling-database。
text_generate 只验证固定候选，不执行 TP/EP/MOE-DP 搜索。

throughput-optimizer-executor

throughput_optimizer 部署规划执行器。用于把吞吐规划、硬件对比、并行搜索、PD 聚合/分离/配比优化等自然语言诉求转换为 python -m cli.inference.throughput_optimizer 命令。

What it does

面向搜索和规划场景，收集模型、硬件、设备数、输入/输出长度、SLO、部署模式和搜索空间，生成 optimizer 命令，在确认后运行并总结最佳并行策略、batch、concurrency、throughput、TTFT、TPOT 和 PD ratio 信息。

File layout

File	Purpose
`throughput-optimizer-executor/SKILL.md`	Skill 主说明、默认策略、校验规则和 handoff 规则
`throughput-optimizer-executor/references/dialog-flow.md`	部署模式识别和渐进式问参流程
`throughput-optimizer-executor/references/throughput-optimizer-params.md`	`throughput_optimizer` 参数速查
`throughput-optimizer-executor/scripts/extract_throughput_optimizer_result.py`	optimizer stdout 结构化摘要脚本

Quick start

提出“比较两种硬件”“搜索 Qwen 32B 最佳 TP”“做 PD 分离能力评估”“算 P/D 实例配比”等请求时，agent 会识别 aggregation、disagg 或 PD ratio 模式，补齐 SLO 和搜索空间，并在确认后执行。

Key constraints

--enable-optimize-prefill-decode-ratio 不能与 --disagg 同时使用。
多硬件对比共用同一个 --num-devices，需要在执行前说明。
执行前需要明确确认是否开启 prefix cache 和 MTP；开启后分别补齐 hit rate、MTP token 数和接受率假设。
该 skill 做候选搜索和规划；单点复验应 handoff 到 text-generate-executor。

msmodeling skills

Table of Contents

msmodeling-env-installer

What it does

File layout

Quick start

Key constraints

model-adaptation

What it does

File layout

Quick start

Key constraints

device_config

What it does

File layout

Quick start

Key constraints

op_mapping

What it does

File layout

Quick start

Key constraints

microbench

What it does

File layout

Quick start

Key constraints

text-generate-executor

What it does

File layout

Quick start

Key constraints

throughput-optimizer-executor

What it does

File layout

Quick start

Key constraints