| [feat] 添加SKILL:通过自然语言配置 device Co-authored-by: wendellX<wenxiaoping6@huawei.com> # message auto-generated for no-merge-commit merge: !243 merge skill_device_cfg into develop [feat] 添加SKILL:通过自然语言配置 device Created-by: wendellX Commit-by: wendellX Merged-by: ascend-robot Description: # PR Template Thanks for your contribution; we appreciate it a lot. The following instructions will make your pull request healthier and help you get feedback more easily. If you do not understand some items, don't worry, just make the pull request and seek help from maintainers. 感谢您的贡献,我们非常重视。以下说明将使您的拉取请求更健康,更易于获得反馈。如果您不理解某些项目,请不要担心,只需提交拉取请求并从维护人员那里寻求帮助即可。 **PR Type / PR类型** - [X] Feature(功能新增) - [ ] Bugfix(Bug 修复) - [ ] Docs(文档更新) - [ ] CI/CD(持续集成/持续部署) - [ ] Refactor(代码重构) - [ ] Perf(性能优化) - [ ] Test-Cases(测试用例更新) - [ ] Other(其他) ## 🔍 Motivation / 变更动机 ------ ## 📝 Modification / 修改内容 添加SKILL文件:通过自然语言配置 device ------ ## 📐 Associated Test Results / 关联测试结果 --- 用户:我要导入新的设备拓扑 Claude:我们先不用填复杂参数,我会一步步带你配。 先告诉我下面 3 件事就够了,不知道的直接写“不知道”: 1. 这是什么硬件?例如厂商和型号。 2. 你想按什么粒度建这个拓扑:单卡、单 die / chiplet,还是别的调度单元? 3. 你手头有没有这些原始信息里的任意一部分:显存容量、显存带宽、FP16/BF16/INT8/FP32 算力、以及“几卡/几 die、它们之间怎么连”的描述?有的话原样贴出来就行。 你可以直接像这样回我: - 型号:XXX - 粒度:单卡 / 单 die / 不确定 - 已知规格:96GB 显存,3.2TB/s,BF16 800T,8 卡,卡间 400GB/s 收到后我会先帮你整理成“已确认 / 还不确定 / 可以先默认”的清单,再继续下一步。 用户:华为的ATLAS_800_A3_560T_128G_DIE。单 die,显存容量 64GiB,显存带宽 1.6TiB/s Claude:我先记下目前已确认的事实。 - 已确认: - 厂商:HUAWEI - profile 候选名:ATLAS_800_A3_560T_128G_DIE - 粒度:单 die - 每个 profile 单元显存:64GiB - 每个 profile 单元显存带宽:1.6TiB/s - 说明: - 仓库里现有的 [tensor_cast/device.py:299-320] ATLAS_800_A3_752T_128G_DIE 也是“名字带 128G,但单 die 显存写 64GiB”的风格,所以你这个命名方式是对得上的。 接下来我只追问 3 件最关键的事,不知道可以直接写“不知道”或“先默认”: 1. 560T 指的是哪种算力? - 例如:FP16 560T、BF16 560T、INT8 560T - 如果资料里还有 FP32 / BF16 / FP16 / INT8 的其他峰值,也请原样贴出来。 2. 资料里有没有单独的“通用 / vector 算力”? - 这项主要影响 softmax、norm、激活函数这类非矩阵算子的估算。 - 你可以直接回: - 没有,先留空 - 或 没有,先用临时估值 3. 这个单 die profile 的互联拓扑你想怎么建? - 你可以直接选一种: - 沿用现有 A3 die 拓扑 - 只建 8 卡节点内 - 我只知道 2 die/卡、8 卡/节点,其它先默认 - 如果你知道互联带宽,顺手告诉我是“单向”还是“双向”。 你回这 3 项后,我就能整理出将写入的字段、默认值和待校准项,再给你最后确认。 用户:FP32 75T,BF16 245.8T,FP16 280T,INT8 560T。通信 grid 形状是 [48, 8, 2]。 Claude:好的,目前我整理出的“已确认”是: - name: ATLAS_800_A3_560T_128G_DIE - vendor: HUAWEI - 粒度:单 die - 每 die 显存:64 GiB - 每 die 显存带宽:1.6 TiB/s - mma_ops: - FP32 = 75T - BF16 = 245.8T - FP16 = 280T - INT8 = 560T - comm_grid 形状:[48, 8, 2] 这基本就是把现有 A3 die 模板 [tensor_cast/device.py:169-182] 和 280T 档位算力风格结合起来了。 还差 3 个会直接写进代码的点,你选一下即可: 1. gp_ops 怎么处理? - 先留空 - 先用临时估值:FP32 8T,BF16/FP16 16T - 我有官方值:... 2. [48, 8, 2] 的每层互联参数,是否直接沿用现有 A3 die 拓扑 [tensor_cast/device.py:169-182]? - 外层:196 GB/s,5.5 us - 中层:196 GB/s,0.5 us - 内层:224 GB/s,0.2 us - comm_efficiency = 0.7 - 类型按当前代码默认 CLOS - 这些带宽会按“单向带宽”写入 你可以直接回:沿用现有 A3 die 拓扑 3. 如果没有更精确数据,是否接受先沿用当前默认效率和静态开销,并标记 needs calibration? - compute_efficiency = 0.7 - memory_efficiency = 0.6 - StaticCost(5us, 2us, 10us) 你回这 3 项后,我就直接改 [tensor_cast/device.py],然后做导入注册验证,并给你可用的 --device ATLAS_800_A3_560T_128G_DIE。 用户:1 使用临时值;2 沿用A3 ; 3 compute_efficiency=0.7,memory_efficiency=0.6。 Claude:还差一个会写进代码的小点:StaticCost 你要不要也先沿用现有 A3 默认值 mma=5us, gp=2us, comm=10us?如果可以,我收到你一句“可以”就直接改代码并验证。 用户:mma_op_cost_s=5e-06、gp_op_cost_s=2e-06、comm_op_cost_s=1e-05。 用这个吧 Claude:已加到 [tensor_cast/device.py:299-320],复用了现有 A3 die 拓扑 [tensor_cast/device.py:169-182]。 - 已注册的 DeviceProfile.name - ATLAS_800_A3_560T_128G_DIE - 直接使用的你提供的事实 - vendor="HUAWEI" - 单 die 粒度 - memory_size_bytes=64 * (1024**3) - memory_bandwidth_bytes_ps=1.6 * (1024**4) - mma_ops - FP32 = 75 * 1e12 - BF16 = 245.8 * 1e12 - FP16 = 280 * 1e12 - INT8 = 560 * 1e12 - comm_grid 形状 [48, 8, 2] - compute_efficiency=0.7 - memory_efficiency=0.6 - StaticCost(mma=5e-06, gp=2e-06, comm=1e-05) - 用户确认的估值 - gp_ops - FP32 = 8 * 1e12 - BF16 = 16 * 1e12 - FP16 = 16 * 1e12 - needs calibration - gp_ops 目前是临时估值,不是官方 vector/general compute 指标 - 互联参数目前沿用现有 A3 die 拓扑: - start_dim=0: 196 GB/s, 5.5 us, comm_efficiency=0.7 - start_dim=1: 196 GB/s, 0.5 us, comm_efficiency=0.7 - start_dim=2: 224 GB/s, 0.2 us, comm_efficiency=0.7 - 验证命令 - python - <<'PY' from tensor_cast.device import DeviceProfile print(DeviceProfile.all_device_profiles["ATLAS_800_A3_560T_128G_DIE"]) PY - 结果:成功打印该 DeviceProfile,注册正常 - 可直接用的命令 - python -m cli.inference.text_generate Qwen/Qwen3-32B --num-queries 2 --query-length 3500 --device ATLAS_800_A3_560T_128G_DIE 已经可用了;如果你愿意,下一步最值得校准的是 gp_ops 是否有官方值。 ------ ## 🌟 Use cases (Optional) / 使用案例(可选) **If this PR introduces a new feature, it is better to list some use cases here and update the documentation.** **如果此拉取请求引入了新功能,最好在此处列出一些用例并更新文档。** ------ ## ✅ Checklist / 检查列表 **Before PR**: - [ ] Bug fixes are fully covered by unit tests, the case that causes the bug should be added in the unit tests. / 修复的 Bug 已完全由单元测试覆盖,导致 Bug 的情况应在单元测试中添加。 - [ ] The modification is covered by complete unit tests. If not, please add more unit tests to ensure the correctness. / 此拉取请求中的修改已完全由单元测试覆盖。如果不是,请添加更多单元测试以确保正确性。 - [ ] All relevant documentation (API docs, docstrings, example tutorials) has been updated to reflect these changes. / 所有相关文档(API 文档、文档字符串、示例教程)已更新以反映这些更改。 - [ ] Please ensure code files contain no Chinese comments. / 请保证代码文件中不含中文注释。 ------ See merge request: Ascend/msmodeling!243 | 1 个月前 |
| [feat] 添加SKILL:通过自然语言配置 device Co-authored-by: wendellX<wenxiaoping6@huawei.com> # message auto-generated for no-merge-commit merge: !243 merge skill_device_cfg into develop [feat] 添加SKILL:通过自然语言配置 device Created-by: wendellX Commit-by: wendellX Merged-by: ascend-robot Description: # PR Template Thanks for your contribution; we appreciate it a lot. The following instructions will make your pull request healthier and help you get feedback more easily. If you do not understand some items, don't worry, just make the pull request and seek help from maintainers. 感谢您的贡献,我们非常重视。以下说明将使您的拉取请求更健康,更易于获得反馈。如果您不理解某些项目,请不要担心,只需提交拉取请求并从维护人员那里寻求帮助即可。 **PR Type / PR类型** - [X] Feature(功能新增) - [ ] Bugfix(Bug 修复) - [ ] Docs(文档更新) - [ ] CI/CD(持续集成/持续部署) - [ ] Refactor(代码重构) - [ ] Perf(性能优化) - [ ] Test-Cases(测试用例更新) - [ ] Other(其他) ## 🔍 Motivation / 变更动机 ------ ## 📝 Modification / 修改内容 添加SKILL文件:通过自然语言配置 device ------ ## 📐 Associated Test Results / 关联测试结果 --- 用户:我要导入新的设备拓扑 Claude:我们先不用填复杂参数,我会一步步带你配。 先告诉我下面 3 件事就够了,不知道的直接写“不知道”: 1. 这是什么硬件?例如厂商和型号。 2. 你想按什么粒度建这个拓扑:单卡、单 die / chiplet,还是别的调度单元? 3. 你手头有没有这些原始信息里的任意一部分:显存容量、显存带宽、FP16/BF16/INT8/FP32 算力、以及“几卡/几 die、它们之间怎么连”的描述?有的话原样贴出来就行。 你可以直接像这样回我: - 型号:XXX - 粒度:单卡 / 单 die / 不确定 - 已知规格:96GB 显存,3.2TB/s,BF16 800T,8 卡,卡间 400GB/s 收到后我会先帮你整理成“已确认 / 还不确定 / 可以先默认”的清单,再继续下一步。 用户:华为的ATLAS_800_A3_560T_128G_DIE。单 die,显存容量 64GiB,显存带宽 1.6TiB/s Claude:我先记下目前已确认的事实。 - 已确认: - 厂商:HUAWEI - profile 候选名:ATLAS_800_A3_560T_128G_DIE - 粒度:单 die - 每个 profile 单元显存:64GiB - 每个 profile 单元显存带宽:1.6TiB/s - 说明: - 仓库里现有的 [tensor_cast/device.py:299-320] ATLAS_800_A3_752T_128G_DIE 也是“名字带 128G,但单 die 显存写 64GiB”的风格,所以你这个命名方式是对得上的。 接下来我只追问 3 件最关键的事,不知道可以直接写“不知道”或“先默认”: 1. 560T 指的是哪种算力? - 例如:FP16 560T、BF16 560T、INT8 560T - 如果资料里还有 FP32 / BF16 / FP16 / INT8 的其他峰值,也请原样贴出来。 2. 资料里有没有单独的“通用 / vector 算力”? - 这项主要影响 softmax、norm、激活函数这类非矩阵算子的估算。 - 你可以直接回: - 没有,先留空 - 或 没有,先用临时估值 3. 这个单 die profile 的互联拓扑你想怎么建? - 你可以直接选一种: - 沿用现有 A3 die 拓扑 - 只建 8 卡节点内 - 我只知道 2 die/卡、8 卡/节点,其它先默认 - 如果你知道互联带宽,顺手告诉我是“单向”还是“双向”。 你回这 3 项后,我就能整理出将写入的字段、默认值和待校准项,再给你最后确认。 用户:FP32 75T,BF16 245.8T,FP16 280T,INT8 560T。通信 grid 形状是 [48, 8, 2]。 Claude:好的,目前我整理出的“已确认”是: - name: ATLAS_800_A3_560T_128G_DIE - vendor: HUAWEI - 粒度:单 die - 每 die 显存:64 GiB - 每 die 显存带宽:1.6 TiB/s - mma_ops: - FP32 = 75T - BF16 = 245.8T - FP16 = 280T - INT8 = 560T - comm_grid 形状:[48, 8, 2] 这基本就是把现有 A3 die 模板 [tensor_cast/device.py:169-182] 和 280T 档位算力风格结合起来了。 还差 3 个会直接写进代码的点,你选一下即可: 1. gp_ops 怎么处理? - 先留空 - 先用临时估值:FP32 8T,BF16/FP16 16T - 我有官方值:... 2. [48, 8, 2] 的每层互联参数,是否直接沿用现有 A3 die 拓扑 [tensor_cast/device.py:169-182]? - 外层:196 GB/s,5.5 us - 中层:196 GB/s,0.5 us - 内层:224 GB/s,0.2 us - comm_efficiency = 0.7 - 类型按当前代码默认 CLOS - 这些带宽会按“单向带宽”写入 你可以直接回:沿用现有 A3 die 拓扑 3. 如果没有更精确数据,是否接受先沿用当前默认效率和静态开销,并标记 needs calibration? - compute_efficiency = 0.7 - memory_efficiency = 0.6 - StaticCost(5us, 2us, 10us) 你回这 3 项后,我就直接改 [tensor_cast/device.py],然后做导入注册验证,并给你可用的 --device ATLAS_800_A3_560T_128G_DIE。 用户:1 使用临时值;2 沿用A3 ; 3 compute_efficiency=0.7,memory_efficiency=0.6。 Claude:还差一个会写进代码的小点:StaticCost 你要不要也先沿用现有 A3 默认值 mma=5us, gp=2us, comm=10us?如果可以,我收到你一句“可以”就直接改代码并验证。 用户:mma_op_cost_s=5e-06、gp_op_cost_s=2e-06、comm_op_cost_s=1e-05。 用这个吧 Claude:已加到 [tensor_cast/device.py:299-320],复用了现有 A3 die 拓扑 [tensor_cast/device.py:169-182]。 - 已注册的 DeviceProfile.name - ATLAS_800_A3_560T_128G_DIE - 直接使用的你提供的事实 - vendor="HUAWEI" - 单 die 粒度 - memory_size_bytes=64 * (1024**3) - memory_bandwidth_bytes_ps=1.6 * (1024**4) - mma_ops - FP32 = 75 * 1e12 - BF16 = 245.8 * 1e12 - FP16 = 280 * 1e12 - INT8 = 560 * 1e12 - comm_grid 形状 [48, 8, 2] - compute_efficiency=0.7 - memory_efficiency=0.6 - StaticCost(mma=5e-06, gp=2e-06, comm=1e-05) - 用户确认的估值 - gp_ops - FP32 = 8 * 1e12 - BF16 = 16 * 1e12 - FP16 = 16 * 1e12 - needs calibration - gp_ops 目前是临时估值,不是官方 vector/general compute 指标 - 互联参数目前沿用现有 A3 die 拓扑: - start_dim=0: 196 GB/s, 5.5 us, comm_efficiency=0.7 - start_dim=1: 196 GB/s, 0.5 us, comm_efficiency=0.7 - start_dim=2: 224 GB/s, 0.2 us, comm_efficiency=0.7 - 验证命令 - python - <<'PY' from tensor_cast.device import DeviceProfile print(DeviceProfile.all_device_profiles["ATLAS_800_A3_560T_128G_DIE"]) PY - 结果:成功打印该 DeviceProfile,注册正常 - 可直接用的命令 - python -m cli.inference.text_generate Qwen/Qwen3-32B --num-queries 2 --query-length 3500 --device ATLAS_800_A3_560T_128G_DIE 已经可用了;如果你愿意,下一步最值得校准的是 gp_ops 是否有官方值。 ------ ## 🌟 Use cases (Optional) / 使用案例(可选) **If this PR introduces a new feature, it is better to list some use cases here and update the documentation.** **如果此拉取请求引入了新功能,最好在此处列出一些用例并更新文档。** ------ ## ✅ Checklist / 检查列表 **Before PR**: - [ ] Bug fixes are fully covered by unit tests, the case that causes the bug should be added in the unit tests. / 修复的 Bug 已完全由单元测试覆盖,导致 Bug 的情况应在单元测试中添加。 - [ ] The modification is covered by complete unit tests. If not, please add more unit tests to ensure the correctness. / 此拉取请求中的修改已完全由单元测试覆盖。如果不是,请添加更多单元测试以确保正确性。 - [ ] All relevant documentation (API docs, docstrings, example tutorials) has been updated to reflect these changes. / 所有相关文档(API 文档、文档字符串、示例教程)已更新以反映这些更改。 - [ ] Please ensure code files contain no Chinese comments. / 请保证代码文件中不含中文注释。 ------ See merge request: Ascend/msmodeling!243 | 1 个月前 |
| Add model adapter onboarding automation Co-authored-by: jhon-117<fangkai15@huawei.com> # message auto-generated for no-merge-commit merge: !282 merge codex/model-adaptation-efficiency-v2 into develop Add model adapter onboarding automation Created-by: jhon-117 Commit-by: jhon-117 Merged-by: ascend-robot Description: # PR Template Thanks for your contribution; we appreciate it a lot. The following instructions will make your pull request healthier and help you get feedback more easily. If you do not understand some items, don't worry, just make the pull request and seek help from maintainers. 感谢您的贡献,我们非常重视。以下说明将使您的拉取请求更健康,更易于获得反馈。如果您不理解某些项目,请不要担心,只需提交拉取请求并从维护人员那里寻求帮助即可。 **PR Type / PR类型** - [x] Feature(功能新增) - [ ] Bugfix(Bug 修复) - [ ] Docs(文档更新) - [ ] CI/CD(持续集成/持续部署) - [ ] Refactor(代码重构) - [ ] Perf(性能优化) - [ ] Test-Cases(测试用例更新) - [ ] Other(其他) ## 🔍 Motivation / 变更动机 **Please describe the motivation of this PR and the goal you want to achieve through this PR.** **请描述您的拉取请求的动机和您希望通过此拉取请求实现的目标。** ------ ## 📝 Modification / 修改内容 本 PR 实现 TensorCast 新模型接入效率提升流程,围绕“用户只必须提供 raw Insight profiling 导出文件 + 对应仿真命令”的适配方式,补齐 doctor、evidence、patch discovery、profile draft、ST case 生成和 qwen3-vl replay 验证能力。 主要改动: 新增 tensor_cast.adapter 自动化模块: 仿真命令解析与 AdaptationContext raw MindStudio Insight profiling 解析 用户 hints 读取、冲突检测和 provenance profile candidate 生成与 review/validation evidence draft 生成与 verifier mismatch 分类 PatchReport、patch discovery、profile draft 渲染 ST guardrail case 生成 新增 CLI: python -m cli.inference.model_doctor python -m cli.inference.verify_model_profile model_doctor 支持: --from-command-file --raw-insight-file --hints-file --patch-failure-file --ignore-existing-profile --profile-draft-output 增强 qwen3-vl replay: 新增 tiny config-only fixture:tests/assets/model_config/qwen3_vl_tiny/config.json 支持在 --ignore-existing-profile qwen3_vl 下通过 installed transformers 源码发现 VL profile 字段 patch discovery 可基于 qwen3-vl placeholder/mask meta failure 生成 patch_method_for_qwen3_vl 草案 新增文档: docs/design/model_adaptation_efficiency_design.md docs/en/tensor_cast_new_model_adaptation.md 增强 runtime/transformations: 暴露 runtime summary 所需信息 记录 patch reports 支持 profile registry replay/audit ignore ------ ## 📐 Associated Test Results / 关联测试结果 pytest tests/test_tensor_cast/test_adapter_automation.py -q # 29 passed pytest tests/test_tensor_cast -k "adapter or doctor or evidence" -q # 29 passed python -m compileall -q tensor_cast/adapter cli/inference/model_doctor.py cli/inference/verify_model_profile.py cli/inference/adapter_cli.py tests/test_tensor_cast/test_adapter_automation.py # passed python -m cli.inference.model_doctor --help # passed python -m cli.inference.verify_model_profile --help # passed 额外 smoke: qwen3-vl tiny CLI replay smoke:通过 qwen3-vl patch code draft CLI smoke:通过 deepseek fixture doctor/replay smoke:通过,仅出现 fixture 自带 rope 参数 warning,不影响结果。 ------ ## 🌟 Use cases (Optional) / 使用案例(可选) **If this PR introduces a new feature, it is better to list some use cases here and update the documentation.** **如果此拉取请求引入了新功能,最好在此处列出一些用例并更新文档。** ------ ## ✅ Checklist / 检查列表 **Before PR**: - [ ] Bug fixes are fully covered by unit tests, the case that causes the bug should be added in the unit tests. / 修复的 Bug 已完全由单元测试覆盖,导致 Bug 的情况应在单元测试中添加。 - [ ] The modification is covered by complete unit tests. If not, please add more unit tests to ensure the correctness. / 此拉取请求中的修改已完全由单元测试覆盖。如果不是,请添加更多单元测试以确保正确性。 - [ ] All relevant documentation (API docs, docstrings, example tutorials) has been updated to reflect these changes. / 所有相关文档(API 文档、文档字符串、示例教程)已更新以反映这些更改。 - [ ] Please ensure code files contain no Chinese comments. / 请保证代码文件中不含中文注释。 ------ See merge request: Ascend/msmodeling!282 | 29 天前 |
| feat(skills): add msmodeling-env-installer Co-authored-by: lutean<lutean1@huawei.com> # message auto-generated for no-merge-commit merge: !279 merge develop into develop feat(skills): add msmodeling-env-installer Created-by: lutean Commit-by: lutean Merged-by: ascend-robot Description: # PR Template Thanks for your contribution; we appreciate it a lot. The following instructions will make your pull request healthier and help you get feedback more easily. If you do not understand some items, don't worry, just make the pull request and seek help from maintainers. 感谢您的贡献,我们非常重视。以下说明将使您的拉取请求更健康,更易于获得反馈。如果您不理解某些项目,请不要担心,只需提交拉取请求并从维护人员那里寻求帮助即可。 **PR Type / PR类型** - [x] Feature(功能新增) - [ ] Bugfix(Bug 修复) - [ ] Docs(文档更新) - [ ] CI/CD(持续集成/持续部署) - [ ] Refactor(代码重构) - [ ] Perf(性能优化) - [ ] Test-Cases(测试用例更新) - [ ] Other(其他) ## 🔍 Motivation / 变更动机 **Please describe the motivation of this PR and the goal you want to achieve through this PR.** **请描述您的拉取请求的动机和您希望通过此拉取请求实现的目标。** msmodeling 项目的环境安装涉及 Python 版本、虚拟环境创建、依赖安装、PYTHONPATH 配置以及可选的 Hugging Face 镜像源设置等步骤。当前这些操作主要依赖人工阅读文档并手动执行,容易因平台差异、命令遗漏或环境变量配置不一致导致安装失败,增加新贡献者和 AI Agent 进入项目的成本。 本 PR 引入 msmodeling-env-installer skill,沉淀标准化的环境安装流程,并提供 Windows 自动化安装脚本,使 Agent 能够在明确规则下完成依赖安装、环境校验和问题定位。该能力可以提升首次环境搭建的一致性,减少重复沟通和手工排障成本,为后续 TensorCast、ServingCast 相关开发和测试提供更稳定的基础环境。 ------ ## 📝 Modification / 修改内容 **Please briefly describe what modification is made in this PR.** **请简要描述此拉取请求中进行的修改。** | 文件 | 说明 | |:---|:---| | .agents\skills\msmodeling-env-installer\SKILL.md | Skill 主说明、触发条件和执行规则 | | .agents\skills\msmodeling-env-installer\scripts\install-current-project-deps.ps1 | Windows PowerShell 自动化安装脚本 | | .agents\skills\msmodeling-env-installer\scripts\install-current-project-deps.sh | Linux/macOS/WSL/Git Bash 自动化安装脚本 | | .agents\skills\README.md | skills 索引与 quick start | | docs/RFC/rfc_msmodeling_env_installer_skill_zh.md | 本 RFC 文档 | ------ ## 📐 Associated Test Results / 关联测试结果 **Please provide the related test results, such as test reports, etc.** **请提供相关测试结果,例如测试报告等。** ------ ## 🌟 Use cases (Optional) / 使用案例(可选) **If this PR introduces a new feature, it is better to list some use cases here and update the documentation.** **如果此拉取请求引入了新功能,最好在此处列出一些用例并更新文档。** ------ ## ✅ Checklist / 检查列表 **Before PR**: - [ ] Bug fixes are fully covered by unit tests, the case that causes the bug should be added in the unit tests. / 修复的 Bug 已完全由单元测试覆盖,导致 Bug 的情况应在单元测试中添加。 - [ ] The modification is covered by complete unit tests. If not, please add more unit tests to ensure the correctness. / 此拉取请求中的修改已完全由单元测试覆盖。如果不是,请添加更多单元测试以确保正确性。 - [ ] All relevant documentation (API docs, docstrings, example tutorials) has been updated to reflect these changes. / 所有相关文档(API 文档、文档字符串、示例教程)已更新以反映这些更改。 - [ ] Please ensure code files contain no Chinese comments. / 请保证代码文件中不含中文注释。 ------ See merge request: Ascend/msmodeling!279 | 29 天前 |
| Update profiling op mapping skill docs Co-authored-by: Secluded_Ocean<tangchuxiao0709@qq.com> # message auto-generated for no-merge-commit merge: !212 merge pr/glm5-op-mapping-skill-docs into develop Update profiling op mapping skill docs Created-by: Secluded_Ocean Commit-by: Secluded_Ocean Merged-by: ascend-robot Description: **PR Type / PR类型** - [ ] Feature(功能新增) - [ ] Bugfix(Bug 修复) - [x] Docs(文档更新) - [ ] CI/CD(持续集成/持续部署) - [ ] Refactor(代码重构) - [ ] Perf(性能优化) - [ ] Test-Cases(测试用例更新) - [ ] Other(其他) ## 🔍 Motivation / 变更动机 This PR updates the profiling database op-mapping skill documentation. During GLM5 profiling database expansion, several recurring issues were identified: - Some TensorCast operators do not map to profiling CSV rows by direct tensor-shape matching. - Semantic operators such as grouped MoE and LightningIndexer require explicit query-mode handling. - Generated placeholder rows with empty or zero latency must not be treated as valid profiling data. - Future op-mapping work needs clearer worker/verifier instructions to avoid incorrect mappings. The goal of this PR is to document these lessons in the op-mapping skill so future profiling database updates follow a clearer and safer workflow. ------ ## 📝 Modification / 修改内容 This PR updates the op-mapping skill documents: - docs/perf_database/skills/op-mapping/SKILL.md - docs/perf_database/skills/op-mapping/single-op-worker-prompt.md - docs/perf_database/skills/op-mapping/verifier-prompt.md Main changes: - Clarify when an operator needs a dedicated query_mode. - Clarify that placeholder latency rows should not be used as measured profiling data. - Strengthen the worker instructions for checking TensorCast op semantics, NPU kernel names, CSV shapes, and replay feasibility. - Strengthen the verifier instructions for reviewing operator mapping quality and shape matching assumptions. ------ ## 📐 Associated Test Results / 关联测试结果 This PR only updates documentation/prompt files. No runtime test is required. Manual check: text Reviewed the updated skill and prompt files for profiling database op-mapping workflow consistency. ------ ## 🌟 Use cases (Optional) / 使用案例(可选) Future profiling database contributors can use this skill to: - Add or verify op mappings for new models. - Decide whether a default compute lookup is enough or whether a dedicated query mode is required. - Avoid treating shape-generated placeholder rows as real latency data. - Review replay feasibility before adding generated CSV shapes. ------ ## ✅ Checklist / 检查列表 **Before PR**: - [ ] Bug fixes are fully covered by unit tests, the case that causes the bug should be added in the unit tests. / 修复的 Bug 已完全由单元测试覆盖,导致 Bug 的情况应在单元测试中添加。 - [ ] The modification is covered by complete unit tests. If not, please add more unit tests to ensure the correctness. / 此拉取请求中的修改已完全由单元测试覆盖。如果不是,请添加更多单元测试以确保正确性。 - [x] All relevant documentation (API docs, docstrings, example tutorials) has been updated to reflect these changes. / 所有相关文档(API 文档、文档字符串、示例教程)已更新以反映这些更改。 - [x] Please ensure code files contain no Chinese comments. / 请保证代码文件中不含中文注释。 See merge request: Ascend/msmodeling!212 | 28 天前 |
| 优化 msModeling README 与中英文文档结构 Co-authored-by: eveyin1<qianyin2022@hotmail.com> # message auto-generated for no-merge-commit merge: !329 merge doc_fix1 into develop 优化 msModeling README 与中英文文档结构 Created-by: eveyin1 Commit-by: eveyin1 Merged-by: ascend-robot Description: # PR Template Thanks for your contribution; we appreciate it a lot. The following instructions will make your pull request healthier and help you get feedback more easily. If you do not understand some items, don't worry, just make the pull request and seek help from maintainers. 感谢您的贡献,我们非常重视。以下说明将使您的拉取请求更健康,更易于获得反馈。如果您不理解某些项目,请不要担心,只需提交拉取请求并从维护人员那里寻求帮助即可。 **PR Type / PR类型** - [ ] Feature(功能新增) - [ ] Bugfix(Bug 修复) - [x] Docs(文档更新) - [ ] CI/CD(持续集成/持续部署) - [ ] Refactor(代码重构) - [ ] Perf(性能优化) - [ ] Test-Cases(测试用例更新) - [ ] Other(其他) ## 🔍 Motivation / 变更动机 **Please describe the motivation of this PR and the goal you want to achieve through this PR.** **请描述您的拉取请求的动机和您希望通过此拉取请求实现的目标。** ------ ## 📝 Modification / 修改内容 **Please briefly describe what modification is made in this PR.** **请简要描述此拉取请求中进行的修改。** 1 readme参考最新模板 内容整改 2 安装 快速入门内容补齐 3 docs 英文文档整理 4 文档易用性评分 5 通过docs-tool aidd检查 ------ ## 📐 Associated Test Results / 关联测试结果 **Please provide the related test results, such as test reports, etc.** **请提供相关测试结果,例如测试报告等。**  ------ ## 🌟 Use cases (Optional) / 使用案例(可选) **If this PR introduces a new feature, it is better to list some use cases here and update the documentation.** **如果此拉取请求引入了新功能,最好在此处列出一些用例并更新文档。** ------ ## ✅ Checklist / 检查列表 **Before PR**: - [ ] Bug fixes are fully covered by unit tests, the case that causes the bug should be added in the unit tests. / 修复的 Bug 已完全由单元测试覆盖,导致 Bug 的情况应在单元测试中添加。 - [ ] The modification is covered by complete unit tests. If not, please add more unit tests to ensure the correctness. / 此拉取请求中的修改已完全由单元测试覆盖。如果不是,请添加更多单元测试以确保正确性。 - [ ] All relevant documentation (API docs, docstrings, example tutorials) has been updated to reflect these changes. / 所有相关文档(API 文档、文档字符串、示例教程)已更新以反映这些更改。 - [ ] Please ensure code files contain no Chinese comments. / 请保证代码文件中不含中文注释。 ------ See merge request: Ascend/msmodeling!329 | 22 天前 |
| [feat]添加服务化寻优部署&参数推荐 skill Co-authored-by: wendellX<wenxiaoping6@huawei.com> # message auto-generated for no-merge-commit merge: !289 merge skill_param_optimizer into develop [feat]添加服务化寻优部署&参数推荐 skill Created-by: wendellX Commit-by: wendellX Merged-by: ascend-robot Description: # PR Template Thanks for your contribution; we appreciate it a lot. The following instructions will make your pull request healthier and help you get feedback more easily. If you do not understand some items, don't worry, just make the pull request and seek help from maintainers. 感谢您的贡献,我们非常重视。以下说明将使您的拉取请求更健康,更易于获得反馈。如果您不理解某些项目,请不要担心,只需提交拉取请求并从维护人员那里寻求帮助即可。 **PR Type / PR类型** - [x] Feature(功能新增) - [ ] Bugfix(Bug 修复) - [ ] Docs(文档更新) - [ ] CI/CD(持续集成/持续部署) - [ ] Refactor(代码重构) - [ ] Perf(性能优化) - [ ] Test-Cases(测试用例更新) - [ ] Other(其他) ## 🔍 Motivation / 变更动机 **[feat]添加服务化寻优部署&参数推荐 skill ** ------ ## 📝 Modification / 修改内容 **[feat]添加服务化寻优部署&参数推荐 skill ** ------ ## 📐 Associated Test Results / 关联测试结果 ### 1 部署:   ### 2 参数推荐:      ### 3 参数写入 config.toml    #### 3.1 在 config.toml 中校验  #### 3.2 运行:  ------ ## 🌟 Use cases (Optional) / 使用案例(可选) **If this PR introduces a new feature, it is better to list some use cases here and update the documentation.** **如果此拉取请求引入了新功能,最好在此处列出一些用例并更新文档。** ------ ## ✅ Checklist / 检查列表 **Before PR**: - [ ] Bug fixes are fully covered by unit tests, the case that causes the bug should be added in the unit tests. / 修复的 Bug 已完全由单元测试覆盖,导致 Bug 的情况应在单元测试中添加。 - [ ] The modification is covered by complete unit tests. If not, please add more unit tests to ensure the correctness. / 此拉取请求中的修改已完全由单元测试覆盖。如果不是,请添加更多单元测试以确保正确性。 - [ ] All relevant documentation (API docs, docstrings, example tutorials) has been updated to reflect these changes. / 所有相关文档(API 文档、文档字符串、示例教程)已更新以反映这些更改。 - [ ] Please ensure code files contain no Chinese comments. / 请保证代码文件中不含中文注释。 ------ See merge request: Ascend/msmodeling!289 | 29 天前 |
| [feat]添加服务化寻优部署&参数推荐 skill Co-authored-by: wendellX<wenxiaoping6@huawei.com> # message auto-generated for no-merge-commit merge: !289 merge skill_param_optimizer into develop [feat]添加服务化寻优部署&参数推荐 skill Created-by: wendellX Commit-by: wendellX Merged-by: ascend-robot Description: # PR Template Thanks for your contribution; we appreciate it a lot. The following instructions will make your pull request healthier and help you get feedback more easily. If you do not understand some items, don't worry, just make the pull request and seek help from maintainers. 感谢您的贡献,我们非常重视。以下说明将使您的拉取请求更健康,更易于获得反馈。如果您不理解某些项目,请不要担心,只需提交拉取请求并从维护人员那里寻求帮助即可。 **PR Type / PR类型** - [x] Feature(功能新增) - [ ] Bugfix(Bug 修复) - [ ] Docs(文档更新) - [ ] CI/CD(持续集成/持续部署) - [ ] Refactor(代码重构) - [ ] Perf(性能优化) - [ ] Test-Cases(测试用例更新) - [ ] Other(其他) ## 🔍 Motivation / 变更动机 **[feat]添加服务化寻优部署&参数推荐 skill ** ------ ## 📝 Modification / 修改内容 **[feat]添加服务化寻优部署&参数推荐 skill ** ------ ## 📐 Associated Test Results / 关联测试结果 ### 1 部署:   ### 2 参数推荐:      ### 3 参数写入 config.toml    #### 3.1 在 config.toml 中校验  #### 3.2 运行:  ------ ## 🌟 Use cases (Optional) / 使用案例(可选) **If this PR introduces a new feature, it is better to list some use cases here and update the documentation.** **如果此拉取请求引入了新功能,最好在此处列出一些用例并更新文档。** ------ ## ✅ Checklist / 检查列表 **Before PR**: - [ ] Bug fixes are fully covered by unit tests, the case that causes the bug should be added in the unit tests. / 修复的 Bug 已完全由单元测试覆盖,导致 Bug 的情况应在单元测试中添加。 - [ ] The modification is covered by complete unit tests. If not, please add more unit tests to ensure the correctness. / 此拉取请求中的修改已完全由单元测试覆盖。如果不是,请添加更多单元测试以确保正确性。 - [ ] All relevant documentation (API docs, docstrings, example tutorials) has been updated to reflect these changes. / 所有相关文档(API 文档、文档字符串、示例教程)已更新以反映这些更改。 - [ ] Please ensure code files contain no Chinese comments. / 请保证代码文件中不含中文注释。 ------ See merge request: Ascend/msmodeling!289 | 29 天前 |
| feat(skills): add text generate and throughput optimizer executors Co-authored-by: lutean<lutean1@huawei.com> # message auto-generated for no-merge-commit merge: !278 merge develop into develop feat(skills): add text generate and throughput optimizer executors Created-by: lutean Commit-by: lutean Merged-by: ascend-robot Description: # PR Template Thanks for your contribution; we appreciate it a lot. The following instructions will make your pull request healthier and help you get feedback more easily. If you do not understand some items, don't worry, just make the pull request and seek help from maintainers. 感谢您的贡献,我们非常重视。以下说明将使您的拉取请求更健康,更易于获得反馈。如果您不理解某些项目,请不要担心,只需提交拉取请求并从维护人员那里寻求帮助即可。 **PR Type / PR类型** - [x] Feature(功能新增) - [ ] Bugfix(Bug 修复) - [ ] Docs(文档更新) - [ ] CI/CD(持续集成/持续部署) - [ ] Refactor(代码重构) - [ ] Perf(性能优化) - [ ] Test-Cases(测试用例更新) - [ ] Other(其他) ## 🔍 Motivation / 变更动机 **Please describe the motivation of this PR and the goal you want to achieve through this PR.** **请描述您的拉取请求的动机和您希望通过此拉取请求实现的目标。** 当前 text_generate 和 throughput_optimizer CLI 的参数组合较多,涉及模型、硬件 profile、并行策略、量化、PD 分离、SLO 约束等多类配置。用户在进行性能验证或吞吐规划时,容易遗漏关键参数、混淆部署模式,或无法快速从原始输出中提取可对比的结论。 本 PR 新增 text-generate-executor 和 throughput-optimizer-executor 两个 skills,用于将自然语言性能验证/吞吐规划需求转换为可确认、可执行的 CLI 命令,并在执行后整理关键指标与结果摘要。这样可以降低常见建模任务的使用门槛,提高参数收集、命令构造和结果解读的一致性。 ------ ## 📝 Modification / 修改内容 **Please briefly describe what modification is made in this PR.** **请简要描述此拉取请求中进行的修改。** | 文件 | 说明 | |:---|:---| | skills/text-generate-executor/SKILL.md | Skill 主说明和执行规则 | | skills/text-generate-executor/references/dialog-flow.md | 渐进式问参流程 | | skills/text-generate-executor/references/text-generate-params.md | text_generate 参数速查 | | skills/throughput-optimizer-executor/SKILL.md | Skill 主说明和执行规则 | | skills/throughput-optimizer-executor/references/dialog-flow.md | 问参流程和模式分支 | | skills/throughput-optimizer-executor/references/throughput-optimizer-params.md | 参数说明和默认规则 | ------ ## 📐 Associated Test Results / 关联测试结果 **Please provide the related test results, such as test reports, etc.** **请提供相关测试结果,例如测试报告等。** ------ ## 🌟 Use cases (Optional) / 使用案例(可选) **If this PR introduces a new feature, it is better to list some use cases here and update the documentation.** **如果此拉取请求引入了新功能,最好在此处列出一些用例并更新文档。** ------ ## ✅ Checklist / 检查列表 **Before PR**: - [ ] Bug fixes are fully covered by unit tests, the case that causes the bug should be added in the unit tests. / 修复的 Bug 已完全由单元测试覆盖,导致 Bug 的情况应在单元测试中添加。 - [ ] The modification is covered by complete unit tests. If not, please add more unit tests to ensure the correctness. / 此拉取请求中的修改已完全由单元测试覆盖。如果不是,请添加更多单元测试以确保正确性。 - [ ] All relevant documentation (API docs, docstrings, example tutorials) has been updated to reflect these changes. / 所有相关文档(API 文档、文档字符串、示例教程)已更新以反映这些更改。 - [ ] Please ensure code files contain no Chinese comments. / 请保证代码文件中不含中文注释。 ------ See merge request: Ascend/msmodeling!278 | 29 天前 |
| feat(skills): add text generate and throughput optimizer executors Co-authored-by: lutean<lutean1@huawei.com> # message auto-generated for no-merge-commit merge: !278 merge develop into develop feat(skills): add text generate and throughput optimizer executors Created-by: lutean Commit-by: lutean Merged-by: ascend-robot Description: # PR Template Thanks for your contribution; we appreciate it a lot. The following instructions will make your pull request healthier and help you get feedback more easily. If you do not understand some items, don't worry, just make the pull request and seek help from maintainers. 感谢您的贡献,我们非常重视。以下说明将使您的拉取请求更健康,更易于获得反馈。如果您不理解某些项目,请不要担心,只需提交拉取请求并从维护人员那里寻求帮助即可。 **PR Type / PR类型** - [x] Feature(功能新增) - [ ] Bugfix(Bug 修复) - [ ] Docs(文档更新) - [ ] CI/CD(持续集成/持续部署) - [ ] Refactor(代码重构) - [ ] Perf(性能优化) - [ ] Test-Cases(测试用例更新) - [ ] Other(其他) ## 🔍 Motivation / 变更动机 **Please describe the motivation of this PR and the goal you want to achieve through this PR.** **请描述您的拉取请求的动机和您希望通过此拉取请求实现的目标。** 当前 text_generate 和 throughput_optimizer CLI 的参数组合较多,涉及模型、硬件 profile、并行策略、量化、PD 分离、SLO 约束等多类配置。用户在进行性能验证或吞吐规划时,容易遗漏关键参数、混淆部署模式,或无法快速从原始输出中提取可对比的结论。 本 PR 新增 text-generate-executor 和 throughput-optimizer-executor 两个 skills,用于将自然语言性能验证/吞吐规划需求转换为可确认、可执行的 CLI 命令,并在执行后整理关键指标与结果摘要。这样可以降低常见建模任务的使用门槛,提高参数收集、命令构造和结果解读的一致性。 ------ ## 📝 Modification / 修改内容 **Please briefly describe what modification is made in this PR.** **请简要描述此拉取请求中进行的修改。** | 文件 | 说明 | |:---|:---| | skills/text-generate-executor/SKILL.md | Skill 主说明和执行规则 | | skills/text-generate-executor/references/dialog-flow.md | 渐进式问参流程 | | skills/text-generate-executor/references/text-generate-params.md | text_generate 参数速查 | | skills/throughput-optimizer-executor/SKILL.md | Skill 主说明和执行规则 | | skills/throughput-optimizer-executor/references/dialog-flow.md | 问参流程和模式分支 | | skills/throughput-optimizer-executor/references/throughput-optimizer-params.md | 参数说明和默认规则 | ------ ## 📐 Associated Test Results / 关联测试结果 **Please provide the related test results, such as test reports, etc.** **请提供相关测试结果,例如测试报告等。** ------ ## 🌟 Use cases (Optional) / 使用案例(可选) **If this PR introduces a new feature, it is better to list some use cases here and update the documentation.** **如果此拉取请求引入了新功能,最好在此处列出一些用例并更新文档。** ------ ## ✅ Checklist / 检查列表 **Before PR**: - [ ] Bug fixes are fully covered by unit tests, the case that causes the bug should be added in the unit tests. / 修复的 Bug 已完全由单元测试覆盖,导致 Bug 的情况应在单元测试中添加。 - [ ] The modification is covered by complete unit tests. If not, please add more unit tests to ensure the correctness. / 此拉取请求中的修改已完全由单元测试覆盖。如果不是,请添加更多单元测试以确保正确性。 - [ ] All relevant documentation (API docs, docstrings, example tutorials) has been updated to reflect these changes. / 所有相关文档(API 文档、文档字符串、示例教程)已更新以反映这些更改。 - [ ] Please ensure code files contain no Chinese comments. / 请保证代码文件中不含中文注释。 ------ See merge request: Ascend/msmodeling!278 | 29 天前 |
| feat(skills): add text generate and throughput optimizer executors Co-authored-by: lutean<lutean1@huawei.com> # message auto-generated for no-merge-commit merge: !278 merge develop into develop feat(skills): add text generate and throughput optimizer executors Created-by: lutean Commit-by: lutean Merged-by: ascend-robot Description: # PR Template Thanks for your contribution; we appreciate it a lot. The following instructions will make your pull request healthier and help you get feedback more easily. If you do not understand some items, don't worry, just make the pull request and seek help from maintainers. 感谢您的贡献,我们非常重视。以下说明将使您的拉取请求更健康,更易于获得反馈。如果您不理解某些项目,请不要担心,只需提交拉取请求并从维护人员那里寻求帮助即可。 **PR Type / PR类型** - [x] Feature(功能新增) - [ ] Bugfix(Bug 修复) - [ ] Docs(文档更新) - [ ] CI/CD(持续集成/持续部署) - [ ] Refactor(代码重构) - [ ] Perf(性能优化) - [ ] Test-Cases(测试用例更新) - [ ] Other(其他) ## 🔍 Motivation / 变更动机 **Please describe the motivation of this PR and the goal you want to achieve through this PR.** **请描述您的拉取请求的动机和您希望通过此拉取请求实现的目标。** 当前 text_generate 和 throughput_optimizer CLI 的参数组合较多,涉及模型、硬件 profile、并行策略、量化、PD 分离、SLO 约束等多类配置。用户在进行性能验证或吞吐规划时,容易遗漏关键参数、混淆部署模式,或无法快速从原始输出中提取可对比的结论。 本 PR 新增 text-generate-executor 和 throughput-optimizer-executor 两个 skills,用于将自然语言性能验证/吞吐规划需求转换为可确认、可执行的 CLI 命令,并在执行后整理关键指标与结果摘要。这样可以降低常见建模任务的使用门槛,提高参数收集、命令构造和结果解读的一致性。 ------ ## 📝 Modification / 修改内容 **Please briefly describe what modification is made in this PR.** **请简要描述此拉取请求中进行的修改。** | 文件 | 说明 | |:---|:---| | skills/text-generate-executor/SKILL.md | Skill 主说明和执行规则 | | skills/text-generate-executor/references/dialog-flow.md | 渐进式问参流程 | | skills/text-generate-executor/references/text-generate-params.md | text_generate 参数速查 | | skills/throughput-optimizer-executor/SKILL.md | Skill 主说明和执行规则 | | skills/throughput-optimizer-executor/references/dialog-flow.md | 问参流程和模式分支 | | skills/throughput-optimizer-executor/references/throughput-optimizer-params.md | 参数说明和默认规则 | ------ ## 📐 Associated Test Results / 关联测试结果 **Please provide the related test results, such as test reports, etc.** **请提供相关测试结果,例如测试报告等。** ------ ## 🌟 Use cases (Optional) / 使用案例(可选) **If this PR introduces a new feature, it is better to list some use cases here and update the documentation.** **如果此拉取请求引入了新功能,最好在此处列出一些用例并更新文档。** ------ ## ✅ Checklist / 检查列表 **Before PR**: - [ ] Bug fixes are fully covered by unit tests, the case that causes the bug should be added in the unit tests. / 修复的 Bug 已完全由单元测试覆盖,导致 Bug 的情况应在单元测试中添加。 - [ ] The modification is covered by complete unit tests. If not, please add more unit tests to ensure the correctness. / 此拉取请求中的修改已完全由单元测试覆盖。如果不是,请添加更多单元测试以确保正确性。 - [ ] All relevant documentation (API docs, docstrings, example tutorials) has been updated to reflect these changes. / 所有相关文档(API 文档、文档字符串、示例教程)已更新以反映这些更改。 - [ ] Please ensure code files contain no Chinese comments. / 请保证代码文件中不含中文注释。 ------ See merge request: Ascend/msmodeling!278 | 29 天前 |