MindStudio-Modeling（msmodeling）是MindStudio建模寻优工具，评估模型及服务化等场景下的理论性能，并在此基础上寻找性能较优的部署策略等参数。

文件	最后提交记录	最后更新时间
.agents	修复ais_bench 配置 target_filed崩溃问题 Co-authored-by: liu977803265<liushuai165@huawei.com> # message auto-generated for no-merge-commit merge: !452 merge master into master 修复ais_bench 配置 target_filed崩溃问题 Created-by: liu977803265 Commit-by: liu977803265 Merged-by: ascend-robot Description: # PR Template Thanks for your contribution; we appreciate it a lot. The following instructions will make your pull request healthier and help you get feedback more easily. If you do not understand some items, don't worry, just make the pull request and seek help from maintainers. 感谢您的贡献，我们非常重视。以下说明将使您的拉取请求更健康，更易于获得反馈。如果您不理解某些项目，请不要担心，只需提交拉取请求并从维护人员那里寻求帮助即可。 PR Type / PR类型 - [ ] Feature（功能新增） - [ ] Bugfix（Bug 修复） - [ ] Docs（文档更新） - [ ] CI/CD（持续集成/持续部署） - [ ] Refactor（代码重构） - [ ] Perf（性能优化） - [ ] Test-Cases（测试用例更新） - [ ] Other（其他） ## 🔍 Motivation / 变更动机 Please describe the motivation of this PR and the goal you want to achieve through this PR. 请描述您的拉取请求的动机和您希望通过此拉取请求实现的目标。 ------ ## 📝 Modification / 修改内容 Please briefly describe what modification is made in this PR. 请简要描述此拉取请求中进行的修改。 ## 📐 Associated Test Results / 关联测试结果 Please provide the related test results, such as test reports, etc. 请提供相关测试结果，例如测试报告等。修改前 ais_bench 中配置target_filed, 会报如下错误： ![Snipaste_2026-06-26_18-22-54.png](https://raw.gitcode.com/user-images/assets/8428112/05faf384-aec5-4281-98e0-4deef86ec5ea/Snipaste_2026-06-26_18-22-54.png 'Snipaste_2026-06-26_18-22-54.png') 修改后，可以正常运行： ![Snipaste_2026-06-26_18-24-02.png](https://raw.gitcode.com/user-images/assets/8428112/b634f584-51fb-46d1-9ec9-5af000351d95/Snipaste_2026-06-26_18-24-02.png 'Snipaste_2026-06-26_18-24-02.png') ## 🌟 Use cases (Optional) / 使用案例（可选） If this PR introduces a new feature, it is better to list some use cases here and update the documentation. 如果此拉取请求引入了新功能，最好在此处列出一些用例并更新文档。 ------ ## ✅ Checklist / 检查列表 Before PR: - [ ] Bug fixes are fully covered by unit tests, the case that causes the bug should be added in the unit tests. / 修复的 Bug 已完全由单元测试覆盖，导致 Bug 的情况应在单元测试中添加。 - [ ] The modification is covered by complete unit tests. If not, please add more unit tests to ensure the correctness. / 此拉取请求中的修改已完全由单元测试覆盖。如果不是，请添加更多单元测试以确保正确性。 - [ ] All relevant documentation (API docs, docstrings, example tutorials) has been updated to reflect these changes. / 所有相关文档（API 文档、文档字符串、示例教程）已更新以反映这些更改。 - [ ] Please ensure code files contain no Chinese comments. / 请保证代码文件中不含中文注释。 ------ See merge request: Ascend/msmodeling!452	1 小时前
.gitcode	【同步】【非开发代码】代码从 develop 同步到 master Co-authored-by: yydyzr<liuyuncong1@huawei.com> Co-authored-by: gcw_61YBRfIt<chuzhenxing@huawei.com> Co-authored-by: 孔炳翔<1120200577@qq.com> Co-authored-by: zhengxinqian<qianzhengxin@huawei.com> Co-authored-by: hw_whx<wanghexiang7@huawei.com> Co-authored-by: jgong5<steven.gong@gmail.com> Co-authored-by: hw_whx<2952154980@qq.com> # message auto-generated for no-merge-commit merge: !330 merge master into master 【同步】【非开发代码】代码从 develop 同步到 master Created-by: AvadaKedavrua Commit-by: liujiawang;ascend-robot;AvadaKedavrua;lutean;Horacehxw;eveyin1;minghang_c;zwt__;tt0cool;elrond-g;jia_ya_nan;zhenyu_zhang;ChenHuiwen;wangshen001;Hudingyi;wendellX;Secluded_Ocean;jhon-117;yaohan404;jiangruitao;zhenghaojie;stormchasingg;panyj1993;cmh1056291129;yuyinkai1;sunguozhong;genius52;liu_jiaxu;HongMaoShuiGuai;zhengxinqian;weixin_43368449;jsez-li-bin;jgong5;wqh17101;w00609794;yydyzr;JieZhang679;sppedforcy;gcw_61YBRfIt;Jiong Gong;hw_whx;gongjiong;孔炳翔 Merged-by: ascend-robot Description: 代码从 develop 同步到 master，后续基于 master 演进，并支持打包 See merge request: Ascend/msmodeling!330	14 天前
cli	更改默认项为vllm Co-authored-by: tt0cool<xujintao8@h-partners.com> # message auto-generated for no-merge-commit merge: !461 merge master into master 更改默认项为vllm Created-by: tt0cool Commit-by: tt0cool Merged-by: ascend-robot Description: # PR Template Thanks for your contribution; we appreciate it a lot. The following instructions will make your pull request healthier and help you get feedback more easily. If you do not understand some items, don't worry, just make the pull request and seek help from maintainers. 感谢您的贡献，我们非常重视。以下说明将使您的拉取请求更健康，更易于获得反馈。如果您不理解某些项目，请不要担心，只需提交拉取请求并从维护人员那里寻求帮助即可。 PR Type / PR类型 - [ ] Feature（功能新增） - [ ] Bugfix（Bug 修复） - [ ] Docs（文档更新） - [ ] CI/CD（持续集成/持续部署） - [ ] Refactor（代码重构） - [ ] Perf（性能优化） - [ ] Test-Cases（测试用例更新） - [x] Other（其他） ## 🔍 Motivation / 变更动机 Please describe the motivation of this PR and the goal you want to achieve through this PR. 请描述您的拉取请求的动机和您希望通过此拉取请求实现的目标。更换寻优工具默认使用为vllm ------ ## 📝 Modification / 修改内容 Please briefly describe what modification is made in this PR. 请简要描述此拉取请求中进行的修改。更换寻优工具默认使用为vllm ------ ## 📐 Associated Test Results / 关联测试结果 Please provide the related test results, such as test reports, etc. 请提供相关测试结果，例如测试报告等。 ![image.png](https://raw.gitcode.com/user-images/assets/8428112/d19da40c-3acf-4316-8c59-d84a7e4492ac/image.png 'image.png') ------ ## 🌟 Use cases (Optional) / 使用案例（可选） If this PR introduces a new feature, it is better to list some use cases here and update the documentation. 如果此拉取请求引入了新功能，最好在此处列出一些用例并更新文档。 ------ ## ✅ Checklist / 检查列表 Before PR: - [x] Bug fixes are fully covered by unit tests, the case that causes the bug should be added in the unit tests. / 修复的 Bug 已完全由单元测试覆盖，导致 Bug 的情况应在单元测试中添加。 - [x] The modification is covered by complete unit tests. If not, please add more unit tests to ensure the correctness. / 此拉取请求中的修改已完全由单元测试覆盖。如果不是，请添加更多单元测试以确保正确性。 - [x] All relevant documentation (API docs, docstrings, example tutorials) has been updated to reflect these changes. / 所有相关文档（API 文档、文档字符串、示例教程）已更新以反映这些更改。 - [x] Please ensure code files contain no Chinese comments. / 请保证代码文件中不含中文注释。 ------ See merge request: Ascend/msmodeling!461	1 小时前
contrib	基于vLLM的服务化参数自动寻优工具双机插件 Co-authored-by: linyikang<lin_yikang@163.com> # message auto-generated for no-merge-commit merge: !403 merge master into master 基于vLLM的服务化参数自动寻优工具双机插件 Created-by: yikangLin Commit-by: linyikang Merged-by: ascend-robot Description: # PR Template Thanks for your contribution; we appreciate it a lot. The following instructions will make your pull request healthier and help you get feedback more easily. If you do not understand some items, don't worry, just make the pull request and seek help from maintainers. 感谢您的贡献，我们非常重视。以下说明将使您的拉取请求更健康，更易于获得反馈。如果您不理解某些项目，请不要担心，只需提交拉取请求并从维护人员那里寻求帮助即可。 PR Type / PR类型 - [x] Feature（功能新增） - [ ] Bugfix（Bug 修复） - [ ] Docs（文档更新） - [ ] CI/CD（持续集成/持续部署） - [ ] Refactor（代码重构） - [ ] Perf（性能优化） - [ ] Test-Cases（测试用例更新） - [ ] Other（其他） ## 🔍 Motivation / 变更动机 Please describe the motivation of this PR and the goal you want to achieve through this PR. 请描述您的拉取请求的动机和您希望通过此拉取请求实现的目标。 PR的目标是为了支持vLLM框架下的双机参数自动寻优。 ------ ## 📝 Modification / 修改内容 Please briefly describe what modification is made in this PR. 请简要描述此拉取请求中进行的修改。在contrib目录下增加了双机参数自动寻优的插件实现，主要是基于SSH实现服务的拉起和停止。 ------ ## 📐 Associated Test Results / 关联测试结果 Please provide the related test results, such as test reports, etc. 请提供相关测试结果，例如测试报告等。 ------ ## 🌟 Use cases (Optional) / 使用案例（可选） If this PR introduces a new feature, it is better to list some use cases here and update the documentation. 如果此拉取请求引入了新功能，最好在此处列出一些用例并更新文档。 ------ ## ✅ Checklist / 检查列表 Before PR: - [ ] Bug fixes are fully covered by unit tests, the case that causes the bug should be added in the unit tests. / 修复的 Bug 已完全由单元测试覆盖，导致 Bug 的情况应在单元测试中添加。 - [ ] The modification is covered by complete unit tests. If not, please add more unit tests to ensure the correctness. / 此拉取请求中的修改已完全由单元测试覆盖。如果不是，请添加更多单元测试以确保正确性。 - [ ] All relevant documentation (API docs, docstrings, example tutorials) has been updated to reflect these changes. / 所有相关文档（API 文档、文档字符串、示例教程）已更新以反映这些更改。 - [x] Please ensure code files contain no Chinese comments. / 请保证代码文件中不含中文注释。 ------ See merge request: Ascend/msmodeling!403	1 天前
docs	更改默认项为vllm Co-authored-by: tt0cool<xujintao8@h-partners.com> # message auto-generated for no-merge-commit merge: !461 merge master into master 更改默认项为vllm Created-by: tt0cool Commit-by: tt0cool Merged-by: ascend-robot Description: # PR Template Thanks for your contribution; we appreciate it a lot. The following instructions will make your pull request healthier and help you get feedback more easily. If you do not understand some items, don't worry, just make the pull request and seek help from maintainers. 感谢您的贡献，我们非常重视。以下说明将使您的拉取请求更健康，更易于获得反馈。如果您不理解某些项目，请不要担心，只需提交拉取请求并从维护人员那里寻求帮助即可。 PR Type / PR类型 - [ ] Feature（功能新增） - [ ] Bugfix（Bug 修复） - [ ] Docs（文档更新） - [ ] CI/CD（持续集成/持续部署） - [ ] Refactor（代码重构） - [ ] Perf（性能优化） - [ ] Test-Cases（测试用例更新） - [x] Other（其他） ## 🔍 Motivation / 变更动机 Please describe the motivation of this PR and the goal you want to achieve through this PR. 请描述您的拉取请求的动机和您希望通过此拉取请求实现的目标。更换寻优工具默认使用为vllm ------ ## 📝 Modification / 修改内容 Please briefly describe what modification is made in this PR. 请简要描述此拉取请求中进行的修改。更换寻优工具默认使用为vllm ------ ## 📐 Associated Test Results / 关联测试结果 Please provide the related test results, such as test reports, etc. 请提供相关测试结果，例如测试报告等。 ![image.png](https://raw.gitcode.com/user-images/assets/8428112/d19da40c-3acf-4316-8c59-d84a7e4492ac/image.png 'image.png') ------ ## 🌟 Use cases (Optional) / 使用案例（可选） If this PR introduces a new feature, it is better to list some use cases here and update the documentation. 如果此拉取请求引入了新功能，最好在此处列出一些用例并更新文档。 ------ ## ✅ Checklist / 检查列表 Before PR: - [x] Bug fixes are fully covered by unit tests, the case that causes the bug should be added in the unit tests. / 修复的 Bug 已完全由单元测试覆盖，导致 Bug 的情况应在单元测试中添加。 - [x] The modification is covered by complete unit tests. If not, please add more unit tests to ensure the correctness. / 此拉取请求中的修改已完全由单元测试覆盖。如果不是，请添加更多单元测试以确保正确性。 - [x] All relevant documentation (API docs, docstrings, example tutorials) has been updated to reflect these changes. / 所有相关文档（API 文档、文档字符串、示例教程）已更新以反映这些更改。 - [x] Please ensure code files contain no Chinese comments. / 请保证代码文件中不含中文注释。 ------ See merge request: Ascend/msmodeling!461	1 小时前
optix	更改默认项为vllm Co-authored-by: tt0cool<xujintao8@h-partners.com> # message auto-generated for no-merge-commit merge: !461 merge master into master 更改默认项为vllm Created-by: tt0cool Commit-by: tt0cool Merged-by: ascend-robot Description: # PR Template Thanks for your contribution; we appreciate it a lot. The following instructions will make your pull request healthier and help you get feedback more easily. If you do not understand some items, don't worry, just make the pull request and seek help from maintainers. 感谢您的贡献，我们非常重视。以下说明将使您的拉取请求更健康，更易于获得反馈。如果您不理解某些项目，请不要担心，只需提交拉取请求并从维护人员那里寻求帮助即可。 PR Type / PR类型 - [ ] Feature（功能新增） - [ ] Bugfix（Bug 修复） - [ ] Docs（文档更新） - [ ] CI/CD（持续集成/持续部署） - [ ] Refactor（代码重构） - [ ] Perf（性能优化） - [ ] Test-Cases（测试用例更新） - [x] Other（其他） ## 🔍 Motivation / 变更动机 Please describe the motivation of this PR and the goal you want to achieve through this PR. 请描述您的拉取请求的动机和您希望通过此拉取请求实现的目标。更换寻优工具默认使用为vllm ------ ## 📝 Modification / 修改内容 Please briefly describe what modification is made in this PR. 请简要描述此拉取请求中进行的修改。更换寻优工具默认使用为vllm ------ ## 📐 Associated Test Results / 关联测试结果 Please provide the related test results, such as test reports, etc. 请提供相关测试结果，例如测试报告等。 ![image.png](https://raw.gitcode.com/user-images/assets/8428112/d19da40c-3acf-4316-8c59-d84a7e4492ac/image.png 'image.png') ------ ## 🌟 Use cases (Optional) / 使用案例（可选） If this PR introduces a new feature, it is better to list some use cases here and update the documentation. 如果此拉取请求引入了新功能，最好在此处列出一些用例并更新文档。 ------ ## ✅ Checklist / 检查列表 Before PR: - [x] Bug fixes are fully covered by unit tests, the case that causes the bug should be added in the unit tests. / 修复的 Bug 已完全由单元测试覆盖，导致 Bug 的情况应在单元测试中添加。 - [x] The modification is covered by complete unit tests. If not, please add more unit tests to ensure the correctness. / 此拉取请求中的修改已完全由单元测试覆盖。如果不是，请添加更多单元测试以确保正确性。 - [x] All relevant documentation (API docs, docstrings, example tutorials) has been updated to reflect these changes. / 所有相关文档（API 文档、文档字符串、示例教程）已更新以反映这些更改。 - [x] Please ensure code files contain no Chinese comments. / 请保证代码文件中不含中文注释。 ------ See merge request: Ascend/msmodeling!461	1 小时前
pre-commit	【同步】【非开发代码】代码从 develop 同步到 master Co-authored-by: yydyzr<liuyuncong1@huawei.com> Co-authored-by: gcw_61YBRfIt<chuzhenxing@huawei.com> Co-authored-by: 孔炳翔<1120200577@qq.com> Co-authored-by: zhengxinqian<qianzhengxin@huawei.com> Co-authored-by: hw_whx<wanghexiang7@huawei.com> Co-authored-by: jgong5<steven.gong@gmail.com> Co-authored-by: hw_whx<2952154980@qq.com> # message auto-generated for no-merge-commit merge: !330 merge master into master 【同步】【非开发代码】代码从 develop 同步到 master Created-by: AvadaKedavrua Commit-by: liujiawang;ascend-robot;AvadaKedavrua;lutean;Horacehxw;eveyin1;minghang_c;zwt__;tt0cool;elrond-g;jia_ya_nan;zhenyu_zhang;ChenHuiwen;wangshen001;Hudingyi;wendellX;Secluded_Ocean;jhon-117;yaohan404;jiangruitao;zhenghaojie;stormchasingg;panyj1993;cmh1056291129;yuyinkai1;sunguozhong;genius52;liu_jiaxu;HongMaoShuiGuai;zhengxinqian;weixin_43368449;jsez-li-bin;jgong5;wqh17101;w00609794;yydyzr;JieZhang679;sppedforcy;gcw_61YBRfIt;Jiong Gong;hw_whx;gongjiong;孔炳翔 Merged-by: ascend-robot Description: 代码从 develop 同步到 master，后续基于 master 演进，并支持打包 See merge request: Ascend/msmodeling!330	14 天前
scripts	fix(security): add model source safety checks Co-authored-by: jia_ya_nan<jiayanan3@h-partners.com> # message auto-generated for no-merge-commit merge: !385 merge fix/trust-remote-code-safety into master fix(security): add model source safety checks Created-by: jia_ya_nan Commit-by: jia_ya_nan Merged-by: ascend-robot Description: # PR Template Thanks for your contribution; we appreciate it a lot. The following instructions will make your pull request healthier and help you get feedback more easily. If you do not understand some items, don't worry, just make the pull request and seek help from maintainers. 感谢您的贡献，我们非常重视。以下说明将使您的拉取请求更健康，更易于获得反馈。如果您不理解某些项目，请不要担心，只需提交拉取请求并从维护人员那里寻求帮助即可。 PR Type / PR类型 - [ ] Feature（功能新增） - [x] Bugfix（Bug 修复） - [ ] Docs（文档更新） - [ ] CI/CD（持续集成/持续部署） - [ ] Refactor（代码重构） - [ ] Perf（性能优化） - [ ] Test-Cases（测试用例更新） - [x] Other（其他） ## 🔍 Motivation / 变更动机 Please describe the motivation of this PR and the goal you want to achieve through this PR. 请描述您的拉取请求的动机和您希望通过此拉取请求实现的目标。安全加固 ------ ## 📝 Modification / 修改内容 Please briefly describe what modification is made in this PR. 请简要描述此拉取请求中进行的修改。增加本地路径权限校验；增加日志风险提示去掉不维护的老接口 ------ ## 📐 Associated Test Results / 关联测试结果 Please provide the related test results, such as test reports, etc. 请提供相关测试结果，例如测试报告等。 ![image.png](https://raw.gitcode.com/user-images/assets/8428112/ef4f75a5-1346-4320-8de2-a19703ebedb3/image.png 'image.png') ------ ## 🌟 Use cases (Optional) / 使用案例（可选） If this PR introduces a new feature, it is better to list some use cases here and update the documentation. 如果此拉取请求引入了新功能，最好在此处列出一些用例并更新文档。 ------ ## ✅ Checklist / 检查列表 Before PR: - [x] Bug fixes are fully covered by unit tests, the case that causes the bug should be added in the unit tests. / 修复的 Bug 已完全由单元测试覆盖，导致 Bug 的情况应在单元测试中添加。 - [x] The modification is covered by complete unit tests. If not, please add more unit tests to ensure the correctness. / 此拉取请求中的修改已完全由单元测试覆盖。如果不是，请添加更多单元测试以确保正确性。 - [x] All relevant documentation (API docs, docstrings, example tutorials) has been updated to reflect these changes. / 所有相关文档（API 文档、文档字符串、示例教程）已更新以反映这些更改。 - [x] Please ensure code files contain no Chinese comments. / 请保证代码文件中不含中文注释。 ------ See merge request: Ascend/msmodeling!385	3 天前
serving_cast	optimize memory peak for servingcast & support model_config from tensorcast Co-authored-by: stormchasingg<sh_ding@zju.edu.cn> # message auto-generated for no-merge-commit merge: !360 merge enhance-servingcast into master optimize memory peak for servingcast & support model_config from tensorcast Created-by: stormchasingg Commit-by: stormchasingg Merged-by: ascend-robot Description: # PR Template Thanks for your contribution; we appreciate it a lot. The following instructions will make your pull request healthier and help you get feedback more easily. If you do not understand some items, don't worry, just make the pull request and seek help from maintainers. 感谢您的贡献，我们非常重视。以下说明将使您的拉取请求更健康，更易于获得反馈。如果您不理解某些项目，请不要担心，只需提交拉取请求并从维护人员那里寻求帮助即可。 PR Type / PR类型 - [x] Feature（功能新增） - [ ] Bugfix（Bug 修复） - [ ] Docs（文档更新） - [ ] CI/CD（持续集成/持续部署） - [ ] Refactor（代码重构） - [ ] Perf（性能优化） - [x] Test-Cases（测试用例更新） - [ ] Other（其他） ## 🔍 Motivation / 变更动机 This PR aligns TensorCast/ServingCast throughput simulation with vLLM-Ascend MoE optimization behavior, especially for shared expert tensor parallelism, sequence parallel configuration, and fused MoE communication paths. 本 PR 旨在使 ServingCast 的吞吐仿真配置与 TensorCast 保持一致，尤其是 shared expert TP、sequence parallel 配置以及 fused MoE 通信路径相关行为。 ------ ## 📝 Modification / 修改内容 - Add throughput optimizer options for shared expert TP, sequence parallel, word embedding TP mode, and chrome trace output. - Propagate optimizer CLI options into `UserInputConfig` and per-parallel-search model runner configs. - Apply sequence-parallel compilation configuration inside each parallel runner task. - Add TP/DP suffixes to chrome trace filenames to avoid overwriting trace files across parallel search candidates. - Adjust MoE shared expert TP execution to decrease memory peak in servingcast. - Enable dispatch-FFN-combine fusion by default in compilation config. ------ ## 📐 Associated Test Results / 关联测试结果略。 Test coverage included: None. ------ ## 🌟 Use cases (Optional) / 使用案例（可选） This change is useful when evaluating MoE models with vLLM-style shared expert TP and sequence parallel optimizations, and when collecting chrome traces for multiple TP/DP candidates in one throughput search. `python3 -m cli.inference.throughput_optimizer $dense_model_path \ --device ATLAS_800_A3_752T_128G_DIE \ --num-devices 16 \ --input-length 4096 \ --output-length 1 \ --compile \ --tp-sizes 8 16 \ --batch-range 16 16 \ --enable-sequence-parallel \ --word-embedding-tp row \ --quantize-linear-action DISABLED \ --ttft-limits 2000 \ --log-level info \ 2>&1 \| tee ./run_sc_1.log` `python3 -m cli.inference.throughput_optimizer $moe_model_path \ --device ATLAS_800_A3_752T_128G_DIE \ --num-devices 16 \ --input-length 4096 \ --output-length 1 \ --compile \ --quantize-linear-action W8A8_STATIC \ --disagg \ --ttft-limits 2000 \ --tp-sizes 8 16 \ --batch-range 4 4 \ --reserved-memory-gb 10 \ --enable-shared-expert-tp \ --word-embedding-tp row \ --chrome-trace trace_decode.json \ --log-level info \ 2>&1 \| tee ./run_sc3_2.log` ------ ## ✅ Checklist / 检查列表 Before PR: - [ ] Bug fixes are fully covered by unit tests, the case that causes the bug should be added in the unit tests. / 修复的 Bug 已完全由单元测试覆盖，导致 Bug 的情况应在单元测试中添加。 - [x] The modification is covered by complete unit tests. If not, please add more unit tests to ensure the correctness. / 此拉取请求中的修改已完全由单元测试覆盖。如果不是，请添加更多单元测试以确保正确性。 - [ ] All relevant documentation (API docs, docstrings, example tutorials) has been updated to reflect these changes. / 所有相关文档（API 文档、文档字符串、示例教程）已更新以反映这些更改。 - [x] Please ensure code files contain no Chinese comments. / 请保证代码文件中不含中文注释。 See merge request: Ascend/msmodeling!360	2 天前
tensor_cast	perf(tensor_cast): refine sparse attention roofline Model sparse MLA and dsa_indexer paged-cache traffic with calibrated data-movement efficiency so operator and end-to-end estimates align with GLM-5.1 profiling targets. Signed-off-by: minghang_c <chiminghang@h-partners.com> Co-authored-by: minghang_c<chiminghang@h-partners.com> # message auto-generated for no-merge-commit merge: !421 merge develop-on-upstream-master into master perf(tensor_cast): refine sparse attention roofline Created-by: minghang_c Commit-by: minghang_c Merged-by: ascend-robot Description: Thanks for your contribution; we appreciate it a lot. The following instructions will make your pull request healthier and help you get feedback more easily. If you do not understand some items, don't worry, just make the pull request and seek help from maintainers. 感谢您的贡献，我们非常重视。以下说明将使您的拉取请求更健康，更易于获得反馈。如果您不理解某些项目，请不要担心，只需提交拉取请求并从维护人员那里寻求帮助即可。 PR Type / PR类型 - [ ] Feature（功能新增） - [ ] Bugfix（Bug 修复） - [ ] Docs（文档更新） - [ ] CI/CD（持续集成/持续部署） - [ ] Refactor（代码重构） - [x] Perf（性能优化） - [x] Test-Cases（测试用例更新） - [ ] Other（其他） ## 🔍 Motivation / 变更动机 Refine TensorCast roofline modeling for sparse MLA, `dsa_indexer`, and GLM-5-series W4A8 MLA preprocessing so sparse-attention estimates better match operator profiling and end-to-end latency targets while keeping the model based on explicit data-movement and compute-efficiency assumptions. The main modeling gap is that sparse MLA KV reads and `dsa_indexer` historical-cache reads are dominated by random/paged memory access. Treating those bytes as ideal contiguous bandwidth traffic makes the analytic roofline too optimistic, especially for long-context GLM-5.1 prefill/decode scenarios. The latest GLM-5.1 W4A8 validation also showed that `mlapo_quant` needs to model packed W4 weights carefully: the tensor storage dtype is `torch.uint8`, but the logical MMA throughput should follow the INT8 compute path used by existing grouped quant matmul modeling. Otherwise the trace can report `mlapo_quant` MMA time as zero even though the op has nonzero projection MMA work. ------ ## 📝 Modification / 修改内容 - Add sparse/paged KV traffic accounting for MLA with separate decode and prefill data-movement efficiency. - Add `dsa_indexer` historical cache read efficiency modeling and separate append cache/scale write traffic. - Keep `dsa_indexer` block-table traffic covered by generic input memory accounting instead of a separate operator-specific model. - Use decode-only sparse page count for mixed prefill/decode sparse MLA batches. - Use raw sparse-index bytes in the quant/physical MLA path so physical KV/block-table/sparse-index accounting is consistent. - Tighten `dsa_indexer` helper signatures so `request_total_seq_lens` is required where the model depends on it. - Keep generic `tensor_cast.attention.default` accounting unchanged, so non-MLA attention models do not inherit sparse-attention calibration. - Extend GLM-5-series compile handling to cover both `GLM-5` and `GLM-5.1`, while excluding `GLM-5.2` because its config has meaningful indexer/long-context differences. - Refine `mlapo_quant` W4A8 modeling so packed `torch.uint8` weights use the logical INT8 MMA throughput path instead of losing MMA time in trace/statistics. - Add `mlapo`/`mlapo_quant` intermediate memory and static-cost accounting for the fused MLA preprocessing path. - Update related performance-model tests for sparse memory breakdowns and `mlapo`/`mlapo_quant` modeling behavior. ------ ## 📐 Associated Test Results / 关联测试结果 - `uvx --python .venv/bin/python pre-commit run --files tensor_cast/performance_model/__init__.py tests/regression/tensor_cast/test_runtime.py` - Passed after auto-format rerun. - `uv run --group ci --with socksio python -m unittest tests.benchmark.models.test_model_regression` - Log: `/tmp/msmodeling_model_regression_develop_after_pick.log` - `Ran 15 tests in 42.029s` - `OK` - `Total Cases: 15 \| Passed: 15 \| Failed: 0 \| No Baseline: 0` - `* All Operator Checks Passed ` - GLM-5.1 e2e validation across 10 query/context scenarios from 3.5k to 128k after the latest `mlapo_quant` W4A8 modeling update: - Log: `/tmp/msmodeling_glm51_e2e_after_user_change_rerun3.log` - `e2e_count=10` - `mean_e2e_err=28.717478%`, meeting the `≤30%` target. - Earlier GLM-5.1 sparse-attention e2e validation across the same 10 scenarios: - Log: `/tmp/msmodeling_glm51_e2e_26_1_0_latest.log` - `e2e_count=10` - `mean_e2e_err=27.678365%`, meeting the `≤30%` target. - GLM-5 e2e validation after applying the GLM-5-series compile override: - Log: `/tmp/msmodeling_glm5_e2e_with_glm5_override.log` - `e2e_count=10` - `mean_e2e_err=27.678365%`, matching the GLM-5.1 run with the same parameters. - Operator-level validation from the sparse MLA / `dsa_indexer` profiling set: - `mean_operator_err = 6.487008%` - `max_operator_err = 18.658699%` - Meets the `≤20%` target. - Issue #103 2.5K GLM-5.1 scenario: - Prefill analytic result: old roofline `182.377 ms` → new roofline `631.874 ms`; real wall `1225.849 ms`; new roofline/wall `51.55%`. - Decode analytic result: old roofline `48.685 ms` → new roofline `103.071 ms`; real wall `82.528 ms`; new roofline/wall `124.89%`. - Decode compared with kernel sum: new roofline `103.071 ms` vs kernel sum `117.158 ms`, ratio `87.97%`. ------ ## 🌟 Use cases (Optional) / 使用案例（可选） GLM-5.1 sparse attention inference latency estimation for prefill and decode scenarios from 3.5k to 128k context length. The latest e2e analytic results were validated with: `bash .venv/bin/python -m cli.inference.text_generate zai-org/GLM-5.1 \ --device ATLAS_800_A3_752T_128G_DIE \ --num-devices 16 \ --tp-size 16 \ --dp-size 1 \ --ep-size 16 \ --num-queries 1 \ --num-mtp-tokens 3 \ --compile \ --quantize-linear-action W4A8_STATIC \ --dump-input-shapes \ --context-length <context> \ --query-length <query>` \| Scenario \| Query length \| Context length \| Target latency \| Analytic latency \| Relative error \| \|---\|---:\|---:\|---:\|---:\|---:\| \| 3.5k-prefill \| 3500 \| 0 \| `1553.21 ms` \| `1010.00 ms` \| `34.9734%` \| \| 3.5k-decode \| 4 \| 3500 \| `69.90 ms` \| `44.79 ms` \| `35.9270%` \| \| 16k-prefill \| 4096 \| 12000 \| `1867.68 ms` \| `1449.00 ms` \| `22.4171%` \| \| 16k-decode \| 4 \| 16000 \| `68.10 ms` \| `47.22 ms` \| `30.6637%` \| \| 32k-prefill \| 4096 \| 28000 \| `2295.99 ms` \| `1807.00 ms` \| `21.2976%` \| \| 32k-decode \| 4 \| 32000 \| `68.70 ms` \| `47.76 ms` \| `30.4862%` \| \| 64k-prefill \| 4096 \| 60000 \| `3256.48 ms` \| `2522.00 ms` \| `22.5544%` \| \| 64k-decode \| 4 \| 64000 \| `71.70 ms` \| `49.63 ms` \| `30.7768%` \| \| 128k-prefill \| 4096 \| 124000 \| `5341.23 ms` \| `3952.00 ms` \| `26.0096%` \| \| 128k-decode \| 4 \| 128000 \| `78.30 ms` \| `53.19 ms` \| `32.0690%` \| `mean_e2e_err=28.717478%` ------ ## ✅ Checklist / 检查列表 Before PR*: - [ ] Bug fixes are fully covered by unit tests, the case that causes the bug should be added in the unit tests. / 修复的 Bug 已完全由单元测试覆盖，导致 Bug 的情况应在单元测试中添加。 - [x] The modification is covered by validation runs and targeted regression coverage. / 此拉取请求中的修改已通过验证用例和定向回归覆盖。 - [x] All relevant documentation (API docs, docstrings, example tutorials) has been updated to reflect these changes. / 所有相关文档（API 文档、文档字符串、示例教程）已更新以反映这些更改。 - [x] Please ensure code files contain no Chinese comments. / 请保证代码文件中不含中文注释。 ------ See merge request: Ascend/msmodeling!421	1 小时前
tests	perf(tensor_cast): refine sparse attention roofline Model sparse MLA and dsa_indexer paged-cache traffic with calibrated data-movement efficiency so operator and end-to-end estimates align with GLM-5.1 profiling targets. Signed-off-by: minghang_c <chiminghang@h-partners.com> Co-authored-by: minghang_c<chiminghang@h-partners.com> # message auto-generated for no-merge-commit merge: !421 merge develop-on-upstream-master into master perf(tensor_cast): refine sparse attention roofline Created-by: minghang_c Commit-by: minghang_c Merged-by: ascend-robot Description: Thanks for your contribution; we appreciate it a lot. The following instructions will make your pull request healthier and help you get feedback more easily. If you do not understand some items, don't worry, just make the pull request and seek help from maintainers. 感谢您的贡献，我们非常重视。以下说明将使您的拉取请求更健康，更易于获得反馈。如果您不理解某些项目，请不要担心，只需提交拉取请求并从维护人员那里寻求帮助即可。 PR Type / PR类型 - [ ] Feature（功能新增） - [ ] Bugfix（Bug 修复） - [ ] Docs（文档更新） - [ ] CI/CD（持续集成/持续部署） - [ ] Refactor（代码重构） - [x] Perf（性能优化） - [x] Test-Cases（测试用例更新） - [ ] Other（其他） ## 🔍 Motivation / 变更动机 Refine TensorCast roofline modeling for sparse MLA, `dsa_indexer`, and GLM-5-series W4A8 MLA preprocessing so sparse-attention estimates better match operator profiling and end-to-end latency targets while keeping the model based on explicit data-movement and compute-efficiency assumptions. The main modeling gap is that sparse MLA KV reads and `dsa_indexer` historical-cache reads are dominated by random/paged memory access. Treating those bytes as ideal contiguous bandwidth traffic makes the analytic roofline too optimistic, especially for long-context GLM-5.1 prefill/decode scenarios. The latest GLM-5.1 W4A8 validation also showed that `mlapo_quant` needs to model packed W4 weights carefully: the tensor storage dtype is `torch.uint8`, but the logical MMA throughput should follow the INT8 compute path used by existing grouped quant matmul modeling. Otherwise the trace can report `mlapo_quant` MMA time as zero even though the op has nonzero projection MMA work. ------ ## 📝 Modification / 修改内容 - Add sparse/paged KV traffic accounting for MLA with separate decode and prefill data-movement efficiency. - Add `dsa_indexer` historical cache read efficiency modeling and separate append cache/scale write traffic. - Keep `dsa_indexer` block-table traffic covered by generic input memory accounting instead of a separate operator-specific model. - Use decode-only sparse page count for mixed prefill/decode sparse MLA batches. - Use raw sparse-index bytes in the quant/physical MLA path so physical KV/block-table/sparse-index accounting is consistent. - Tighten `dsa_indexer` helper signatures so `request_total_seq_lens` is required where the model depends on it. - Keep generic `tensor_cast.attention.default` accounting unchanged, so non-MLA attention models do not inherit sparse-attention calibration. - Extend GLM-5-series compile handling to cover both `GLM-5` and `GLM-5.1`, while excluding `GLM-5.2` because its config has meaningful indexer/long-context differences. - Refine `mlapo_quant` W4A8 modeling so packed `torch.uint8` weights use the logical INT8 MMA throughput path instead of losing MMA time in trace/statistics. - Add `mlapo`/`mlapo_quant` intermediate memory and static-cost accounting for the fused MLA preprocessing path. - Update related performance-model tests for sparse memory breakdowns and `mlapo`/`mlapo_quant` modeling behavior. ------ ## 📐 Associated Test Results / 关联测试结果 - `uvx --python .venv/bin/python pre-commit run --files tensor_cast/performance_model/__init__.py tests/regression/tensor_cast/test_runtime.py` - Passed after auto-format rerun. - `uv run --group ci --with socksio python -m unittest tests.benchmark.models.test_model_regression` - Log: `/tmp/msmodeling_model_regression_develop_after_pick.log` - `Ran 15 tests in 42.029s` - `OK` - `Total Cases: 15 \| Passed: 15 \| Failed: 0 \| No Baseline: 0` - `* All Operator Checks Passed ` - GLM-5.1 e2e validation across 10 query/context scenarios from 3.5k to 128k after the latest `mlapo_quant` W4A8 modeling update: - Log: `/tmp/msmodeling_glm51_e2e_after_user_change_rerun3.log` - `e2e_count=10` - `mean_e2e_err=28.717478%`, meeting the `≤30%` target. - Earlier GLM-5.1 sparse-attention e2e validation across the same 10 scenarios: - Log: `/tmp/msmodeling_glm51_e2e_26_1_0_latest.log` - `e2e_count=10` - `mean_e2e_err=27.678365%`, meeting the `≤30%` target. - GLM-5 e2e validation after applying the GLM-5-series compile override: - Log: `/tmp/msmodeling_glm5_e2e_with_glm5_override.log` - `e2e_count=10` - `mean_e2e_err=27.678365%`, matching the GLM-5.1 run with the same parameters. - Operator-level validation from the sparse MLA / `dsa_indexer` profiling set: - `mean_operator_err = 6.487008%` - `max_operator_err = 18.658699%` - Meets the `≤20%` target. - Issue #103 2.5K GLM-5.1 scenario: - Prefill analytic result: old roofline `182.377 ms` → new roofline `631.874 ms`; real wall `1225.849 ms`; new roofline/wall `51.55%`. - Decode analytic result: old roofline `48.685 ms` → new roofline `103.071 ms`; real wall `82.528 ms`; new roofline/wall `124.89%`. - Decode compared with kernel sum: new roofline `103.071 ms` vs kernel sum `117.158 ms`, ratio `87.97%`. ------ ## 🌟 Use cases (Optional) / 使用案例（可选） GLM-5.1 sparse attention inference latency estimation for prefill and decode scenarios from 3.5k to 128k context length. The latest e2e analytic results were validated with: `bash .venv/bin/python -m cli.inference.text_generate zai-org/GLM-5.1 \ --device ATLAS_800_A3_752T_128G_DIE \ --num-devices 16 \ --tp-size 16 \ --dp-size 1 \ --ep-size 16 \ --num-queries 1 \ --num-mtp-tokens 3 \ --compile \ --quantize-linear-action W4A8_STATIC \ --dump-input-shapes \ --context-length <context> \ --query-length <query>` \| Scenario \| Query length \| Context length \| Target latency \| Analytic latency \| Relative error \| \|---\|---:\|---:\|---:\|---:\|---:\| \| 3.5k-prefill \| 3500 \| 0 \| `1553.21 ms` \| `1010.00 ms` \| `34.9734%` \| \| 3.5k-decode \| 4 \| 3500 \| `69.90 ms` \| `44.79 ms` \| `35.9270%` \| \| 16k-prefill \| 4096 \| 12000 \| `1867.68 ms` \| `1449.00 ms` \| `22.4171%` \| \| 16k-decode \| 4 \| 16000 \| `68.10 ms` \| `47.22 ms` \| `30.6637%` \| \| 32k-prefill \| 4096 \| 28000 \| `2295.99 ms` \| `1807.00 ms` \| `21.2976%` \| \| 32k-decode \| 4 \| 32000 \| `68.70 ms` \| `47.76 ms` \| `30.4862%` \| \| 64k-prefill \| 4096 \| 60000 \| `3256.48 ms` \| `2522.00 ms` \| `22.5544%` \| \| 64k-decode \| 4 \| 64000 \| `71.70 ms` \| `49.63 ms` \| `30.7768%` \| \| 128k-prefill \| 4096 \| 124000 \| `5341.23 ms` \| `3952.00 ms` \| `26.0096%` \| \| 128k-decode \| 4 \| 128000 \| `78.30 ms` \| `53.19 ms` \| `32.0690%` \| `mean_e2e_err=28.717478%` ------ ## ✅ Checklist / 检查列表 Before PR*: - [ ] Bug fixes are fully covered by unit tests, the case that causes the bug should be added in the unit tests. / 修复的 Bug 已完全由单元测试覆盖，导致 Bug 的情况应在单元测试中添加。 - [x] The modification is covered by validation runs and targeted regression coverage. / 此拉取请求中的修改已通过验证用例和定向回归覆盖。 - [x] All relevant documentation (API docs, docstrings, example tutorials) has been updated to reflect these changes. / 所有相关文档（API 文档、文档字符串、示例教程）已更新以反映这些更改。 - [x] Please ensure code files contain no Chinese comments. / 请保证代码文件中不含中文注释。 ------ See merge request: Ascend/msmodeling!421	1 小时前
tools	Fix Qwen3 FIA shape grid coverage Co-authored-by: Secluded_Ocean<tangchuxiao0709@qq.com> # message auto-generated for no-merge-commit merge: !429 merge codex/qwen3-fia-shape-coverage into master Fix Qwen3 FIA shape grid coverage Created-by: Secluded_Ocean Commit-by: Secluded_Ocean Merged-by: ascend-robot Description: ## Summary - enumerate dense FIA rows by model TP local head variants - add Qwen3 3.5k prefill/decode grid points so TP4 attention shape is generated - add regression coverage for Qwen3 TP4 3593 decode FIA and TP head variants ## Verification - py -3.10 -m pytest tests/regression/cli/test_generate_shape_grid.py tests/regression/cli/test_perf_tooling_ci_map.py -q - parsed 3.5k_10_data.zip, generated shape grid, backfilled FusedInferAttentionScore missing-only on 2-card A3, final Qwen3-32B 3593 decode run hits attention duration 27.760us with no shape miss See merge request: Ascend/msmodeling!429	2 天前
web_ui	web_ui文档更新（启动命令和文件说明） Co-authored-by: zwt<zhuweite@huawei.com> # message auto-generated for no-merge-commit merge: !395 merge dev_readme into master web_ui文档更新（启动命令和文件说明） Created-by: zwt__ Commit-by: zwt Merged-by: ascend-robot Description: PR Type / PR类型 - [ ] Feature（功能新增） - [ ] Bugfix（Bug 修复） - [x] Docs（文档更新） - [ ] CI/CD（持续集成/持续部署） - [ ] Refactor（代码重构） - [ ] Perf（性能优化） - [ ] Test-Cases（测试用例更新） - [ ] Other（其他） ## 🔍 Motivation / 变更动机本次文档更新主要基于以下动机： 1. 简化启动命令：移除 `--host` 参数，使用默认配置，降低用户使用门槛 2. 更新核心文件列表：补充新增的 `web_ui` 模块文件（`time_tracker.py`、`web_ui_start.py`、`__init__.py` 等），保持文档与代码结构同步 3. 精简文档内容：移除局域网访问和 Gradio share 相关章节，聚焦于本地使用场景 4. 修正开发检查清单：确保语法检查包含所有必要文件 ------ ## 📝 Modification / 修改内容 ### 修改文件清单 - `docs/en/user_guide/msmodeling_web_ui_user_guide.md` - `docs/zh/user_guide/msmodeling_web_ui_user_guide.md` - `web_ui/README.md` ### 具体修改内容 #### 1. 启动命令简化 - 修改前：`python -m web_ui.web_ui_start --host 127.0.0.1 --port 2345` - 修改后：`python -m web_ui.web_ui_start --port 2345` - 移除了 `--host` 参数说明，使用 Gradio 默认配置 - 从环境变量表格中移除了 `GRADIO_SERVER_NAME` 说明 #### 2. 章节精简 - 移除了「3.2 局域网访问」章节 - 移除了「3.3 Gradio share」章节 - 重新编号后续章节（3.4 → 3.2，3.5 → 3.3） #### 3. 核心文件关系列表更新新增以下文件说明： - `web_ui/__init__.py` - 包入口点，延迟暴露 launch_app - `web_ui/styles.py` - 共享CSS、主题助手和头部样式 - `web_ui/schemas.py` - 构建器、运行器、解析器和存储之间共享的数据类 - `web_ui/utils.py` - 共享解析、哈希和标准化助手 - `web_ui/time_tracker.py` - 跟踪和显示仿真时间信息 - `web_ui/web_ui_start.py` - Web UI服务器启动入口 #### 4. 开发检查清单更新 - 在语法检查命令中添加 `web_ui/styles.py` - 确保所有核心文件都包含在语法检查范围内 #### 5. 故障排查章节简化 - 移除了与远程访问和防火墙相关的排查项 ------ ## 📐 Associated Test Results / 关联测试结果本次修改为文档更新，无需执行额外测试。修改内容已通过以下验证： - [x] 启动命令验证：`python -m web_ui.web_ui_start --port 2345` 可正常启动 - [x] 文档交叉检查：中英文文档修改保持一致 - [x] 文件存在性验证：所有新增列出的文件均存在于代码库中 ------ ## ✅ Checklist / 检查列表 Before PR: - [ ] Bug fixes are fully covered by unit tests, the case that causes the bug should be added in the unit tests. / 修复的 Bug 已完全由单元测试覆盖，导致 Bug 的情况应在单元测试中添加。 - [ ] The modification is covered by complete unit tests. If not, please add more unit tests to ensure the correctness. / 此拉取请求中的修改已完全由单元测试覆盖。如果不是，请添加更多单元测试以确保正确性。 - [x] All relevant documentation (API docs, docstrings, example tutorials) has been updated to reflect these changes. / 所有相关文档（API 文档、文档字符串、示例教程）已更新以反映这些更改。 - [ ] Please ensure code files contain no Chinese comments. / 请保证代码文件中不含中文注释。 ------ See merge request: Ascend/msmodeling!395	4 天前
.gitattributes	【同步】【非开发代码】代码从 develop 同步到 master Co-authored-by: yydyzr<liuyuncong1@huawei.com> Co-authored-by: gcw_61YBRfIt<chuzhenxing@huawei.com> Co-authored-by: 孔炳翔<1120200577@qq.com> Co-authored-by: zhengxinqian<qianzhengxin@huawei.com> Co-authored-by: hw_whx<wanghexiang7@huawei.com> Co-authored-by: jgong5<steven.gong@gmail.com> Co-authored-by: hw_whx<2952154980@qq.com> # message auto-generated for no-merge-commit merge: !330 merge master into master 【同步】【非开发代码】代码从 develop 同步到 master Created-by: AvadaKedavrua Commit-by: liujiawang;ascend-robot;AvadaKedavrua;lutean;Horacehxw;eveyin1;minghang_c;zwt__;tt0cool;elrond-g;jia_ya_nan;zhenyu_zhang;ChenHuiwen;wangshen001;Hudingyi;wendellX;Secluded_Ocean;jhon-117;yaohan404;jiangruitao;zhenghaojie;stormchasingg;panyj1993;cmh1056291129;yuyinkai1;sunguozhong;genius52;liu_jiaxu;HongMaoShuiGuai;zhengxinqian;weixin_43368449;jsez-li-bin;jgong5;wqh17101;w00609794;yydyzr;JieZhang679;sppedforcy;gcw_61YBRfIt;Jiong Gong;hw_whx;gongjiong;孔炳翔 Merged-by: ascend-robot Description: 代码从 develop 同步到 master，后续基于 master 演进，并支持打包 See merge request: Ascend/msmodeling!330	14 天前
.gitignore	【同步】【非开发代码】代码从 develop 同步到 master Co-authored-by: yydyzr<liuyuncong1@huawei.com> Co-authored-by: gcw_61YBRfIt<chuzhenxing@huawei.com> Co-authored-by: 孔炳翔<1120200577@qq.com> Co-authored-by: zhengxinqian<qianzhengxin@huawei.com> Co-authored-by: hw_whx<wanghexiang7@huawei.com> Co-authored-by: jgong5<steven.gong@gmail.com> Co-authored-by: hw_whx<2952154980@qq.com> # message auto-generated for no-merge-commit merge: !330 merge master into master 【同步】【非开发代码】代码从 develop 同步到 master Created-by: AvadaKedavrua Commit-by: liujiawang;ascend-robot;AvadaKedavrua;lutean;Horacehxw;eveyin1;minghang_c;zwt__;tt0cool;elrond-g;jia_ya_nan;zhenyu_zhang;ChenHuiwen;wangshen001;Hudingyi;wendellX;Secluded_Ocean;jhon-117;yaohan404;jiangruitao;zhenghaojie;stormchasingg;panyj1993;cmh1056291129;yuyinkai1;sunguozhong;genius52;liu_jiaxu;HongMaoShuiGuai;zhengxinqian;weixin_43368449;jsez-li-bin;jgong5;wqh17101;w00609794;yydyzr;JieZhang679;sppedforcy;gcw_61YBRfIt;Jiong Gong;hw_whx;gongjiong;孔炳翔 Merged-by: ascend-robot Description: 代码从 develop 同步到 master，后续基于 master 演进，并支持打包 See merge request: Ascend/msmodeling!330	14 天前
.pre-commit-config.yaml	【同步】【非开发代码】代码从 develop 同步到 master Co-authored-by: yydyzr<liuyuncong1@huawei.com> Co-authored-by: gcw_61YBRfIt<chuzhenxing@huawei.com> Co-authored-by: 孔炳翔<1120200577@qq.com> Co-authored-by: zhengxinqian<qianzhengxin@huawei.com> Co-authored-by: hw_whx<wanghexiang7@huawei.com> Co-authored-by: jgong5<steven.gong@gmail.com> Co-authored-by: hw_whx<2952154980@qq.com> # message auto-generated for no-merge-commit merge: !330 merge master into master 【同步】【非开发代码】代码从 develop 同步到 master Created-by: AvadaKedavrua Commit-by: liujiawang;ascend-robot;AvadaKedavrua;lutean;Horacehxw;eveyin1;minghang_c;zwt__;tt0cool;elrond-g;jia_ya_nan;zhenyu_zhang;ChenHuiwen;wangshen001;Hudingyi;wendellX;Secluded_Ocean;jhon-117;yaohan404;jiangruitao;zhenghaojie;stormchasingg;panyj1993;cmh1056291129;yuyinkai1;sunguozhong;genius52;liu_jiaxu;HongMaoShuiGuai;zhengxinqian;weixin_43368449;jsez-li-bin;jgong5;wqh17101;w00609794;yydyzr;JieZhang679;sppedforcy;gcw_61YBRfIt;Jiong Gong;hw_whx;gongjiong;孔炳翔 Merged-by: ascend-robot Description: 代码从 develop 同步到 master，后续基于 master 演进，并支持打包 See merge request: Ascend/msmodeling!330	14 天前
AGENTS.md	feat(skills): add throughput-optimizer-explainer Co-authored-by: lutean<lutean1@huawei.com> Co-authored-by: gitcode-bot<noreply@gitcode.com> # message auto-generated for no-merge-commit merge: !413 merge master into master feat(skills): add throughput-optimizer-explainer Created-by: lutean Commit-by: lutean;gitcode-bot Merged-by: ascend-robot Description: # PR Template Thanks for your contribution; we appreciate it a lot. The following instructions will make your pull request healthier and help you get feedback more easily. If you do not understand some items, don't worry, just make the pull request and seek help from maintainers. 感谢您的贡献，我们非常重视。以下说明将使您的拉取请求更健康，更易于获得反馈。如果您不理解某些项目，请不要担心，只需提交拉取请求并从维护人员那里寻求帮助即可。 PR Type / PR类型 - [x] Feature（功能新增） - [ ] Bugfix（Bug 修复） - [ ] Docs（文档更新） - [ ] CI/CD（持续集成/持续部署） - [ ] Refactor（代码重构） - [ ] Perf（性能优化） - [ ] Test-Cases（测试用例更新） - [ ] Other（其他） ## 🔍 Motivation / 变更动机 Please describe the motivation of this PR and the goal you want to achieve through this PR. 请描述您的拉取请求的动机和您希望通过此拉取请求实现的目标。新增throughput-optimizer-explainer skill用于对throughput-optimizer结果的分析解释 ------ ## 📝 Modification / 修改内容 Please briefly describe what modification is made in this PR. 请简要描述此拉取请求中进行的修改。触发方式这个 skill 用于解释 python -m cli.inference.throughput_optimizer 的结果。典型触发包括： ·用户问吞吐、TTFT、TPOT、PD ratio 是否合理。 ·用户要比较不同硬件、并行策略或最优行。 ·用户要分析 Cube/Vec/Comm/Mem 瓶颈。 ·用户提供 --dump-original-results、text_generate、--dump-op-bound-results 或 profiler trace。 ·用户想把 throughput_optimizer 的 best row 映射成 python -m cli.inference.text_generate 验证命令。使用场景核心场景是“解释优化器结果，但不超出证据范围”： ·解释 aggregation / disaggregation / PD ratio 模式下的最优策略。 ·判断结果等级：basically reasonable、partly explainable、suspicious、insufficient evidence。 ·基于 TTFT、TPOT、吞吐、并发、batch、并行配置做宏观判断。 ·基于 phase breakdown 分析 Prefill / Decode 的 Cube、Vec、Comm、Mem 占比。 ·基于 text_generate --dump-op-bound-results 做模拟 operator 级归因。 ·基于真实 profiler 或 chrome trace 做更强的 operator/kernel 级判断。 ·在证据不足时，生成最小必要的验证命令。工作流 1、识别 optimizer 模式：aggregation、disaggregation 或 PD ratio。 2、提取可比较条件：模型、设备、设备数、输入输出长度、SLO、量化、compile、prefix cache、MTP、搜索空间等。 3、提取 best row / top candidates：throughput、TTFT、TPOT、concurrency、batch size、parallel strategy、PD ratio、QPS、breakdown。 4、先判定证据等级：macro_only、optimizer_phase_breakdown、text_generate_phase_breakdown、text_generate_op_bound、profiler_trace。 5、aggregation 模式必须拆成 Prefill forward + Decode forward + scheduling 公式，不能当成单次 forward。 6、disaggregation 模式直接映射到 Prefill 或 Decode 阶段。 7、如果缺少 breakdown 且需要瓶颈分析，生成 text_generate 验证命令；需要 operator 级归因时加 --dump-op-bound-results。 8、如果有 op-bound 输出，先看 top total-time operators、dominant bound、memory/comm/mma/gp 百分比。 9、比较硬件或策略时，优先级是 phase breakdown、op-bound、macro metrics，硬件规格比例只作辅助。 10、给出合理性等级和主要判断。 11、结束时给出最小验证动作。关键证据规则不能在只有宏观输出时断言具体 operator 或 Cube/Vec/Comm/Mem 瓶颈。text_generate --dump-op-bound-results 只能算 TensorCast 模拟 operator 归因，不是真实 profiler/kernel 证据。真实 runtime 结论必须有 profiler 或实际测量支撑。用到的脚本功能 parse_optimizer_output.py ·输入 raw optimizer 输出、dump 表、text_generate 输出或 op-bound 输出。 ·输出结构化 JSON。 ·可解析 mode、Best Throughput、TTFT、TPOT、PD Ratio、Prefill/Decode QPS。 ·可提取 pretty tables、percentage_breakdowns dump rows、Stats breakdowns、op-bound operator 表。 build_text_generate_commands.py ·从 normalized best row JSON 生成 text_generate 验证命令。 ·支持 --mode aggregation 和 --mode disaggregation。 ·aggregation 会生成 Prefill 和 Decode 两条命令，并计算 effective_input_length、prefill_batch_size、partial Prefill wave。 ·disaggregation 要求指定 phase=prefill\|decode，生成对应单阶段命令。 ·--include-op-bound 会追加 --dump-op-bound-results。 compare_phase_breakdowns.py ·比较两个 JSON 中的 Cube/Vec/Comm/Mem breakdown。 ·输出左右值、差值 delta_right_minus_left 和比例 ratio_right_over_left。 ·加 --op-bound 时比较两个 op-bound 表：bound 分布、top operators 差异、total time 和 memory/comm/mma/gp 百分比变化。 ------ ## 📐 Associated Test Results / 关联测试结果 Please provide the related test results, such as test reports, etc. 请提供相关测试结果，例如测试报告等。 ------ ## 🌟 Use cases (Optional) / 使用案例（可选） If this PR introduces a new feature, it is better to list some use cases here and update the documentation. 如果此拉取请求引入了新功能，最好在此处列出一些用例并更新文档。 ------ ## ✅ Checklist / 检查列表 Before PR: - [ ] Bug fixes are fully covered by unit tests, the case that causes the bug should be added in the unit tests. / 修复的 Bug 已完全由单元测试覆盖，导致 Bug 的情况应在单元测试中添加。 - [ ] The modification is covered by complete unit tests. If not, please add more unit tests to ensure the correctness. / 此拉取请求中的修改已完全由单元测试覆盖。如果不是，请添加更多单元测试以确保正确性。 - [ ] All relevant documentation (API docs, docstrings, example tutorials) has been updated to reflect these changes. / 所有相关文档（API 文档、文档字符串、示例教程）已更新以反映这些更改。 - [ ] Please ensure code files contain no Chinese comments. / 请保证代码文件中不含中文注释。 ------ See merge request: Ascend/msmodeling!413	1 天前
CLAUDE.md	【同步】【非开发代码】代码从 develop 同步到 master Co-authored-by: yydyzr<liuyuncong1@huawei.com> Co-authored-by: gcw_61YBRfIt<chuzhenxing@huawei.com> Co-authored-by: 孔炳翔<1120200577@qq.com> Co-authored-by: zhengxinqian<qianzhengxin@huawei.com> Co-authored-by: hw_whx<wanghexiang7@huawei.com> Co-authored-by: jgong5<steven.gong@gmail.com> Co-authored-by: hw_whx<2952154980@qq.com> # message auto-generated for no-merge-commit merge: !330 merge master into master 【同步】【非开发代码】代码从 develop 同步到 master Created-by: AvadaKedavrua Commit-by: liujiawang;ascend-robot;AvadaKedavrua;lutean;Horacehxw;eveyin1;minghang_c;zwt__;tt0cool;elrond-g;jia_ya_nan;zhenyu_zhang;ChenHuiwen;wangshen001;Hudingyi;wendellX;Secluded_Ocean;jhon-117;yaohan404;jiangruitao;zhenghaojie;stormchasingg;panyj1993;cmh1056291129;yuyinkai1;sunguozhong;genius52;liu_jiaxu;HongMaoShuiGuai;zhengxinqian;weixin_43368449;jsez-li-bin;jgong5;wqh17101;w00609794;yydyzr;JieZhang679;sppedforcy;gcw_61YBRfIt;Jiong Gong;hw_whx;gongjiong;孔炳翔 Merged-by: ascend-robot Description: 代码从 develop 同步到 master，后续基于 master 演进，并支持打包 See merge request: Ascend/msmodeling!330	14 天前
CONTRIBUTING.md	docs: 更新 PR 关联要求说明 Co-authored-by: Kudo__shinichi<liuning119@huawei.com> # message auto-generated for no-merge-commit merge: !435 merge docs/update-contributing-pr-link-policy into master docs: 更新 PR 关联要求说明 Created-by: Kudo__shinichi Commit-by: Kudo__shinichi Merged-by: ascend-robot Description: # PR Template Thanks for your contribution; we appreciate it a lot. The following instructions will make your pull request healthier and help you get feedback more easily. If you do not understand some items, don't worry, just make the pull request and seek help from maintainers. 感谢您的贡献，我们非常重视。以下说明将使您的拉取请求更健康，更易于获得反馈。如果您不理解某些项目，请不要担心，只需提交拉取请求并从维护人员那里寻求帮助即可。 PR Type / PR类型 - [ ] Feature（功能新增） - [ ] Bugfix（Bug 修复） - [x] Docs（文档更新） - [ ] CI/CD（持续集成/持续部署） - [ ] Refactor（代码重构） - [ ] Perf（性能优化） - [ ] Test-Cases（测试用例更新） - [ ] Other（其他） ## 🔍 Motivation / 变更动机 Please describe the motivation of this PR and the goal you want to achieve through this PR. 请描述您的拉取请求的动机和您希望通过此拉取请求实现的目标。更新 PR 规范中关于关联里程碑或 Issue 的合入要求，帮助贡献者了解当前仓库的 PR 合入流程、目标分支要求和权限申请方式。 ------ ## 📝 Modification / 修改内容 Please briefly describe what modification is made in this PR. 请简要描述此拉取请求中进行的修改。 - 在 CONTRIBUTING.md 的 PR 规范章节新增 PR 关联要求。 - 明确 PR 合入前需满足关联 Issue 或关联里程碑二选一。 - 说明关联里程碑的 PR 需要合入对应商发分支。 - 补充关联 Issue 或里程碑权限的申请链接。 ------ ## 📐 Associated Test Results / 关联测试结果 Please provide the related test results, such as test reports, etc. 请提供相关测试结果，例如测试报告等。文档更新，无需运行单元测试。已执行： - `git diff --check` ------ ## 🌟 Use cases (Optional) / 使用案例（可选） If this PR introduces a new feature, it is better to list some use cases here and update the documentation. 如果此拉取请求引入了新功能，最好在此处列出一些用例并更新文档。 N/A，非功能新增。 ------ ## ✅ Checklist / 检查列表 Before PR: - [ ] Bug fixes are fully covered by unit tests, the case that causes the bug should be added in the unit tests. / 修复的 Bug 已完全由单元测试覆盖，导致 Bug 的情况应在单元测试中添加。（N/A，非 Bugfix） - [ ] The modification is covered by complete unit tests. If not, please add more unit tests to ensure the correctness. / 此拉取请求中的修改已完全由单元测试覆盖。如果不是，请添加更多单元测试以确保正确性。（N/A，纯文档更新） - [x] All relevant documentation (API docs, docstrings, example tutorials) has been updated to reflect these changes. / 所有相关文档（API 文档、文档字符串、示例教程）已更新以反映这些更改。 - [x] Please ensure code files contain no Chinese comments. / 请保证代码文件中不含中文注释。（未修改代码文件） ------ See merge request: Ascend/msmodeling!435	2 天前
LICENSE	修改目录 Co-authored-by: ttcool<xujintao8@h-partners.com> # message auto-generated for no-merge-commit merge: !20 merge master into master 修改目录 Created-by: tt0cool Commit-by: ttcool Merged-by: ascend-robot Description: 修改目录 See merge request: Ascend/msmodeling!20	5 个月前
README.md	【fix_doc】优化资料 Co-authored-by: eveyin1<qianyin2022@hotmail.com> # message auto-generated for no-merge-commit merge: !406 merge fix_doc into master 【fix_doc】优化资料 Created-by: eveyin1 Commit-by: eveyin1 Merged-by: ascend-robot Description: # PR Template Thanks for your contribution; we appreciate it a lot. The following instructions will make your pull request healthier and help you get feedback more easily. If you do not understand some items, don't worry, just make the pull request and seek help from maintainers. 感谢您的贡献，我们非常重视。以下说明将使您的拉取请求更健康，更易于获得反馈。如果您不理解某些项目，请不要担心，只需提交拉取请求并从维护人员那里寻求帮助即可。 PR Type / PR类型 - [ ] Feature（功能新增） - [ ] Bugfix（Bug 修复） - [ ] Docs（文档更新） - [ ] CI/CD（持续集成/持续部署） - [ ] Refactor（代码重构） - [ ] Perf（性能优化） - [ ] Test-Cases（测试用例更新） - [ ] Other（其他） ## 🔍 Motivation / 变更动机 Please describe the motivation of this PR and the goal you want to achieve through this PR. 请描述您的拉取请求的动机和您希望通过此拉取请求实现的目标。 ------ ## 📝 Modification / 修改内容 Please briefly describe what modification is made in this PR. 请简要描述此拉取请求中进行的修改。本次主要更新 README 和模型支持矩阵文档，提升项目信息展示和模型支持说明的准确性。更新 README.md 中 DeepWiki 入口链接为 https://deepwiki.com/Ascend/msmodeling。调整 README 的 Markdown 换行和空行格式，修复“最新消息”“智能检索”“相关说明”“致谢”等段落在预览中不换行的问题。优化 README 致谢名单，使用更正式的公司/部门名称。在中英文模型支持矩阵中同步更新已支持模型列表，新增 DeepSeek V4、Kimi-K2.6、Kimi-K2.5、GLM5.1、Qwen3 Dense、GLM-4V 等模型。同步英文模型支持文档结构，使其与中文文档保持一致，包括阅读说明、模型支持表和特性支持表。 ------ ## 📐 Associated Test Results / 关联测试结果 Please provide the related test results, such as test reports, etc. 请提供相关测试结果，例如测试报告等。 ------ ## 🌟 Use cases (Optional) / 使用案例（可选） If this PR introduces a new feature, it is better to list some use cases here and update the documentation. 如果此拉取请求引入了新功能，最好在此处列出一些用例并更新文档。 ------ ## ✅ Checklist / 检查列表 Before PR: - [ ] Bug fixes are fully covered by unit tests, the case that causes the bug should be added in the unit tests. / 修复的 Bug 已完全由单元测试覆盖，导致 Bug 的情况应在单元测试中添加。 - [ ] The modification is covered by complete unit tests. If not, please add more unit tests to ensure the correctness. / 此拉取请求中的修改已完全由单元测试覆盖。如果不是，请添加更多单元测试以确保正确性。 - [ ] All relevant documentation (API docs, docstrings, example tutorials) has been updated to reflect these changes. / 所有相关文档（API 文档、文档字符串、示例教程）已更新以反映这些更改。 - [ ] Please ensure code files contain no Chinese comments. / 请保证代码文件中不含中文注释。 ------ See merge request: Ascend/msmodeling!406	1 天前
__init__.py	【同步】【非开发代码】代码从 develop 同步到 master Co-authored-by: yydyzr<liuyuncong1@huawei.com> Co-authored-by: gcw_61YBRfIt<chuzhenxing@huawei.com> Co-authored-by: 孔炳翔<1120200577@qq.com> Co-authored-by: zhengxinqian<qianzhengxin@huawei.com> Co-authored-by: hw_whx<wanghexiang7@huawei.com> Co-authored-by: jgong5<steven.gong@gmail.com> Co-authored-by: hw_whx<2952154980@qq.com> # message auto-generated for no-merge-commit merge: !330 merge master into master 【同步】【非开发代码】代码从 develop 同步到 master Created-by: AvadaKedavrua Commit-by: liujiawang;ascend-robot;AvadaKedavrua;lutean;Horacehxw;eveyin1;minghang_c;zwt__;tt0cool;elrond-g;jia_ya_nan;zhenyu_zhang;ChenHuiwen;wangshen001;Hudingyi;wendellX;Secluded_Ocean;jhon-117;yaohan404;jiangruitao;zhenghaojie;stormchasingg;panyj1993;cmh1056291129;yuyinkai1;sunguozhong;genius52;liu_jiaxu;HongMaoShuiGuai;zhengxinqian;weixin_43368449;jsez-li-bin;jgong5;wqh17101;w00609794;yydyzr;JieZhang679;sppedforcy;gcw_61YBRfIt;Jiong Gong;hw_whx;gongjiong;孔炳翔 Merged-by: ascend-robot Description: 代码从 develop 同步到 master，后续基于 master 演进，并支持打包 See merge request: Ascend/msmodeling!330	14 天前
pyproject.toml	修改fastapi版本问题 Co-authored-by: tt0cool<xujintao8@h-partners.com> # message auto-generated for no-merge-commit merge: !451 merge master into master 修改fastapi版本问题 Created-by: tt0cool Commit-by: tt0cool Merged-by: ascend-robot Description: # PR Template Thanks for your contribution; we appreciate it a lot. The following instructions will make your pull request healthier and help you get feedback more easily. If you do not understand some items, don't worry, just make the pull request and seek help from maintainers. 感谢您的贡献，我们非常重视。以下说明将使您的拉取请求更健康，更易于获得反馈。如果您不理解某些项目，请不要担心，只需提交拉取请求并从维护人员那里寻求帮助即可。 PR Type / PR类型 - [ ] Feature（功能新增） - [x] Bugfix（Bug 修复） - [ ] Docs（文档更新） - [ ] CI/CD（持续集成/持续部署） - [ ] Refactor（代码重构） - [ ] Perf（性能优化） - [ ] Test-Cases（测试用例更新） - [ ] Other（其他） ## 🔍 Motivation / 变更动机 Please describe the motivation of this PR and the goal you want to achieve through this PR. fastapi在0.137的版本会导致vllm服务失败，需要限制fastapi版本 ------ ## 📝 Modification / 修改内容 Please briefly describe what modification is made in this PR. 请简要描述此拉取请求中进行的修改。限制fastapi版本与vllm官方依赖版本一致 ------ ## 📐 Associated Test Results / 关联测试结果 Please provide the related test results, such as test reports, etc. 请提供相关测试结果，例如测试报告等。 ![image.png](https://raw.gitcode.com/user-images/assets/8428112/044472f6-460b-445f-b29e-73f9d6e250f9/image.png 'image.png') ------ ## 🌟 Use cases (Optional) / 使用案例（可选） If this PR introduces a new feature, it is better to list some use cases here and update the documentation. 如果此拉取请求引入了新功能，最好在此处列出一些用例并更新文档。 ------ ## ✅ Checklist / 检查列表 Before PR: - [x] Bug fixes are fully covered by unit tests, the case that causes the bug should be added in the unit tests. / 修复的 Bug 已完全由单元测试覆盖，导致 Bug 的情况应在单元测试中添加。 - [x] The modification is covered by complete unit tests. If not, please add more unit tests to ensure the correctness. / 此拉取请求中的修改已完全由单元测试覆盖。如果不是，请添加更多单元测试以确保正确性。 - [x] All relevant documentation (API docs, docstrings, example tutorials) has been updated to reflect these changes. / 所有相关文档（API 文档、文档字符串、示例教程）已更新以反映这些更改。 - [x] Please ensure code files contain no Chinese comments. / 请保证代码文件中不含中文注释。 ------ See merge request: Ascend/msmodeling!451	23 小时前
requirements.txt	support kimi k2.5 - 升级 torch 版本号 Co-authored-by: wangshen001<wangshen34@h-partners.com> # message auto-generated for no-merge-commit merge: !404 merge upgrade_torch_version into master support kimi k2.5 - 升级 torch 版本号 Created-by: wangshen001 Commit-by: wangshen001 Merged-by: ascend-robot Description: # PR Template Thanks for your contribution; we appreciate it a lot. The following instructions will make your pull request healthier and help you get feedback more easily. If you do not understand some items, don't worry, just make the pull request and seek help from maintainers. 感谢您的贡献，我们非常重视。以下说明将使您的拉取请求更健康，更易于获得反馈。如果您不理解某些项目，请不要担心，只需提交拉取请求并从维护人员那里寻求帮助即可。 PR Type / PR类型 - [ ] Feature（功能新增） - [ ] Bugfix（Bug 修复） - [ ] Docs（文档更新） - [ ] CI/CD（持续集成/持续部署） - [ ] Refactor（代码重构） - [ ] Perf（性能优化） - [ ] Test-Cases（测试用例更新） - [ ] Other（其他） ## 🔍 Motivation / 变更动机 Please describe the motivation of this PR and the goal you want to achieve through this PR. 请描述您的拉取请求的动机和您希望通过此拉取请求实现的目标。 PyTorch 2.7 的 inductor 在处理 torch.compile 时存在一个 bug。当 Kimi-K2.5 模型的 MoE Gate 中执行 .view(-1, h) 操作时（将 batch_size * seq_len 合并为一个维度），inductor 内部产生了符号表达式 s0s1。这个符号表达式是 torch.SymInt 类型，而 inductor 的 Layout.__init__（ir.py:3275）中的断言只接受 sympy.Expr 或 int，导致 AssertionError 关键调用链：empty_strided lowering → pointwise.realize() → FlexibleLayout(size=(s0s1, 1152)) → Layout.__init__ → assert all(isinstance(s, (Expr, int)) for s in size) ❌ 失败该bug在2.8以及以上版本修复，故升级PyTorch版本到 2.8 ## 📝 Modification / 修改内容 Please briefly describe what modification is made in this PR. 请简要描述此拉取请求中进行的修改。将requirements.txt和pyproject.toml中的torch版本范围由>=2.7,<=2.10改为>=2.8,<=2.10；去掉torchvision>=0.25.0的限制 ## 📐 Associated Test Results / 关联测试结果 Please provide the related test results, such as test reports, etc. 请提供相关测试结果，例如测试报告等。执行仿真命令：python -m cli.inference.text_generate moonshotai/Kimi-K2.5 --device ATLAS_800_A3_560T_128G_DIE --num-devices 16 --num-queries 4 --query-length 30 --compile --tp-size 4 --dp-size 4 --ep-size 16 --quantize-linear-action W4A8_DYNAMIC --image-batch-size 1 --image-height 1080 --image-width 1920 --dump-input-shapes torch版本升级前： ![image.png](https://raw.gitcode.com/user-images/assets/8428112/25a1cdfd-b87d-47ec-90c1-e39aa6025815/image.png 'image.png') torch版本升级后： ![image.png](https://raw.gitcode.com/user-images/assets/8428112/08b6c1af-a26c-4d8c-b6ad-62f5135c9d84/image.png 'image.png') ## 🌟 Use cases (Optional) / 使用案例（可选） If this PR introduces a new feature, it is better to list some use cases here and update the documentation. 如果此拉取请求引入了新功能，最好在此处列出一些用例并更新文档。 ------ ## ✅ Checklist / 检查列表 Before PR: - [ ] Bug fixes are fully covered by unit tests, the case that causes the bug should be added in the unit tests. / 修复的 Bug 已完全由单元测试覆盖，导致 Bug 的情况应在单元测试中添加。 - [ ] The modification is covered by complete unit tests. If not, please add more unit tests to ensure the correctness. / 此拉取请求中的修改已完全由单元测试覆盖。如果不是，请添加更多单元测试以确保正确性。 - [ ] All relevant documentation (API docs, docstrings, example tutorials) has been updated to reflect these changes. / 所有相关文档（API 文档、文档字符串、示例教程）已更新以反映这些更改。 - [ ] Please ensure code files contain no Chinese comments. / 请保证代码文件中不含中文注释。 ------ See merge request: Ascend/msmodeling!404	2 天前
uv.lock	support kimi k2.5 - 升级 torch 版本号 Co-authored-by: wangshen001<wangshen34@h-partners.com> # message auto-generated for no-merge-commit merge: !404 merge upgrade_torch_version into master support kimi k2.5 - 升级 torch 版本号 Created-by: wangshen001 Commit-by: wangshen001 Merged-by: ascend-robot Description: # PR Template Thanks for your contribution; we appreciate it a lot. The following instructions will make your pull request healthier and help you get feedback more easily. If you do not understand some items, don't worry, just make the pull request and seek help from maintainers. 感谢您的贡献，我们非常重视。以下说明将使您的拉取请求更健康，更易于获得反馈。如果您不理解某些项目，请不要担心，只需提交拉取请求并从维护人员那里寻求帮助即可。 PR Type / PR类型 - [ ] Feature（功能新增） - [ ] Bugfix（Bug 修复） - [ ] Docs（文档更新） - [ ] CI/CD（持续集成/持续部署） - [ ] Refactor（代码重构） - [ ] Perf（性能优化） - [ ] Test-Cases（测试用例更新） - [ ] Other（其他） ## 🔍 Motivation / 变更动机 Please describe the motivation of this PR and the goal you want to achieve through this PR. 请描述您的拉取请求的动机和您希望通过此拉取请求实现的目标。 PyTorch 2.7 的 inductor 在处理 torch.compile 时存在一个 bug。当 Kimi-K2.5 模型的 MoE Gate 中执行 .view(-1, h) 操作时（将 batch_size * seq_len 合并为一个维度），inductor 内部产生了符号表达式 s0s1。这个符号表达式是 torch.SymInt 类型，而 inductor 的 Layout.__init__（ir.py:3275）中的断言只接受 sympy.Expr 或 int，导致 AssertionError 关键调用链：empty_strided lowering → pointwise.realize() → FlexibleLayout(size=(s0s1, 1152)) → Layout.__init__ → assert all(isinstance(s, (Expr, int)) for s in size) ❌ 失败该bug在2.8以及以上版本修复，故升级PyTorch版本到 2.8 ## 📝 Modification / 修改内容 Please briefly describe what modification is made in this PR. 请简要描述此拉取请求中进行的修改。将requirements.txt和pyproject.toml中的torch版本范围由>=2.7,<=2.10改为>=2.8,<=2.10；去掉torchvision>=0.25.0的限制 ## 📐 Associated Test Results / 关联测试结果 Please provide the related test results, such as test reports, etc. 请提供相关测试结果，例如测试报告等。执行仿真命令：python -m cli.inference.text_generate moonshotai/Kimi-K2.5 --device ATLAS_800_A3_560T_128G_DIE --num-devices 16 --num-queries 4 --query-length 30 --compile --tp-size 4 --dp-size 4 --ep-size 16 --quantize-linear-action W4A8_DYNAMIC --image-batch-size 1 --image-height 1080 --image-width 1920 --dump-input-shapes torch版本升级前： ![image.png](https://raw.gitcode.com/user-images/assets/8428112/25a1cdfd-b87d-47ec-90c1-e39aa6025815/image.png 'image.png') torch版本升级后： ![image.png](https://raw.gitcode.com/user-images/assets/8428112/08b6c1af-a26c-4d8c-b6ad-62f5135c9d84/image.png 'image.png') ## 🌟 Use cases (Optional) / 使用案例（可选） If this PR introduces a new feature, it is better to list some use cases here and update the documentation. 如果此拉取请求引入了新功能，最好在此处列出一些用例并更新文档。 ------ ## ✅ Checklist / 检查列表 Before PR: - [ ] Bug fixes are fully covered by unit tests, the case that causes the bug should be added in the unit tests. / 修复的 Bug 已完全由单元测试覆盖，导致 Bug 的情况应在单元测试中添加。 - [ ] The modification is covered by complete unit tests. If not, please add more unit tests to ensure the correctness. / 此拉取请求中的修改已完全由单元测试覆盖。如果不是，请添加更多单元测试以确保正确性。 - [ ] All relevant documentation (API docs, docstrings, example tutorials) has been updated to reflect these changes. / 所有相关文档（API 文档、文档字符串、示例教程）已更新以反映这些更改。 - [ ] Please ensure code files contain no Chinese comments. / 请保证代码文件中不含中文注释。 ------ See merge request: Ascend/msmodeling!404	2 天前

MindStudio Modeling

昇腾 AI 模型性能建模与仿真工具

✨ 最新消息

🔹 [2026.06.10]：msModeling 新增 DeepSeek-V4 模型支持
🔹 [2026.04.02]：msModeling 新增 GLM5 模型支持

ℹ️ 简介

MindStudio Modeling（msModeling）是专为昇腾 AI 处理器打造的神经网络推理性能仿真与分析框架，提供单模型性能仿真、服务级吞吐优化、服务化参数自动寻优与可视化分析能力，帮助开发者在无物理硬件或部署前期预测模型性能、识别瓶颈并优化配置。

⚙️ 功能介绍

msModeling 提供 TensorCast、Throughput Optimizer、ServingCast、Web UI 和 OptiX 等功能模块，覆盖单模型性能仿真、吞吐优化、服务级仿真、可视化交互与服务化参数自动寻优等场景。模型与特性覆盖范围请参见《模型支持与特性支持矩阵》。

功能名称	功能描述
TensorCast	算子仿真模块，拦截 PyTorch 计算图，在指定 DeviceProfile 上模拟推理过程，输出算子级性能分解、内存占用、算子 shape 及 Chrome Trace。
Throughput Optimizer	吞吐优化模块，在 SLO 约束下自动搜索最优并行策略与 batch 配置，支持 PD 混部、PD 分离、PD 配比三种模式。
ServingCast	服务级推理仿真模块，基于 YAML 配置模拟多实例、多请求的端到端 serving 场景，输出吞吐、TTFT、TPOT 等系统级指标。
Web UI	可视化交互界面，支持通过页面配置模型、芯片、并行、量化和 workload 参数，并查看曲线、表格和导出结果。
OptiX	服务化参数自动寻优工具，基于 PSO 粒子寻优算法对 vLLM、MindIE 等服务框架进行参数寻优与验证。

🚀 快速入门

以 TensorCast 单模型仿真与 ServingCast 服务仿真为例，快速跑通核心流程，请参见《TensorCast 与 ServingCast 快速入门》。

📦 安装指南

介绍工具的环境依赖与安装方法，请参见《msModeling 安装指南》。

📘 使用指南

各工具的详细使用说明请参阅其源码仓库中的 README 文件，也可通过上方功能介绍表格中的链接直接跳转。

💡 典型案例

通过典型问题场景帮助用户理解并掌握工具使用，请参见《吞吐优化指南》与《服务仿真指南》中的示例。

❓ FAQ

常见问题及解决方案，请提交 Issues 或参见各模块使用指南。

🌌 智能检索

为提升文档查阅效率，我们提供多种高效检索方式：
🔹 AI 问答（DeepWiki）：自然语言问答，快速把握项目架构与模块关系。
🔹 AI 问答（ZRead）：中文问答体验更优，精准定位功能用法与细节。
🔹 精确搜索（ReadTheDocs）：关键词全文检索，直达接口、参数与报错等信息。

🛠️ 贡献指南

欢迎参与项目贡献！详细的贡献流程、代码规范、Commit 规范、测试要求等，请参见《CONTRIBUTING.md》。如有疑问，请提交 Issues。

⚖️ 相关说明

🔹 《版本说明》
🔹 《许可证声明》
🔹 《安全声明》
🔹 免责声明：本工具仿真与优化结果仅供性能评估参考，最终性能表现请以真实环境实测为准

🤝 建议与交流

欢迎大家为社区做贡献。如果有任何疑问或建议，请提交 Issues，我们会尽快回复。感谢您的支持。

SIG 例会：MindStudio Modeling Weekly Meeting 每周三 10:00-12:00（UTC+8）举行，会议纪要与议题请参见 sig-msit-modeling，也可使用时区转换查看本地时间。

即时互动（微信群）	官方资讯（公众号）	深度支持（助手/论坛）
_{扫码加入技术交流群}	_{扫码关注官方公众号}	扫码入群并关注公众号，直达 MindStudio 用户与开发者最快捷的交流平台：快速提问：与社区小伙伴即时探讨技术问题掌握动态：第一时间获取版本发布与功能更新通知经验共享：与广大开发者交流最佳实践与实战心得更多支持渠道：👉 昇腾助手： 👉 昇腾论坛：