msmodeling/cli/inference · Ascend/MindStudio-Modeling - AtomGit

ascend-robotoptimize memory peak for servingcast & support model_config from tensorcast

文件	最后提交记录	最后更新时间
model_adapter.py	修复check_dependencies导致model_adapter入口启动崩溃 Co-authored-by: pengzhipin<pengzhipin1@h-partners.com> # message auto-generated for no-merge-commit merge: !392 merge fix_model_adaptor into master 修复check_dependencies导致model_adapter入口启动崩溃 Created-by: weixin_43113933 Commit-by: pengzhipin Merged-by: ascend-robot Description: # PR Template Thanks for your contribution; we appreciate it a lot. The following instructions will make your pull request healthier and help you get feedback more easily. If you do not understand some items, don't worry, just make the pull request and seek help from maintainers. 感谢您的贡献，我们非常重视。以下说明将使您的拉取请求更健康，更易于获得反馈。如果您不理解某些项目，请不要担心，只需提交拉取请求并从维护人员那里寻求帮助即可。 PR Type / PR类型 - [ ] Feature（功能新增） - [ ] Bugfix（Bug 修复） - [ ] Docs（文档更新） - [ ] CI/CD（持续集成/持续部署） - [ ] Refactor（代码重构） - [ ] Perf（性能优化） - [ ] Test-Cases（测试用例更新） - [ ] Other（其他） ## 🔍 Motivation / 变更动机 Please describe the motivation of this PR and the goal you want to achieve through this PR. 请描述您的拉取请求的动机和您希望通过此拉取请求实现的目标。修复model_adapter报错:AttributeError: module 'tensor_cast.utils' has no attribute 'check_dependencies' ------ ## 📝 Modification / 修改内容 Please briefly describe what modification is made in this PR. 请简要描述此拉取请求中进行的修改。删除 main() 中对 check_dependencies 的调用 ------ ## 📐 Associated Test Results / 关联测试结果 Please provide the related test results, such as test reports, etc. 请提供相关测试结果，例如测试报告等。 ![image.png](https://raw.gitcode.com/user-images/assets/8428112/ec8ebb8e-dd1b-4dbf-840a-730f2dc88414/image.png 'image.png') ------ ## 🌟 Use cases (Optional) / 使用案例（可选） If this PR introduces a new feature, it is better to list some use cases here and update the documentation. 如果此拉取请求引入了新功能，最好在此处列出一些用例并更新文档。 ------ ## ✅ Checklist / 检查列表 Before PR: - [ ] Bug fixes are fully covered by unit tests, the case that causes the bug should be added in the unit tests. / 修复的 Bug 已完全由单元测试覆盖，导致 Bug 的情况应在单元测试中添加。 - [ ] The modification is covered by complete unit tests. If not, please add more unit tests to ensure the correctness. / 此拉取请求中的修改已完全由单元测试覆盖。如果不是，请添加更多单元测试以确保正确性。 - [ ] All relevant documentation (API docs, docstrings, example tutorials) has been updated to reflect these changes. / 所有相关文档（API 文档、文档字符串、示例教程）已更新以反映这些更改。 - [ ] Please ensure code files contain no Chinese comments. / 请保证代码文件中不含中文注释。 ------ See merge request: Ascend/msmodeling!392	4 天前
text_generate.py	【Bugfix】修复deepseek-v4部分量化场景下的问题 Co-authored-by: ChenHuiwen<chenhuiwen7@huawei.com> # message auto-generated for no-merge-commit merge: !373 merge fix-ds-v4-quant into master 【Bugfix】修复deepseek-v4部分量化场景下的问题 Created-by: ChenHuiwen Commit-by: ChenHuiwen Merged-by: ascend-robot Description: # PR Template Thanks for your contribution; we appreciate it a lot. The following instructions will make your pull request healthier and help you get feedback more easily. If you do not understand some items, don't worry, just make the pull request and seek help from maintainers. 感谢您的贡献，我们非常重视。以下说明将使您的拉取请求更健康，更易于获得反馈。如果您不理解某些项目，请不要担心，只需提交拉取请求并从维护人员那里寻求帮助即可。 PR Type / PR类型 - [ ] Feature（功能新增） - [x] Bugfix（Bug 修复） - [ ] Docs（文档更新） - [ ] CI/CD（持续集成/持续部署） - [ ] Refactor（代码重构） - [ ] Perf（性能优化） - [x] Test-Cases（测试用例更新） - [ ] Other（其他） ## 🔍 Motivation / 变更动机 This PR fixes DeepSeek V4 quantization and tensor-parallel execution issues found in `text_generate` simulation. 此 PR 修复 DeepSeek V4 在 `text_generate` 仿真中与量化和张量并行相关的问题。 Specifically, the old `backbone` quantization override naming was ambiguous because the option is used to configure non-routed-expert linear layers such as attention projections, dense MLP layers, and shared experts. This PR renames it to `non-expert` to better match the actual behavior. 同时，DeepSeek V4 Flash/Pro 的 O projection path has model-specific grouped projection behavior. Under W4A8 quantization or high TP configurations, TensorCast previously could hit shape mismatches in `wo_a` / `wo_b` modeling. This PR keeps the modeled path aligned with the real DeepSeek V4 structure while avoiding invalid reshape or double-sharding behavior. ## 📝 Modification / 修改内容 - Rename quantization override terminology: - `--quantize-backbone-linear-action` -> `--quantize-non-expert-linear-action` - `quantize_backbone_linear_action` -> `quantize_non_expert_linear_action` - Update CLI help text, `UserInputConfig`, quantization config creation, and related regression tests. - Update non-expert quantization config patterns: - Rename `_BACKBONE_LINEAR_PATTERNS` to `_NON_EXPERT_LINEAR_PATTERNS`. - Keep routed MoE experts controlled by the broad `--quantize-linear-action` setting. - Keep attention, dense MLP, and shared-expert layers covered by the non-expert override. - Fix DeepSeek V4 W4A8 `wo_a` grouped projection: - When `wo_a` is quantized as W4A8, unpack int4 packed `qweight` back to its logical weight shape before the grouped einsum reshape. - Avoid using stale `in_features/out_features` values after TP wrapping. - Fix DeepSeek V4 O projection TP sharding: - Remove the duplicate V4-specific `self_attn.o_proj` TP rule. - Let the generic `o_proj` RowParallel rule handle `wo_b/o_proj` once. - Prevent `o_proj` from being sharded twice, which caused local input dim mismatch. - Add/update regression tests: - Cover W4A8 `wo_a` logical weight shape before grouped einsum. - Update DeepSeek V4 TP plan expectations. - Update quantization config and user config tests for the new non-expert terminology. ------ ## 📐 Associated Test Results / 关联测试结果 Please provide the related test results, such as test reports, etc. 请提供相关测试结果，例如测试报告等。 ![image.png](https://raw.gitcode.com/user-images/assets/8428112/554930ff-0798-4331-8131-355e9d34c759/image.png 'image.png') ![image.png](https://raw.gitcode.com/user-images/assets/8428112/de48f376-7feb-49b1-96ff-daa94228f25a/image.png 'image.png') ------ ## 🌟 Use cases (Optional) / 使用案例（可选） If this PR introduces a new feature, it is better to list some use cases here and update the documentation. 如果此拉取请求引入了新功能，最好在此处列出一些用例并更新文档。 ------ ## ✅ Checklist / 检查列表 Before PR: - [ ] Bug fixes are fully covered by unit tests, the case that causes the bug should be added in the unit tests. / 修复的 Bug 已完全由单元测试覆盖，导致 Bug 的情况应在单元测试中添加。 - [ ] The modification is covered by complete unit tests. If not, please add more unit tests to ensure the correctness. / 此拉取请求中的修改已完全由单元测试覆盖。如果不是，请添加更多单元测试以确保正确性。 - [ ] All relevant documentation (API docs, docstrings, example tutorials) has been updated to reflect these changes. / 所有相关文档（API 文档、文档字符串、示例教程）已更新以反映这些更改。 - [ ] Please ensure code files contain no Chinese comments. / 请保证代码文件中不含中文注释。 ------ See merge request: Ascend/msmodeling!373	11 天前
throughput_optimizer.py	optimize memory peak for servingcast & support model_config from tensorcast Co-authored-by: stormchasingg<sh_ding@zju.edu.cn> # message auto-generated for no-merge-commit merge: !360 merge enhance-servingcast into master optimize memory peak for servingcast & support model_config from tensorcast Created-by: stormchasingg Commit-by: stormchasingg Merged-by: ascend-robot Description: # PR Template Thanks for your contribution; we appreciate it a lot. The following instructions will make your pull request healthier and help you get feedback more easily. If you do not understand some items, don't worry, just make the pull request and seek help from maintainers. 感谢您的贡献，我们非常重视。以下说明将使您的拉取请求更健康，更易于获得反馈。如果您不理解某些项目，请不要担心，只需提交拉取请求并从维护人员那里寻求帮助即可。 PR Type / PR类型 - [x] Feature（功能新增） - [ ] Bugfix（Bug 修复） - [ ] Docs（文档更新） - [ ] CI/CD（持续集成/持续部署） - [ ] Refactor（代码重构） - [ ] Perf（性能优化） - [x] Test-Cases（测试用例更新） - [ ] Other（其他） ## 🔍 Motivation / 变更动机 This PR aligns TensorCast/ServingCast throughput simulation with vLLM-Ascend MoE optimization behavior, especially for shared expert tensor parallelism, sequence parallel configuration, and fused MoE communication paths. 本 PR 旨在使 ServingCast 的吞吐仿真配置与 TensorCast 保持一致，尤其是 shared expert TP、sequence parallel 配置以及 fused MoE 通信路径相关行为。 ------ ## 📝 Modification / 修改内容 - Add throughput optimizer options for shared expert TP, sequence parallel, word embedding TP mode, and chrome trace output. - Propagate optimizer CLI options into `UserInputConfig` and per-parallel-search model runner configs. - Apply sequence-parallel compilation configuration inside each parallel runner task. - Add TP/DP suffixes to chrome trace filenames to avoid overwriting trace files across parallel search candidates. - Adjust MoE shared expert TP execution to decrease memory peak in servingcast. - Enable dispatch-FFN-combine fusion by default in compilation config. ------ ## 📐 Associated Test Results / 关联测试结果略。 Test coverage included: None. ------ ## 🌟 Use cases (Optional) / 使用案例（可选） This change is useful when evaluating MoE models with vLLM-style shared expert TP and sequence parallel optimizations, and when collecting chrome traces for multiple TP/DP candidates in one throughput search. `python3 -m cli.inference.throughput_optimizer $dense_model_path \ --device ATLAS_800_A3_752T_128G_DIE \ --num-devices 16 \ --input-length 4096 \ --output-length 1 \ --compile \ --tp-sizes 8 16 \ --batch-range 16 16 \ --enable-sequence-parallel \ --word-embedding-tp row \ --quantize-linear-action DISABLED \ --ttft-limits 2000 \ --log-level info \ 2>&1 \| tee ./run_sc_1.log` `python3 -m cli.inference.throughput_optimizer $moe_model_path \ --device ATLAS_800_A3_752T_128G_DIE \ --num-devices 16 \ --input-length 4096 \ --output-length 1 \ --compile \ --quantize-linear-action W8A8_STATIC \ --disagg \ --ttft-limits 2000 \ --tp-sizes 8 16 \ --batch-range 4 4 \ --reserved-memory-gb 10 \ --enable-shared-expert-tp \ --word-embedding-tp row \ --chrome-trace trace_decode.json \ --log-level info \ 2>&1 \| tee ./run_sc3_2.log` ------ ## ✅ Checklist / 检查列表 Before PR: - [ ] Bug fixes are fully covered by unit tests, the case that causes the bug should be added in the unit tests. / 修复的 Bug 已完全由单元测试覆盖，导致 Bug 的情况应在单元测试中添加。 - [x] The modification is covered by complete unit tests. If not, please add more unit tests to ensure the correctness. / 此拉取请求中的修改已完全由单元测试覆盖。如果不是，请添加更多单元测试以确保正确性。 - [ ] All relevant documentation (API docs, docstrings, example tutorials) has been updated to reflect these changes. / 所有相关文档（API 文档、文档字符串、示例教程）已更新以反映这些更改。 - [x] Please ensure code files contain no Chinese comments. / 请保证代码文件中不含中文注释。 See merge request: Ascend/msmodeling!360	3 天前
video_generate.py	fix(security): add model source safety checks Co-authored-by: jia_ya_nan<jiayanan3@h-partners.com> # message auto-generated for no-merge-commit merge: !385 merge fix/trust-remote-code-safety into master fix(security): add model source safety checks Created-by: jia_ya_nan Commit-by: jia_ya_nan Merged-by: ascend-robot Description: # PR Template Thanks for your contribution; we appreciate it a lot. The following instructions will make your pull request healthier and help you get feedback more easily. If you do not understand some items, don't worry, just make the pull request and seek help from maintainers. 感谢您的贡献，我们非常重视。以下说明将使您的拉取请求更健康，更易于获得反馈。如果您不理解某些项目，请不要担心，只需提交拉取请求并从维护人员那里寻求帮助即可。 PR Type / PR类型 - [ ] Feature（功能新增） - [x] Bugfix（Bug 修复） - [ ] Docs（文档更新） - [ ] CI/CD（持续集成/持续部署） - [ ] Refactor（代码重构） - [ ] Perf（性能优化） - [ ] Test-Cases（测试用例更新） - [x] Other（其他） ## 🔍 Motivation / 变更动机 Please describe the motivation of this PR and the goal you want to achieve through this PR. 请描述您的拉取请求的动机和您希望通过此拉取请求实现的目标。安全加固 ------ ## 📝 Modification / 修改内容 Please briefly describe what modification is made in this PR. 请简要描述此拉取请求中进行的修改。增加本地路径权限校验；增加日志风险提示去掉不维护的老接口 ------ ## 📐 Associated Test Results / 关联测试结果 Please provide the related test results, such as test reports, etc. 请提供相关测试结果，例如测试报告等。 ![image.png](https://raw.gitcode.com/user-images/assets/8428112/ef4f75a5-1346-4320-8de2-a19703ebedb3/image.png 'image.png') ------ ## 🌟 Use cases (Optional) / 使用案例（可选） If this PR introduces a new feature, it is better to list some use cases here and update the documentation. 如果此拉取请求引入了新功能，最好在此处列出一些用例并更新文档。 ------ ## ✅ Checklist / 检查列表 Before PR: - [x] Bug fixes are fully covered by unit tests, the case that causes the bug should be added in the unit tests. / 修复的 Bug 已完全由单元测试覆盖，导致 Bug 的情况应在单元测试中添加。 - [x] The modification is covered by complete unit tests. If not, please add more unit tests to ensure the correctness. / 此拉取请求中的修改已完全由单元测试覆盖。如果不是，请添加更多单元测试以确保正确性。 - [x] All relevant documentation (API docs, docstrings, example tutorials) has been updated to reflect these changes. / 所有相关文档（API 文档、文档字符串、示例教程）已更新以反映这些更改。 - [x] Please ensure code files contain no Chinese comments. / 请保证代码文件中不含中文注释。 ------ See merge request: Ascend/msmodeling!385	4 天前