| 文件 | 最后提交记录 | 最后更新时间 |
|---|---|---|
chore(ci): adopt pre-commit and retire legacy lintrunner adapters Co-authored-by: liujiawang<anonymousdev@163.com> # message auto-generated for no-merge-commit merge: !176 merge pre-commit into develop chore(ci): adopt pre-commit and retire legacy lintrunner adapters Created-by: AvadaKedavrua Commit-by: liujiawang;AvadaKedavrua Merged-by: ascend-robot Description: Thanks for your contribution; we appreciate it a lot. The following instructions will make your pull request healthier and help you get feedback more easily. If you do not understand some items, don't worry, just make the pull request and seek help from maintainers. 感谢您的贡献,我们非常重视。以下说明将使您的拉取请求更健康,更易于获得反馈。如果您不理解某些项目,请不要担心,只需提交拉取请求并从维护人员那里寻求帮助即可。 **PR Type / PR类型** - [ ] Feature(功能新增) - [ ] Bugfix(Bug 修复) - [x] Docs(文档更新) - [x] CI/CD(持续集成/持续部署) - [ ] Refactor(代码重构) - [ ] Perf(性能优化) - [ ] Test-Cases(测试用例更新) - [ ] Other(其他) ------ ## Motivation / 变更动机 Continue the **pre-commit** migration: tighten **Pylint** so only high-signal messages run ( disable=all + explicit enable list), fix real issues that remained under that profile, and translate hook/config comments to **English**. ------ ## Configuration changes(仅工具与注释 / tooling & comments only) | Path | What changed | |------|----------------| | pre-commit/pyproject.toml | **Pylint:** [tool.pylint."messages control"] with disable = ["all"] and a short **allowlist** of message IDs (E0100, E0601–E0611, E0632, E1101, E1120, W0632, W1514). **Ruff:** unchanged behavior; comments translated to English. **Bandit:** comments translated; rule allowlist/skip lists unchanged. | | .pre-commit-config.yaml | Comments translated to English; Bandit hook display name set to **bandit (Python security checks)**. Hook versions and args unchanged except for comment text. | ------ ## Source code changes(应用代码 / application code) | Area | Files | Purpose | |------|--------|---------| | serving_cast | communication.py, engine.py, instance.py, kv_cache_manager.py, load_gen.py, main.py, model_runner.py, request.py, serving.py, utils.py | Replace from . import stime with import serving_cast.stime as stime so Pylint resolves imports (fixes **E0611**). | | serving_cast | stime.py | Singleton **salabim** Environment via _get_sim_env() so type checkers/Pylint see **sim.Environment** (fixes **E1101** on SimulationEnv). | | serving_cast/service | base_throughput_optimizer.py | __init__ defaults + assert runner is not None before run_inference (fixes **E1101** on base class). | | tensor_cast | diffusers/diffusers_model.py, diffusers/diffusers_utils.py, runtime.py | Add **encoding="utf-8"** to open() / trace export (fixes **W1514**). | | web_ui | callbacks.py | **refresh_optimizer_detail:** call _optimizer_detail_view(rows, None, device) and unpack five return values (fixes **E1120**). | ------ ## Recent commits on pre-commit branch - ci(pre-commit): fix pylint message selection with disable=all - fix: resolve pylint findings in serving_cast, tensor_cast, and web_ui - docs(pre-commit): translate comments to English and add all-files run log ------  ------ ## Checklist / 检查列表 - [x] Please ensure code files contain no Chinese comments. / 请保证代码文件中不含中文注释。 See merge request: Ascend/msmodeling!176 | 1 个月前 | |
chore(ci): adopt pre-commit and retire legacy lintrunner adapters Co-authored-by: liujiawang<anonymousdev@163.com> # message auto-generated for no-merge-commit merge: !176 merge pre-commit into develop chore(ci): adopt pre-commit and retire legacy lintrunner adapters Created-by: AvadaKedavrua Commit-by: liujiawang;AvadaKedavrua Merged-by: ascend-robot Description: Thanks for your contribution; we appreciate it a lot. The following instructions will make your pull request healthier and help you get feedback more easily. If you do not understand some items, don't worry, just make the pull request and seek help from maintainers. 感谢您的贡献,我们非常重视。以下说明将使您的拉取请求更健康,更易于获得反馈。如果您不理解某些项目,请不要担心,只需提交拉取请求并从维护人员那里寻求帮助即可。 **PR Type / PR类型** - [ ] Feature(功能新增) - [ ] Bugfix(Bug 修复) - [x] Docs(文档更新) - [x] CI/CD(持续集成/持续部署) - [ ] Refactor(代码重构) - [ ] Perf(性能优化) - [ ] Test-Cases(测试用例更新) - [ ] Other(其他) ------ ## Motivation / 变更动机 Continue the **pre-commit** migration: tighten **Pylint** so only high-signal messages run ( disable=all + explicit enable list), fix real issues that remained under that profile, and translate hook/config comments to **English**. ------ ## Configuration changes(仅工具与注释 / tooling & comments only) | Path | What changed | |------|----------------| | pre-commit/pyproject.toml | **Pylint:** [tool.pylint."messages control"] with disable = ["all"] and a short **allowlist** of message IDs (E0100, E0601–E0611, E0632, E1101, E1120, W0632, W1514). **Ruff:** unchanged behavior; comments translated to English. **Bandit:** comments translated; rule allowlist/skip lists unchanged. | | .pre-commit-config.yaml | Comments translated to English; Bandit hook display name set to **bandit (Python security checks)**. Hook versions and args unchanged except for comment text. | ------ ## Source code changes(应用代码 / application code) | Area | Files | Purpose | |------|--------|---------| | serving_cast | communication.py, engine.py, instance.py, kv_cache_manager.py, load_gen.py, main.py, model_runner.py, request.py, serving.py, utils.py | Replace from . import stime with import serving_cast.stime as stime so Pylint resolves imports (fixes **E0611**). | | serving_cast | stime.py | Singleton **salabim** Environment via _get_sim_env() so type checkers/Pylint see **sim.Environment** (fixes **E1101** on SimulationEnv). | | serving_cast/service | base_throughput_optimizer.py | __init__ defaults + assert runner is not None before run_inference (fixes **E1101** on base class). | | tensor_cast | diffusers/diffusers_model.py, diffusers/diffusers_utils.py, runtime.py | Add **encoding="utf-8"** to open() / trace export (fixes **W1514**). | | web_ui | callbacks.py | **refresh_optimizer_detail:** call _optimizer_detail_view(rows, None, device) and unpack five return values (fixes **E1120**). | ------ ## Recent commits on pre-commit branch - ci(pre-commit): fix pylint message selection with disable=all - fix: resolve pylint findings in serving_cast, tensor_cast, and web_ui - docs(pre-commit): translate comments to English and add all-files run log ------  ------ ## Checklist / 检查列表 - [x] Please ensure code files contain no Chinese comments. / 请保证代码文件中不含中文注释。 See merge request: Ascend/msmodeling!176 | 1 个月前 | |
feat:仿真建模支持deepseek-V4模型适配 Co-authored-by: ChenHuiwen<chenhuiwen7@huawei.com> # message auto-generated for no-merge-commit merge: !166 merge deepseek-v4 into develop feat:仿真建模支持deepseek-V4模型适配 Created-by: ChenHuiwen Commit-by: ChenHuiwen Merged-by: ascend-robot Description: Thanks for your contribution; we appreciate it a lot. The following instructions will make your pull request healthier and help you get feedback more easily. If you do not understand some items, don't worry, just make the pull request and seek help from maintainers. 感谢您的贡献,我们非常重视。以下说明将使您的拉取请求更健康,更易于获得反馈。如果您不理解某些项目,请不要担心,只需提交拉取请求并从维护人员那里寻求帮助即可。 **PR Type / PR类型** - [x] Feature(功能新增) - [ ] Bugfix(Bug 修复) - [ ] Docs(文档更新) - [ ] CI/CD(持续集成/持续部署) - [ ] Refactor(代码重构) - [ ] Perf(性能优化) - [ ] Test-Cases(测试用例更新) - [ ] Other(其他) ## 🔍 Motivation / 变更动机 为 msmodeling/tensor_cast 增加对 DeepSeek V4 (Flash/Pro) 模型的端到端支持,使其性能建模流水线能够覆盖 V4 引入的稀疏注意力(NSA / Window / Compressed / Heavily-Compressed 多 layer-type 路由)、HC(Head Compression)混合、Sinkhorn 拆分以及 Hash Routing MoE 等新结构,并补齐对应的 fake-tensor 语义算子与代价模型,让 V4 模型可以直接走通现有 analytic / multistream tracing 流程。 ------ ## 📝 Modification / 修改内容 新增文件 / New files - tensor_cast/transformers/builtin_model/deepseek_v4.py:DeepSeek V4 builtin model profile,包含 DeepseekV4Config / DeepseekV4Model 注册、layer-type 校验({0, 4, 128} 对应 sliding_attention / compressed_sparse_attention / heavily_compressed_attention)、以及与 transformers AutoConfig / AutoModel 的安全注册逻辑。 - tests/test_tensor_cast/test_deepseek_v4.py 与 tests/test_tensor_cast/data/deepseek_v4/*.json:V4 模型对应的测试数据集与用例(含合法/非法/缺失/截短的 ratios 配置)。 注意力 / Attention(tensor_cast/layers/mla.py,tensor_cast/ops/mla.py,tensor_cast/ops/rotary_embedding.py) - 新增 DeepseekV4SparseAttention 与 MultiheadLatentAttentionTensorCast 适配(含 requires_legacy_kv_b_decomposition、KV-cache window 写入路径等)。 - 新增 get_window_topk_idxs / get_compress_topk_idxs 索引生成工具。 - 新增 HC 路径语义算子:hc_pre_inv_rms、hc_pre_sinkhorn,分别对应参考实现中的 inverse-RMS 缩放与 Sinkhorn 加权 reduction。 - 新增 scatter_nd_update_mla 等 KV 写入算子的代价模型,按参考实现仅计 source 行读 + 更新行写,不计 slot_mapping / 整 cache 张量。 MoE / Gate(tensor_cast/layers/moe_layer.py,tensor_cast/ops/fused_moe.py) - MoELayer 增加 V4 统一 gating 路径:识别 gate 上的 is_v4 / hash 标志位,按参考 Gate.forward 顺序发出 matmul + score func + indices + gather/normalize/route_scale 各算子,使每一步按其真实 dtype(gate matmul 走 fp32)单独计费。 - 新增 moe_gating_top_k(带可选 bias 的 V4 非 hash 层)与 moe_gating_top_k_hash(基于 tid2eid 表的 hash 路由层)两个语义算子。 性能模型 / Performance Model(tensor_cast/performance_model/__init__.py) - 引入 _safe_max_int 工具:在 fake / meta / functional tensor 上 tensor.max().item() 不可用时回退为 None,让 caller 走 shape-based 估算。 - 注册 V4 新算子(scatter_nd_update_mla、HC 系列、MoE 新 gating tail 等)的 PerformanceProperties,与参考实现的内存访问语义对齐。 其他 / Misc - tensor_cast/core/config_resolver.py、input_generator.py、model_runner.py、device.py、transformers/transformations.py、 transformers/custom_model_registry.py、layers/utils.py、model_config.py、compilation/passes/multistream_pass.py:补齐 V4 在 config 解析、输入构造、runner 调度、device profile、模型变换与算子注册各环节的接入。 ------ ## 📐 Associated Test Results / 关联测试结果 **Please provide the related test results, such as test reports, etc.** **请提供相关测试结果,例如测试报告等。**   ------ ## 🌟 Use cases (Optional) / 使用案例(可选) **If this PR introduces a new feature, it is better to list some use cases here and update the documentation.** **如果此拉取请求引入了新功能,最好在此处列出一些用例并更新文档。** ------ ## ✅ Checklist / 检查列表 **Before PR**: - [ ] [Linting tools](https://gitcode.com/Ascend/msmodeling/blob/develop/tensor_cast/README.md#coding-style) are used to fix the potential lint issues. / 使用 [lintrunner 工具](https://gitcode.com/Ascend/msmodeling/blob/develop/tensor_cast/README.md#coding-style) 来修复潜在的 lint 问题。 - [ ] Bug fixes are fully covered by unit tests, the case that causes the bug should be added in the unit tests. / 修复的 Bug 已完全由单元测试覆盖,导致 Bug 的情况应在单元测试中添加。 - [ ] The modification is covered by complete unit tests. If not, please add more unit tests to ensure the correctness. / 此拉取请求中的修改已完全由单元测试覆盖。如果不是,请添加更多单元测试以确保正确性。 - [ ] All relevant documentation (API docs, docstrings, example tutorials) has been updated to reflect these changes. / 所有相关文档(API 文档、文档字符串、示例教程)已更新以反映这些更改。 - [ ] Please ensure code files contain no Chinese comments. / 请保证代码文件中不含中文注释。 ------ See merge request: Ascend/msmodeling!166 | 26 天前 | |
feat:仿真建模支持deepseek-V4模型适配 Co-authored-by: ChenHuiwen<chenhuiwen7@huawei.com> # message auto-generated for no-merge-commit merge: !166 merge deepseek-v4 into develop feat:仿真建模支持deepseek-V4模型适配 Created-by: ChenHuiwen Commit-by: ChenHuiwen Merged-by: ascend-robot Description: Thanks for your contribution; we appreciate it a lot. The following instructions will make your pull request healthier and help you get feedback more easily. If you do not understand some items, don't worry, just make the pull request and seek help from maintainers. 感谢您的贡献,我们非常重视。以下说明将使您的拉取请求更健康,更易于获得反馈。如果您不理解某些项目,请不要担心,只需提交拉取请求并从维护人员那里寻求帮助即可。 **PR Type / PR类型** - [x] Feature(功能新增) - [ ] Bugfix(Bug 修复) - [ ] Docs(文档更新) - [ ] CI/CD(持续集成/持续部署) - [ ] Refactor(代码重构) - [ ] Perf(性能优化) - [ ] Test-Cases(测试用例更新) - [ ] Other(其他) ## 🔍 Motivation / 变更动机 为 msmodeling/tensor_cast 增加对 DeepSeek V4 (Flash/Pro) 模型的端到端支持,使其性能建模流水线能够覆盖 V4 引入的稀疏注意力(NSA / Window / Compressed / Heavily-Compressed 多 layer-type 路由)、HC(Head Compression)混合、Sinkhorn 拆分以及 Hash Routing MoE 等新结构,并补齐对应的 fake-tensor 语义算子与代价模型,让 V4 模型可以直接走通现有 analytic / multistream tracing 流程。 ------ ## 📝 Modification / 修改内容 新增文件 / New files - tensor_cast/transformers/builtin_model/deepseek_v4.py:DeepSeek V4 builtin model profile,包含 DeepseekV4Config / DeepseekV4Model 注册、layer-type 校验({0, 4, 128} 对应 sliding_attention / compressed_sparse_attention / heavily_compressed_attention)、以及与 transformers AutoConfig / AutoModel 的安全注册逻辑。 - tests/test_tensor_cast/test_deepseek_v4.py 与 tests/test_tensor_cast/data/deepseek_v4/*.json:V4 模型对应的测试数据集与用例(含合法/非法/缺失/截短的 ratios 配置)。 注意力 / Attention(tensor_cast/layers/mla.py,tensor_cast/ops/mla.py,tensor_cast/ops/rotary_embedding.py) - 新增 DeepseekV4SparseAttention 与 MultiheadLatentAttentionTensorCast 适配(含 requires_legacy_kv_b_decomposition、KV-cache window 写入路径等)。 - 新增 get_window_topk_idxs / get_compress_topk_idxs 索引生成工具。 - 新增 HC 路径语义算子:hc_pre_inv_rms、hc_pre_sinkhorn,分别对应参考实现中的 inverse-RMS 缩放与 Sinkhorn 加权 reduction。 - 新增 scatter_nd_update_mla 等 KV 写入算子的代价模型,按参考实现仅计 source 行读 + 更新行写,不计 slot_mapping / 整 cache 张量。 MoE / Gate(tensor_cast/layers/moe_layer.py,tensor_cast/ops/fused_moe.py) - MoELayer 增加 V4 统一 gating 路径:识别 gate 上的 is_v4 / hash 标志位,按参考 Gate.forward 顺序发出 matmul + score func + indices + gather/normalize/route_scale 各算子,使每一步按其真实 dtype(gate matmul 走 fp32)单独计费。 - 新增 moe_gating_top_k(带可选 bias 的 V4 非 hash 层)与 moe_gating_top_k_hash(基于 tid2eid 表的 hash 路由层)两个语义算子。 性能模型 / Performance Model(tensor_cast/performance_model/__init__.py) - 引入 _safe_max_int 工具:在 fake / meta / functional tensor 上 tensor.max().item() 不可用时回退为 None,让 caller 走 shape-based 估算。 - 注册 V4 新算子(scatter_nd_update_mla、HC 系列、MoE 新 gating tail 等)的 PerformanceProperties,与参考实现的内存访问语义对齐。 其他 / Misc - tensor_cast/core/config_resolver.py、input_generator.py、model_runner.py、device.py、transformers/transformations.py、 transformers/custom_model_registry.py、layers/utils.py、model_config.py、compilation/passes/multistream_pass.py:补齐 V4 在 config 解析、输入构造、runner 调度、device profile、模型变换与算子注册各环节的接入。 ------ ## 📐 Associated Test Results / 关联测试结果 **Please provide the related test results, such as test reports, etc.** **请提供相关测试结果,例如测试报告等。**   ------ ## 🌟 Use cases (Optional) / 使用案例(可选) **If this PR introduces a new feature, it is better to list some use cases here and update the documentation.** **如果此拉取请求引入了新功能,最好在此处列出一些用例并更新文档。** ------ ## ✅ Checklist / 检查列表 **Before PR**: - [ ] [Linting tools](https://gitcode.com/Ascend/msmodeling/blob/develop/tensor_cast/README.md#coding-style) are used to fix the potential lint issues. / 使用 [lintrunner 工具](https://gitcode.com/Ascend/msmodeling/blob/develop/tensor_cast/README.md#coding-style) 来修复潜在的 lint 问题。 - [ ] Bug fixes are fully covered by unit tests, the case that causes the bug should be added in the unit tests. / 修复的 Bug 已完全由单元测试覆盖,导致 Bug 的情况应在单元测试中添加。 - [ ] The modification is covered by complete unit tests. If not, please add more unit tests to ensure the correctness. / 此拉取请求中的修改已完全由单元测试覆盖。如果不是,请添加更多单元测试以确保正确性。 - [ ] All relevant documentation (API docs, docstrings, example tutorials) has been updated to reflect these changes. / 所有相关文档(API 文档、文档字符串、示例教程)已更新以反映这些更改。 - [ ] Please ensure code files contain no Chinese comments. / 请保证代码文件中不含中文注释。 ------ See merge request: Ascend/msmodeling!166 | 26 天前 | |
chore(ci): adopt pre-commit and retire legacy lintrunner adapters Co-authored-by: liujiawang<anonymousdev@163.com> # message auto-generated for no-merge-commit merge: !176 merge pre-commit into develop chore(ci): adopt pre-commit and retire legacy lintrunner adapters Created-by: AvadaKedavrua Commit-by: liujiawang;AvadaKedavrua Merged-by: ascend-robot Description: Thanks for your contribution; we appreciate it a lot. The following instructions will make your pull request healthier and help you get feedback more easily. If you do not understand some items, don't worry, just make the pull request and seek help from maintainers. 感谢您的贡献,我们非常重视。以下说明将使您的拉取请求更健康,更易于获得反馈。如果您不理解某些项目,请不要担心,只需提交拉取请求并从维护人员那里寻求帮助即可。 **PR Type / PR类型** - [ ] Feature(功能新增) - [ ] Bugfix(Bug 修复) - [x] Docs(文档更新) - [x] CI/CD(持续集成/持续部署) - [ ] Refactor(代码重构) - [ ] Perf(性能优化) - [ ] Test-Cases(测试用例更新) - [ ] Other(其他) ------ ## Motivation / 变更动机 Continue the **pre-commit** migration: tighten **Pylint** so only high-signal messages run ( disable=all + explicit enable list), fix real issues that remained under that profile, and translate hook/config comments to **English**. ------ ## Configuration changes(仅工具与注释 / tooling & comments only) | Path | What changed | |------|----------------| | pre-commit/pyproject.toml | **Pylint:** [tool.pylint."messages control"] with disable = ["all"] and a short **allowlist** of message IDs (E0100, E0601–E0611, E0632, E1101, E1120, W0632, W1514). **Ruff:** unchanged behavior; comments translated to English. **Bandit:** comments translated; rule allowlist/skip lists unchanged. | | .pre-commit-config.yaml | Comments translated to English; Bandit hook display name set to **bandit (Python security checks)**. Hook versions and args unchanged except for comment text. | ------ ## Source code changes(应用代码 / application code) | Area | Files | Purpose | |------|--------|---------| | serving_cast | communication.py, engine.py, instance.py, kv_cache_manager.py, load_gen.py, main.py, model_runner.py, request.py, serving.py, utils.py | Replace from . import stime with import serving_cast.stime as stime so Pylint resolves imports (fixes **E0611**). | | serving_cast | stime.py | Singleton **salabim** Environment via _get_sim_env() so type checkers/Pylint see **sim.Environment** (fixes **E1101** on SimulationEnv). | | serving_cast/service | base_throughput_optimizer.py | __init__ defaults + assert runner is not None before run_inference (fixes **E1101** on base class). | | tensor_cast | diffusers/diffusers_model.py, diffusers/diffusers_utils.py, runtime.py | Add **encoding="utf-8"** to open() / trace export (fixes **W1514**). | | web_ui | callbacks.py | **refresh_optimizer_detail:** call _optimizer_detail_view(rows, None, device) and unpack five return values (fixes **E1120**). | ------ ## Recent commits on pre-commit branch - ci(pre-commit): fix pylint message selection with disable=all - fix: resolve pylint findings in serving_cast, tensor_cast, and web_ui - docs(pre-commit): translate comments to English and add all-files run log ------  ------ ## Checklist / 检查列表 - [x] Please ensure code files contain no Chinese comments. / 请保证代码文件中不含中文注释。 See merge request: Ascend/msmodeling!176 | 1 个月前 | |
chore(ci): adopt pre-commit and retire legacy lintrunner adapters Co-authored-by: liujiawang<anonymousdev@163.com> # message auto-generated for no-merge-commit merge: !176 merge pre-commit into develop chore(ci): adopt pre-commit and retire legacy lintrunner adapters Created-by: AvadaKedavrua Commit-by: liujiawang;AvadaKedavrua Merged-by: ascend-robot Description: Thanks for your contribution; we appreciate it a lot. The following instructions will make your pull request healthier and help you get feedback more easily. If you do not understand some items, don't worry, just make the pull request and seek help from maintainers. 感谢您的贡献,我们非常重视。以下说明将使您的拉取请求更健康,更易于获得反馈。如果您不理解某些项目,请不要担心,只需提交拉取请求并从维护人员那里寻求帮助即可。 **PR Type / PR类型** - [ ] Feature(功能新增) - [ ] Bugfix(Bug 修复) - [x] Docs(文档更新) - [x] CI/CD(持续集成/持续部署) - [ ] Refactor(代码重构) - [ ] Perf(性能优化) - [ ] Test-Cases(测试用例更新) - [ ] Other(其他) ------ ## Motivation / 变更动机 Continue the **pre-commit** migration: tighten **Pylint** so only high-signal messages run ( disable=all + explicit enable list), fix real issues that remained under that profile, and translate hook/config comments to **English**. ------ ## Configuration changes(仅工具与注释 / tooling & comments only) | Path | What changed | |------|----------------| | pre-commit/pyproject.toml | **Pylint:** [tool.pylint."messages control"] with disable = ["all"] and a short **allowlist** of message IDs (E0100, E0601–E0611, E0632, E1101, E1120, W0632, W1514). **Ruff:** unchanged behavior; comments translated to English. **Bandit:** comments translated; rule allowlist/skip lists unchanged. | | .pre-commit-config.yaml | Comments translated to English; Bandit hook display name set to **bandit (Python security checks)**. Hook versions and args unchanged except for comment text. | ------ ## Source code changes(应用代码 / application code) | Area | Files | Purpose | |------|--------|---------| | serving_cast | communication.py, engine.py, instance.py, kv_cache_manager.py, load_gen.py, main.py, model_runner.py, request.py, serving.py, utils.py | Replace from . import stime with import serving_cast.stime as stime so Pylint resolves imports (fixes **E0611**). | | serving_cast | stime.py | Singleton **salabim** Environment via _get_sim_env() so type checkers/Pylint see **sim.Environment** (fixes **E1101** on SimulationEnv). | | serving_cast/service | base_throughput_optimizer.py | __init__ defaults + assert runner is not None before run_inference (fixes **E1101** on base class). | | tensor_cast | diffusers/diffusers_model.py, diffusers/diffusers_utils.py, runtime.py | Add **encoding="utf-8"** to open() / trace export (fixes **W1514**). | | web_ui | callbacks.py | **refresh_optimizer_detail:** call _optimizer_detail_view(rows, None, device) and unpack five return values (fixes **E1120**). | ------ ## Recent commits on pre-commit branch - ci(pre-commit): fix pylint message selection with disable=all - fix: resolve pylint findings in serving_cast, tensor_cast, and web_ui - docs(pre-commit): translate comments to English and add all-files run log ------  ------ ## Checklist / 检查列表 - [x] Please ensure code files contain no Chinese comments. / 请保证代码文件中不含中文注释。 See merge request: Ascend/msmodeling!176 | 1 个月前 | |
feat(serving_cast): support chunked prefill modeling Co-authored-by: jia_ya_nan<jiayanan3@h-partners.com> # message auto-generated for no-merge-commit merge: !250 merge feat/chunked-prefill-impl into develop feat(serving_cast): support chunked prefill modeling Created-by: jia_ya_nan Commit-by: jia_ya_nan Merged-by: ascend-robot Description: # PR Template Thanks for your contribution; we appreciate it a lot. The following instructions will make your pull request healthier and help you get feedback more easily. If you do not understand some items, don't worry, just make the pull request and seek help from maintainers. 感谢您的贡献,我们非常重视。以下说明将使您的拉取请求更健康,更易于获得反馈。如果您不理解某些项目,请不要担心,只需提交拉取请求并从维护人员那里寻求帮助即可。 **PR Type / PR类型** - [x] Feature(功能新增) - [ ] Bugfix(Bug 修复) - [ ] Docs(文档更新) - [ ] CI/CD(持续集成/持续部署) - [ ] Refactor(代码重构) - [ ] Perf(性能优化) - [ ] Test-Cases(测试用例更新) - [ ] Other(其他) ## 🔍 Motivation / 变更动机 **Please describe the motivation of this PR and the goal you want to achieve through this PR.** **请描述您的拉取请求的动机和您希望通过此拉取请求实现的目标。** 当前 throughput_optimizer 在混部模式下使用 max_prefill_tokens 作为 prefill token budget,并要求有效输入长度不超过该值。当长上下文请求的 effective_input_length 大于 token budget 时,工具会直接报错,无法模拟实际服务中常见的 chunked prefill 场景。 本 PR 旨在补齐 msmodeling 对 chunked prefill 的建模能力,使吞吐优化器可以在长 prompt 或较小 batch token budget 场景下,自动将 prefill 拆分为多个 chunk 进行估算,并更合理地建模 prefill 与 decode 混部执行对 TTFT、TPOT 和吞吐的影响。 ------ ## 📝 Modification / 修改内容 **Please briefly describe what modification is made in this PR.** **请简要描述此拉取请求中进行的修改。** - 将 CLI 参数 --max-prefill-tokens 重命名为 --max-batched-tokens,用于表达单个 prefill / mixed step 的 token budget。 - 新增 prefill chunk plan 生成逻辑,当 effective_input_length > max_batched_tokens 时自动按 max_batched_tokens 切分 prefill。 - 新增默认调度策略 DecodeFirstWithSlack,支持 decode-first 调度,并允许 15% slack 以避免 decode token 占用导致 prefill chunk 无法调度。 - 聚合模式中新增 chunked prefill 轻量级时间模拟,支持已完成 prefill 的请求提前进入 decode,不再要求所有请求完成 prefill 后统一 decode。 - PD 分离模式中 prefill 阶段支持 chunked prefill,decode 阶段保持原有逻辑。 - 优化 latency cache key,使其区分不同的 query_len、seq_len 和并发形态。 - 输出结果新增 effective_input_length、max_batched_tokens、prefill_num_chunks,便于分析 chunked prefill 配置影响。 - 更新 Web UI 参数生成、表单校验、相关文档和单元测试。 ------ ## 📐 Associated Test Results / 关联测试结果 **Please provide the related test results, such as test reports, etc.** **请提供相关测试结果,例如测试报告等。** 以32条 32k请求为例,无chunk改动前: python -m cli.inference.throughput_optimizer Qwen/Qwen3-32B --device ATLAS_800_A2_280T_64G --quantize-linear-action DISABLED --input-length 32000 --output-length 1024 --tp-sizes 8 --compile --batch-range 32 32 --num-devices 8 --max-prefill-tokens 32000 --log-level info  无chunk改动后: python -m cli.inference.throughput_optimizer Qwen/Qwen3-32B --device ATLAS_800_A2_280T_64G --quantize-linear-action DISABLED --input-length 32000 --output-length 1024 --tp-sizes 8 --compile --batch-range 32 32 --num-devices 8 --max-batched-tokens 32000 --log-level info  结果不变,不影响之前的调度逻辑 chunk为2000: python -m cli.inference.throughput_optimizer Qwen/Qwen3-32B --device ATLAS_800_A2_280T_64G --quantize-linear-action DISABLED --input-length 32000 --output-length 1024 --tp-sizes 8 --compile --batch-range 32 32 --num-devices 8 --max-batched-tokens 2000 --log-level info  chunk为4000: python -m cli.inference.throughput_optimizer Qwen/Qwen3-32B --device ATLAS_800_A2_280T_64G --quantize-linear-action DISABLED --input-length 32000 --output-length 1024 --tp-sizes 8 --compile --batch-range 32 32 --num-devices 8 --max-batched-tokens 4000 --log-level info  chunk为8000: python -m cli.inference.throughput_optimizer Qwen/Qwen3-32B --device ATLAS_800_A2_280T_64G --quantize-linear-action DISABLED --input-length 32000 --output-length 1024 --tp-sizes 8 --compile --batch-range 32 32 --num-devices 8 --max-batched-tokens 4000 --log-level info  从趋势上看,chunk size越小,对tpot越友好;同时chunk 越小,prefill阶段调度越多,ttft会增加,符合预期; 另外,由于开启chunk prefill后,prefill会增加多次计算,导致耗时呈线性增长;可以考虑并行一次性跑完所有切分的prefill,但对资源消耗巨大,考虑在下个PR内提升性能 ------ ## 🌟 Use cases (Optional) / 使用案例(可选) **If this PR introduces a new feature, it is better to list some use cases here and update the documentation.** **如果此拉取请求引入了新功能,最好在此处列出一些用例并更新文档。** ------ ## ✅ Checklist / 检查列表 **Before PR**: - [x] Bug fixes are fully covered by unit tests, the case that causes the bug should be added in the unit tests. / 修复的 Bug 已完全由单元测试覆盖,导致 Bug 的情况应在单元测试中添加。 - [x] The modification is covered by complete unit tests. If not, please add more unit tests to ensure the correctness. / 此拉取请求中的修改已完全由单元测试覆盖。如果不是,请添加更多单元测试以确保正确性。 - [x] All relevant documentation (API docs, docstrings, example tutorials) has been updated to reflect these changes. / 所有相关文档(API 文档、文档字符串、示例教程)已更新以反映这些更改。 - [x] Please ensure code files contain no Chinese comments. / 请保证代码文件中不含中文注释。 ------ See merge request: Ascend/msmodeling!250 | 26 天前 | |
[RFC] docs: add rfc for decode context parallel Co-authored-by: Elrond G<elrondgcn@gmail.com> # message auto-generated for no-merge-commit merge: !211 merge rfc/develop/dcp_rfc into develop [RFC] docs: add rfc for decode context parallel Created-by: elrond-g Commit-by: Elrond G Merged-by: ascend-robot Description: # PR Template Create RFC document for decode context parallel. **PR Type / PR类型** - [ ] Feature(功能新增) - [ ] Bugfix(Bug 修复) - [x] Docs(文档更新) - [ ] CI/CD(持续集成/持续部署) - [ ] Refactor(代码重构) - [ ] Perf(性能优化) - [ ] Test-Cases(测试用例更新) - [ ] Other(其他) ## 🔍 Motivation / 变更动机 Create doc for support Context Parallel. ------ ## 📝 Modification / 修改内容 create doc : docs/RFC/rfc_context_parallel_dcp_cn.md ------ ## 📐 Associated Test Results / 关联测试结果 **Please provide the related test results, such as test reports, etc.** **请提供相关测试结果,例如测试报告等。** ------ ## 🌟 Use cases (Optional) / 使用案例(可选) **If this PR introduces a new feature, it is better to list some use cases here and update the documentation.** **如果此拉取请求引入了新功能,最好在此处列出一些用例并更新文档。** ------ ## ✅ Checklist / 检查列表 **Before PR**: - [ ] Bug fixes are fully covered by unit tests, the case that causes the bug should be added in the unit tests. / 修复的 Bug 已完全由单元测试覆盖,导致 Bug 的情况应在单元测试中添加。 - [ ] The modification is covered by complete unit tests. If not, please add more unit tests to ensure the correctness. / 此拉取请求中的修改已完全由单元测试覆盖。如果不是,请添加更多单元测试以确保正确性。 - [ ] All relevant documentation (API docs, docstrings, example tutorials) has been updated to reflect these changes. / 所有相关文档(API 文档、文档字符串、示例教程)已更新以反映这些更改。 - [ ] Please ensure code files contain no Chinese comments. / 请保证代码文件中不含中文注释。 ------ See merge request: Ascend/msmodeling!211 | 20 天前 | |
[RFC] docs: add rfc for decode context parallel Co-authored-by: Elrond G<elrondgcn@gmail.com> # message auto-generated for no-merge-commit merge: !211 merge rfc/develop/dcp_rfc into develop [RFC] docs: add rfc for decode context parallel Created-by: elrond-g Commit-by: Elrond G Merged-by: ascend-robot Description: # PR Template Create RFC document for decode context parallel. **PR Type / PR类型** - [ ] Feature(功能新增) - [ ] Bugfix(Bug 修复) - [x] Docs(文档更新) - [ ] CI/CD(持续集成/持续部署) - [ ] Refactor(代码重构) - [ ] Perf(性能优化) - [ ] Test-Cases(测试用例更新) - [ ] Other(其他) ## 🔍 Motivation / 变更动机 Create doc for support Context Parallel. ------ ## 📝 Modification / 修改内容 create doc : docs/RFC/rfc_context_parallel_dcp_cn.md ------ ## 📐 Associated Test Results / 关联测试结果 **Please provide the related test results, such as test reports, etc.** **请提供相关测试结果,例如测试报告等。** ------ ## 🌟 Use cases (Optional) / 使用案例(可选) **If this PR introduces a new feature, it is better to list some use cases here and update the documentation.** **如果此拉取请求引入了新功能,最好在此处列出一些用例并更新文档。** ------ ## ✅ Checklist / 检查列表 **Before PR**: - [ ] Bug fixes are fully covered by unit tests, the case that causes the bug should be added in the unit tests. / 修复的 Bug 已完全由单元测试覆盖,导致 Bug 的情况应在单元测试中添加。 - [ ] The modification is covered by complete unit tests. If not, please add more unit tests to ensure the correctness. / 此拉取请求中的修改已完全由单元测试覆盖。如果不是,请添加更多单元测试以确保正确性。 - [ ] All relevant documentation (API docs, docstrings, example tutorials) has been updated to reflect these changes. / 所有相关文档(API 文档、文档字符串、示例教程)已更新以反映这些更改。 - [ ] Please ensure code files contain no Chinese comments. / 请保证代码文件中不含中文注释。 ------ See merge request: Ascend/msmodeling!211 | 20 天前 | |
chore(ci): adopt pre-commit and retire legacy lintrunner adapters Co-authored-by: liujiawang<anonymousdev@163.com> # message auto-generated for no-merge-commit merge: !176 merge pre-commit into develop chore(ci): adopt pre-commit and retire legacy lintrunner adapters Created-by: AvadaKedavrua Commit-by: liujiawang;AvadaKedavrua Merged-by: ascend-robot Description: Thanks for your contribution; we appreciate it a lot. The following instructions will make your pull request healthier and help you get feedback more easily. If you do not understand some items, don't worry, just make the pull request and seek help from maintainers. 感谢您的贡献,我们非常重视。以下说明将使您的拉取请求更健康,更易于获得反馈。如果您不理解某些项目,请不要担心,只需提交拉取请求并从维护人员那里寻求帮助即可。 **PR Type / PR类型** - [ ] Feature(功能新增) - [ ] Bugfix(Bug 修复) - [x] Docs(文档更新) - [x] CI/CD(持续集成/持续部署) - [ ] Refactor(代码重构) - [ ] Perf(性能优化) - [ ] Test-Cases(测试用例更新) - [ ] Other(其他) ------ ## Motivation / 变更动机 Continue the **pre-commit** migration: tighten **Pylint** so only high-signal messages run ( disable=all + explicit enable list), fix real issues that remained under that profile, and translate hook/config comments to **English**. ------ ## Configuration changes(仅工具与注释 / tooling & comments only) | Path | What changed | |------|----------------| | pre-commit/pyproject.toml | **Pylint:** [tool.pylint."messages control"] with disable = ["all"] and a short **allowlist** of message IDs (E0100, E0601–E0611, E0632, E1101, E1120, W0632, W1514). **Ruff:** unchanged behavior; comments translated to English. **Bandit:** comments translated; rule allowlist/skip lists unchanged. | | .pre-commit-config.yaml | Comments translated to English; Bandit hook display name set to **bandit (Python security checks)**. Hook versions and args unchanged except for comment text. | ------ ## Source code changes(应用代码 / application code) | Area | Files | Purpose | |------|--------|---------| | serving_cast | communication.py, engine.py, instance.py, kv_cache_manager.py, load_gen.py, main.py, model_runner.py, request.py, serving.py, utils.py | Replace from . import stime with import serving_cast.stime as stime so Pylint resolves imports (fixes **E0611**). | | serving_cast | stime.py | Singleton **salabim** Environment via _get_sim_env() so type checkers/Pylint see **sim.Environment** (fixes **E1101** on SimulationEnv). | | serving_cast/service | base_throughput_optimizer.py | __init__ defaults + assert runner is not None before run_inference (fixes **E1101** on base class). | | tensor_cast | diffusers/diffusers_model.py, diffusers/diffusers_utils.py, runtime.py | Add **encoding="utf-8"** to open() / trace export (fixes **W1514**). | | web_ui | callbacks.py | **refresh_optimizer_detail:** call _optimizer_detail_view(rows, None, device) and unpack five return values (fixes **E1120**). | ------ ## Recent commits on pre-commit branch - ci(pre-commit): fix pylint message selection with disable=all - fix: resolve pylint findings in serving_cast, tensor_cast, and web_ui - docs(pre-commit): translate comments to English and add all-files run log ------  ------ ## Checklist / 检查列表 - [x] Please ensure code files contain no Chinese comments. / 请保证代码文件中不含中文注释。 See merge request: Ascend/msmodeling!176 | 1 个月前 | |
chore(ci): adopt pre-commit and retire legacy lintrunner adapters Co-authored-by: liujiawang<anonymousdev@163.com> # message auto-generated for no-merge-commit merge: !176 merge pre-commit into develop chore(ci): adopt pre-commit and retire legacy lintrunner adapters Created-by: AvadaKedavrua Commit-by: liujiawang;AvadaKedavrua Merged-by: ascend-robot Description: Thanks for your contribution; we appreciate it a lot. The following instructions will make your pull request healthier and help you get feedback more easily. If you do not understand some items, don't worry, just make the pull request and seek help from maintainers. 感谢您的贡献,我们非常重视。以下说明将使您的拉取请求更健康,更易于获得反馈。如果您不理解某些项目,请不要担心,只需提交拉取请求并从维护人员那里寻求帮助即可。 **PR Type / PR类型** - [ ] Feature(功能新增) - [ ] Bugfix(Bug 修复) - [x] Docs(文档更新) - [x] CI/CD(持续集成/持续部署) - [ ] Refactor(代码重构) - [ ] Perf(性能优化) - [ ] Test-Cases(测试用例更新) - [ ] Other(其他) ------ ## Motivation / 变更动机 Continue the **pre-commit** migration: tighten **Pylint** so only high-signal messages run ( disable=all + explicit enable list), fix real issues that remained under that profile, and translate hook/config comments to **English**. ------ ## Configuration changes(仅工具与注释 / tooling & comments only) | Path | What changed | |------|----------------| | pre-commit/pyproject.toml | **Pylint:** [tool.pylint."messages control"] with disable = ["all"] and a short **allowlist** of message IDs (E0100, E0601–E0611, E0632, E1101, E1120, W0632, W1514). **Ruff:** unchanged behavior; comments translated to English. **Bandit:** comments translated; rule allowlist/skip lists unchanged. | | .pre-commit-config.yaml | Comments translated to English; Bandit hook display name set to **bandit (Python security checks)**. Hook versions and args unchanged except for comment text. | ------ ## Source code changes(应用代码 / application code) | Area | Files | Purpose | |------|--------|---------| | serving_cast | communication.py, engine.py, instance.py, kv_cache_manager.py, load_gen.py, main.py, model_runner.py, request.py, serving.py, utils.py | Replace from . import stime with import serving_cast.stime as stime so Pylint resolves imports (fixes **E0611**). | | serving_cast | stime.py | Singleton **salabim** Environment via _get_sim_env() so type checkers/Pylint see **sim.Environment** (fixes **E1101** on SimulationEnv). | | serving_cast/service | base_throughput_optimizer.py | __init__ defaults + assert runner is not None before run_inference (fixes **E1101** on base class). | | tensor_cast | diffusers/diffusers_model.py, diffusers/diffusers_utils.py, runtime.py | Add **encoding="utf-8"** to open() / trace export (fixes **W1514**). | | web_ui | callbacks.py | **refresh_optimizer_detail:** call _optimizer_detail_view(rows, None, device) and unpack five return values (fixes **E1120**). | ------ ## Recent commits on pre-commit branch - ci(pre-commit): fix pylint message selection with disable=all - fix: resolve pylint findings in serving_cast, tensor_cast, and web_ui - docs(pre-commit): translate comments to English and add all-files run log ------  ------ ## Checklist / 检查列表 - [x] Please ensure code files contain no Chinese comments. / 请保证代码文件中不含中文注释。 See merge request: Ascend/msmodeling!176 | 1 个月前 | |
chore(ci): adopt pre-commit and retire legacy lintrunner adapters Co-authored-by: liujiawang<anonymousdev@163.com> # message auto-generated for no-merge-commit merge: !176 merge pre-commit into develop chore(ci): adopt pre-commit and retire legacy lintrunner adapters Created-by: AvadaKedavrua Commit-by: liujiawang;AvadaKedavrua Merged-by: ascend-robot Description: Thanks for your contribution; we appreciate it a lot. The following instructions will make your pull request healthier and help you get feedback more easily. If you do not understand some items, don't worry, just make the pull request and seek help from maintainers. 感谢您的贡献,我们非常重视。以下说明将使您的拉取请求更健康,更易于获得反馈。如果您不理解某些项目,请不要担心,只需提交拉取请求并从维护人员那里寻求帮助即可。 **PR Type / PR类型** - [ ] Feature(功能新增) - [ ] Bugfix(Bug 修复) - [x] Docs(文档更新) - [x] CI/CD(持续集成/持续部署) - [ ] Refactor(代码重构) - [ ] Perf(性能优化) - [ ] Test-Cases(测试用例更新) - [ ] Other(其他) ------ ## Motivation / 变更动机 Continue the **pre-commit** migration: tighten **Pylint** so only high-signal messages run ( disable=all + explicit enable list), fix real issues that remained under that profile, and translate hook/config comments to **English**. ------ ## Configuration changes(仅工具与注释 / tooling & comments only) | Path | What changed | |------|----------------| | pre-commit/pyproject.toml | **Pylint:** [tool.pylint."messages control"] with disable = ["all"] and a short **allowlist** of message IDs (E0100, E0601–E0611, E0632, E1101, E1120, W0632, W1514). **Ruff:** unchanged behavior; comments translated to English. **Bandit:** comments translated; rule allowlist/skip lists unchanged. | | .pre-commit-config.yaml | Comments translated to English; Bandit hook display name set to **bandit (Python security checks)**. Hook versions and args unchanged except for comment text. | ------ ## Source code changes(应用代码 / application code) | Area | Files | Purpose | |------|--------|---------| | serving_cast | communication.py, engine.py, instance.py, kv_cache_manager.py, load_gen.py, main.py, model_runner.py, request.py, serving.py, utils.py | Replace from . import stime with import serving_cast.stime as stime so Pylint resolves imports (fixes **E0611**). | | serving_cast | stime.py | Singleton **salabim** Environment via _get_sim_env() so type checkers/Pylint see **sim.Environment** (fixes **E1101** on SimulationEnv). | | serving_cast/service | base_throughput_optimizer.py | __init__ defaults + assert runner is not None before run_inference (fixes **E1101** on base class). | | tensor_cast | diffusers/diffusers_model.py, diffusers/diffusers_utils.py, runtime.py | Add **encoding="utf-8"** to open() / trace export (fixes **W1514**). | | web_ui | callbacks.py | **refresh_optimizer_detail:** call _optimizer_detail_view(rows, None, device) and unpack five return values (fixes **E1120**). | ------ ## Recent commits on pre-commit branch - ci(pre-commit): fix pylint message selection with disable=all - fix: resolve pylint findings in serving_cast, tensor_cast, and web_ui - docs(pre-commit): translate comments to English and add all-files run log ------  ------ ## Checklist / 检查列表 - [x] Please ensure code files contain no Chinese comments. / 请保证代码文件中不含中文注释。 See merge request: Ascend/msmodeling!176 | 1 个月前 | |
chore(ci): adopt pre-commit and retire legacy lintrunner adapters Co-authored-by: liujiawang<anonymousdev@163.com> # message auto-generated for no-merge-commit merge: !176 merge pre-commit into develop chore(ci): adopt pre-commit and retire legacy lintrunner adapters Created-by: AvadaKedavrua Commit-by: liujiawang;AvadaKedavrua Merged-by: ascend-robot Description: Thanks for your contribution; we appreciate it a lot. The following instructions will make your pull request healthier and help you get feedback more easily. If you do not understand some items, don't worry, just make the pull request and seek help from maintainers. 感谢您的贡献,我们非常重视。以下说明将使您的拉取请求更健康,更易于获得反馈。如果您不理解某些项目,请不要担心,只需提交拉取请求并从维护人员那里寻求帮助即可。 **PR Type / PR类型** - [ ] Feature(功能新增) - [ ] Bugfix(Bug 修复) - [x] Docs(文档更新) - [x] CI/CD(持续集成/持续部署) - [ ] Refactor(代码重构) - [ ] Perf(性能优化) - [ ] Test-Cases(测试用例更新) - [ ] Other(其他) ------ ## Motivation / 变更动机 Continue the **pre-commit** migration: tighten **Pylint** so only high-signal messages run ( disable=all + explicit enable list), fix real issues that remained under that profile, and translate hook/config comments to **English**. ------ ## Configuration changes(仅工具与注释 / tooling & comments only) | Path | What changed | |------|----------------| | pre-commit/pyproject.toml | **Pylint:** [tool.pylint."messages control"] with disable = ["all"] and a short **allowlist** of message IDs (E0100, E0601–E0611, E0632, E1101, E1120, W0632, W1514). **Ruff:** unchanged behavior; comments translated to English. **Bandit:** comments translated; rule allowlist/skip lists unchanged. | | .pre-commit-config.yaml | Comments translated to English; Bandit hook display name set to **bandit (Python security checks)**. Hook versions and args unchanged except for comment text. | ------ ## Source code changes(应用代码 / application code) | Area | Files | Purpose | |------|--------|---------| | serving_cast | communication.py, engine.py, instance.py, kv_cache_manager.py, load_gen.py, main.py, model_runner.py, request.py, serving.py, utils.py | Replace from . import stime with import serving_cast.stime as stime so Pylint resolves imports (fixes **E0611**). | | serving_cast | stime.py | Singleton **salabim** Environment via _get_sim_env() so type checkers/Pylint see **sim.Environment** (fixes **E1101** on SimulationEnv). | | serving_cast/service | base_throughput_optimizer.py | __init__ defaults + assert runner is not None before run_inference (fixes **E1101** on base class). | | tensor_cast | diffusers/diffusers_model.py, diffusers/diffusers_utils.py, runtime.py | Add **encoding="utf-8"** to open() / trace export (fixes **W1514**). | | web_ui | callbacks.py | **refresh_optimizer_detail:** call _optimizer_detail_view(rows, None, device) and unpack five return values (fixes **E1120**). | ------ ## Recent commits on pre-commit branch - ci(pre-commit): fix pylint message selection with disable=all - fix: resolve pylint findings in serving_cast, tensor_cast, and web_ui - docs(pre-commit): translate comments to English and add all-files run log ------  ------ ## Checklist / 检查列表 - [x] Please ensure code files contain no Chinese comments. / 请保证代码文件中不含中文注释。 See merge request: Ascend/msmodeling!176 | 1 个月前 | |
[feat] 添加SKILL:通过自然语言配置 device Co-authored-by: wendellX<wenxiaoping6@huawei.com> # message auto-generated for no-merge-commit merge: !243 merge skill_device_cfg into develop [feat] 添加SKILL:通过自然语言配置 device Created-by: wendellX Commit-by: wendellX Merged-by: ascend-robot Description: # PR Template Thanks for your contribution; we appreciate it a lot. The following instructions will make your pull request healthier and help you get feedback more easily. If you do not understand some items, don't worry, just make the pull request and seek help from maintainers. 感谢您的贡献,我们非常重视。以下说明将使您的拉取请求更健康,更易于获得反馈。如果您不理解某些项目,请不要担心,只需提交拉取请求并从维护人员那里寻求帮助即可。 **PR Type / PR类型** - [X] Feature(功能新增) - [ ] Bugfix(Bug 修复) - [ ] Docs(文档更新) - [ ] CI/CD(持续集成/持续部署) - [ ] Refactor(代码重构) - [ ] Perf(性能优化) - [ ] Test-Cases(测试用例更新) - [ ] Other(其他) ## 🔍 Motivation / 变更动机 ------ ## 📝 Modification / 修改内容 添加SKILL文件:通过自然语言配置 device ------ ## 📐 Associated Test Results / 关联测试结果 --- 用户:我要导入新的设备拓扑 Claude:我们先不用填复杂参数,我会一步步带你配。 先告诉我下面 3 件事就够了,不知道的直接写“不知道”: 1. 这是什么硬件?例如厂商和型号。 2. 你想按什么粒度建这个拓扑:单卡、单 die / chiplet,还是别的调度单元? 3. 你手头有没有这些原始信息里的任意一部分:显存容量、显存带宽、FP16/BF16/INT8/FP32 算力、以及“几卡/几 die、它们之间怎么连”的描述?有的话原样贴出来就行。 你可以直接像这样回我: - 型号:XXX - 粒度:单卡 / 单 die / 不确定 - 已知规格:96GB 显存,3.2TB/s,BF16 800T,8 卡,卡间 400GB/s 收到后我会先帮你整理成“已确认 / 还不确定 / 可以先默认”的清单,再继续下一步。 用户:华为的ATLAS_800_A3_560T_128G_DIE。单 die,显存容量 64GiB,显存带宽 1.6TiB/s Claude:我先记下目前已确认的事实。 - 已确认: - 厂商:HUAWEI - profile 候选名: ATLAS_800_A3_560T_128G_DIE - 粒度:单 die - 每个 profile 单元显存:64GiB - 每个 profile 单元显存带宽:1.6TiB/s - 说明: - 仓库里现有的 [tensor_cast/device.py:299-320] ATLAS_800_A3_752T_128G_DIE 也是“名字带 128G,但单 die 显存写 64GiB”的风格,所以你这个命名方式是对得上的。 接下来我只追问 3 件最关键的事,不知道可以直接写“不知道”或“先默认”: 1. 560T 指的是哪种算力? - 例如:FP16 560T、BF16 560T、INT8 560T - 如果资料里还有 FP32 / BF16 / FP16 / INT8 的其他峰值,也请原样贴出来。 2. 资料里有没有单独的“通用 / vector 算力”? - 这项主要影响 softmax、norm、激活函数这类非矩阵算子的估算。 - 你可以直接回: - 没有,先留空 - 或 没有,先用临时估值 3. 这个单 die profile 的互联拓扑你想怎么建? - 你可以直接选一种: - 沿用现有 A3 die 拓扑 - 只建 8 卡节点内 - 我只知道 2 die/卡、8 卡/节点,其它先默认 - 如果你知道互联带宽,顺手告诉我是“单向”还是“双向”。 你回这 3 项后,我就能整理出将写入的字段、默认值和待校准项,再给你最后确认。 用户:FP32 75T,BF16 245.8T,FP16 280T,INT8 560T。通信 grid 形状是 [48, 8, 2]。 Claude:好的,目前我整理出的“已确认”是: - name: ATLAS_800_A3_560T_128G_DIE - vendor: HUAWEI - 粒度:单 die - 每 die 显存:64 GiB - 每 die 显存带宽:1.6 TiB/s - mma_ops: - FP32 = 75T - BF16 = 245.8T - FP16 = 280T - INT8 = 560T - comm_grid 形状:[48, 8, 2] 这基本就是把现有 A3 die 模板 [tensor_cast/device.py:169-182] 和 280T 档位算力风格结合起来了。 还差 3 个会直接写进代码的点,你选一下即可: 1. gp_ops 怎么处理? - 先留空 - 先用临时估值:FP32 8T,BF16/FP16 16T - 我有官方值:... 2. [48, 8, 2] 的每层互联参数,是否直接沿用现有 A3 die 拓扑 [tensor_cast/device.py:169-182]? - 外层:196 GB/s,5.5 us - 中层:196 GB/s,0.5 us - 内层:224 GB/s,0.2 us - comm_efficiency = 0.7 - 类型按当前代码默认 CLOS - 这些带宽会按“单向带宽”写入 你可以直接回:沿用现有 A3 die 拓扑 3. 如果没有更精确数据,是否接受先沿用当前默认效率和静态开销,并标记 needs calibration? - compute_efficiency = 0.7 - memory_efficiency = 0.6 - StaticCost(5us, 2us, 10us) 你回这 3 项后,我就直接改 [tensor_cast/device.py],然后做导入注册验证,并给你可用的 --device ATLAS_800_A3_560T_128G_DIE。 用户:1 使用临时值;2 沿用A3 ; 3 compute_efficiency=0.7,memory_efficiency=0.6。 Claude:还差一个会写进代码的小点:StaticCost 你要不要也先沿用现有 A3 默认值 mma=5us, gp=2us, comm=10us?如果可以,我收到你一句“可以”就直接改代码并验证。 用户:mma_op_cost_s=5e-06、gp_op_cost_s=2e-06、comm_op_cost_s=1e-05。 用这个吧 Claude:已加到 [tensor_cast/device.py:299-320],复用了现有 A3 die 拓扑 [tensor_cast/device.py:169-182]。 - 已注册的 DeviceProfile.name - ATLAS_800_A3_560T_128G_DIE - 直接使用的你提供的事实 - vendor="HUAWEI" - 单 die 粒度 - memory_size_bytes=64 * (1024**3) - memory_bandwidth_bytes_ps=1.6 * (1024**4) - mma_ops - FP32 = 75 * 1e12 - BF16 = 245.8 * 1e12 - FP16 = 280 * 1e12 - INT8 = 560 * 1e12 - comm_grid 形状 [48, 8, 2] - compute_efficiency=0.7 - memory_efficiency=0.6 - StaticCost(mma=5e-06, gp=2e-06, comm=1e-05) - 用户确认的估值 - gp_ops - FP32 = 8 * 1e12 - BF16 = 16 * 1e12 - FP16 = 16 * 1e12 - needs calibration - gp_ops 目前是临时估值,不是官方 vector/general compute 指标 - 互联参数目前沿用现有 A3 die 拓扑: - start_dim=0: 196 GB/s, 5.5 us, comm_efficiency=0.7 - start_dim=1: 196 GB/s, 0.5 us, comm_efficiency=0.7 - start_dim=2: 224 GB/s, 0.2 us, comm_efficiency=0.7 - 验证命令 - python - <<'PY' from tensor_cast.device import DeviceProfile print(DeviceProfile.all_device_profiles["ATLAS_800_A3_560T_128G_DIE"]) PY - 结果:成功打印该 DeviceProfile,注册正常 - 可直接用的命令 - python -m cli.inference.text_generate Qwen/Qwen3-32B --num-queries 2 --query-length 3500 --device ATLAS_800_A3_560T_128G_DIE 已经可用了;如果你愿意,下一步最值得校准的是 gp_ops 是否有官方值。 ------ ## 🌟 Use cases (Optional) / 使用案例(可选) **If this PR introduces a new feature, it is better to list some use cases here and update the documentation.** **如果此拉取请求引入了新功能,最好在此处列出一些用例并更新文档。** ------ ## ✅ Checklist / 检查列表 **Before PR**: - [ ] Bug fixes are fully covered by unit tests, the case that causes the bug should be added in the unit tests. / 修复的 Bug 已完全由单元测试覆盖,导致 Bug 的情况应在单元测试中添加。 - [ ] The modification is covered by complete unit tests. If not, please add more unit tests to ensure the correctness. / 此拉取请求中的修改已完全由单元测试覆盖。如果不是,请添加更多单元测试以确保正确性。 - [ ] All relevant documentation (API docs, docstrings, example tutorials) has been updated to reflect these changes. / 所有相关文档(API 文档、文档字符串、示例教程)已更新以反映这些更改。 - [ ] Please ensure code files contain no Chinese comments. / 请保证代码文件中不含中文注释。 ------ See merge request: Ascend/msmodeling!243 | 28 天前 | |
chore(ci): adopt pre-commit and retire legacy lintrunner adapters Co-authored-by: liujiawang<anonymousdev@163.com> # message auto-generated for no-merge-commit merge: !176 merge pre-commit into develop chore(ci): adopt pre-commit and retire legacy lintrunner adapters Created-by: AvadaKedavrua Commit-by: liujiawang;AvadaKedavrua Merged-by: ascend-robot Description: Thanks for your contribution; we appreciate it a lot. The following instructions will make your pull request healthier and help you get feedback more easily. If you do not understand some items, don't worry, just make the pull request and seek help from maintainers. 感谢您的贡献,我们非常重视。以下说明将使您的拉取请求更健康,更易于获得反馈。如果您不理解某些项目,请不要担心,只需提交拉取请求并从维护人员那里寻求帮助即可。 **PR Type / PR类型** - [ ] Feature(功能新增) - [ ] Bugfix(Bug 修复) - [x] Docs(文档更新) - [x] CI/CD(持续集成/持续部署) - [ ] Refactor(代码重构) - [ ] Perf(性能优化) - [ ] Test-Cases(测试用例更新) - [ ] Other(其他) ------ ## Motivation / 变更动机 Continue the **pre-commit** migration: tighten **Pylint** so only high-signal messages run ( disable=all + explicit enable list), fix real issues that remained under that profile, and translate hook/config comments to **English**. ------ ## Configuration changes(仅工具与注释 / tooling & comments only) | Path | What changed | |------|----------------| | pre-commit/pyproject.toml | **Pylint:** [tool.pylint."messages control"] with disable = ["all"] and a short **allowlist** of message IDs (E0100, E0601–E0611, E0632, E1101, E1120, W0632, W1514). **Ruff:** unchanged behavior; comments translated to English. **Bandit:** comments translated; rule allowlist/skip lists unchanged. | | .pre-commit-config.yaml | Comments translated to English; Bandit hook display name set to **bandit (Python security checks)**. Hook versions and args unchanged except for comment text. | ------ ## Source code changes(应用代码 / application code) | Area | Files | Purpose | |------|--------|---------| | serving_cast | communication.py, engine.py, instance.py, kv_cache_manager.py, load_gen.py, main.py, model_runner.py, request.py, serving.py, utils.py | Replace from . import stime with import serving_cast.stime as stime so Pylint resolves imports (fixes **E0611**). | | serving_cast | stime.py | Singleton **salabim** Environment via _get_sim_env() so type checkers/Pylint see **sim.Environment** (fixes **E1101** on SimulationEnv). | | serving_cast/service | base_throughput_optimizer.py | __init__ defaults + assert runner is not None before run_inference (fixes **E1101** on base class). | | tensor_cast | diffusers/diffusers_model.py, diffusers/diffusers_utils.py, runtime.py | Add **encoding="utf-8"** to open() / trace export (fixes **W1514**). | | web_ui | callbacks.py | **refresh_optimizer_detail:** call _optimizer_detail_view(rows, None, device) and unpack five return values (fixes **E1120**). | ------ ## Recent commits on pre-commit branch - ci(pre-commit): fix pylint message selection with disable=all - fix: resolve pylint findings in serving_cast, tensor_cast, and web_ui - docs(pre-commit): translate comments to English and add all-files run log ------  ------ ## Checklist / 检查列表 - [x] Please ensure code files contain no Chinese comments. / 请保证代码文件中不含中文注释。 See merge request: Ascend/msmodeling!176 | 1 个月前 | |
RFC: Diffusers Hugging Face repo id autoload Co-authored-by: minghang_c<chiminghang@h-partners.com> # message auto-generated for no-merge-commit merge: !202 merge diffusers-hf-autoload into develop RFC: Diffusers Hugging Face repo id autoload Created-by: minghang_c Commit-by: minghang_c Merged-by: ascend-robot Description: ## Summary - Add a formal RFC for allowing video_generate Diffusers model_id to accept either a local directory or a Hugging Face repo id. - Propose resolving non-local model_id values via huggingface_hub.snapshot_download with allow_patterns=["**/config.json"]. - Document goals, non-goals, DFX considerations, testing strategy, risks, and open questions for review. ## Test Plan - [x] Documentation-only change; verified branch status is clean after commit. - [x] Verified branch is based on latest upstream/develop before push. See merge request: Ascend/msmodeling!202 | 26 天前 | |
chore(ci): adopt pre-commit and retire legacy lintrunner adapters Co-authored-by: liujiawang<anonymousdev@163.com> # message auto-generated for no-merge-commit merge: !176 merge pre-commit into develop chore(ci): adopt pre-commit and retire legacy lintrunner adapters Created-by: AvadaKedavrua Commit-by: liujiawang;AvadaKedavrua Merged-by: ascend-robot Description: Thanks for your contribution; we appreciate it a lot. The following instructions will make your pull request healthier and help you get feedback more easily. If you do not understand some items, don't worry, just make the pull request and seek help from maintainers. 感谢您的贡献,我们非常重视。以下说明将使您的拉取请求更健康,更易于获得反馈。如果您不理解某些项目,请不要担心,只需提交拉取请求并从维护人员那里寻求帮助即可。 **PR Type / PR类型** - [ ] Feature(功能新增) - [ ] Bugfix(Bug 修复) - [x] Docs(文档更新) - [x] CI/CD(持续集成/持续部署) - [ ] Refactor(代码重构) - [ ] Perf(性能优化) - [ ] Test-Cases(测试用例更新) - [ ] Other(其他) ------ ## Motivation / 变更动机 Continue the **pre-commit** migration: tighten **Pylint** so only high-signal messages run ( disable=all + explicit enable list), fix real issues that remained under that profile, and translate hook/config comments to **English**. ------ ## Configuration changes(仅工具与注释 / tooling & comments only) | Path | What changed | |------|----------------| | pre-commit/pyproject.toml | **Pylint:** [tool.pylint."messages control"] with disable = ["all"] and a short **allowlist** of message IDs (E0100, E0601–E0611, E0632, E1101, E1120, W0632, W1514). **Ruff:** unchanged behavior; comments translated to English. **Bandit:** comments translated; rule allowlist/skip lists unchanged. | | .pre-commit-config.yaml | Comments translated to English; Bandit hook display name set to **bandit (Python security checks)**. Hook versions and args unchanged except for comment text. | ------ ## Source code changes(应用代码 / application code) | Area | Files | Purpose | |------|--------|---------| | serving_cast | communication.py, engine.py, instance.py, kv_cache_manager.py, load_gen.py, main.py, model_runner.py, request.py, serving.py, utils.py | Replace from . import stime with import serving_cast.stime as stime so Pylint resolves imports (fixes **E0611**). | | serving_cast | stime.py | Singleton **salabim** Environment via _get_sim_env() so type checkers/Pylint see **sim.Environment** (fixes **E1101** on SimulationEnv). | | serving_cast/service | base_throughput_optimizer.py | __init__ defaults + assert runner is not None before run_inference (fixes **E1101** on base class). | | tensor_cast | diffusers/diffusers_model.py, diffusers/diffusers_utils.py, runtime.py | Add **encoding="utf-8"** to open() / trace export (fixes **W1514**). | | web_ui | callbacks.py | **refresh_optimizer_detail:** call _optimizer_detail_view(rows, None, device) and unpack five return values (fixes **E1120**). | ------ ## Recent commits on pre-commit branch - ci(pre-commit): fix pylint message selection with disable=all - fix: resolve pylint findings in serving_cast, tensor_cast, and web_ui - docs(pre-commit): translate comments to English and add all-files run log ------  ------ ## Checklist / 检查列表 - [x] Please ensure code files contain no Chinese comments. / 请保证代码文件中不含中文注释。 See merge request: Ascend/msmodeling!176 | 1 个月前 | |
chore(ci): adopt pre-commit and retire legacy lintrunner adapters Co-authored-by: liujiawang<anonymousdev@163.com> # message auto-generated for no-merge-commit merge: !176 merge pre-commit into develop chore(ci): adopt pre-commit and retire legacy lintrunner adapters Created-by: AvadaKedavrua Commit-by: liujiawang;AvadaKedavrua Merged-by: ascend-robot Description: Thanks for your contribution; we appreciate it a lot. The following instructions will make your pull request healthier and help you get feedback more easily. If you do not understand some items, don't worry, just make the pull request and seek help from maintainers. 感谢您的贡献,我们非常重视。以下说明将使您的拉取请求更健康,更易于获得反馈。如果您不理解某些项目,请不要担心,只需提交拉取请求并从维护人员那里寻求帮助即可。 **PR Type / PR类型** - [ ] Feature(功能新增) - [ ] Bugfix(Bug 修复) - [x] Docs(文档更新) - [x] CI/CD(持续集成/持续部署) - [ ] Refactor(代码重构) - [ ] Perf(性能优化) - [ ] Test-Cases(测试用例更新) - [ ] Other(其他) ------ ## Motivation / 变更动机 Continue the **pre-commit** migration: tighten **Pylint** so only high-signal messages run ( disable=all + explicit enable list), fix real issues that remained under that profile, and translate hook/config comments to **English**. ------ ## Configuration changes(仅工具与注释 / tooling & comments only) | Path | What changed | |------|----------------| | pre-commit/pyproject.toml | **Pylint:** [tool.pylint."messages control"] with disable = ["all"] and a short **allowlist** of message IDs (E0100, E0601–E0611, E0632, E1101, E1120, W0632, W1514). **Ruff:** unchanged behavior; comments translated to English. **Bandit:** comments translated; rule allowlist/skip lists unchanged. | | .pre-commit-config.yaml | Comments translated to English; Bandit hook display name set to **bandit (Python security checks)**. Hook versions and args unchanged except for comment text. | ------ ## Source code changes(应用代码 / application code) | Area | Files | Purpose | |------|--------|---------| | serving_cast | communication.py, engine.py, instance.py, kv_cache_manager.py, load_gen.py, main.py, model_runner.py, request.py, serving.py, utils.py | Replace from . import stime with import serving_cast.stime as stime so Pylint resolves imports (fixes **E0611**). | | serving_cast | stime.py | Singleton **salabim** Environment via _get_sim_env() so type checkers/Pylint see **sim.Environment** (fixes **E1101** on SimulationEnv). | | serving_cast/service | base_throughput_optimizer.py | __init__ defaults + assert runner is not None before run_inference (fixes **E1101** on base class). | | tensor_cast | diffusers/diffusers_model.py, diffusers/diffusers_utils.py, runtime.py | Add **encoding="utf-8"** to open() / trace export (fixes **W1514**). | | web_ui | callbacks.py | **refresh_optimizer_detail:** call _optimizer_detail_view(rows, None, device) and unpack five return values (fixes **E1120**). | ------ ## Recent commits on pre-commit branch - ci(pre-commit): fix pylint message selection with disable=all - fix: resolve pylint findings in serving_cast, tensor_cast, and web_ui - docs(pre-commit): translate comments to English and add all-files run log ------  ------ ## Checklist / 检查列表 - [x] Please ensure code files contain no Chinese comments. / 请保证代码文件中不含中文注释。 See merge request: Ascend/msmodeling!176 | 1 个月前 | |
chore(ci): adopt pre-commit and retire legacy lintrunner adapters Co-authored-by: liujiawang<anonymousdev@163.com> # message auto-generated for no-merge-commit merge: !176 merge pre-commit into develop chore(ci): adopt pre-commit and retire legacy lintrunner adapters Created-by: AvadaKedavrua Commit-by: liujiawang;AvadaKedavrua Merged-by: ascend-robot Description: Thanks for your contribution; we appreciate it a lot. The following instructions will make your pull request healthier and help you get feedback more easily. If you do not understand some items, don't worry, just make the pull request and seek help from maintainers. 感谢您的贡献,我们非常重视。以下说明将使您的拉取请求更健康,更易于获得反馈。如果您不理解某些项目,请不要担心,只需提交拉取请求并从维护人员那里寻求帮助即可。 **PR Type / PR类型** - [ ] Feature(功能新增) - [ ] Bugfix(Bug 修复) - [x] Docs(文档更新) - [x] CI/CD(持续集成/持续部署) - [ ] Refactor(代码重构) - [ ] Perf(性能优化) - [ ] Test-Cases(测试用例更新) - [ ] Other(其他) ------ ## Motivation / 变更动机 Continue the **pre-commit** migration: tighten **Pylint** so only high-signal messages run ( disable=all + explicit enable list), fix real issues that remained under that profile, and translate hook/config comments to **English**. ------ ## Configuration changes(仅工具与注释 / tooling & comments only) | Path | What changed | |------|----------------| | pre-commit/pyproject.toml | **Pylint:** [tool.pylint."messages control"] with disable = ["all"] and a short **allowlist** of message IDs (E0100, E0601–E0611, E0632, E1101, E1120, W0632, W1514). **Ruff:** unchanged behavior; comments translated to English. **Bandit:** comments translated; rule allowlist/skip lists unchanged. | | .pre-commit-config.yaml | Comments translated to English; Bandit hook display name set to **bandit (Python security checks)**. Hook versions and args unchanged except for comment text. | ------ ## Source code changes(应用代码 / application code) | Area | Files | Purpose | |------|--------|---------| | serving_cast | communication.py, engine.py, instance.py, kv_cache_manager.py, load_gen.py, main.py, model_runner.py, request.py, serving.py, utils.py | Replace from . import stime with import serving_cast.stime as stime so Pylint resolves imports (fixes **E0611**). | | serving_cast | stime.py | Singleton **salabim** Environment via _get_sim_env() so type checkers/Pylint see **sim.Environment** (fixes **E1101** on SimulationEnv). | | serving_cast/service | base_throughput_optimizer.py | __init__ defaults + assert runner is not None before run_inference (fixes **E1101** on base class). | | tensor_cast | diffusers/diffusers_model.py, diffusers/diffusers_utils.py, runtime.py | Add **encoding="utf-8"** to open() / trace export (fixes **W1514**). | | web_ui | callbacks.py | **refresh_optimizer_detail:** call _optimizer_detail_view(rows, None, device) and unpack five return values (fixes **E1120**). | ------ ## Recent commits on pre-commit branch - ci(pre-commit): fix pylint message selection with disable=all - fix: resolve pylint findings in serving_cast, tensor_cast, and web_ui - docs(pre-commit): translate comments to English and add all-files run log ------  ------ ## Checklist / 检查列表 - [x] Please ensure code files contain no Chinese comments. / 请保证代码文件中不含中文注释。 See merge request: Ascend/msmodeling!176 | 1 个月前 | |
chore(ci): adopt pre-commit and retire legacy lintrunner adapters Co-authored-by: liujiawang<anonymousdev@163.com> # message auto-generated for no-merge-commit merge: !176 merge pre-commit into develop chore(ci): adopt pre-commit and retire legacy lintrunner adapters Created-by: AvadaKedavrua Commit-by: liujiawang;AvadaKedavrua Merged-by: ascend-robot Description: Thanks for your contribution; we appreciate it a lot. The following instructions will make your pull request healthier and help you get feedback more easily. If you do not understand some items, don't worry, just make the pull request and seek help from maintainers. 感谢您的贡献,我们非常重视。以下说明将使您的拉取请求更健康,更易于获得反馈。如果您不理解某些项目,请不要担心,只需提交拉取请求并从维护人员那里寻求帮助即可。 **PR Type / PR类型** - [ ] Feature(功能新增) - [ ] Bugfix(Bug 修复) - [x] Docs(文档更新) - [x] CI/CD(持续集成/持续部署) - [ ] Refactor(代码重构) - [ ] Perf(性能优化) - [ ] Test-Cases(测试用例更新) - [ ] Other(其他) ------ ## Motivation / 变更动机 Continue the **pre-commit** migration: tighten **Pylint** so only high-signal messages run ( disable=all + explicit enable list), fix real issues that remained under that profile, and translate hook/config comments to **English**. ------ ## Configuration changes(仅工具与注释 / tooling & comments only) | Path | What changed | |------|----------------| | pre-commit/pyproject.toml | **Pylint:** [tool.pylint."messages control"] with disable = ["all"] and a short **allowlist** of message IDs (E0100, E0601–E0611, E0632, E1101, E1120, W0632, W1514). **Ruff:** unchanged behavior; comments translated to English. **Bandit:** comments translated; rule allowlist/skip lists unchanged. | | .pre-commit-config.yaml | Comments translated to English; Bandit hook display name set to **bandit (Python security checks)**. Hook versions and args unchanged except for comment text. | ------ ## Source code changes(应用代码 / application code) | Area | Files | Purpose | |------|--------|---------| | serving_cast | communication.py, engine.py, instance.py, kv_cache_manager.py, load_gen.py, main.py, model_runner.py, request.py, serving.py, utils.py | Replace from . import stime with import serving_cast.stime as stime so Pylint resolves imports (fixes **E0611**). | | serving_cast | stime.py | Singleton **salabim** Environment via _get_sim_env() so type checkers/Pylint see **sim.Environment** (fixes **E1101** on SimulationEnv). | | serving_cast/service | base_throughput_optimizer.py | __init__ defaults + assert runner is not None before run_inference (fixes **E1101** on base class). | | tensor_cast | diffusers/diffusers_model.py, diffusers/diffusers_utils.py, runtime.py | Add **encoding="utf-8"** to open() / trace export (fixes **W1514**). | | web_ui | callbacks.py | **refresh_optimizer_detail:** call _optimizer_detail_view(rows, None, device) and unpack five return values (fixes **E1120**). | ------ ## Recent commits on pre-commit branch - ci(pre-commit): fix pylint message selection with disable=all - fix: resolve pylint findings in serving_cast, tensor_cast, and web_ui - docs(pre-commit): translate comments to English and add all-files run log ------  ------ ## Checklist / 检查列表 - [x] Please ensure code files contain no Chinese comments. / 请保证代码文件中不含中文注释。 See merge request: Ascend/msmodeling!176 | 1 个月前 | |
[RFC] mixed-batch variable tokens Co-authored-by: stormchasingg<sh_ding@zju.edu.cn> # message auto-generated for no-merge-commit merge: !196 merge bucket-token-rfc into develop [RFC] mixed-batch variable tokens Created-by: stormchasingg Commit-by: stormchasingg Merged-by: ascend-robot Description: # PR Template Thanks for your contribution; we appreciate it a lot. The following instructions will make your pull request healthier and help you get feedback more easily. If you do not understand some items, don't worry, just make the pull request and seek help from maintainers. 感谢您的贡献,我们非常重视。以下说明将使您的拉取请求更健康,更易于获得反馈。如果您不理解某些项目,请不要担心,只需提交拉取请求并从维护人员那里寻求帮助即可。 **PR Type / PR类型** - [x] Feature(功能新增) - [ ] Bugfix(Bug 修复) - [x] Docs(文档更新) - [ ] CI/CD(持续集成/持续部署) - [ ] Refactor(代码重构) - [ ] Perf(性能优化) - [ ] Test-Cases(测试用例更新) - [ ] Other(其他) ## 🔍 Motivation / 变更动机 This PR updates the throughput optimizer variable-token implementation from the original per-bin homogeneous approximation to the current mixed-batch modeling flow, and aligns the code structure, final summary output, and RFC documents with the current implementation. The goals are: - support built-in --length-distribution based mixed-batch prefill modeling in disaggregation mode - construct real heterogeneous request batches and reuse the existing varlen execution path - expose composition detail rows in the final report - simplify and refactor the summary/reporting path - refresh RFC documents so that they match the current codebase instead of the initial design version 本 PR 将 throughput optimizer 的变长 token 实现从最初的按桶同构近似方案整理为当前 mixed-batch 建模流程,并同步更新代码结构、最终汇总输出和 RFC 文档,使其与当前实现保持一致。 主要目标: - 支持基于内置 --length-distribution 的 disaggregation prefill mixed-batch 建模 - 构造真实 heterogeneous request batch,并复用现有 varlen 执行路径 - 在最终结果中展示 composition detail rows - 简化并重构 summary/reporting 路径 - 更新 RFC 文档,使其反映当前代码现状而不是最初设计稿 ------ ## 📝 Modification / 修改内容 This PR mainly includes: 1. Variable-token mixed-batch modeling - add built-in --length-distribution mode in throughput optimizer - load built-in serving_cast/example/length_distribution.yaml - add LengthBin / LengthDistribution - add representative-row generation and deterministic concurrency sample allocation in OptimizerData - support prefix-cache-aware query_len and normalized request_ratio 2. Mixed-batch execution path - add _get_batched_forward_info() in BaseThroughputOptimizer - build heterogeneous List[RequestInfo] - execute mixed batches through generate_inputs_varlen 3. Disaggregation integration - enable variable-token prefill path in DisaggThroughputOptimizer - compute throughput using true batch token count: - Σ(num_input_tokens * samples) - emit one aggregate row plus multiple composition detail rows 4. Summary and final table refactor - keep OptimizerSummary as the unified summary entry - add batched final-output path inside OptimizerSummary - expand composition rows in final report - add _get_disagg_table_buf_batched() for mixed-batch final table - simplify final displayed columns to focus on: - num_input_tokens - request_ratio - samples - concurrency - TTFT - Throughput - parallel - batch_size 5. Naming and cleanup - align distribution row fields with RequestInfo semantics: - input_tokens -> num_input_tokens - effective_input_tokens -> query_len - remove stale helpers / branches / tests tied to earlier summary and naming design 6. Documentation updates - update: - docs/RFC/rfc_mixed_batch_variable_tokens_en.md - docs/RFC/rfc_mixed_batch_variable_tokens_zh.md - align module interaction flow, scope, limitations, and current implementation details with the latest code 本 PR 主要包含: 1. 变长 token mixed-batch 建模 - 在 throughput optimizer 中增加内置 --length-distribution 模式 - 加载内置 serving_cast/example/length_distribution.yaml - 增加 LengthBin / LengthDistribution - 在 OptimizerData 中增加 representative row 生成和确定性 sample 分配 - 支持考虑 prefix cache 的 query_len 和归一化后的 request_ratio 2. Mixed-batch 执行路径 - 在 BaseThroughputOptimizer 中增加 _get_batched_forward_info() - 构造 heterogeneous List[RequestInfo] - 通过 generate_inputs_varlen 执行 mixed batch 3. Disaggregation 接入 - 在 DisaggThroughputOptimizer 中接入 variable-token prefill 路径 - 使用 batch 的真实 token 总量计算吞吐: - Σ(num_input_tokens * samples) - 产出 1 条 aggregate row 和多条 composition detail rows 4. Summary 与最终表格重构 - 统一保留 OptimizerSummary 作为 summary 入口 - 在 OptimizerSummary 中增加 batched final-output 路径 - 在最终报告中展开 composition rows - 增加 _get_disagg_table_buf_batched() 用于 mixed-batch 最终表格 - 精简最终展示列,聚焦: - num_input_tokens - request_ratio - samples - concurrency - TTFT - Throughput - parallel - batch_size 5. 命名与清理 - 将分布行字段与 RequestInfo 语义对齐: - input_tokens -> num_input_tokens - effective_input_tokens -> query_len - 清理旧的 helper、分支和测试 6. 文档更新 - 更新: - docs/RFC/rfc_mixed_batch_variable_tokens_en.md - docs/RFC/rfc_mixed_batch_variable_tokens_zh.md - 使模块交互流程、适用范围、局限性和当前实现细节与最新代码保持一致 ------ ## 📐 Associated Test Results / 关联测试结果 **Please provide the related test results, such as test reports, etc.** **请提供相关测试结果,例如测试报告等。** ------ ## 🌟 Use cases (Optional) / 使用案例(可选) **If this PR introduces a new feature, it is better to list some use cases here and update the documentation.** **如果此拉取请求引入了新功能,最好在此处列出一些用例并更新文档。** ------ ## ✅ Checklist / 检查列表 **Before PR**: - [ ] Bug fixes are fully covered by unit tests, the case that causes the bug should be added in the unit tests. / 修复的 Bug 已完全由单元测试覆盖,导致 Bug 的情况应在单元测试中添加。 - [ ] The modification is covered by complete unit tests. If not, please add more unit tests to ensure the correctness. / 此拉取请求中的修改已完全由单元测试覆盖。如果不是,请添加更多单元测试以确保正确性。 - [x] All relevant documentation (API docs, docstrings, example tutorials) has been updated to reflect these changes. / 所有相关文档(API 文档、文档字符串、示例教程)已更新以反映这些更改。 - [ ] Please ensure code files contain no Chinese comments. / 请保证代码文件中不含中文注释。 ------ See merge request: Ascend/msmodeling!196 | 27 天前 | |
[RFC] mixed-batch variable tokens Co-authored-by: stormchasingg<sh_ding@zju.edu.cn> # message auto-generated for no-merge-commit merge: !196 merge bucket-token-rfc into develop [RFC] mixed-batch variable tokens Created-by: stormchasingg Commit-by: stormchasingg Merged-by: ascend-robot Description: # PR Template Thanks for your contribution; we appreciate it a lot. The following instructions will make your pull request healthier and help you get feedback more easily. If you do not understand some items, don't worry, just make the pull request and seek help from maintainers. 感谢您的贡献,我们非常重视。以下说明将使您的拉取请求更健康,更易于获得反馈。如果您不理解某些项目,请不要担心,只需提交拉取请求并从维护人员那里寻求帮助即可。 **PR Type / PR类型** - [x] Feature(功能新增) - [ ] Bugfix(Bug 修复) - [x] Docs(文档更新) - [ ] CI/CD(持续集成/持续部署) - [ ] Refactor(代码重构) - [ ] Perf(性能优化) - [ ] Test-Cases(测试用例更新) - [ ] Other(其他) ## 🔍 Motivation / 变更动机 This PR updates the throughput optimizer variable-token implementation from the original per-bin homogeneous approximation to the current mixed-batch modeling flow, and aligns the code structure, final summary output, and RFC documents with the current implementation. The goals are: - support built-in --length-distribution based mixed-batch prefill modeling in disaggregation mode - construct real heterogeneous request batches and reuse the existing varlen execution path - expose composition detail rows in the final report - simplify and refactor the summary/reporting path - refresh RFC documents so that they match the current codebase instead of the initial design version 本 PR 将 throughput optimizer 的变长 token 实现从最初的按桶同构近似方案整理为当前 mixed-batch 建模流程,并同步更新代码结构、最终汇总输出和 RFC 文档,使其与当前实现保持一致。 主要目标: - 支持基于内置 --length-distribution 的 disaggregation prefill mixed-batch 建模 - 构造真实 heterogeneous request batch,并复用现有 varlen 执行路径 - 在最终结果中展示 composition detail rows - 简化并重构 summary/reporting 路径 - 更新 RFC 文档,使其反映当前代码现状而不是最初设计稿 ------ ## 📝 Modification / 修改内容 This PR mainly includes: 1. Variable-token mixed-batch modeling - add built-in --length-distribution mode in throughput optimizer - load built-in serving_cast/example/length_distribution.yaml - add LengthBin / LengthDistribution - add representative-row generation and deterministic concurrency sample allocation in OptimizerData - support prefix-cache-aware query_len and normalized request_ratio 2. Mixed-batch execution path - add _get_batched_forward_info() in BaseThroughputOptimizer - build heterogeneous List[RequestInfo] - execute mixed batches through generate_inputs_varlen 3. Disaggregation integration - enable variable-token prefill path in DisaggThroughputOptimizer - compute throughput using true batch token count: - Σ(num_input_tokens * samples) - emit one aggregate row plus multiple composition detail rows 4. Summary and final table refactor - keep OptimizerSummary as the unified summary entry - add batched final-output path inside OptimizerSummary - expand composition rows in final report - add _get_disagg_table_buf_batched() for mixed-batch final table - simplify final displayed columns to focus on: - num_input_tokens - request_ratio - samples - concurrency - TTFT - Throughput - parallel - batch_size 5. Naming and cleanup - align distribution row fields with RequestInfo semantics: - input_tokens -> num_input_tokens - effective_input_tokens -> query_len - remove stale helpers / branches / tests tied to earlier summary and naming design 6. Documentation updates - update: - docs/RFC/rfc_mixed_batch_variable_tokens_en.md - docs/RFC/rfc_mixed_batch_variable_tokens_zh.md - align module interaction flow, scope, limitations, and current implementation details with the latest code 本 PR 主要包含: 1. 变长 token mixed-batch 建模 - 在 throughput optimizer 中增加内置 --length-distribution 模式 - 加载内置 serving_cast/example/length_distribution.yaml - 增加 LengthBin / LengthDistribution - 在 OptimizerData 中增加 representative row 生成和确定性 sample 分配 - 支持考虑 prefix cache 的 query_len 和归一化后的 request_ratio 2. Mixed-batch 执行路径 - 在 BaseThroughputOptimizer 中增加 _get_batched_forward_info() - 构造 heterogeneous List[RequestInfo] - 通过 generate_inputs_varlen 执行 mixed batch 3. Disaggregation 接入 - 在 DisaggThroughputOptimizer 中接入 variable-token prefill 路径 - 使用 batch 的真实 token 总量计算吞吐: - Σ(num_input_tokens * samples) - 产出 1 条 aggregate row 和多条 composition detail rows 4. Summary 与最终表格重构 - 统一保留 OptimizerSummary 作为 summary 入口 - 在 OptimizerSummary 中增加 batched final-output 路径 - 在最终报告中展开 composition rows - 增加 _get_disagg_table_buf_batched() 用于 mixed-batch 最终表格 - 精简最终展示列,聚焦: - num_input_tokens - request_ratio - samples - concurrency - TTFT - Throughput - parallel - batch_size 5. 命名与清理 - 将分布行字段与 RequestInfo 语义对齐: - input_tokens -> num_input_tokens - effective_input_tokens -> query_len - 清理旧的 helper、分支和测试 6. 文档更新 - 更新: - docs/RFC/rfc_mixed_batch_variable_tokens_en.md - docs/RFC/rfc_mixed_batch_variable_tokens_zh.md - 使模块交互流程、适用范围、局限性和当前实现细节与最新代码保持一致 ------ ## 📐 Associated Test Results / 关联测试结果 **Please provide the related test results, such as test reports, etc.** **请提供相关测试结果,例如测试报告等。** ------ ## 🌟 Use cases (Optional) / 使用案例(可选) **If this PR introduces a new feature, it is better to list some use cases here and update the documentation.** **如果此拉取请求引入了新功能,最好在此处列出一些用例并更新文档。** ------ ## ✅ Checklist / 检查列表 **Before PR**: - [ ] Bug fixes are fully covered by unit tests, the case that causes the bug should be added in the unit tests. / 修复的 Bug 已完全由单元测试覆盖,导致 Bug 的情况应在单元测试中添加。 - [ ] The modification is covered by complete unit tests. If not, please add more unit tests to ensure the correctness. / 此拉取请求中的修改已完全由单元测试覆盖。如果不是,请添加更多单元测试以确保正确性。 - [x] All relevant documentation (API docs, docstrings, example tutorials) has been updated to reflect these changes. / 所有相关文档(API 文档、文档字符串、示例教程)已更新以反映这些更改。 - [ ] Please ensure code files contain no Chinese comments. / 请保证代码文件中不含中文注释。 ------ See merge request: Ascend/msmodeling!196 | 27 天前 | |
feat(skills): add msmodeling-env-installer Co-authored-by: lutean<lutean1@huawei.com> # message auto-generated for no-merge-commit merge: !279 merge develop into develop feat(skills): add msmodeling-env-installer Created-by: lutean Commit-by: lutean Merged-by: ascend-robot Description: # PR Template Thanks for your contribution; we appreciate it a lot. The following instructions will make your pull request healthier and help you get feedback more easily. If you do not understand some items, don't worry, just make the pull request and seek help from maintainers. 感谢您的贡献,我们非常重视。以下说明将使您的拉取请求更健康,更易于获得反馈。如果您不理解某些项目,请不要担心,只需提交拉取请求并从维护人员那里寻求帮助即可。 **PR Type / PR类型** - [x] Feature(功能新增) - [ ] Bugfix(Bug 修复) - [ ] Docs(文档更新) - [ ] CI/CD(持续集成/持续部署) - [ ] Refactor(代码重构) - [ ] Perf(性能优化) - [ ] Test-Cases(测试用例更新) - [ ] Other(其他) ## 🔍 Motivation / 变更动机 **Please describe the motivation of this PR and the goal you want to achieve through this PR.** **请描述您的拉取请求的动机和您希望通过此拉取请求实现的目标。** msmodeling 项目的环境安装涉及 Python 版本、虚拟环境创建、依赖安装、 PYTHONPATH 配置以及可选的 Hugging Face 镜像源设置等步骤。当前这些操作主要依赖人工阅读文档并手动执行,容易因平台差异、命令遗漏或环境变量配置不一致导致安装失败,增加新贡献者和 AI Agent 进入项目的成本。 本 PR 引入 msmodeling-env-installer skill,沉淀标准化的环境安装流程,并提供 Windows 自动化安装脚本,使 Agent 能够在明确规则下完成依赖安装、环境校验和问题定位。该能力可以提升首次环境搭建的一致性,减少重复沟通和手工排障成本,为后续 TensorCast、ServingCast 相关开发和测试提供更稳定的基础环境。 ------ ## 📝 Modification / 修改内容 **Please briefly describe what modification is made in this PR.** **请简要描述此拉取请求中进行的修改。** | 文件 | 说明 | |:---|:---| | .agents\skills\msmodeling-env-installer\SKILL.md | Skill 主说明、触发条件和执行规则 | | .agents\skills\msmodeling-env-installer\scripts\install-current-project-deps.ps1 | Windows PowerShell 自动化安装脚本 | | .agents\skills\msmodeling-env-installer\scripts\install-current-project-deps.sh | Linux/macOS/WSL/Git Bash 自动化安装脚本 | | .agents\skills\README.md | skills 索引与 quick start | | docs/RFC/rfc_msmodeling_env_installer_skill_zh.md | 本 RFC 文档 | ------ ## 📐 Associated Test Results / 关联测试结果 **Please provide the related test results, such as test reports, etc.** **请提供相关测试结果,例如测试报告等。** ------ ## 🌟 Use cases (Optional) / 使用案例(可选) **If this PR introduces a new feature, it is better to list some use cases here and update the documentation.** **如果此拉取请求引入了新功能,最好在此处列出一些用例并更新文档。** ------ ## ✅ Checklist / 检查列表 **Before PR**: - [ ] Bug fixes are fully covered by unit tests, the case that causes the bug should be added in the unit tests. / 修复的 Bug 已完全由单元测试覆盖,导致 Bug 的情况应在单元测试中添加。 - [ ] The modification is covered by complete unit tests. If not, please add more unit tests to ensure the correctness. / 此拉取请求中的修改已完全由单元测试覆盖。如果不是,请添加更多单元测试以确保正确性。 - [ ] All relevant documentation (API docs, docstrings, example tutorials) has been updated to reflect these changes. / 所有相关文档(API 文档、文档字符串、示例教程)已更新以反映这些更改。 - [ ] Please ensure code files contain no Chinese comments. / 请保证代码文件中不含中文注释。 ------ See merge request: Ascend/msmodeling!279 | 26 天前 | |
[feat]添加服务化寻优部署&参数推荐 skill Co-authored-by: wendellX<wenxiaoping6@huawei.com> # message auto-generated for no-merge-commit merge: !289 merge skill_param_optimizer into develop [feat]添加服务化寻优部署&参数推荐 skill Created-by: wendellX Commit-by: wendellX Merged-by: ascend-robot Description: # PR Template Thanks for your contribution; we appreciate it a lot. The following instructions will make your pull request healthier and help you get feedback more easily. If you do not understand some items, don't worry, just make the pull request and seek help from maintainers. 感谢您的贡献,我们非常重视。以下说明将使您的拉取请求更健康,更易于获得反馈。如果您不理解某些项目,请不要担心,只需提交拉取请求并从维护人员那里寻求帮助即可。 **PR Type / PR类型** - [x] Feature(功能新增) - [ ] Bugfix(Bug 修复) - [ ] Docs(文档更新) - [ ] CI/CD(持续集成/持续部署) - [ ] Refactor(代码重构) - [ ] Perf(性能优化) - [ ] Test-Cases(测试用例更新) - [ ] Other(其他) ## 🔍 Motivation / 变更动机 **[feat]添加服务化寻优部署&参数推荐 skill ** ------ ## 📝 Modification / 修改内容 **[feat]添加服务化寻优部署&参数推荐 skill ** ------ ## 📐 Associated Test Results / 关联测试结果 ### 1 部署:   ### 2 参数推荐:      ### 3 参数写入 config.toml    #### 3.1 在 config.toml 中校验  #### 3.2 运行:  ------ ## 🌟 Use cases (Optional) / 使用案例(可选) **If this PR introduces a new feature, it is better to list some use cases here and update the documentation.** **如果此拉取请求引入了新功能,最好在此处列出一些用例并更新文档。** ------ ## ✅ Checklist / 检查列表 **Before PR**: - [ ] Bug fixes are fully covered by unit tests, the case that causes the bug should be added in the unit tests. / 修复的 Bug 已完全由单元测试覆盖,导致 Bug 的情况应在单元测试中添加。 - [ ] The modification is covered by complete unit tests. If not, please add more unit tests to ensure the correctness. / 此拉取请求中的修改已完全由单元测试覆盖。如果不是,请添加更多单元测试以确保正确性。 - [ ] All relevant documentation (API docs, docstrings, example tutorials) has been updated to reflect these changes. / 所有相关文档(API 文档、文档字符串、示例教程)已更新以反映这些更改。 - [ ] Please ensure code files contain no Chinese comments. / 请保证代码文件中不含中文注释。 ------ See merge request: Ascend/msmodeling!289 | 26 天前 | |
[feat]添加服务化寻优部署&参数推荐 skill Co-authored-by: wendellX<wenxiaoping6@huawei.com> # message auto-generated for no-merge-commit merge: !289 merge skill_param_optimizer into develop [feat]添加服务化寻优部署&参数推荐 skill Created-by: wendellX Commit-by: wendellX Merged-by: ascend-robot Description: # PR Template Thanks for your contribution; we appreciate it a lot. The following instructions will make your pull request healthier and help you get feedback more easily. If you do not understand some items, don't worry, just make the pull request and seek help from maintainers. 感谢您的贡献,我们非常重视。以下说明将使您的拉取请求更健康,更易于获得反馈。如果您不理解某些项目,请不要担心,只需提交拉取请求并从维护人员那里寻求帮助即可。 **PR Type / PR类型** - [x] Feature(功能新增) - [ ] Bugfix(Bug 修复) - [ ] Docs(文档更新) - [ ] CI/CD(持续集成/持续部署) - [ ] Refactor(代码重构) - [ ] Perf(性能优化) - [ ] Test-Cases(测试用例更新) - [ ] Other(其他) ## 🔍 Motivation / 变更动机 **[feat]添加服务化寻优部署&参数推荐 skill ** ------ ## 📝 Modification / 修改内容 **[feat]添加服务化寻优部署&参数推荐 skill ** ------ ## 📐 Associated Test Results / 关联测试结果 ### 1 部署:   ### 2 参数推荐:      ### 3 参数写入 config.toml    #### 3.1 在 config.toml 中校验  #### 3.2 运行:  ------ ## 🌟 Use cases (Optional) / 使用案例(可选) **If this PR introduces a new feature, it is better to list some use cases here and update the documentation.** **如果此拉取请求引入了新功能,最好在此处列出一些用例并更新文档。** ------ ## ✅ Checklist / 检查列表 **Before PR**: - [ ] Bug fixes are fully covered by unit tests, the case that causes the bug should be added in the unit tests. / 修复的 Bug 已完全由单元测试覆盖,导致 Bug 的情况应在单元测试中添加。 - [ ] The modification is covered by complete unit tests. If not, please add more unit tests to ensure the correctness. / 此拉取请求中的修改已完全由单元测试覆盖。如果不是,请添加更多单元测试以确保正确性。 - [ ] All relevant documentation (API docs, docstrings, example tutorials) has been updated to reflect these changes. / 所有相关文档(API 文档、文档字符串、示例教程)已更新以反映这些更改。 - [ ] Please ensure code files contain no Chinese comments. / 请保证代码文件中不含中文注释。 ------ See merge request: Ascend/msmodeling!289 | 26 天前 | |
【rfc】add rfc for pd ratio Co-authored-by: jia_ya_nan<jiayanan3@h-partners.com> # message auto-generated for no-merge-commit merge: !112 merge rfc/pd-ratio-throughput-optimization into develop 【rfc】add rfc for pd ratio Created-by: jia_ya_nan Commit-by: jia_ya_nan Merged-by: ascend-robot Description: Thanks for your contribution; we appreciate it a lot. The following instructions will make your pull request healthier and help you get feedback more easily. If you do not understand some items, don't worry, just make the pull request and seek help from maintainers. 感谢您的贡献,我们非常重视。以下说明将使您的拉取请求更健康,更易于获得反馈。如果您不理解某些项目,请不要担心,只需提交拉取请求并从维护人员那里寻求帮助即可。 **PR Type / PR类型** - [ ] Feature(功能新增) - [ ] Bugfix(Bug 修复) - [x] Docs(文档更新) - [ ] CI/CD(持续集成/持续部署) - [ ] Refactor(代码重构) - [ ] Perf(性能优化) - [ ] Test-Cases(测试用例更新) - [ ] Other(其他) ## 🔍 Motivation / 变更动机 **Please describe the motivation of this PR and the goal you want to achieve through this PR.** **请描述您的拉取请求的动机和您希望通过此拉取请求实现的目标。** rfc:支持pd配比寻优 ------ ## 📝 Modification / 修改内容 **Please briefly describe what modification is made in this PR.** **请简要描述此拉取请求中进行的修改。** ------ ## 📐 Associated Test Results / 关联测试结果 **Please provide the related test results, such as test reports, etc.** **请提供相关测试结果,例如测试报告等。** ------ ## 🌟 Use cases (Optional) / 使用案例(可选) **If this PR introduces a new feature, it is better to list some use cases here and update the documentation.** **如果此拉取请求引入了新功能,最好在此处列出一些用例并更新文档。** ------ ## ✅ Checklist / 检查列表 **Before PR**: - [x] [Linting tools](https://gitcode.com/Ascend/msmodeling/blob/develop/tensor_cast/README.md#coding-style) are used to fix the potential lint issues. / 使用 [lintrunner 工具](https://gitcode.com/Ascend/msmodeling/blob/develop/tensor_cast/README.md#coding-style) 来修复潜在的 lint 问题。 - [x] Bug fixes are fully covered by unit tests, the case that causes the bug should be added in the unit tests. / 修复的 Bug 已完全由单元测试覆盖,导致 Bug 的情况应在单元测试中添加。 - [x] The modification is covered by complete unit tests. If not, please add more unit tests to ensure the correctness. / 此拉取请求中的修改已完全由单元测试覆盖。如果不是,请添加更多单元测试以确保正确性。 - [x] All relevant documentation (API docs, docstrings, example tutorials) has been updated to reflect these changes. / 所有相关文档(API 文档、文档字符串、示例教程)已更新以反映这些更改。 - [x] Please ensure code files contain no Chinese comments. / 请保证代码文件中不含中文注释。 ------ See merge request: Ascend/msmodeling!112 | 2 个月前 | |
【rfc】add rfc for pd ratio Co-authored-by: jia_ya_nan<jiayanan3@h-partners.com> # message auto-generated for no-merge-commit merge: !112 merge rfc/pd-ratio-throughput-optimization into develop 【rfc】add rfc for pd ratio Created-by: jia_ya_nan Commit-by: jia_ya_nan Merged-by: ascend-robot Description: Thanks for your contribution; we appreciate it a lot. The following instructions will make your pull request healthier and help you get feedback more easily. If you do not understand some items, don't worry, just make the pull request and seek help from maintainers. 感谢您的贡献,我们非常重视。以下说明将使您的拉取请求更健康,更易于获得反馈。如果您不理解某些项目,请不要担心,只需提交拉取请求并从维护人员那里寻求帮助即可。 **PR Type / PR类型** - [ ] Feature(功能新增) - [ ] Bugfix(Bug 修复) - [x] Docs(文档更新) - [ ] CI/CD(持续集成/持续部署) - [ ] Refactor(代码重构) - [ ] Perf(性能优化) - [ ] Test-Cases(测试用例更新) - [ ] Other(其他) ## 🔍 Motivation / 变更动机 **Please describe the motivation of this PR and the goal you want to achieve through this PR.** **请描述您的拉取请求的动机和您希望通过此拉取请求实现的目标。** rfc:支持pd配比寻优 ------ ## 📝 Modification / 修改内容 **Please briefly describe what modification is made in this PR.** **请简要描述此拉取请求中进行的修改。** ------ ## 📐 Associated Test Results / 关联测试结果 **Please provide the related test results, such as test reports, etc.** **请提供相关测试结果,例如测试报告等。** ------ ## 🌟 Use cases (Optional) / 使用案例(可选) **If this PR introduces a new feature, it is better to list some use cases here and update the documentation.** **如果此拉取请求引入了新功能,最好在此处列出一些用例并更新文档。** ------ ## ✅ Checklist / 检查列表 **Before PR**: - [x] [Linting tools](https://gitcode.com/Ascend/msmodeling/blob/develop/tensor_cast/README.md#coding-style) are used to fix the potential lint issues. / 使用 [lintrunner 工具](https://gitcode.com/Ascend/msmodeling/blob/develop/tensor_cast/README.md#coding-style) 来修复潜在的 lint 问题。 - [x] Bug fixes are fully covered by unit tests, the case that causes the bug should be added in the unit tests. / 修复的 Bug 已完全由单元测试覆盖,导致 Bug 的情况应在单元测试中添加。 - [x] The modification is covered by complete unit tests. If not, please add more unit tests to ensure the correctness. / 此拉取请求中的修改已完全由单元测试覆盖。如果不是,请添加更多单元测试以确保正确性。 - [x] All relevant documentation (API docs, docstrings, example tutorials) has been updated to reflect these changes. / 所有相关文档(API 文档、文档字符串、示例教程)已更新以反映这些更改。 - [x] Please ensure code files contain no Chinese comments. / 请保证代码文件中不含中文注释。 ------ See merge request: Ascend/msmodeling!112 | 2 个月前 | |
docs: add performance database collection tooling RFC Co-authored-by: Horacehxw<horacehxw@gmail.com> # message auto-generated for no-merge-commit merge: !182 merge perf-db-collection-tooling-rfc into develop docs: add performance database collection tooling RFC Created-by: Horacehxw Commit-by: Horacehxw Merged-by: ascend-robot Description: **PR Type / PR类型** - [ ] Feature(功能新增) - [ ] Bugfix(Bug 修复) - [x] Docs(文档更新) - [ ] Refactor(代码重构) - [ ] Test-Cases(测试用例更新) - [ ] Other(其他) ## 🔍 Motivation / 变更动机 Add a standalone RFC for the performance database collection tooling so the database generation workflow is specified independently from runtime consumption. The RFC assumes the generation capability is introduced by this proposal and documents the file-based pipeline, data contracts, replay-script authoring workflow, validation, and human review requirements. ------ ## 📝 Modification / 修改内容 - Add the Chinese RFC docs/RFC/rfc_performance_database_collection_tooling_zh.md. - Add the English RFC docs/RFC/rfc_performance_database_collection_tooling_en.md. - Define the offline collection pipeline under tools/perf_data_collection/: profiling parsing, FIA metadata enrichment, theory shape-grid generation, per-kernel replay, msprof writeback, HCCL benchmark collection, validation reports, and data lifecycle. - Include the Microbench Run Script Generator workflow from docs/perf_database/skills/microbench/SKILL.md as the contributor process for creating or updating op_replay/<KernelType>_run.py scripts. ------ ## 📐 Associated Test Results / 关联测试结果 - [x] Ran RFC consistency checks for both zh/en files: balanced code fences, no v1.6/original-design references, standalone generation framing, required microbench skill references present. - [x] Ran git diff --cached --check before commit. ------ ## ✅ Checklist / 检查列表 - [ ] Linting tools used / 使用 lintrunner 工具 - [ ] Bug fixes covered by unit tests / 修复的 Bug 已由单元测试覆盖 - [ ] Modification covered by unit tests / 修改已由单元测试覆盖 - [x] Documentation updated / 文档已更新 - [x] No Chinese comments in code files / 代码文件中不含中文注释 See merge request: Ascend/msmodeling!182 | 19 天前 | |
docs: add performance database collection tooling RFC Co-authored-by: Horacehxw<horacehxw@gmail.com> # message auto-generated for no-merge-commit merge: !182 merge perf-db-collection-tooling-rfc into develop docs: add performance database collection tooling RFC Created-by: Horacehxw Commit-by: Horacehxw Merged-by: ascend-robot Description: **PR Type / PR类型** - [ ] Feature(功能新增) - [ ] Bugfix(Bug 修复) - [x] Docs(文档更新) - [ ] Refactor(代码重构) - [ ] Test-Cases(测试用例更新) - [ ] Other(其他) ## 🔍 Motivation / 变更动机 Add a standalone RFC for the performance database collection tooling so the database generation workflow is specified independently from runtime consumption. The RFC assumes the generation capability is introduced by this proposal and documents the file-based pipeline, data contracts, replay-script authoring workflow, validation, and human review requirements. ------ ## 📝 Modification / 修改内容 - Add the Chinese RFC docs/RFC/rfc_performance_database_collection_tooling_zh.md. - Add the English RFC docs/RFC/rfc_performance_database_collection_tooling_en.md. - Define the offline collection pipeline under tools/perf_data_collection/: profiling parsing, FIA metadata enrichment, theory shape-grid generation, per-kernel replay, msprof writeback, HCCL benchmark collection, validation reports, and data lifecycle. - Include the Microbench Run Script Generator workflow from docs/perf_database/skills/microbench/SKILL.md as the contributor process for creating or updating op_replay/<KernelType>_run.py scripts. ------ ## 📐 Associated Test Results / 关联测试结果 - [x] Ran RFC consistency checks for both zh/en files: balanced code fences, no v1.6/original-design references, standalone generation framing, required microbench skill references present. - [x] Ran git diff --cached --check before commit. ------ ## ✅ Checklist / 检查列表 - [ ] Linting tools used / 使用 lintrunner 工具 - [ ] Bug fixes covered by unit tests / 修复的 Bug 已由单元测试覆盖 - [ ] Modification covered by unit tests / 修改已由单元测试覆盖 - [x] Documentation updated / 文档已更新 - [x] No Chinese comments in code files / 代码文件中不含中文注释 See merge request: Ascend/msmodeling!182 | 19 天前 | |
docs: add pipeline parallel RFC Co-authored-by: Secluded_Ocean<tangchuxiao0709@qq.com> # message auto-generated for no-merge-commit merge: !187 merge rfc/pipeline-parallel-rfc-only into develop docs: add pipeline parallel RFC Created-by: Secluded_Ocean Commit-by: Secluded_Ocean Merged-by: ascend-robot Description: **PR Type / PR类型** - [ ] Feature(功能新增) - [ ] Bugfix(Bug 修复) - [x] Docs(文档更新) - [ ] CI/CD(持续集成/持续部署) - [ ] Refactor(代码重构) - [ ] Perf(性能优化) - [ ] Test-Cases(测试用例更新) - [ ] Other(其他) ## 🔍 Motivation / 变更动机 This PR adds an RFC for Pipeline Parallel (PP) simulation support. The goal is to document the current PP support status, clarify existing gaps, and propose a phased technical plan for extending PP modeling in CLI, serving configuration, TensorCast stage partitioning, memory estimation, communication modeling, and throughput optimization. 本 PR 新增 Pipeline Parallel(PP)并行仿真支持 RFC。目标是梳理当前 PP 支持现状,明确现有缺口,并提出面向 CLI、serving 配置、TensorCast stage 切分、显存估算、通信建模和吞吐优化搜索的分阶段技术方案。 ------ ## 📝 Modification / 修改内容 Added two RFC documents under docs/RFC/: - docs/RFC/rfc_pipeline_parallel_support_zh.md - docs/RFC/rfc_pipeline_parallel_support_en.md The RFC covers: - Current PP-related fields and rank group support in the codebase - Missing end-to-end PP capabilities - Proposed PP modeling semantics and configuration entry points - Search space extension for throughput optimization - TensorCast pipeline stage partitioning plan - Inter-stage communication and pipeline latency approximation - Stage-level memory modeling - Test plan, risks, and phased implementation plan ------ ## 📐 Associated Test Results / 关联测试结果 No code changes are included in this PR. This PR only adds RFC documentation, so unit tests were not run. 本 PR 不包含代码修改,仅新增 RFC 文档,因此未运行单元测试。 ------ ## 🌟 Use cases (Optional) / 使用案例(可选) This RFC can be used as the design reference for future Pipeline Parallel simulation implementation, including: - Adding pp_size support in service YAML and TensorCast user configuration - Extending throughput optimizer search with --pp-sizes - Modeling stage-level memory and communication overhead - Evaluating PP impact on TTFT, TPOT, and throughput ------ ## ✅ Checklist / 检查列表 **Before PR**: - [ ] [Linting tools](https://gitcode.com/Ascend/msmodeling/blob/develop/tensor_cast/README.md#coding-style) are used to fix the potential lint issues. / 使用 [lintrunner 工具](https://gitcode.com/Ascend/msmodeling/blob/develop/tensor_cast/README.md#coding-style) 来修复潜在的 lint 问题。 - [ ] Bug fixes are fully covered by unit tests, the case that causes the bug should be added in the unit tests. / 修复的 Bug 已完全由单元测试覆盖,导致 Bug 的情况应在单元测试中添加。 - [ ] The modification is covered by complete unit tests. If not, please add more unit tests to ensure the correctness. / 此拉取请求中的修改已完全由单元测试覆盖。如果不是,请添加更多单元测试以确保正确性。 - [x] All relevant documentation (API docs, docstrings, example tutorials) has been updated to reflect these changes. / 所有相关文档(API 文档、文档字符串、示例教程)已更新以反映这些更改。 - [x] Please ensure code files contain no Chinese comments. / 请保证代码文件中不含中文注释。 See merge request: Ascend/msmodeling!187 | 1 个月前 | |
docs: add pipeline parallel RFC Co-authored-by: Secluded_Ocean<tangchuxiao0709@qq.com> # message auto-generated for no-merge-commit merge: !187 merge rfc/pipeline-parallel-rfc-only into develop docs: add pipeline parallel RFC Created-by: Secluded_Ocean Commit-by: Secluded_Ocean Merged-by: ascend-robot Description: **PR Type / PR类型** - [ ] Feature(功能新增) - [ ] Bugfix(Bug 修复) - [x] Docs(文档更新) - [ ] CI/CD(持续集成/持续部署) - [ ] Refactor(代码重构) - [ ] Perf(性能优化) - [ ] Test-Cases(测试用例更新) - [ ] Other(其他) ## 🔍 Motivation / 变更动机 This PR adds an RFC for Pipeline Parallel (PP) simulation support. The goal is to document the current PP support status, clarify existing gaps, and propose a phased technical plan for extending PP modeling in CLI, serving configuration, TensorCast stage partitioning, memory estimation, communication modeling, and throughput optimization. 本 PR 新增 Pipeline Parallel(PP)并行仿真支持 RFC。目标是梳理当前 PP 支持现状,明确现有缺口,并提出面向 CLI、serving 配置、TensorCast stage 切分、显存估算、通信建模和吞吐优化搜索的分阶段技术方案。 ------ ## 📝 Modification / 修改内容 Added two RFC documents under docs/RFC/: - docs/RFC/rfc_pipeline_parallel_support_zh.md - docs/RFC/rfc_pipeline_parallel_support_en.md The RFC covers: - Current PP-related fields and rank group support in the codebase - Missing end-to-end PP capabilities - Proposed PP modeling semantics and configuration entry points - Search space extension for throughput optimization - TensorCast pipeline stage partitioning plan - Inter-stage communication and pipeline latency approximation - Stage-level memory modeling - Test plan, risks, and phased implementation plan ------ ## 📐 Associated Test Results / 关联测试结果 No code changes are included in this PR. This PR only adds RFC documentation, so unit tests were not run. 本 PR 不包含代码修改,仅新增 RFC 文档,因此未运行单元测试。 ------ ## 🌟 Use cases (Optional) / 使用案例(可选) This RFC can be used as the design reference for future Pipeline Parallel simulation implementation, including: - Adding pp_size support in service YAML and TensorCast user configuration - Extending throughput optimizer search with --pp-sizes - Modeling stage-level memory and communication overhead - Evaluating PP impact on TTFT, TPOT, and throughput ------ ## ✅ Checklist / 检查列表 **Before PR**: - [ ] [Linting tools](https://gitcode.com/Ascend/msmodeling/blob/develop/tensor_cast/README.md#coding-style) are used to fix the potential lint issues. / 使用 [lintrunner 工具](https://gitcode.com/Ascend/msmodeling/blob/develop/tensor_cast/README.md#coding-style) 来修复潜在的 lint 问题。 - [ ] Bug fixes are fully covered by unit tests, the case that causes the bug should be added in the unit tests. / 修复的 Bug 已完全由单元测试覆盖,导致 Bug 的情况应在单元测试中添加。 - [ ] The modification is covered by complete unit tests. If not, please add more unit tests to ensure the correctness. / 此拉取请求中的修改已完全由单元测试覆盖。如果不是,请添加更多单元测试以确保正确性。 - [x] All relevant documentation (API docs, docstrings, example tutorials) has been updated to reflect these changes. / 所有相关文档(API 文档、文档字符串、示例教程)已更新以反映这些更改。 - [x] Please ensure code files contain no Chinese comments. / 请保证代码文件中不含中文注释。 See merge request: Ascend/msmodeling!187 | 1 个月前 | |
[feature]msmodeling支持防护精度网 Co-authored-by: yuyinkai1<769293914@qq.com> # message auto-generated for no-merge-commit merge: !219 merge develop into develop [feature]msmodeling支持防护精度网 Created-by: yuyinkai1 Commit-by: yuyinkai1 Merged-by: ascend-robot Description: # PR Template Thanks for your contribution; we appreciate it a lot. The following instructions will make your pull request healthier and help you get feedback more easily. If you do not understand some items, don't worry, just make the pull request and seek help from maintainers. 感谢您的贡献,我们非常重视。以下说明将使您的拉取请求更健康,更易于获得反馈。如果您不理解某些项目,请不要担心,只需提交拉取请求并从维护人员那里寻求帮助即可。 **PR Type / PR类型** - [ ] Feature(功能新增) - [ ] Bugfix(Bug 修复) - [ ] Docs(文档更新) - [ ] CI/CD(持续集成/持续部署) - [ ] Refactor(代码重构) - [ ] Perf(性能优化) - [ ] Test-Cases(测试用例更新) - [ ] Other(其他) ## 🔍 Motivation / 变更动机 **Please describe the motivation of this PR and the goal you want to achieve through this PR.** **请描述您的拉取请求的动机和您希望通过此拉取请求实现的目标。** ------ ## 📝 Modification / 修改内容 **Please briefly describe what modification is made in this PR.** **请简要描述此拉取请求中进行的修改。** ------ ## 📐 Associated Test Results / 关联测试结果 **Please provide the related test results, such as test reports, etc.** **请提供相关测试结果,例如测试报告等。** ------ ## 🌟 Use cases (Optional) / 使用案例(可选) **If this PR introduces a new feature, it is better to list some use cases here and update the documentation.** **如果此拉取请求引入了新功能,最好在此处列出一些用例并更新文档。** ------ ## ✅ Checklist / 检查列表 **Before PR**: - [ ] Bug fixes are fully covered by unit tests, the case that causes the bug should be added in the unit tests. / 修复的 Bug 已完全由单元测试覆盖,导致 Bug 的情况应在单元测试中添加。 - [ ] The modification is covered by complete unit tests. If not, please add more unit tests to ensure the correctness. / 此拉取请求中的修改已完全由单元测试覆盖。如果不是,请添加更多单元测试以确保正确性。 - [ ] All relevant documentation (API docs, docstrings, example tutorials) has been updated to reflect these changes. / 所有相关文档(API 文档、文档字符串、示例教程)已更新以反映这些更改。 - [ ] Please ensure code files contain no Chinese comments. / 请保证代码文件中不含中文注释。 ------ See merge request: Ascend/msmodeling!219 | 1 个月前 | |
[Feature] Add prefix cache hit rate support to text_generate and throughput_optimizer Co-authored-by: yaohan404<yaohan8@huawei.com> # message auto-generated for no-merge-commit merge: !119 merge prefix-cache into develop [Feature] Add prefix cache hit rate support to text_generate and throughput_optimizer Created-by: yaohan404 Commit-by: yaohan404 Merged-by: ascend-robot Description: Thanks for your contribution; we appreciate it a lot. The following instructions will make your pull request healthier and help you get feedback more easily. If you do not understand some items, don't worry, just make the pull request and seek help from maintainers. 感谢您的贡献,我们非常重视。以下说明将使您的拉取请求更健康,更易于获得反馈。如果您不理解某些项目,请不要担心,只需提交拉取请求并从维护人员那里寻求帮助即可。 **PR Type / PR类型** - [x] Feature(功能新增) - [ ] Bugfix(Bug 修复) - [ ] Docs(文档更新) - [ ] CI/CD(持续集成/持续部署) - [ ] Refactor(代码重构) - [ ] Perf(性能优化) - [ ] Test-Cases(测试用例更新) - [ ] Other(其他) ## 🔍 Motivation / 变更动机 **Please describe the motivation of this PR and the goal you want to achieve through this PR.** **请描述您的拉取请求的动机和您希望通过此拉取请求实现的目标。** We need a lightweight way to estimate the performance impact of prefix cache in existing CLI workflows. Today, text_generate and throughput_optimizer model prompt processing without accounting for reused prefix KV cache, which makes the estimated prefill latency and TTFT pessimistic for prefix-cache-heavy scenarios. This change adds a simple token-level prefix cache hit-rate model so users can quickly evaluate the expected benefit without introducing a full block-level cache simulation. ------ ## 📝 Motivation / 修改内容 **Please briefly describe what modification is made in this PR.** **请简要描述此拉取请求中进行的修改。** - Added --prefix-cache-hit-rate to text_generate and throughput_optimizer. - Extended shared user/config and serving optimizer data paths with prefix cache hit rate. - Rewrote effective prefill lengths for prefix-cache-aware modeling while keeping total sequence length semantics unchanged in text_generate. - Applied prefix cache only to prefill-related paths in throughput_optimizer; decode paths keep their original logic. - Updated max_prefill_tokens checks and prefill wave capacity calculations to use effective input length. - Added and updated unit/CLI tests for valid cases, invalid inputs, decode-mode behavior, and regression coverage. - Added synchronized Chinese and English RFC documents for the feature. ------ ## 📐 Associated Test Results / 关联测试结果 **Please provide the related test results, such as test reports, etc.** **请提供相关测试结果,例如测试报告等。** 1. Total time of text_generate: --query-length 200 --prefix-cache-hit-rate 0.5 == --query-length 100 --context-length 100   2. TTFT of throughput_optimizer: --input-length 1000 --prefix-cache-hit-rate 0.5 < --input-length 1000   ------ ## 🌟 Use cases (Optional) / 使用案例(可选) **If this PR introduces a new feature, it is better to list some use cases here and update the documentation.** **如果此拉取请求引入了新功能,最好在此处列出一些用例并更新文档。** ------ ## ✅ Checklist / 检查列表 **Before PR**: - [ ] [Linting tools](https://gitcode.com/Ascend/msmodeling/blob/develop/tensor_cast/README.md#coding-style) are used to fix the potential lint issues. / 使用 [lintrunner 工具](https://gitcode.com/Ascend/msmodeling/blob/develop/tensor_cast/README.md#coding-style) 来修复潜在的 lint 问题。 - [ ] Bug fixes are fully covered by unit tests, the case that causes the bug should be added in the unit tests. / 修复的 Bug 已完全由单元测试覆盖,导致 Bug 的情况应在单元测试中添加。 - [x] The modification is covered by complete unit tests. If not, please add more unit tests to ensure the correctness. / 此拉取请求中的修改已完全由单元测试覆盖。如果不是,请添加更多单元测试以确保正确性。 - [x] All relevant documentation (API docs, docstrings, example tutorials) has been updated to reflect these changes. / 所有相关文档(API 文档、文档字符串、示例教程)已更新以反映这些更改。 - [x] Please ensure code files contain no Chinese comments. / 请保证代码文件中不含中文注释。 ------ See merge request: Ascend/msmodeling!119 | 3 个月前 | |
[Feature] Add prefix cache hit rate support to text_generate and throughput_optimizer Co-authored-by: yaohan404<yaohan8@huawei.com> # message auto-generated for no-merge-commit merge: !119 merge prefix-cache into develop [Feature] Add prefix cache hit rate support to text_generate and throughput_optimizer Created-by: yaohan404 Commit-by: yaohan404 Merged-by: ascend-robot Description: Thanks for your contribution; we appreciate it a lot. The following instructions will make your pull request healthier and help you get feedback more easily. If you do not understand some items, don't worry, just make the pull request and seek help from maintainers. 感谢您的贡献,我们非常重视。以下说明将使您的拉取请求更健康,更易于获得反馈。如果您不理解某些项目,请不要担心,只需提交拉取请求并从维护人员那里寻求帮助即可。 **PR Type / PR类型** - [x] Feature(功能新增) - [ ] Bugfix(Bug 修复) - [ ] Docs(文档更新) - [ ] CI/CD(持续集成/持续部署) - [ ] Refactor(代码重构) - [ ] Perf(性能优化) - [ ] Test-Cases(测试用例更新) - [ ] Other(其他) ## 🔍 Motivation / 变更动机 **Please describe the motivation of this PR and the goal you want to achieve through this PR.** **请描述您的拉取请求的动机和您希望通过此拉取请求实现的目标。** We need a lightweight way to estimate the performance impact of prefix cache in existing CLI workflows. Today, text_generate and throughput_optimizer model prompt processing without accounting for reused prefix KV cache, which makes the estimated prefill latency and TTFT pessimistic for prefix-cache-heavy scenarios. This change adds a simple token-level prefix cache hit-rate model so users can quickly evaluate the expected benefit without introducing a full block-level cache simulation. ------ ## 📝 Motivation / 修改内容 **Please briefly describe what modification is made in this PR.** **请简要描述此拉取请求中进行的修改。** - Added --prefix-cache-hit-rate to text_generate and throughput_optimizer. - Extended shared user/config and serving optimizer data paths with prefix cache hit rate. - Rewrote effective prefill lengths for prefix-cache-aware modeling while keeping total sequence length semantics unchanged in text_generate. - Applied prefix cache only to prefill-related paths in throughput_optimizer; decode paths keep their original logic. - Updated max_prefill_tokens checks and prefill wave capacity calculations to use effective input length. - Added and updated unit/CLI tests for valid cases, invalid inputs, decode-mode behavior, and regression coverage. - Added synchronized Chinese and English RFC documents for the feature. ------ ## 📐 Associated Test Results / 关联测试结果 **Please provide the related test results, such as test reports, etc.** **请提供相关测试结果,例如测试报告等。** 1. Total time of text_generate: --query-length 200 --prefix-cache-hit-rate 0.5 == --query-length 100 --context-length 100   2. TTFT of throughput_optimizer: --input-length 1000 --prefix-cache-hit-rate 0.5 < --input-length 1000   ------ ## 🌟 Use cases (Optional) / 使用案例(可选) **If this PR introduces a new feature, it is better to list some use cases here and update the documentation.** **如果此拉取请求引入了新功能,最好在此处列出一些用例并更新文档。** ------ ## ✅ Checklist / 检查列表 **Before PR**: - [ ] [Linting tools](https://gitcode.com/Ascend/msmodeling/blob/develop/tensor_cast/README.md#coding-style) are used to fix the potential lint issues. / 使用 [lintrunner 工具](https://gitcode.com/Ascend/msmodeling/blob/develop/tensor_cast/README.md#coding-style) 来修复潜在的 lint 问题。 - [ ] Bug fixes are fully covered by unit tests, the case that causes the bug should be added in the unit tests. / 修复的 Bug 已完全由单元测试覆盖,导致 Bug 的情况应在单元测试中添加。 - [x] The modification is covered by complete unit tests. If not, please add more unit tests to ensure the correctness. / 此拉取请求中的修改已完全由单元测试覆盖。如果不是,请添加更多单元测试以确保正确性。 - [x] All relevant documentation (API docs, docstrings, example tutorials) has been updated to reflect these changes. / 所有相关文档(API 文档、文档字符串、示例教程)已更新以反映这些更改。 - [x] Please ensure code files contain no Chinese comments. / 请保证代码文件中不含中文注释。 ------ See merge request: Ascend/msmodeling!119 | 3 个月前 | |
chore(ci): adopt pre-commit and retire legacy lintrunner adapters Co-authored-by: liujiawang<anonymousdev@163.com> # message auto-generated for no-merge-commit merge: !176 merge pre-commit into develop chore(ci): adopt pre-commit and retire legacy lintrunner adapters Created-by: AvadaKedavrua Commit-by: liujiawang;AvadaKedavrua Merged-by: ascend-robot Description: Thanks for your contribution; we appreciate it a lot. The following instructions will make your pull request healthier and help you get feedback more easily. If you do not understand some items, don't worry, just make the pull request and seek help from maintainers. 感谢您的贡献,我们非常重视。以下说明将使您的拉取请求更健康,更易于获得反馈。如果您不理解某些项目,请不要担心,只需提交拉取请求并从维护人员那里寻求帮助即可。 **PR Type / PR类型** - [ ] Feature(功能新增) - [ ] Bugfix(Bug 修复) - [x] Docs(文档更新) - [x] CI/CD(持续集成/持续部署) - [ ] Refactor(代码重构) - [ ] Perf(性能优化) - [ ] Test-Cases(测试用例更新) - [ ] Other(其他) ------ ## Motivation / 变更动机 Continue the **pre-commit** migration: tighten **Pylint** so only high-signal messages run ( disable=all + explicit enable list), fix real issues that remained under that profile, and translate hook/config comments to **English**. ------ ## Configuration changes(仅工具与注释 / tooling & comments only) | Path | What changed | |------|----------------| | pre-commit/pyproject.toml | **Pylint:** [tool.pylint."messages control"] with disable = ["all"] and a short **allowlist** of message IDs (E0100, E0601–E0611, E0632, E1101, E1120, W0632, W1514). **Ruff:** unchanged behavior; comments translated to English. **Bandit:** comments translated; rule allowlist/skip lists unchanged. | | .pre-commit-config.yaml | Comments translated to English; Bandit hook display name set to **bandit (Python security checks)**. Hook versions and args unchanged except for comment text. | ------ ## Source code changes(应用代码 / application code) | Area | Files | Purpose | |------|--------|---------| | serving_cast | communication.py, engine.py, instance.py, kv_cache_manager.py, load_gen.py, main.py, model_runner.py, request.py, serving.py, utils.py | Replace from . import stime with import serving_cast.stime as stime so Pylint resolves imports (fixes **E0611**). | | serving_cast | stime.py | Singleton **salabim** Environment via _get_sim_env() so type checkers/Pylint see **sim.Environment** (fixes **E1101** on SimulationEnv). | | serving_cast/service | base_throughput_optimizer.py | __init__ defaults + assert runner is not None before run_inference (fixes **E1101** on base class). | | tensor_cast | diffusers/diffusers_model.py, diffusers/diffusers_utils.py, runtime.py | Add **encoding="utf-8"** to open() / trace export (fixes **W1514**). | | web_ui | callbacks.py | **refresh_optimizer_detail:** call _optimizer_detail_view(rows, None, device) and unpack five return values (fixes **E1120**). | ------ ## Recent commits on pre-commit branch - ci(pre-commit): fix pylint message selection with disable=all - fix: resolve pylint findings in serving_cast, tensor_cast, and web_ui - docs(pre-commit): translate comments to English and add all-files run log ------  ------ ## Checklist / 检查列表 - [x] Please ensure code files contain no Chinese comments. / 请保证代码文件中不含中文注释。 See merge request: Ascend/msmodeling!176 | 1 个月前 | |
chore(ci): adopt pre-commit and retire legacy lintrunner adapters Co-authored-by: liujiawang<anonymousdev@163.com> # message auto-generated for no-merge-commit merge: !176 merge pre-commit into develop chore(ci): adopt pre-commit and retire legacy lintrunner adapters Created-by: AvadaKedavrua Commit-by: liujiawang;AvadaKedavrua Merged-by: ascend-robot Description: Thanks for your contribution; we appreciate it a lot. The following instructions will make your pull request healthier and help you get feedback more easily. If you do not understand some items, don't worry, just make the pull request and seek help from maintainers. 感谢您的贡献,我们非常重视。以下说明将使您的拉取请求更健康,更易于获得反馈。如果您不理解某些项目,请不要担心,只需提交拉取请求并从维护人员那里寻求帮助即可。 **PR Type / PR类型** - [ ] Feature(功能新增) - [ ] Bugfix(Bug 修复) - [x] Docs(文档更新) - [x] CI/CD(持续集成/持续部署) - [ ] Refactor(代码重构) - [ ] Perf(性能优化) - [ ] Test-Cases(测试用例更新) - [ ] Other(其他) ------ ## Motivation / 变更动机 Continue the **pre-commit** migration: tighten **Pylint** so only high-signal messages run ( disable=all + explicit enable list), fix real issues that remained under that profile, and translate hook/config comments to **English**. ------ ## Configuration changes(仅工具与注释 / tooling & comments only) | Path | What changed | |------|----------------| | pre-commit/pyproject.toml | **Pylint:** [tool.pylint."messages control"] with disable = ["all"] and a short **allowlist** of message IDs (E0100, E0601–E0611, E0632, E1101, E1120, W0632, W1514). **Ruff:** unchanged behavior; comments translated to English. **Bandit:** comments translated; rule allowlist/skip lists unchanged. | | .pre-commit-config.yaml | Comments translated to English; Bandit hook display name set to **bandit (Python security checks)**. Hook versions and args unchanged except for comment text. | ------ ## Source code changes(应用代码 / application code) | Area | Files | Purpose | |------|--------|---------| | serving_cast | communication.py, engine.py, instance.py, kv_cache_manager.py, load_gen.py, main.py, model_runner.py, request.py, serving.py, utils.py | Replace from . import stime with import serving_cast.stime as stime so Pylint resolves imports (fixes **E0611**). | | serving_cast | stime.py | Singleton **salabim** Environment via _get_sim_env() so type checkers/Pylint see **sim.Environment** (fixes **E1101** on SimulationEnv). | | serving_cast/service | base_throughput_optimizer.py | __init__ defaults + assert runner is not None before run_inference (fixes **E1101** on base class). | | tensor_cast | diffusers/diffusers_model.py, diffusers/diffusers_utils.py, runtime.py | Add **encoding="utf-8"** to open() / trace export (fixes **W1514**). | | web_ui | callbacks.py | **refresh_optimizer_detail:** call _optimizer_detail_view(rows, None, device) and unpack five return values (fixes **E1120**). | ------ ## Recent commits on pre-commit branch - ci(pre-commit): fix pylint message selection with disable=all - fix: resolve pylint findings in serving_cast, tensor_cast, and web_ui - docs(pre-commit): translate comments to English and add all-files run log ------  ------ ## Checklist / 检查列表 - [x] Please ensure code files contain no Chinese comments. / 请保证代码文件中不含中文注释。 See merge request: Ascend/msmodeling!176 | 1 个月前 | |
chore(ci): adopt pre-commit and retire legacy lintrunner adapters Co-authored-by: liujiawang<anonymousdev@163.com> # message auto-generated for no-merge-commit merge: !176 merge pre-commit into develop chore(ci): adopt pre-commit and retire legacy lintrunner adapters Created-by: AvadaKedavrua Commit-by: liujiawang;AvadaKedavrua Merged-by: ascend-robot Description: Thanks for your contribution; we appreciate it a lot. The following instructions will make your pull request healthier and help you get feedback more easily. If you do not understand some items, don't worry, just make the pull request and seek help from maintainers. 感谢您的贡献,我们非常重视。以下说明将使您的拉取请求更健康,更易于获得反馈。如果您不理解某些项目,请不要担心,只需提交拉取请求并从维护人员那里寻求帮助即可。 **PR Type / PR类型** - [ ] Feature(功能新增) - [ ] Bugfix(Bug 修复) - [x] Docs(文档更新) - [x] CI/CD(持续集成/持续部署) - [ ] Refactor(代码重构) - [ ] Perf(性能优化) - [ ] Test-Cases(测试用例更新) - [ ] Other(其他) ------ ## Motivation / 变更动机 Continue the **pre-commit** migration: tighten **Pylint** so only high-signal messages run ( disable=all + explicit enable list), fix real issues that remained under that profile, and translate hook/config comments to **English**. ------ ## Configuration changes(仅工具与注释 / tooling & comments only) | Path | What changed | |------|----------------| | pre-commit/pyproject.toml | **Pylint:** [tool.pylint."messages control"] with disable = ["all"] and a short **allowlist** of message IDs (E0100, E0601–E0611, E0632, E1101, E1120, W0632, W1514). **Ruff:** unchanged behavior; comments translated to English. **Bandit:** comments translated; rule allowlist/skip lists unchanged. | | .pre-commit-config.yaml | Comments translated to English; Bandit hook display name set to **bandit (Python security checks)**. Hook versions and args unchanged except for comment text. | ------ ## Source code changes(应用代码 / application code) | Area | Files | Purpose | |------|--------|---------| | serving_cast | communication.py, engine.py, instance.py, kv_cache_manager.py, load_gen.py, main.py, model_runner.py, request.py, serving.py, utils.py | Replace from . import stime with import serving_cast.stime as stime so Pylint resolves imports (fixes **E0611**). | | serving_cast | stime.py | Singleton **salabim** Environment via _get_sim_env() so type checkers/Pylint see **sim.Environment** (fixes **E1101** on SimulationEnv). | | serving_cast/service | base_throughput_optimizer.py | __init__ defaults + assert runner is not None before run_inference (fixes **E1101** on base class). | | tensor_cast | diffusers/diffusers_model.py, diffusers/diffusers_utils.py, runtime.py | Add **encoding="utf-8"** to open() / trace export (fixes **W1514**). | | web_ui | callbacks.py | **refresh_optimizer_detail:** call _optimizer_detail_view(rows, None, device) and unpack five return values (fixes **E1120**). | ------ ## Recent commits on pre-commit branch - ci(pre-commit): fix pylint message selection with disable=all - fix: resolve pylint findings in serving_cast, tensor_cast, and web_ui - docs(pre-commit): translate comments to English and add all-files run log ------  ------ ## Checklist / 检查列表 - [x] Please ensure code files contain no Chinese comments. / 请保证代码文件中不含中文注释。 See merge request: Ascend/msmodeling!176 | 1 个月前 | |
rfc更新 | 5 个月前 | |
chore(ci): adopt pre-commit and retire legacy lintrunner adapters Co-authored-by: liujiawang<anonymousdev@163.com> # message auto-generated for no-merge-commit merge: !176 merge pre-commit into develop chore(ci): adopt pre-commit and retire legacy lintrunner adapters Created-by: AvadaKedavrua Commit-by: liujiawang;AvadaKedavrua Merged-by: ascend-robot Description: Thanks for your contribution; we appreciate it a lot. The following instructions will make your pull request healthier and help you get feedback more easily. If you do not understand some items, don't worry, just make the pull request and seek help from maintainers. 感谢您的贡献,我们非常重视。以下说明将使您的拉取请求更健康,更易于获得反馈。如果您不理解某些项目,请不要担心,只需提交拉取请求并从维护人员那里寻求帮助即可。 **PR Type / PR类型** - [ ] Feature(功能新增) - [ ] Bugfix(Bug 修复) - [x] Docs(文档更新) - [x] CI/CD(持续集成/持续部署) - [ ] Refactor(代码重构) - [ ] Perf(性能优化) - [ ] Test-Cases(测试用例更新) - [ ] Other(其他) ------ ## Motivation / 变更动机 Continue the **pre-commit** migration: tighten **Pylint** so only high-signal messages run ( disable=all + explicit enable list), fix real issues that remained under that profile, and translate hook/config comments to **English**. ------ ## Configuration changes(仅工具与注释 / tooling & comments only) | Path | What changed | |------|----------------| | pre-commit/pyproject.toml | **Pylint:** [tool.pylint."messages control"] with disable = ["all"] and a short **allowlist** of message IDs (E0100, E0601–E0611, E0632, E1101, E1120, W0632, W1514). **Ruff:** unchanged behavior; comments translated to English. **Bandit:** comments translated; rule allowlist/skip lists unchanged. | | .pre-commit-config.yaml | Comments translated to English; Bandit hook display name set to **bandit (Python security checks)**. Hook versions and args unchanged except for comment text. | ------ ## Source code changes(应用代码 / application code) | Area | Files | Purpose | |------|--------|---------| | serving_cast | communication.py, engine.py, instance.py, kv_cache_manager.py, load_gen.py, main.py, model_runner.py, request.py, serving.py, utils.py | Replace from . import stime with import serving_cast.stime as stime so Pylint resolves imports (fixes **E0611**). | | serving_cast | stime.py | Singleton **salabim** Environment via _get_sim_env() so type checkers/Pylint see **sim.Environment** (fixes **E1101** on SimulationEnv). | | serving_cast/service | base_throughput_optimizer.py | __init__ defaults + assert runner is not None before run_inference (fixes **E1101** on base class). | | tensor_cast | diffusers/diffusers_model.py, diffusers/diffusers_utils.py, runtime.py | Add **encoding="utf-8"** to open() / trace export (fixes **W1514**). | | web_ui | callbacks.py | **refresh_optimizer_detail:** call _optimizer_detail_view(rows, None, device) and unpack five return values (fixes **E1120**). | ------ ## Recent commits on pre-commit branch - ci(pre-commit): fix pylint message selection with disable=all - fix: resolve pylint findings in serving_cast, tensor_cast, and web_ui - docs(pre-commit): translate comments to English and add all-files run log ------  ------ ## Checklist / 检查列表 - [x] Please ensure code files contain no Chinese comments. / 请保证代码文件中不含中文注释。 See merge request: Ascend/msmodeling!176 | 1 个月前 | |
perf: optimize repetition representative layers Co-authored-by: yaohan404<yaohan8@huawei.com> # message auto-generated for no-merge-commit merge: !222 merge develop into develop perf: optimize repetition representative layers Created-by: yaohan404 Commit-by: yaohan404 Merged-by: ascend-robot Description: # perf: optimize repetition representative layers Thanks for your contribution; we appreciate it a lot. The following instructions will make your pull request healthier and help you get feedback more easily. If you do not understand some items, don't worry, just make the pull request and seek help from maintainers. 感谢您的贡献,我们非常重视。以下说明将使您的拉取请求更健康,更易于获得反馈。如果您不理解某些项目,请不要担心,只需提交拉取请求并从维护人员那里寻求帮助即可。 **PR Type / PR类型** - [ ] Feature(功能新增) - [ ] Bugfix(Bug 修复) - [ ] Docs(文档更新) - [ ] CI/CD(持续集成/持续部署) - [ ] Refactor(代码重构) - [x] Perf(性能优化) - [ ] Test-Cases(测试用例更新) - [ ] Other(其他) ## 🔍 Motivation / 变更动机 **Please describe the motivation of this PR and the goal you want to achieve through this PR.** **请描述您的拉取请求的动机和您希望通过此拉取请求实现的目标。** Improve model construction performance for large Transformer models with many structurally repeated layers. Existing copy-region based layer reuse already reduces runtime and compile cost, but host-side transformations such as MoE patching, quantization, and sharding still traversed repeated layer internals, causing significant startup overhead for large MoE models like DeepSeek-V3.2. ------ ## 📝 Modification / 修改内容 **Please briefly describe what modification is made in this PR.** **请简要描述此拉取请求中进行的修改。** Introduce representative-layer processing for structurally repeated Transformer layers. Repeated copy layers now behave as leaf modules during module, parameter, and buffer traversal, so subsequent model transformations only process representative layers. Add repeat-count based weight accounting to preserve full-model memory estimation, and extend repetition tests to verify representative layer counts, full layer count preservation, and modeling-result consistency. ------ ## 📐 Associated Test Results / 关联测试结果 **Please provide the related test results, such as test reports, etc.** **请提供相关测试结果,例如测试报告等。** ### throughput_optimizer disagg case: cli.inference.throughput_optimizer deepseek-ai/DeepSeek-V3.2 --device ATLAS_800_A3_752T_128G_DIE --num-devices 16 --input-length 3500 --output-length 1500 --disagg --tpot-limits 50 --compile before:  after:  ### throughput_optimizer agg case: cli.inference.throughput_optimizer deepseek-ai/DeepSeek-V3.2 --device ATLAS_800_A3_752T_128G_DIE --num-devices 16 --input-length 3500 --output-length 1500 --ttft-limits 2000 --tpot-limits 100 --compile before:  aftre:  ### text_generate case: cli.inference.text_generate deepseek-ai/DeepSeek-V3.2 --num-queries 16 --query-length 1 --context-length 4096 --device ATLAS_800_A3_752T_128G_DIE --quantize-linear-action FP8 --num-devices 16 --tp-size 1 --ep-size 16 --compile before:  after:  ------ ## 🌟 Use cases (Optional) / 使用案例(可选) **If this PR introduces a new feature, it is better to list some use cases here and update the documentation.** **如果此拉取请求引入了新功能,最好在此处列出一些用例并更新文档。** ------ ## ✅ Checklist / 检查列表 **Before PR**: - [ ] Bug fixes are fully covered by unit tests, the case that causes the bug should be added in the unit tests. / 修复的 Bug 已完全由单元测试覆盖,导致 Bug 的情况应在单元测试中添加。 - [x] The modification is covered by complete unit tests. If not, please add more unit tests to ensure the correctness. / 此拉取请求中的修改已完全由单元测试覆盖。如果不是,请添加更多单元测试以确保正确性。 - [ ] All relevant documentation (API docs, docstrings, example tutorials) has been updated to reflect these changes. / 所有相关文档(API 文档、文档字符串、示例教程)已更新以反映这些更改。 - [x] Please ensure code files contain no Chinese comments. / 请保证代码文件中不含中文注释。 ------ See merge request: Ascend/msmodeling!222 | 26 天前 | |
chore(ci): adopt pre-commit and retire legacy lintrunner adapters Co-authored-by: liujiawang<anonymousdev@163.com> # message auto-generated for no-merge-commit merge: !176 merge pre-commit into develop chore(ci): adopt pre-commit and retire legacy lintrunner adapters Created-by: AvadaKedavrua Commit-by: liujiawang;AvadaKedavrua Merged-by: ascend-robot Description: Thanks for your contribution; we appreciate it a lot. The following instructions will make your pull request healthier and help you get feedback more easily. If you do not understand some items, don't worry, just make the pull request and seek help from maintainers. 感谢您的贡献,我们非常重视。以下说明将使您的拉取请求更健康,更易于获得反馈。如果您不理解某些项目,请不要担心,只需提交拉取请求并从维护人员那里寻求帮助即可。 **PR Type / PR类型** - [ ] Feature(功能新增) - [ ] Bugfix(Bug 修复) - [x] Docs(文档更新) - [x] CI/CD(持续集成/持续部署) - [ ] Refactor(代码重构) - [ ] Perf(性能优化) - [ ] Test-Cases(测试用例更新) - [ ] Other(其他) ------ ## Motivation / 变更动机 Continue the **pre-commit** migration: tighten **Pylint** so only high-signal messages run ( disable=all + explicit enable list), fix real issues that remained under that profile, and translate hook/config comments to **English**. ------ ## Configuration changes(仅工具与注释 / tooling & comments only) | Path | What changed | |------|----------------| | pre-commit/pyproject.toml | **Pylint:** [tool.pylint."messages control"] with disable = ["all"] and a short **allowlist** of message IDs (E0100, E0601–E0611, E0632, E1101, E1120, W0632, W1514). **Ruff:** unchanged behavior; comments translated to English. **Bandit:** comments translated; rule allowlist/skip lists unchanged. | | .pre-commit-config.yaml | Comments translated to English; Bandit hook display name set to **bandit (Python security checks)**. Hook versions and args unchanged except for comment text. | ------ ## Source code changes(应用代码 / application code) | Area | Files | Purpose | |------|--------|---------| | serving_cast | communication.py, engine.py, instance.py, kv_cache_manager.py, load_gen.py, main.py, model_runner.py, request.py, serving.py, utils.py | Replace from . import stime with import serving_cast.stime as stime so Pylint resolves imports (fixes **E0611**). | | serving_cast | stime.py | Singleton **salabim** Environment via _get_sim_env() so type checkers/Pylint see **sim.Environment** (fixes **E1101** on SimulationEnv). | | serving_cast/service | base_throughput_optimizer.py | __init__ defaults + assert runner is not None before run_inference (fixes **E1101** on base class). | | tensor_cast | diffusers/diffusers_model.py, diffusers/diffusers_utils.py, runtime.py | Add **encoding="utf-8"** to open() / trace export (fixes **W1514**). | | web_ui | callbacks.py | **refresh_optimizer_detail:** call _optimizer_detail_view(rows, None, device) and unpack five return values (fixes **E1120**). | ------ ## Recent commits on pre-commit branch - ci(pre-commit): fix pylint message selection with disable=all - fix: resolve pylint findings in serving_cast, tensor_cast, and web_ui - docs(pre-commit): translate comments to English and add all-files run log ------  ------ ## Checklist / 检查列表 - [x] Please ensure code files contain no Chinese comments. / 请保证代码文件中不含中文注释。 See merge request: Ascend/msmodeling!176 | 1 个月前 | |
chore(ci): adopt pre-commit and retire legacy lintrunner adapters Co-authored-by: liujiawang<anonymousdev@163.com> # message auto-generated for no-merge-commit merge: !176 merge pre-commit into develop chore(ci): adopt pre-commit and retire legacy lintrunner adapters Created-by: AvadaKedavrua Commit-by: liujiawang;AvadaKedavrua Merged-by: ascend-robot Description: Thanks for your contribution; we appreciate it a lot. The following instructions will make your pull request healthier and help you get feedback more easily. If you do not understand some items, don't worry, just make the pull request and seek help from maintainers. 感谢您的贡献,我们非常重视。以下说明将使您的拉取请求更健康,更易于获得反馈。如果您不理解某些项目,请不要担心,只需提交拉取请求并从维护人员那里寻求帮助即可。 **PR Type / PR类型** - [ ] Feature(功能新增) - [ ] Bugfix(Bug 修复) - [x] Docs(文档更新) - [x] CI/CD(持续集成/持续部署) - [ ] Refactor(代码重构) - [ ] Perf(性能优化) - [ ] Test-Cases(测试用例更新) - [ ] Other(其他) ------ ## Motivation / 变更动机 Continue the **pre-commit** migration: tighten **Pylint** so only high-signal messages run ( disable=all + explicit enable list), fix real issues that remained under that profile, and translate hook/config comments to **English**. ------ ## Configuration changes(仅工具与注释 / tooling & comments only) | Path | What changed | |------|----------------| | pre-commit/pyproject.toml | **Pylint:** [tool.pylint."messages control"] with disable = ["all"] and a short **allowlist** of message IDs (E0100, E0601–E0611, E0632, E1101, E1120, W0632, W1514). **Ruff:** unchanged behavior; comments translated to English. **Bandit:** comments translated; rule allowlist/skip lists unchanged. | | .pre-commit-config.yaml | Comments translated to English; Bandit hook display name set to **bandit (Python security checks)**. Hook versions and args unchanged except for comment text. | ------ ## Source code changes(应用代码 / application code) | Area | Files | Purpose | |------|--------|---------| | serving_cast | communication.py, engine.py, instance.py, kv_cache_manager.py, load_gen.py, main.py, model_runner.py, request.py, serving.py, utils.py | Replace from . import stime with import serving_cast.stime as stime so Pylint resolves imports (fixes **E0611**). | | serving_cast | stime.py | Singleton **salabim** Environment via _get_sim_env() so type checkers/Pylint see **sim.Environment** (fixes **E1101** on SimulationEnv). | | serving_cast/service | base_throughput_optimizer.py | __init__ defaults + assert runner is not None before run_inference (fixes **E1101** on base class). | | tensor_cast | diffusers/diffusers_model.py, diffusers/diffusers_utils.py, runtime.py | Add **encoding="utf-8"** to open() / trace export (fixes **W1514**). | | web_ui | callbacks.py | **refresh_optimizer_detail:** call _optimizer_detail_view(rows, None, device) and unpack five return values (fixes **E1120**). | ------ ## Recent commits on pre-commit branch - ci(pre-commit): fix pylint message selection with disable=all - fix: resolve pylint findings in serving_cast, tensor_cast, and web_ui - docs(pre-commit): translate comments to English and add all-files run log ------  ------ ## Checklist / 检查列表 - [x] Please ensure code files contain no Chinese comments. / 请保证代码文件中不含中文注释。 See merge request: Ascend/msmodeling!176 | 1 个月前 | |
docs: update RFC and design templates Co-authored-by: jhon-117<fangkai15@huawei.com> # message auto-generated for no-merge-commit merge: !203 merge design_template into develop docs: update RFC and design templates Created-by: jhon-117 Commit-by: jhon-117 Merged-by: ascend-robot Description: Title: Update RFC and design document templates Body: # PR Template Thanks for your contribution; we appreciate it a lot. The following instructions will make your pull request healthier and help you get feedback more easily. If you do not understand some items, don't worry, just make the pull request and seek help from maintainers. 感谢您的贡献,我们非常重视。以下说明将使您的拉取请求更健康,更易于获得反馈。如果您不理解某些项目,请不要担心,只需提交拉取请求并从维护人员那里寻求帮助即可。 **PR Type / PR类型** - [ ] Feature(功能新增) - [ ] Bugfix(Bug 修复) - [x] Docs(文档更新) - [ ] CI/CD(持续集成/持续部署) - [ ] Refactor(代码重构) - [ ] Perf(性能优化) - [ ] Test-Cases(测试用例更新) - [ ] Other(其他) ## 🔍 Motivation / 变更动机 This PR standardizes the repository's RFC and design document templates with an English-first bilingual format. The updated templates make the required sections clearer while preserving the original RFC semantics and keeping Chinese guidance available for contributors. ------ ## 📝 Modification / 修改内容 - Replace the old RFC template with a new English-first bilingual template under docs/RFC/rfc_template.md. - Remove the previous uppercase RFC template file docs/RFC/RFC_template.md. - Add a design document template under docs/design/docs_template.md. - Align the design document template style with the RFC template while preserving the original design-template content requirements. ------ ## 📐 Associated Test Results / 关联测试结果 No automated tests were run because this PR only updates Markdown documentation templates. Editor diagnostics were checked for the updated Markdown template files before commit, with no actionable issues found. ------ ## 🌟 Use cases (Optional) / 使用案例(可选) Contributors can use these templates when drafting new RFCs or detailed design documents, with English as the primary language and Chinese explanations retained for consistency and accessibility. ------ ## ✅ Checklist / 检查列表 **Before PR**: - [ ] Bug fixes are fully covered by unit tests, the case that causes the bug should be added in the unit tests. / 修复的 Bug 已完全由单元测试覆盖,导致 Bug 的情况应在单元测试中添加。 - [ ] The modification is covered by complete unit tests. If not, please add more unit tests to ensure the correctness. / 此拉取请求中的修改已完全由单元测试覆盖。如果不是,请添加更多单元测试以确保正确性。 - [x] All relevant documentation (API docs, docstrings, example tutorials) has been updated to reflect these changes. / 所有相关文档(API 文档、文档字符串、示例教程)已更新以反映这些更改。 - [x] Please ensure code files contain no Chinese comments. / 请保证代码文件中不含中文注释。 ------ See merge request: Ascend/msmodeling!203 | 1 个月前 | |
feat(skills): add text generate and throughput optimizer executors Co-authored-by: lutean<lutean1@huawei.com> # message auto-generated for no-merge-commit merge: !278 merge develop into develop feat(skills): add text generate and throughput optimizer executors Created-by: lutean Commit-by: lutean Merged-by: ascend-robot Description: # PR Template Thanks for your contribution; we appreciate it a lot. The following instructions will make your pull request healthier and help you get feedback more easily. If you do not understand some items, don't worry, just make the pull request and seek help from maintainers. 感谢您的贡献,我们非常重视。以下说明将使您的拉取请求更健康,更易于获得反馈。如果您不理解某些项目,请不要担心,只需提交拉取请求并从维护人员那里寻求帮助即可。 **PR Type / PR类型** - [x] Feature(功能新增) - [ ] Bugfix(Bug 修复) - [ ] Docs(文档更新) - [ ] CI/CD(持续集成/持续部署) - [ ] Refactor(代码重构) - [ ] Perf(性能优化) - [ ] Test-Cases(测试用例更新) - [ ] Other(其他) ## 🔍 Motivation / 变更动机 **Please describe the motivation of this PR and the goal you want to achieve through this PR.** **请描述您的拉取请求的动机和您希望通过此拉取请求实现的目标。** 当前 text_generate 和 throughput_optimizer CLI 的参数组合较多,涉及模型、硬件 profile、并行策略、量化、PD 分离、SLO 约束等多类配置。用户在进行性能验证或吞吐规划时,容易遗漏关键参数、混淆部署模式,或无法快速从原始输出中提取可对比的结论。 本 PR 新增 text-generate-executor 和 throughput-optimizer-executor 两个 skills,用于将自然语言性能验证/吞吐规划需求转换为可确认、可执行的 CLI 命令,并在执行后整理关键指标与结果摘要。这样可以降低常见建模任务的使用门槛,提高参数收集、命令构造和结果解读的一致性。 ------ ## 📝 Modification / 修改内容 **Please briefly describe what modification is made in this PR.** **请简要描述此拉取请求中进行的修改。** | 文件 | 说明 | |:---|:---| | skills/text-generate-executor/SKILL.md | Skill 主说明和执行规则 | | skills/text-generate-executor/references/dialog-flow.md | 渐进式问参流程 | | skills/text-generate-executor/references/text-generate-params.md | text_generate 参数速查 | | skills/throughput-optimizer-executor/SKILL.md | Skill 主说明和执行规则 | | skills/throughput-optimizer-executor/references/dialog-flow.md | 问参流程和模式分支 | | skills/throughput-optimizer-executor/references/throughput-optimizer-params.md | 参数说明和默认规则 | ------ ## 📐 Associated Test Results / 关联测试结果 **Please provide the related test results, such as test reports, etc.** **请提供相关测试结果,例如测试报告等。** ------ ## 🌟 Use cases (Optional) / 使用案例(可选) **If this PR introduces a new feature, it is better to list some use cases here and update the documentation.** **如果此拉取请求引入了新功能,最好在此处列出一些用例并更新文档。** ------ ## ✅ Checklist / 检查列表 **Before PR**: - [ ] Bug fixes are fully covered by unit tests, the case that causes the bug should be added in the unit tests. / 修复的 Bug 已完全由单元测试覆盖,导致 Bug 的情况应在单元测试中添加。 - [ ] The modification is covered by complete unit tests. If not, please add more unit tests to ensure the correctness. / 此拉取请求中的修改已完全由单元测试覆盖。如果不是,请添加更多单元测试以确保正确性。 - [ ] All relevant documentation (API docs, docstrings, example tutorials) has been updated to reflect these changes. / 所有相关文档(API 文档、文档字符串、示例教程)已更新以反映这些更改。 - [ ] Please ensure code files contain no Chinese comments. / 请保证代码文件中不含中文注释。 ------ See merge request: Ascend/msmodeling!278 | 26 天前 | |
feat(skills): add text generate and throughput optimizer executors Co-authored-by: lutean<lutean1@huawei.com> # message auto-generated for no-merge-commit merge: !278 merge develop into develop feat(skills): add text generate and throughput optimizer executors Created-by: lutean Commit-by: lutean Merged-by: ascend-robot Description: # PR Template Thanks for your contribution; we appreciate it a lot. The following instructions will make your pull request healthier and help you get feedback more easily. If you do not understand some items, don't worry, just make the pull request and seek help from maintainers. 感谢您的贡献,我们非常重视。以下说明将使您的拉取请求更健康,更易于获得反馈。如果您不理解某些项目,请不要担心,只需提交拉取请求并从维护人员那里寻求帮助即可。 **PR Type / PR类型** - [x] Feature(功能新增) - [ ] Bugfix(Bug 修复) - [ ] Docs(文档更新) - [ ] CI/CD(持续集成/持续部署) - [ ] Refactor(代码重构) - [ ] Perf(性能优化) - [ ] Test-Cases(测试用例更新) - [ ] Other(其他) ## 🔍 Motivation / 变更动机 **Please describe the motivation of this PR and the goal you want to achieve through this PR.** **请描述您的拉取请求的动机和您希望通过此拉取请求实现的目标。** 当前 text_generate 和 throughput_optimizer CLI 的参数组合较多,涉及模型、硬件 profile、并行策略、量化、PD 分离、SLO 约束等多类配置。用户在进行性能验证或吞吐规划时,容易遗漏关键参数、混淆部署模式,或无法快速从原始输出中提取可对比的结论。 本 PR 新增 text-generate-executor 和 throughput-optimizer-executor 两个 skills,用于将自然语言性能验证/吞吐规划需求转换为可确认、可执行的 CLI 命令,并在执行后整理关键指标与结果摘要。这样可以降低常见建模任务的使用门槛,提高参数收集、命令构造和结果解读的一致性。 ------ ## 📝 Modification / 修改内容 **Please briefly describe what modification is made in this PR.** **请简要描述此拉取请求中进行的修改。** | 文件 | 说明 | |:---|:---| | skills/text-generate-executor/SKILL.md | Skill 主说明和执行规则 | | skills/text-generate-executor/references/dialog-flow.md | 渐进式问参流程 | | skills/text-generate-executor/references/text-generate-params.md | text_generate 参数速查 | | skills/throughput-optimizer-executor/SKILL.md | Skill 主说明和执行规则 | | skills/throughput-optimizer-executor/references/dialog-flow.md | 问参流程和模式分支 | | skills/throughput-optimizer-executor/references/throughput-optimizer-params.md | 参数说明和默认规则 | ------ ## 📐 Associated Test Results / 关联测试结果 **Please provide the related test results, such as test reports, etc.** **请提供相关测试结果,例如测试报告等。** ------ ## 🌟 Use cases (Optional) / 使用案例(可选) **If this PR introduces a new feature, it is better to list some use cases here and update the documentation.** **如果此拉取请求引入了新功能,最好在此处列出一些用例并更新文档。** ------ ## ✅ Checklist / 检查列表 **Before PR**: - [ ] Bug fixes are fully covered by unit tests, the case that causes the bug should be added in the unit tests. / 修复的 Bug 已完全由单元测试覆盖,导致 Bug 的情况应在单元测试中添加。 - [ ] The modification is covered by complete unit tests. If not, please add more unit tests to ensure the correctness. / 此拉取请求中的修改已完全由单元测试覆盖。如果不是,请添加更多单元测试以确保正确性。 - [ ] All relevant documentation (API docs, docstrings, example tutorials) has been updated to reflect these changes. / 所有相关文档(API 文档、文档字符串、示例教程)已更新以反映这些更改。 - [ ] Please ensure code files contain no Chinese comments. / 请保证代码文件中不含中文注释。 ------ See merge request: Ascend/msmodeling!278 | 26 天前 | |
feat(deps): adopt uv lockfile for reproducible dependency management Co-authored-by: liujiawang<anonymousdev@163.com> # message auto-generated for no-merge-commit merge: !267 merge uv into develop feat(deps): adopt uv lockfile for reproducible dependency management Created-by: AvadaKedavrua Commit-by: liujiawang Merged-by: ascend-robot Description: # PR Template Thanks for your contribution; we appreciate it a lot. The following instructions will make your pull request healthier and help you get feedback more easily. If you do not understand some items, don't worry, just make the pull request and seek help from maintainers. 感谢您的贡献,我们非常重视。以下说明将使您的拉取请求更健康,更易于获得反馈。如果您不理解某些项目,请不要担心,只需提交拉取请求并从维护人员那里寻求帮助即可。 **PR Type / PR类型** - [x] Feature(功能新增) - [ ] Bugfix(Bug 修复) - [x] Docs(文档更新) - [x] CI/CD(持续集成/持续部署) - [x] Refactor(代码重构) - [ ] Perf(性能优化) - [ ] Test-Cases(测试用例更新) - [ ] Other(其他) ## 🔍 Motivation / 变更动机 **Please describe the motivation of this PR and the goal you want to achieve through this PR.** **请描述您的拉取请求的动机和您希望通过此拉取请求实现的目标。** msmodeling 是 **CPU 仿真**框架,但依赖 Python / PyTorch / Transformers 强耦合,现有 pip install -r requirements.txt 带来大量环境类问题: | 痛点 | 现状 | | --- | --- | | 版本不可复现 | 无 lockfile,每人/每次 CI 解析结果不同,issue 难复现 | | CUDA wheel 误装 | CPU 仿真却默认拉 multi-GB CUDA PyTorch,安装慢 | | Python 版本分裂 | 原 >=3.9 与 Transformers 5.x(需 ≥3.10)冲突 | | 依赖源漂移 | requirements.txt 与运行时行为不一致;check_dependencies() 静默 pip install 覆盖用户环境 | | **PyTorch 生态版本匹配** | 装好 torch 后,再装 torchvision 等组件需上网查兼容版本,费时易错 | | **手写 manifest** | 新加依赖需手改 pyproject.toml / requirements.txt,易 typo、漏约束 | **本 PR 目标**:引入 **uv + 纳入版本库的 uv.lock** 作为可复现依赖契约,同时保留 requirements.txt 作为 pip 兼容路径。 **uv / uv.lock 价值(本 PR 核心)** 1. **可复现** — uv.lock 锁定完整传递依赖树(含 hash);开发、CI、维护者对照同一份版本基线。 2. **CPU-only PyTorch** — pyproject.toml 通过 [tool.uv.sources] 走官方 CPU index(torch / torchvision)。 3. **受控升级** — manifest 声明 bounds;具体版本由 lock 固定;升级需 uv lock + PR 审查。 4. **CI 防漂移** — 所有 run_*.sh 经 common.sh 执行 uv sync --frozen --group ci。 5. **双路径兼容** — README 推荐 uv;pip + requirements.txt 仍可用(无 lock,有意接受漂移)。 **uv add 工作流价值** | 场景 | 以前 | 现在 | | --- | --- | --- | | 新增依赖 | 手改 manifest + 自行查 torch/torchvision 兼容版本 | uv add <pkg> 一次完成解析、写 manifest、更新 lock | | 本地包开发 | 手配 -e | uv add --editable ./path | | 临时试验 | 污染 venv 或忘记卸载 | uv run --with <pkg> … 不改 lockfile | 示例:本 PR 通过 uv add torchvision 加入依赖——工具层未直接调用,但部分模型底层会 import torchvision;resolver 自动匹配与 locked torch 兼容的 0.25.0。 **为何将 uv.lock 纳入版本库(而非仅自动解析)** msmodeling 是克隆即运行的**应用/工具仓库**,不是发布到 PyPI 供下游再锁版本的 library。若不上库 lock、每次 uv sync 自动重新解析: | | **上库 uv.lock(本 PR)** | **不上库、自动解析** | | --- | --- | --- | | 可复现性 | 全树 pinned,issue 可对照 lock 复现 | 随 PyPI/时间变化,「我这边能跑」类 issue 复发 | | CI | --frozen 保证 manifest/lock 同步 | 须去掉 --frozen,CI 结果非确定 | | 升级 | 改 bounds → uv lock → 审 lock diff | 隐式升级,难追溯 | | 与 pip 路径关系 | lock = 开发/CI 真相;requirements.txt = 无 lock 的兼容退路 | 两条路径都不可复现 | **结论**:uv.lock **必须上库**;否则 uv 只剩「安装快」,无法解决版本漂移根因,且与现有 --frozen CI 矛盾。 **Lockfile 维护约定**:依赖变更 PR 须同时提交 pyproject.toml + uv.lock;优先 uv add;冲突时 rebase 后 uv lock,勿手改 lock。 Related: [#69](https://gitcode.com/Ascend/msmodeling/issues/69), RFC [docs/RFC/rfc_uv_dependency_management_en.md](docs/RFC/rfc_uv_dependency_management_en.md) **Out of scope**: NPU torch-npu 依赖 — 后续通过华为云源统一绑定。 ------ ## 📝 Modification / 修改内容 **Please briefly describe what modification is made in this PR.** **请简要描述此拉取请求中进行的修改。** - **pyproject.toml + uv.lock(上库)** — 运行时依赖;CPU index 路由 torch / torchvision;lint / ci groups - **新增 torchvision>=0.25.0** — 模型 import 路径需要;经 uv add 解析并与 locked torch 对齐 - **.pre-commit-config.yaml** — uv.lock 排除 large-file 检查 - **移除 check_dependencies()** — 版本由 uv / lockfile 管理 - **scripts/lib/common.sh** — uv sync --frozen --group ci(所有 run_*.sh) - **requirements.txt** — 对齐 bounds,含 torchvision;pip 退路 - **RFC** — 补充 uv add / --editable / --with 及 PyTorch 生态匹配痛点 ------ ## 📐 Associated Test Results / 关联测试结果 **Please provide the related test results, such as test reports, etc.** **请提供相关测试结果,例如测试报告等。** - 本地 uv sync --frozen --group ci 通过 - CodeArts pipeline 已触发验证(docs-ci-pipeline-success) ------ ## 🌟 Use cases (Optional) / 使用案例(可选) **If this PR introduces a new feature, it is better to list some use cases here and update the documentation.** **如果此拉取请求引入了新功能,最好在此处列出一些用例并更新文档。** **推荐(uv + 上库 lockfile)** bash pip install uv uv venv --python 3.13 .venv && source .venv/bin/activate uv sync uv add torchvision # 新增依赖:自动匹配 torch 版本并更新 lock uv run python -m cli.inference.text_generate --help **临时试验(不改 lock)** bash uv run --with some-package python -c "import some_package" **CI / 本地测试入口** bash bash scripts/run_smoke.sh # common.sh → uv sync --frozen --group ci **pip 兼容(无 lock)** bash pip install "torch>=2.7,<=2.10" "torchvision>=0.25.0" --index-url https://download.pytorch.org/whl/cpu pip install -r requirements.txt ------ ## ✅ Checklist / 检查列表 **Before PR**: - [ ] Bug fixes are fully covered by unit tests, the case that causes the bug should be added in the unit tests. / 修复的 Bug 已完全由单元测试覆盖,导致 Bug 的情况应在单元测试中添加。 - [ ] The modification is covered by complete unit tests. If not, please add more unit tests to ensure the correctness. / 此拉取请求中的修改已完全由单元测试覆盖。如果不是,请添加更多单元测试以确保正确性。 - [x] All relevant documentation (API docs, docstrings, example tutorials) has been updated to reflect these changes. / 所有相关文档(API 文档、文档字符串、示例教程)已更新以反映这些更改。 - [x] Please ensure code files contain no Chinese comments. / 请保证代码文件中不含中文注释。 ------ See merge request: Ascend/msmodeling!267 | 27 天前 |
| 文件 | 最后提交记录 | 最后更新时间 |
|---|---|---|
| 1 个月前 | ||
| 1 个月前 | ||
| 26 天前 | ||
| 26 天前 | ||
| 1 个月前 | ||
| 1 个月前 | ||
| 26 天前 | ||
| 20 天前 | ||
| 20 天前 | ||
| 1 个月前 | ||
| 1 个月前 | ||
| 1 个月前 | ||
| 1 个月前 | ||
| 28 天前 | ||
| 1 个月前 | ||
| 26 天前 | ||
| 1 个月前 | ||
| 1 个月前 | ||
| 1 个月前 | ||
| 1 个月前 | ||
| 27 天前 | ||
| 27 天前 | ||
| 26 天前 | ||
| 26 天前 | ||
| 26 天前 | ||
| 2 个月前 | ||
| 2 个月前 | ||
| 19 天前 | ||
| 19 天前 | ||
| 1 个月前 | ||
| 1 个月前 | ||
| 1 个月前 | ||
| 3 个月前 | ||
| 3 个月前 | ||
| 1 个月前 | ||
| 1 个月前 | ||
| 1 个月前 | ||
| 5 个月前 | ||
| 1 个月前 | ||
| 26 天前 | ||
| 1 个月前 | ||
| 1 个月前 | ||
| 1 个月前 | ||
| 26 天前 | ||
| 26 天前 | ||
| 27 天前 |