| 文件 | 最后提交记录 | 最后更新时间 |
|---|---|---|
support rms_norm quant fusion; refactor graph compilation: add graph pass abstraction and made pattern match as passes; fix compilation under runtime mode | 9 个月前 | |
chore(ci): adopt pre-commit and retire legacy lintrunner adapters Co-authored-by: liujiawang<anonymousdev@163.com> # message auto-generated for no-merge-commit merge: !176 merge pre-commit into develop chore(ci): adopt pre-commit and retire legacy lintrunner adapters Created-by: AvadaKedavrua Commit-by: liujiawang;AvadaKedavrua Merged-by: ascend-robot Description: Thanks for your contribution; we appreciate it a lot. The following instructions will make your pull request healthier and help you get feedback more easily. If you do not understand some items, don't worry, just make the pull request and seek help from maintainers. 感谢您的贡献,我们非常重视。以下说明将使您的拉取请求更健康,更易于获得反馈。如果您不理解某些项目,请不要担心,只需提交拉取请求并从维护人员那里寻求帮助即可。 **PR Type / PR类型** - [ ] Feature(功能新增) - [ ] Bugfix(Bug 修复) - [x] Docs(文档更新) - [x] CI/CD(持续集成/持续部署) - [ ] Refactor(代码重构) - [ ] Perf(性能优化) - [ ] Test-Cases(测试用例更新) - [ ] Other(其他) ------ ## Motivation / 变更动机 Continue the **pre-commit** migration: tighten **Pylint** so only high-signal messages run ( disable=all + explicit enable list), fix real issues that remained under that profile, and translate hook/config comments to **English**. ------ ## Configuration changes(仅工具与注释 / tooling & comments only) | Path | What changed | |------|----------------| | pre-commit/pyproject.toml | **Pylint:** [tool.pylint."messages control"] with disable = ["all"] and a short **allowlist** of message IDs (E0100, E0601–E0611, E0632, E1101, E1120, W0632, W1514). **Ruff:** unchanged behavior; comments translated to English. **Bandit:** comments translated; rule allowlist/skip lists unchanged. | | .pre-commit-config.yaml | Comments translated to English; Bandit hook display name set to **bandit (Python security checks)**. Hook versions and args unchanged except for comment text. | ------ ## Source code changes(应用代码 / application code) | Area | Files | Purpose | |------|--------|---------| | serving_cast | communication.py, engine.py, instance.py, kv_cache_manager.py, load_gen.py, main.py, model_runner.py, request.py, serving.py, utils.py | Replace from . import stime with import serving_cast.stime as stime so Pylint resolves imports (fixes **E0611**). | | serving_cast | stime.py | Singleton **salabim** Environment via _get_sim_env() so type checkers/Pylint see **sim.Environment** (fixes **E1101** on SimulationEnv). | | serving_cast/service | base_throughput_optimizer.py | __init__ defaults + assert runner is not None before run_inference (fixes **E1101** on base class). | | tensor_cast | diffusers/diffusers_model.py, diffusers/diffusers_utils.py, runtime.py | Add **encoding="utf-8"** to open() / trace export (fixes **W1514**). | | web_ui | callbacks.py | **refresh_optimizer_detail:** call _optimizer_detail_view(rows, None, device) and unpack five return values (fixes **E1120**). | ------ ## Recent commits on pre-commit branch - ci(pre-commit): fix pylint message selection with disable=all - fix: resolve pylint findings in serving_cast, tensor_cast, and web_ui - docs(pre-commit): translate comments to English and add all-files run log ------  ------ ## Checklist / 检查列表 - [x] Please ensure code files contain no Chinese comments. / 请保证代码文件中不含中文注释。 See merge request: Ascend/msmodeling!176 | 1 个月前 | |
chore(ci): adopt pre-commit and retire legacy lintrunner adapters Co-authored-by: liujiawang<anonymousdev@163.com> # message auto-generated for no-merge-commit merge: !176 merge pre-commit into develop chore(ci): adopt pre-commit and retire legacy lintrunner adapters Created-by: AvadaKedavrua Commit-by: liujiawang;AvadaKedavrua Merged-by: ascend-robot Description: Thanks for your contribution; we appreciate it a lot. The following instructions will make your pull request healthier and help you get feedback more easily. If you do not understand some items, don't worry, just make the pull request and seek help from maintainers. 感谢您的贡献,我们非常重视。以下说明将使您的拉取请求更健康,更易于获得反馈。如果您不理解某些项目,请不要担心,只需提交拉取请求并从维护人员那里寻求帮助即可。 **PR Type / PR类型** - [ ] Feature(功能新增) - [ ] Bugfix(Bug 修复) - [x] Docs(文档更新) - [x] CI/CD(持续集成/持续部署) - [ ] Refactor(代码重构) - [ ] Perf(性能优化) - [ ] Test-Cases(测试用例更新) - [ ] Other(其他) ------ ## Motivation / 变更动机 Continue the **pre-commit** migration: tighten **Pylint** so only high-signal messages run ( disable=all + explicit enable list), fix real issues that remained under that profile, and translate hook/config comments to **English**. ------ ## Configuration changes(仅工具与注释 / tooling & comments only) | Path | What changed | |------|----------------| | pre-commit/pyproject.toml | **Pylint:** [tool.pylint."messages control"] with disable = ["all"] and a short **allowlist** of message IDs (E0100, E0601–E0611, E0632, E1101, E1120, W0632, W1514). **Ruff:** unchanged behavior; comments translated to English. **Bandit:** comments translated; rule allowlist/skip lists unchanged. | | .pre-commit-config.yaml | Comments translated to English; Bandit hook display name set to **bandit (Python security checks)**. Hook versions and args unchanged except for comment text. | ------ ## Source code changes(应用代码 / application code) | Area | Files | Purpose | |------|--------|---------| | serving_cast | communication.py, engine.py, instance.py, kv_cache_manager.py, load_gen.py, main.py, model_runner.py, request.py, serving.py, utils.py | Replace from . import stime with import serving_cast.stime as stime so Pylint resolves imports (fixes **E0611**). | | serving_cast | stime.py | Singleton **salabim** Environment via _get_sim_env() so type checkers/Pylint see **sim.Environment** (fixes **E1101** on SimulationEnv). | | serving_cast/service | base_throughput_optimizer.py | __init__ defaults + assert runner is not None before run_inference (fixes **E1101** on base class). | | tensor_cast | diffusers/diffusers_model.py, diffusers/diffusers_utils.py, runtime.py | Add **encoding="utf-8"** to open() / trace export (fixes **W1514**). | | web_ui | callbacks.py | **refresh_optimizer_detail:** call _optimizer_detail_view(rows, None, device) and unpack five return values (fixes **E1120**). | ------ ## Recent commits on pre-commit branch - ci(pre-commit): fix pylint message selection with disable=all - fix: resolve pylint findings in serving_cast, tensor_cast, and web_ui - docs(pre-commit): translate comments to English and add all-files run log ------  ------ ## Checklist / 检查列表 - [x] Please ensure code files contain no Chinese comments. / 请保证代码文件中不含中文注释。 See merge request: Ascend/msmodeling!176 | 1 个月前 | |
feat:仿真建模支持deepseek-V4模型适配 Co-authored-by: ChenHuiwen<chenhuiwen7@huawei.com> # message auto-generated for no-merge-commit merge: !166 merge deepseek-v4 into develop feat:仿真建模支持deepseek-V4模型适配 Created-by: ChenHuiwen Commit-by: ChenHuiwen Merged-by: ascend-robot Description: Thanks for your contribution; we appreciate it a lot. The following instructions will make your pull request healthier and help you get feedback more easily. If you do not understand some items, don't worry, just make the pull request and seek help from maintainers. 感谢您的贡献,我们非常重视。以下说明将使您的拉取请求更健康,更易于获得反馈。如果您不理解某些项目,请不要担心,只需提交拉取请求并从维护人员那里寻求帮助即可。 **PR Type / PR类型** - [x] Feature(功能新增) - [ ] Bugfix(Bug 修复) - [ ] Docs(文档更新) - [ ] CI/CD(持续集成/持续部署) - [ ] Refactor(代码重构) - [ ] Perf(性能优化) - [ ] Test-Cases(测试用例更新) - [ ] Other(其他) ## 🔍 Motivation / 变更动机 为 msmodeling/tensor_cast 增加对 DeepSeek V4 (Flash/Pro) 模型的端到端支持,使其性能建模流水线能够覆盖 V4 引入的稀疏注意力(NSA / Window / Compressed / Heavily-Compressed 多 layer-type 路由)、HC(Head Compression)混合、Sinkhorn 拆分以及 Hash Routing MoE 等新结构,并补齐对应的 fake-tensor 语义算子与代价模型,让 V4 模型可以直接走通现有 analytic / multistream tracing 流程。 ------ ## 📝 Modification / 修改内容 新增文件 / New files - tensor_cast/transformers/builtin_model/deepseek_v4.py:DeepSeek V4 builtin model profile,包含 DeepseekV4Config / DeepseekV4Model 注册、layer-type 校验({0, 4, 128} 对应 sliding_attention / compressed_sparse_attention / heavily_compressed_attention)、以及与 transformers AutoConfig / AutoModel 的安全注册逻辑。 - tests/test_tensor_cast/test_deepseek_v4.py 与 tests/test_tensor_cast/data/deepseek_v4/*.json:V4 模型对应的测试数据集与用例(含合法/非法/缺失/截短的 ratios 配置)。 注意力 / Attention(tensor_cast/layers/mla.py,tensor_cast/ops/mla.py,tensor_cast/ops/rotary_embedding.py) - 新增 DeepseekV4SparseAttention 与 MultiheadLatentAttentionTensorCast 适配(含 requires_legacy_kv_b_decomposition、KV-cache window 写入路径等)。 - 新增 get_window_topk_idxs / get_compress_topk_idxs 索引生成工具。 - 新增 HC 路径语义算子:hc_pre_inv_rms、hc_pre_sinkhorn,分别对应参考实现中的 inverse-RMS 缩放与 Sinkhorn 加权 reduction。 - 新增 scatter_nd_update_mla 等 KV 写入算子的代价模型,按参考实现仅计 source 行读 + 更新行写,不计 slot_mapping / 整 cache 张量。 MoE / Gate(tensor_cast/layers/moe_layer.py,tensor_cast/ops/fused_moe.py) - MoELayer 增加 V4 统一 gating 路径:识别 gate 上的 is_v4 / hash 标志位,按参考 Gate.forward 顺序发出 matmul + score func + indices + gather/normalize/route_scale 各算子,使每一步按其真实 dtype(gate matmul 走 fp32)单独计费。 - 新增 moe_gating_top_k(带可选 bias 的 V4 非 hash 层)与 moe_gating_top_k_hash(基于 tid2eid 表的 hash 路由层)两个语义算子。 性能模型 / Performance Model(tensor_cast/performance_model/__init__.py) - 引入 _safe_max_int 工具:在 fake / meta / functional tensor 上 tensor.max().item() 不可用时回退为 None,让 caller 走 shape-based 估算。 - 注册 V4 新算子(scatter_nd_update_mla、HC 系列、MoE 新 gating tail 等)的 PerformanceProperties,与参考实现的内存访问语义对齐。 其他 / Misc - tensor_cast/core/config_resolver.py、input_generator.py、model_runner.py、device.py、transformers/transformations.py、 transformers/custom_model_registry.py、layers/utils.py、model_config.py、compilation/passes/multistream_pass.py:补齐 V4 在 config 解析、输入构造、runner 调度、device profile、模型变换与算子注册各环节的接入。 ------ ## 📐 Associated Test Results / 关联测试结果 **Please provide the related test results, such as test reports, etc.** **请提供相关测试结果,例如测试报告等。**   ------ ## 🌟 Use cases (Optional) / 使用案例(可选) **If this PR introduces a new feature, it is better to list some use cases here and update the documentation.** **如果此拉取请求引入了新功能,最好在此处列出一些用例并更新文档。** ------ ## ✅ Checklist / 检查列表 **Before PR**: - [ ] [Linting tools](https://gitcode.com/Ascend/msmodeling/blob/develop/tensor_cast/README.md#coding-style) are used to fix the potential lint issues. / 使用 [lintrunner 工具](https://gitcode.com/Ascend/msmodeling/blob/develop/tensor_cast/README.md#coding-style) 来修复潜在的 lint 问题。 - [ ] Bug fixes are fully covered by unit tests, the case that causes the bug should be added in the unit tests. / 修复的 Bug 已完全由单元测试覆盖,导致 Bug 的情况应在单元测试中添加。 - [ ] The modification is covered by complete unit tests. If not, please add more unit tests to ensure the correctness. / 此拉取请求中的修改已完全由单元测试覆盖。如果不是,请添加更多单元测试以确保正确性。 - [ ] All relevant documentation (API docs, docstrings, example tutorials) has been updated to reflect these changes. / 所有相关文档(API 文档、文档字符串、示例教程)已更新以反映这些更改。 - [ ] Please ensure code files contain no Chinese comments. / 请保证代码文件中不含中文注释。 ------ See merge request: Ascend/msmodeling!166 | 21 天前 | |
chore(ci): adopt pre-commit and retire legacy lintrunner adapters Co-authored-by: liujiawang<anonymousdev@163.com> # message auto-generated for no-merge-commit merge: !176 merge pre-commit into develop chore(ci): adopt pre-commit and retire legacy lintrunner adapters Created-by: AvadaKedavrua Commit-by: liujiawang;AvadaKedavrua Merged-by: ascend-robot Description: Thanks for your contribution; we appreciate it a lot. The following instructions will make your pull request healthier and help you get feedback more easily. If you do not understand some items, don't worry, just make the pull request and seek help from maintainers. 感谢您的贡献,我们非常重视。以下说明将使您的拉取请求更健康,更易于获得反馈。如果您不理解某些项目,请不要担心,只需提交拉取请求并从维护人员那里寻求帮助即可。 **PR Type / PR类型** - [ ] Feature(功能新增) - [ ] Bugfix(Bug 修复) - [x] Docs(文档更新) - [x] CI/CD(持续集成/持续部署) - [ ] Refactor(代码重构) - [ ] Perf(性能优化) - [ ] Test-Cases(测试用例更新) - [ ] Other(其他) ------ ## Motivation / 变更动机 Continue the **pre-commit** migration: tighten **Pylint** so only high-signal messages run ( disable=all + explicit enable list), fix real issues that remained under that profile, and translate hook/config comments to **English**. ------ ## Configuration changes(仅工具与注释 / tooling & comments only) | Path | What changed | |------|----------------| | pre-commit/pyproject.toml | **Pylint:** [tool.pylint."messages control"] with disable = ["all"] and a short **allowlist** of message IDs (E0100, E0601–E0611, E0632, E1101, E1120, W0632, W1514). **Ruff:** unchanged behavior; comments translated to English. **Bandit:** comments translated; rule allowlist/skip lists unchanged. | | .pre-commit-config.yaml | Comments translated to English; Bandit hook display name set to **bandit (Python security checks)**. Hook versions and args unchanged except for comment text. | ------ ## Source code changes(应用代码 / application code) | Area | Files | Purpose | |------|--------|---------| | serving_cast | communication.py, engine.py, instance.py, kv_cache_manager.py, load_gen.py, main.py, model_runner.py, request.py, serving.py, utils.py | Replace from . import stime with import serving_cast.stime as stime so Pylint resolves imports (fixes **E0611**). | | serving_cast | stime.py | Singleton **salabim** Environment via _get_sim_env() so type checkers/Pylint see **sim.Environment** (fixes **E1101** on SimulationEnv). | | serving_cast/service | base_throughput_optimizer.py | __init__ defaults + assert runner is not None before run_inference (fixes **E1101** on base class). | | tensor_cast | diffusers/diffusers_model.py, diffusers/diffusers_utils.py, runtime.py | Add **encoding="utf-8"** to open() / trace export (fixes **W1514**). | | web_ui | callbacks.py | **refresh_optimizer_detail:** call _optimizer_detail_view(rows, None, device) and unpack five return values (fixes **E1120**). | ------ ## Recent commits on pre-commit branch - ci(pre-commit): fix pylint message selection with disable=all - fix: resolve pylint findings in serving_cast, tensor_cast, and web_ui - docs(pre-commit): translate comments to English and add all-files run log ------  ------ ## Checklist / 检查列表 - [x] Please ensure code files contain no Chinese comments. / 请保证代码文件中不含中文注释。 See merge request: Ascend/msmodeling!176 | 1 个月前 | |
chore(ci): adopt pre-commit and retire legacy lintrunner adapters Co-authored-by: liujiawang<anonymousdev@163.com> # message auto-generated for no-merge-commit merge: !176 merge pre-commit into develop chore(ci): adopt pre-commit and retire legacy lintrunner adapters Created-by: AvadaKedavrua Commit-by: liujiawang;AvadaKedavrua Merged-by: ascend-robot Description: Thanks for your contribution; we appreciate it a lot. The following instructions will make your pull request healthier and help you get feedback more easily. If you do not understand some items, don't worry, just make the pull request and seek help from maintainers. 感谢您的贡献,我们非常重视。以下说明将使您的拉取请求更健康,更易于获得反馈。如果您不理解某些项目,请不要担心,只需提交拉取请求并从维护人员那里寻求帮助即可。 **PR Type / PR类型** - [ ] Feature(功能新增) - [ ] Bugfix(Bug 修复) - [x] Docs(文档更新) - [x] CI/CD(持续集成/持续部署) - [ ] Refactor(代码重构) - [ ] Perf(性能优化) - [ ] Test-Cases(测试用例更新) - [ ] Other(其他) ------ ## Motivation / 变更动机 Continue the **pre-commit** migration: tighten **Pylint** so only high-signal messages run ( disable=all + explicit enable list), fix real issues that remained under that profile, and translate hook/config comments to **English**. ------ ## Configuration changes(仅工具与注释 / tooling & comments only) | Path | What changed | |------|----------------| | pre-commit/pyproject.toml | **Pylint:** [tool.pylint."messages control"] with disable = ["all"] and a short **allowlist** of message IDs (E0100, E0601–E0611, E0632, E1101, E1120, W0632, W1514). **Ruff:** unchanged behavior; comments translated to English. **Bandit:** comments translated; rule allowlist/skip lists unchanged. | | .pre-commit-config.yaml | Comments translated to English; Bandit hook display name set to **bandit (Python security checks)**. Hook versions and args unchanged except for comment text. | ------ ## Source code changes(应用代码 / application code) | Area | Files | Purpose | |------|--------|---------| | serving_cast | communication.py, engine.py, instance.py, kv_cache_manager.py, load_gen.py, main.py, model_runner.py, request.py, serving.py, utils.py | Replace from . import stime with import serving_cast.stime as stime so Pylint resolves imports (fixes **E0611**). | | serving_cast | stime.py | Singleton **salabim** Environment via _get_sim_env() so type checkers/Pylint see **sim.Environment** (fixes **E1101** on SimulationEnv). | | serving_cast/service | base_throughput_optimizer.py | __init__ defaults + assert runner is not None before run_inference (fixes **E1101** on base class). | | tensor_cast | diffusers/diffusers_model.py, diffusers/diffusers_utils.py, runtime.py | Add **encoding="utf-8"** to open() / trace export (fixes **W1514**). | | web_ui | callbacks.py | **refresh_optimizer_detail:** call _optimizer_detail_view(rows, None, device) and unpack five return values (fixes **E1120**). | ------ ## Recent commits on pre-commit branch - ci(pre-commit): fix pylint message selection with disable=all - fix: resolve pylint findings in serving_cast, tensor_cast, and web_ui - docs(pre-commit): translate comments to English and add all-files run log ------  ------ ## Checklist / 检查列表 - [x] Please ensure code files contain no Chinese comments. / 请保证代码文件中不含中文注释。 See merge request: Ascend/msmodeling!176 | 1 个月前 | |
chore(ci): adopt pre-commit and retire legacy lintrunner adapters Co-authored-by: liujiawang<anonymousdev@163.com> # message auto-generated for no-merge-commit merge: !176 merge pre-commit into develop chore(ci): adopt pre-commit and retire legacy lintrunner adapters Created-by: AvadaKedavrua Commit-by: liujiawang;AvadaKedavrua Merged-by: ascend-robot Description: Thanks for your contribution; we appreciate it a lot. The following instructions will make your pull request healthier and help you get feedback more easily. If you do not understand some items, don't worry, just make the pull request and seek help from maintainers. 感谢您的贡献,我们非常重视。以下说明将使您的拉取请求更健康,更易于获得反馈。如果您不理解某些项目,请不要担心,只需提交拉取请求并从维护人员那里寻求帮助即可。 **PR Type / PR类型** - [ ] Feature(功能新增) - [ ] Bugfix(Bug 修复) - [x] Docs(文档更新) - [x] CI/CD(持续集成/持续部署) - [ ] Refactor(代码重构) - [ ] Perf(性能优化) - [ ] Test-Cases(测试用例更新) - [ ] Other(其他) ------ ## Motivation / 变更动机 Continue the **pre-commit** migration: tighten **Pylint** so only high-signal messages run ( disable=all + explicit enable list), fix real issues that remained under that profile, and translate hook/config comments to **English**. ------ ## Configuration changes(仅工具与注释 / tooling & comments only) | Path | What changed | |------|----------------| | pre-commit/pyproject.toml | **Pylint:** [tool.pylint."messages control"] with disable = ["all"] and a short **allowlist** of message IDs (E0100, E0601–E0611, E0632, E1101, E1120, W0632, W1514). **Ruff:** unchanged behavior; comments translated to English. **Bandit:** comments translated; rule allowlist/skip lists unchanged. | | .pre-commit-config.yaml | Comments translated to English; Bandit hook display name set to **bandit (Python security checks)**. Hook versions and args unchanged except for comment text. | ------ ## Source code changes(应用代码 / application code) | Area | Files | Purpose | |------|--------|---------| | serving_cast | communication.py, engine.py, instance.py, kv_cache_manager.py, load_gen.py, main.py, model_runner.py, request.py, serving.py, utils.py | Replace from . import stime with import serving_cast.stime as stime so Pylint resolves imports (fixes **E0611**). | | serving_cast | stime.py | Singleton **salabim** Environment via _get_sim_env() so type checkers/Pylint see **sim.Environment** (fixes **E1101** on SimulationEnv). | | serving_cast/service | base_throughput_optimizer.py | __init__ defaults + assert runner is not None before run_inference (fixes **E1101** on base class). | | tensor_cast | diffusers/diffusers_model.py, diffusers/diffusers_utils.py, runtime.py | Add **encoding="utf-8"** to open() / trace export (fixes **W1514**). | | web_ui | callbacks.py | **refresh_optimizer_detail:** call _optimizer_detail_view(rows, None, device) and unpack five return values (fixes **E1120**). | ------ ## Recent commits on pre-commit branch - ci(pre-commit): fix pylint message selection with disable=all - fix: resolve pylint findings in serving_cast, tensor_cast, and web_ui - docs(pre-commit): translate comments to English and add all-files run log ------  ------ ## Checklist / 检查列表 - [x] Please ensure code files contain no Chinese comments. / 请保证代码文件中不含中文注释。 See merge request: Ascend/msmodeling!176 | 1 个月前 | |
fix(tensor_cast): guard unsafe sequence parallel rewrites Co-authored-by: Horacehxw<horacehxw@gmail.com> # message auto-generated for no-merge-commit merge: !326 merge issue-90-v2 into develop fix(tensor_cast): guard unsafe sequence parallel rewrites Created-by: Horacehxw Commit-by: Horacehxw Merged-by: ascend-robot Description: **PR Type / PR类型** - [ ] Feature(功能新增) - [x] Bugfix(Bug 修复) - [ ] Docs(文档更新) - [ ] Refactor(代码重构) - [x] Test-Cases(测试用例更新) - [ ] Other(其他) ## 🔍 Motivation / 变更动机 Fix GitCode issue #90 root cause in TensorCast Sequence Parallel. GLM5 TP>1 compile graphs can match all_reduce -> add_rms_norm2 while the residual side is still full-shape. Rewriting only the all-reduce side to reduce_scatter creates mixed full/local inputs and may later double-expand the sequence dimension through all_gather, causing compile-time reshape failures such as shape '[1, 128, 6144]' is invalid for input of size 1572864. The pass also lacked a shardability precheck, so non-divisible sequence lengths such as query_length=127, TP=2 reached the fake reduce_scatter exact-division assertion. This PR is intentionally scoped to the TensorCast SP pass root cause and focused regression tests. Throughput optimizer / ServingCast CLI exposure is not included in this PR. ------ ## 📝 Modification / 修改内容 Final diff only changes 2 files: - tensor_cast/compilation/passes/sequence_parallel_pass.py - tests/regression/tensor_cast/test_sp_pass_unit.py Main changes: - Add shape/provenance helpers for SP-local values and expected reduce_scatter output shape. - Guard P2 so add_rms_norm2 rewrites only happen when the non-communication input is proven SP-local and shape-compatible with the reduce_scatter result. - Mark successfully localized add_rms_norm2 nodes with tensor_cast_sp_local. - Allow P3 to consume only residuals from add_rms_norm2 nodes already localized by P2. - Add a shardability check before SP rewrites to skip non-divisible shard dimensions instead of reaching exact_division assertions. - Share reduce-scatter insertion logic across P1/P2/P3, including view repair for 2-D/3-D metadata mismatches. - Add focused regression coverage for unsafe GLM5-style P2 residuals, all-gathered full residuals, P3 tail skips, non-divisible shard dimensions, and existing local Qwen3-style SP paths. ------ ## 📐 Associated Test Results / 关联测试结果 Environment timestamp: 2026-06-11 12:52 +08, local worktree issue-90-v2, commit e29c4b49fd697d0472974ddc91aa0b40f0737a97. ### UT / Regression - [x] python -m pytest tests/regression/tensor_cast/test_sp_pass_unit.py -q - Result: 32 passed in 0.06s - [x] python -m pytest tests/regression/tensor_cast/test_sequence_parallel_pass.py -q -m nightly - Result: 2 passed in 2.48s - [x] python -m pytest tests/regression/tensor_cast -q - Result: 605 passed, 125 deselected, 13 warnings, 150 subtests passed in 310.83s (0:05:10) - [x] python -m py_compile tensor_cast/compilation/passes/sequence_parallel_pass.py tests/regression/tensor_cast/test_sp_pass_unit.py - Result: passed - [x] python -m ruff check tensor_cast/compilation/passes/sequence_parallel_pass.py tests/regression/tensor_cast/test_sp_pass_unit.py - Result: All checks passed! - [x] git diff --check gitcode-ascend/develop...HEAD - Result: passed ### Issue #90 Direct Repro And Bad Cases All commands below exited with code 0. Log scan found no old failure signatures: no BackendCompilerFailed, no old invalid reshape, no AssertionError, no RuntimeError, no TypeError: cannot pickle. - [x] GLM5 q=128, TP=2, compile, SP, W8A8_DYNAMIC, 1 layer - Command: python -X utf8 -m tensor_cast.scripts.text_generate zai-org/GLM-5 --device ATLAS_800_A3_752T_128G_DIE --num-queries 16 --query-length 128 --context-length 0 --compile --world-size 32 --tp-size 2 --dp-size 16 --ep-size 32 --moe-tp-size 1 --moe-dp-size 1 --quantize-linear-action W8A8_DYNAMIC --quantize-attention-action DISABLED --enable-sequence-parallel --num-hidden-layers-override 1 --log-level debug - SP log: SP ordered rewrites: 0 P1, 0 P2 matches; SP ordered rewrites: 0 P3 matches - Result: [analytic] Execution time: 0.001284 s, TPS/Device: 4.984e+04 token/s - [x] GLM5 q=128, TP=2, compile, SP, linear quant disabled, 1 layer - SP log: SP ordered rewrites: 0 P1, 0 P2 matches; SP ordered rewrites: 0 P3 matches - Result: [analytic] Execution time: 0.001468 s, TPS/Device: 4.359e+04 token/s - [x] GLM5 q=127, TP=2, compile, SP, linear quant disabled, 1 layer - SP log: SP pass: skipping because shard dimension is not divisible by 2 - Result: [analytic] Execution time: 0.001468 s, TPS/Device: 4.326e+04 token/s - [x] GLM5 q=128, TP=2, compile, SP + DFC, W8A8_DYNAMIC, 1 layer - SP log: SP ordered rewrites: 0 P1, 0 P2 matches; SP ordered rewrites: 0 P3 matches - Result: [analytic] Execution time: 0.001284 s, TPS/Device: 4.984e+04 token/s - [x] GLM5 q=128, TP=2, compile, DFC on, SP off, W8A8_DYNAMIC, 1 layer - Result: [analytic] Execution time: 0.001284 s, TPS/Device: 4.984e+04 token/s - [x] GLM5 q=5120, TP=2, compile, SP, W8A8_DYNAMIC, 1 layer - SP log: SP ordered rewrites: 0 P1, 0 P2 matches; SP ordered rewrites: 0 P3 matches - Result: [analytic] Execution time: 0.009156 s, TPS/Device: 2.796e+05 token/s - [x] GLM5 q=5120, TP=2, compile, SP off, W8A8_DYNAMIC, 1 layer - Result: [analytic] Execution time: 0.009156 s, TPS/Device: 2.796e+05 token/s - [x] GLM5 q=5120, TP=1, compile, SP on, W8A8_DYNAMIC, 1 layer - Result: [analytic] Execution time: 0.015624 s, TPS/Device: 1.639e+05 token/s - [x] GLM5 q=5120, TP=2, compile off, SP on, W8A8_DYNAMIC, 1 layer - Result: [analytic] Execution time: 0.013055 s, TPS/Device: 1.961e+05 token/s - [x] Qwen3-32B q=128, TP=2, compile, SP, 1 layer - SP log: SP ordered rewrites: 1 P1, 1 P2 matches; SP ordered rewrites: 1 P3 matches - Result: [analytic] Execution time: 0.001414 s, TPS/Device: 4.527e+04 token/s ### Qwen3 Prefill Performance Non-Regression - [x] Qwen3-32B prefill, query-length=4112, TP=16, compile, profiling model, SP on - Command uses profiling database: tensor_cast/performance_model/profiling_database/data/ATLAS_800_A3_752T_128G_DIE/vllm_ascend/vllm0.18.0_torch2.9.0_cann8.5 - Mapping kept as expected: - tensor_cast.reduce_scatter.default->hcom_reduceScatter_ (x2) - tensor_cast.all_gather.default->hcom_allGather_ (x2) - Coverage: Simulated Latency Coverage: 99.4% (2.474ms / 2.490ms) - Metrics JSON: - M1 raw op count HR: 96.43% - M2 fused op HR: 92.00% - M3 fused op HR excluding zero-cost: 81.82% - M4 per-shape HR: 89.47% - M5 simulated latency coverage: 99.38% - Result: [empirical] Execution time: 0.154645 s ------ ## ✅ Checklist / 检查列表 - [x] Linting tools used / 使用 lintrunner 工具 - [x] Bug fixes covered by unit tests / 修复的 Bug 已由单元测试覆盖 - [x] Modification covered by unit tests / 修改已由单元测试覆盖 - [ ] Documentation updated / 文档已更新 - [x] No Chinese comments in code files / 代码文件中不含中文注释 See merge request: Ascend/msmodeling!326 | 13 天前 |
| 文件 | 最后提交记录 | 最后更新时间 |
|---|---|---|
| 9 个月前 | ||
| 1 个月前 | ||
| 1 个月前 | ||
| 21 天前 | ||
| 1 个月前 | ||
| 1 个月前 | ||
| 1 个月前 | ||
| 13 天前 |