| 文件 | 最后提交记录 | 最后更新时间 |
|---|---|---|
【bugfix】Fix DFC quant fusion residuals by internalizing activation quant args Co-authored-by: lutean<lutean1@huawei.com> # message auto-generated for no-merge-commit merge: !191 merge develop into develop 【bugfix】Fix DFC quant fusion residuals by internalizing activation quant args Created-by: lutean Commit-by: lutean Merged-by: ascend-robot Description: Thanks for your contribution; we appreciate it a lot. The following instructions will make your pull request healthier and help you get feedback more easily. If you do not understand some items, don't worry, just make the pull request and seek help from maintainers. 感谢您的贡献,我们非常重视。以下说明将使您的拉取请求更健康,更易于获得反馈。如果您不理解某些项目,请不要担心,只需提交拉取请求并从维护人员那里寻求帮助即可。 **PR Type / PR类型** - [ ] Feature(功能新增) - [x] Bugfix(Bug 修复) - [ ] Docs(文档更新) - [ ] CI/CD(持续集成/持续部署) - [ ] Refactor(代码重构) - [ ] Perf(性能优化) - [ ] Test-Cases(测试用例更新) - [ ] Other(其他) ## 🔍 Motivation / 变更动机 **Please describe the motivation of this PR and the goal you want to achieve through this PR.** **请描述您的拉取请求的动机和您希望通过此拉取请求实现的目标。** 错误现象:在动态量化下,DFC融合pass生效后,会残留init_routing、all to all、gmm_quant_swiglu,导致性能精度下降 错误原因:当前在 quant case 里做的是“结构替换”,但参数接口设计还保留了对原始中间激活量化节点的依赖,导致grouped_matmul_quant_swiglu_default 这条链因为还在给 fused op 产生活跃输入,无法被 eliminate_dead_code() 删除,所以图上留下了 grouped_matmul_quant_swiglu_default 活节点。 ------ ## 📝 Modification / 修改内容 **Please briefly describe what modification is made in this PR.** **请简要描述此拉取请求中进行的修改。** 1、修改 dispatch_ffn_combine_quant / dispatch_ffn_combine_quant_int4 的算子语义,不再把 gmm2_x_scale/gmm2_x_offset 作为外部输入,fused op 内部自己完成。 2、修改 pass 的 grouped quant 取参逻辑:在 grouped case 下,不再直接用 gmm_plain_node.args[1:],而是只提取 gmm2 的静态权重侧参数,不再把 gmm2_x_scale/gmm2_x_offset 从图里带进去。 3、同步更新 meta op / estimator 签名:tensor_cast/ops/fused_moe.py 和 tensor_cast/performance_model/__init__.py 里 dispatch_ffn_combine_quant / quant_int4 的参数列表 ------ ## 📐 Associated Test Results / 关联测试结果 **Please provide the related test results, such as test reports, etc.** **请提供相关测试结果,例如测试报告等。** 修复前:  修复后:  ------ ## 🌟 Use cases (Optional) / 使用案例(可选) **If this PR introduces a new feature, it is better to list some use cases here and update the documentation.** **如果此拉取请求引入了新功能,最好在此处列出一些用例并更新文档。** ------ ## ✅ Checklist / 检查列表 **Before PR**: - [ ] [Linting tools](https://gitcode.com/Ascend/msmodeling/blob/develop/tensor_cast/README.md#coding-style) are used to fix the potential lint issues. / 使用 [lintrunner 工具](https://gitcode.com/Ascend/msmodeling/blob/develop/tensor_cast/README.md#coding-style) 来修复潜在的 lint 问题。 - [ ] Bug fixes are fully covered by unit tests, the case that causes the bug should be added in the unit tests. / 修复的 Bug 已完全由单元测试覆盖,导致 Bug 的情况应在单元测试中添加。 - [ ] The modification is covered by complete unit tests. If not, please add more unit tests to ensure the correctness. / 此拉取请求中的修改已完全由单元测试覆盖。如果不是,请添加更多单元测试以确保正确性。 - [ ] All relevant documentation (API docs, docstrings, example tutorials) has been updated to reflect these changes. / 所有相关文档(API 文档、文档字符串、示例教程)已更新以反映这些更改。 - [ ] Please ensure code files contain no Chinese comments. / 请保证代码文件中不含中文注释。 ------ See merge request: Ascend/msmodeling!191 | 1 个月前 | |
fix(tensor_cast): guard unsafe sequence parallel rewrites Co-authored-by: Horacehxw<horacehxw@gmail.com> # message auto-generated for no-merge-commit merge: !326 merge issue-90-v2 into develop fix(tensor_cast): guard unsafe sequence parallel rewrites Created-by: Horacehxw Commit-by: Horacehxw Merged-by: ascend-robot Description: **PR Type / PR类型** - [ ] Feature(功能新增) - [x] Bugfix(Bug 修复) - [ ] Docs(文档更新) - [ ] Refactor(代码重构) - [x] Test-Cases(测试用例更新) - [ ] Other(其他) ## 🔍 Motivation / 变更动机 Fix GitCode issue #90 root cause in TensorCast Sequence Parallel. GLM5 TP>1 compile graphs can match all_reduce -> add_rms_norm2 while the residual side is still full-shape. Rewriting only the all-reduce side to reduce_scatter creates mixed full/local inputs and may later double-expand the sequence dimension through all_gather, causing compile-time reshape failures such as shape '[1, 128, 6144]' is invalid for input of size 1572864. The pass also lacked a shardability precheck, so non-divisible sequence lengths such as query_length=127, TP=2 reached the fake reduce_scatter exact-division assertion. This PR is intentionally scoped to the TensorCast SP pass root cause and focused regression tests. Throughput optimizer / ServingCast CLI exposure is not included in this PR. ------ ## 📝 Modification / 修改内容 Final diff only changes 2 files: - tensor_cast/compilation/passes/sequence_parallel_pass.py - tests/regression/tensor_cast/test_sp_pass_unit.py Main changes: - Add shape/provenance helpers for SP-local values and expected reduce_scatter output shape. - Guard P2 so add_rms_norm2 rewrites only happen when the non-communication input is proven SP-local and shape-compatible with the reduce_scatter result. - Mark successfully localized add_rms_norm2 nodes with tensor_cast_sp_local. - Allow P3 to consume only residuals from add_rms_norm2 nodes already localized by P2. - Add a shardability check before SP rewrites to skip non-divisible shard dimensions instead of reaching exact_division assertions. - Share reduce-scatter insertion logic across P1/P2/P3, including view repair for 2-D/3-D metadata mismatches. - Add focused regression coverage for unsafe GLM5-style P2 residuals, all-gathered full residuals, P3 tail skips, non-divisible shard dimensions, and existing local Qwen3-style SP paths. ------ ## 📐 Associated Test Results / 关联测试结果 Environment timestamp: 2026-06-11 12:52 +08, local worktree issue-90-v2, commit e29c4b49fd697d0472974ddc91aa0b40f0737a97. ### UT / Regression - [x] python -m pytest tests/regression/tensor_cast/test_sp_pass_unit.py -q - Result: 32 passed in 0.06s - [x] python -m pytest tests/regression/tensor_cast/test_sequence_parallel_pass.py -q -m nightly - Result: 2 passed in 2.48s - [x] python -m pytest tests/regression/tensor_cast -q - Result: 605 passed, 125 deselected, 13 warnings, 150 subtests passed in 310.83s (0:05:10) - [x] python -m py_compile tensor_cast/compilation/passes/sequence_parallel_pass.py tests/regression/tensor_cast/test_sp_pass_unit.py - Result: passed - [x] python -m ruff check tensor_cast/compilation/passes/sequence_parallel_pass.py tests/regression/tensor_cast/test_sp_pass_unit.py - Result: All checks passed! - [x] git diff --check gitcode-ascend/develop...HEAD - Result: passed ### Issue #90 Direct Repro And Bad Cases All commands below exited with code 0. Log scan found no old failure signatures: no BackendCompilerFailed, no old invalid reshape, no AssertionError, no RuntimeError, no TypeError: cannot pickle. - [x] GLM5 q=128, TP=2, compile, SP, W8A8_DYNAMIC, 1 layer - Command: python -X utf8 -m tensor_cast.scripts.text_generate zai-org/GLM-5 --device ATLAS_800_A3_752T_128G_DIE --num-queries 16 --query-length 128 --context-length 0 --compile --world-size 32 --tp-size 2 --dp-size 16 --ep-size 32 --moe-tp-size 1 --moe-dp-size 1 --quantize-linear-action W8A8_DYNAMIC --quantize-attention-action DISABLED --enable-sequence-parallel --num-hidden-layers-override 1 --log-level debug - SP log: SP ordered rewrites: 0 P1, 0 P2 matches; SP ordered rewrites: 0 P3 matches - Result: [analytic] Execution time: 0.001284 s, TPS/Device: 4.984e+04 token/s - [x] GLM5 q=128, TP=2, compile, SP, linear quant disabled, 1 layer - SP log: SP ordered rewrites: 0 P1, 0 P2 matches; SP ordered rewrites: 0 P3 matches - Result: [analytic] Execution time: 0.001468 s, TPS/Device: 4.359e+04 token/s - [x] GLM5 q=127, TP=2, compile, SP, linear quant disabled, 1 layer - SP log: SP pass: skipping because shard dimension is not divisible by 2 - Result: [analytic] Execution time: 0.001468 s, TPS/Device: 4.326e+04 token/s - [x] GLM5 q=128, TP=2, compile, SP + DFC, W8A8_DYNAMIC, 1 layer - SP log: SP ordered rewrites: 0 P1, 0 P2 matches; SP ordered rewrites: 0 P3 matches - Result: [analytic] Execution time: 0.001284 s, TPS/Device: 4.984e+04 token/s - [x] GLM5 q=128, TP=2, compile, DFC on, SP off, W8A8_DYNAMIC, 1 layer - Result: [analytic] Execution time: 0.001284 s, TPS/Device: 4.984e+04 token/s - [x] GLM5 q=5120, TP=2, compile, SP, W8A8_DYNAMIC, 1 layer - SP log: SP ordered rewrites: 0 P1, 0 P2 matches; SP ordered rewrites: 0 P3 matches - Result: [analytic] Execution time: 0.009156 s, TPS/Device: 2.796e+05 token/s - [x] GLM5 q=5120, TP=2, compile, SP off, W8A8_DYNAMIC, 1 layer - Result: [analytic] Execution time: 0.009156 s, TPS/Device: 2.796e+05 token/s - [x] GLM5 q=5120, TP=1, compile, SP on, W8A8_DYNAMIC, 1 layer - Result: [analytic] Execution time: 0.015624 s, TPS/Device: 1.639e+05 token/s - [x] GLM5 q=5120, TP=2, compile off, SP on, W8A8_DYNAMIC, 1 layer - Result: [analytic] Execution time: 0.013055 s, TPS/Device: 1.961e+05 token/s - [x] Qwen3-32B q=128, TP=2, compile, SP, 1 layer - SP log: SP ordered rewrites: 1 P1, 1 P2 matches; SP ordered rewrites: 1 P3 matches - Result: [analytic] Execution time: 0.001414 s, TPS/Device: 4.527e+04 token/s ### Qwen3 Prefill Performance Non-Regression - [x] Qwen3-32B prefill, query-length=4112, TP=16, compile, profiling model, SP on - Command uses profiling database: tensor_cast/performance_model/profiling_database/data/ATLAS_800_A3_752T_128G_DIE/vllm_ascend/vllm0.18.0_torch2.9.0_cann8.5 - Mapping kept as expected: - tensor_cast.reduce_scatter.default->hcom_reduceScatter_ (x2) - tensor_cast.all_gather.default->hcom_allGather_ (x2) - Coverage: Simulated Latency Coverage: 99.4% (2.474ms / 2.490ms) - Metrics JSON: - M1 raw op count HR: 96.43% - M2 fused op HR: 92.00% - M3 fused op HR excluding zero-cost: 81.82% - M4 per-shape HR: 89.47% - M5 simulated latency coverage: 99.38% - Result: [empirical] Execution time: 0.154645 s ------ ## ✅ Checklist / 检查列表 - [x] Linting tools used / 使用 lintrunner 工具 - [x] Bug fixes covered by unit tests / 修复的 Bug 已由单元测试覆盖 - [x] Modification covered by unit tests / 修改已由单元测试覆盖 - [ ] Documentation updated / 文档已更新 - [x] No Chinese comments in code files / 代码文件中不含中文注释 See merge request: Ascend/msmodeling!326 | 13 天前 | |
chore(ci): adopt pre-commit and retire legacy lintrunner adapters Co-authored-by: liujiawang<anonymousdev@163.com> # message auto-generated for no-merge-commit merge: !176 merge pre-commit into develop chore(ci): adopt pre-commit and retire legacy lintrunner adapters Created-by: AvadaKedavrua Commit-by: liujiawang;AvadaKedavrua Merged-by: ascend-robot Description: Thanks for your contribution; we appreciate it a lot. The following instructions will make your pull request healthier and help you get feedback more easily. If you do not understand some items, don't worry, just make the pull request and seek help from maintainers. 感谢您的贡献,我们非常重视。以下说明将使您的拉取请求更健康,更易于获得反馈。如果您不理解某些项目,请不要担心,只需提交拉取请求并从维护人员那里寻求帮助即可。 **PR Type / PR类型** - [ ] Feature(功能新增) - [ ] Bugfix(Bug 修复) - [x] Docs(文档更新) - [x] CI/CD(持续集成/持续部署) - [ ] Refactor(代码重构) - [ ] Perf(性能优化) - [ ] Test-Cases(测试用例更新) - [ ] Other(其他) ------ ## Motivation / 变更动机 Continue the **pre-commit** migration: tighten **Pylint** so only high-signal messages run ( disable=all + explicit enable list), fix real issues that remained under that profile, and translate hook/config comments to **English**. ------ ## Configuration changes(仅工具与注释 / tooling & comments only) | Path | What changed | |------|----------------| | pre-commit/pyproject.toml | **Pylint:** [tool.pylint."messages control"] with disable = ["all"] and a short **allowlist** of message IDs (E0100, E0601–E0611, E0632, E1101, E1120, W0632, W1514). **Ruff:** unchanged behavior; comments translated to English. **Bandit:** comments translated; rule allowlist/skip lists unchanged. | | .pre-commit-config.yaml | Comments translated to English; Bandit hook display name set to **bandit (Python security checks)**. Hook versions and args unchanged except for comment text. | ------ ## Source code changes(应用代码 / application code) | Area | Files | Purpose | |------|--------|---------| | serving_cast | communication.py, engine.py, instance.py, kv_cache_manager.py, load_gen.py, main.py, model_runner.py, request.py, serving.py, utils.py | Replace from . import stime with import serving_cast.stime as stime so Pylint resolves imports (fixes **E0611**). | | serving_cast | stime.py | Singleton **salabim** Environment via _get_sim_env() so type checkers/Pylint see **sim.Environment** (fixes **E1101** on SimulationEnv). | | serving_cast/service | base_throughput_optimizer.py | __init__ defaults + assert runner is not None before run_inference (fixes **E1101** on base class). | | tensor_cast | diffusers/diffusers_model.py, diffusers/diffusers_utils.py, runtime.py | Add **encoding="utf-8"** to open() / trace export (fixes **W1514**). | | web_ui | callbacks.py | **refresh_optimizer_detail:** call _optimizer_detail_view(rows, None, device) and unpack five return values (fixes **E1120**). | ------ ## Recent commits on pre-commit branch - ci(pre-commit): fix pylint message selection with disable=all - fix: resolve pylint findings in serving_cast, tensor_cast, and web_ui - docs(pre-commit): translate comments to English and add all-files run log ------  ------ ## Checklist / 检查列表 - [x] Please ensure code files contain no Chinese comments. / 请保证代码文件中不含中文注释。 See merge request: Ascend/msmodeling!176 | 1 个月前 | |
feat(multistream): add compile-time multistream scheduling (core only) Co-authored-by: Kudo__shinichi<liuning119@huawei.com> # message auto-generated for no-merge-commit merge: !117 merge feat/multistream-design into develop feat(multistream): add compile-time multistream scheduling (core only) Created-by: Kudo__shinichi Commit-by: Kudo__shinichi Merged-by: ascend-robot Description: **PR Type / PR类型** - [x] Feature(功能新增) - [ ] Bugfix(Bug 修复) - [ ] Docs(文档更新) - [ ] CI/CD(持续集成/持续部署) - [ ] Refactor(代码重构) - [x] Perf(性能优化) - [x] Test-Cases(测试用例更新) - [ ] Other(其他) ## 🔍 Motivation / 变更动机 当前 torch.compile 路径中缺少通用的多流调度能力。通信与计算的重叠主要依赖少量已有融合算子的局部建模,无法对 FX 图中的普通 compute / collective 节点做统一的 compile-time 调度。 本 PR 的目标是: 1. 在 torch.compile 路径中引入可控的多流调度能力; 2. 在存在通信与计算重叠窗口的场景下缩短关键路径; 3. 在预测无收益时通过收益守卫自动回退,保持原有单流行为不变; 4. 保持实现简洁,尽量复用现有 compile / runtime / performance model 基础能力; 5. 修复多流控制锚点参与 memory tracking 时导致 activation memory 统计失真的问题。 ## 📝 Modification / 修改内容 本 PR 主要包含以下改动: 1. tensor_cast/config.py - 增加 multistream 配置项; - 支持基于 role 的 stream 映射; - 保留旧字段兼容; - 去除 pass-local 硬编码带宽默认值,调度成本优先使用 analytic performance model 和 device profile 信息。 2. tensor_cast/core/model_builder.py - 在构建 compile backend 时传入当前 device 信息; - 使 multistream pass 能够基于当前设备 profile 做 cost estimation。 3. tensor_cast/compilation/compile_backend.py - 在 compile rewrite 流程中接入 multistream pass; - 按 reviewer 建议,将 multistream pass 放在 decompose_auto_functionalized_pass 之前执行; - 原因是 multistream pass 内部会调用 DCE,需要在 pure-functional graph 上运行,避免 defunctionalization 后的 mutation-style graph 影响语义正确性。 4. tensor_cast/compilation/passes/multistream_pass.py - 引入 compile-time multistream schedule pass; - 将节点按执行资源划分为 COMM_ONLY、HYBRID、COMPUTE; - all_reduce / all_gather / reduce_scatter / all_to_all 等 collective 节点建模为通信节点; - matmul_all_reduce / static_quant_linear_all_reduce 等融合节点建模为 hybrid 节点; - 通过 _internal_wait_and_bind / _internal_record 完成 lowering; - 增加收益守卫,仅当预测多流 makespan 优于单流 baseline 时才应用改写; - 非 OpOverload helper 节点不进入 analytic cost estimation,避免 operator.getitem 等 helper 被错误当作设备算子建模。 5. tensor_cast/runtime.py - 增加多流运行事件中的 stream / dependency token 记录; - memory tracker 按多流依赖感知顺序回放事件,更准确地反映多流下 activation lifetime 延长; - 多流内部 anchor op 不作为模型 activation 参与显存统计,避免控制锚点放大 memory 结果。 6. tests - 增加 multistream pass 基础覆盖; - 增加 runtime critical path 和 anchor memory 相关覆盖; - 覆盖收益守卫、anchor lowering、helper node 处理和多流 memory accounting 等关键行为。 ## 📐 Associated Test Results / 关联测试结果 单流示例 python -m tensor_cast.scripts.text_generate deepseek-ai/DeepSeek-V3.1 --device ATLAS_800_A3_560T_128G_DIE --num-queries 64 --query-length 1 --context-length 1024 --world-size 16 --tp-size 8 --dp-size 2 --moe-tp-size 4 --moe-dp-size 1 --ep-size 4 --decode --compile --compile-allow-graph-break --disable-repetition --num-hidden-layers-override 4 --quantize-attention-action INT8 --chrome-trace trace_ds_single_l4_q64_ctx1024.json --log-level info  多流示例 python -m tensor_cast.scripts.text_generate deepseek-ai/DeepSeek-V3.1 --device ATLAS_800_A3_560T_128G_DIE --num-queries 64 --query-length 1 --context-length 1024 --world-size 16 --tp-size 8 --dp-size 2 --moe-tp-size 4 --moe-dp-size 1 --ep-size 4 --decode --compile --compile-allow-graph-break --disable-repetition --num-hidden-layers-override 4 --quantize-attention-action INT8 --chrome-trace trace_ds_multi_l4_q64_ctx1024_current.json --log-level info  关键结果: | 场景 | Total time for analytic | Execution time | TPS/Device | 说明 | |---|---:|---:|---:|---| | 单流 | 20.729ms | 0.020729 s | 193 token/s | baseline | | 多流 | 20.687ms | 0.019750 s | 202.5 token/s | multistream enabled | 性能对比: - 多流场景下,Execution time 从 0.020729 s 降低到 0.019750 s,时延下降约 4.72%。 - TPS/Device 从 193 token/s 提升到 202.5 token/s,提升约 4.92%。 ------ ## 🌟 Use cases (Optional) / 使用案例(可选) 适合当前版本多流收益验证的场景: 1. 通信占比较高的 decode 场景; 2. TP/EP collective 较多、存在独立 compute/comm 重叠窗口的场景; 3. 希望在 compile 侧进行保守调度尝试,并要求无收益时自动回退的场景。 当前版本的已知边界: 1. dense / memory-bound 场景下,多流可能因收益守卫直接跳过; 2. HYBRID 融合算子当前仍按主流黑盒节点建模,后续仍有进一步细化空间。 ------ ## ✅ Checklist / 检查列表 **Before PR**: - [x] Linting tools are used to fix the potential lint issues. - [x] Bug fixes are fully covered by unit tests, the case that causes the bug should be added in the unit tests. - [x] The modification is covered by complete unit tests. If not, please add more unit tests to ensure the correctness. - [ ] All relevant documentation (API docs, docstrings, example tutorials) has been updated to reflect these changes. - [x] Please ensure code files contain no Chinese comments. See merge request: Ascend/msmodeling!117 | 1 个月前 | |
chore(ci): adopt pre-commit and retire legacy lintrunner adapters Co-authored-by: liujiawang<anonymousdev@163.com> # message auto-generated for no-merge-commit merge: !176 merge pre-commit into develop chore(ci): adopt pre-commit and retire legacy lintrunner adapters Created-by: AvadaKedavrua Commit-by: liujiawang;AvadaKedavrua Merged-by: ascend-robot Description: Thanks for your contribution; we appreciate it a lot. The following instructions will make your pull request healthier and help you get feedback more easily. If you do not understand some items, don't worry, just make the pull request and seek help from maintainers. 感谢您的贡献,我们非常重视。以下说明将使您的拉取请求更健康,更易于获得反馈。如果您不理解某些项目,请不要担心,只需提交拉取请求并从维护人员那里寻求帮助即可。 **PR Type / PR类型** - [ ] Feature(功能新增) - [ ] Bugfix(Bug 修复) - [x] Docs(文档更新) - [x] CI/CD(持续集成/持续部署) - [ ] Refactor(代码重构) - [ ] Perf(性能优化) - [ ] Test-Cases(测试用例更新) - [ ] Other(其他) ------ ## Motivation / 变更动机 Continue the **pre-commit** migration: tighten **Pylint** so only high-signal messages run ( disable=all + explicit enable list), fix real issues that remained under that profile, and translate hook/config comments to **English**. ------ ## Configuration changes(仅工具与注释 / tooling & comments only) | Path | What changed | |------|----------------| | pre-commit/pyproject.toml | **Pylint:** [tool.pylint."messages control"] with disable = ["all"] and a short **allowlist** of message IDs (E0100, E0601–E0611, E0632, E1101, E1120, W0632, W1514). **Ruff:** unchanged behavior; comments translated to English. **Bandit:** comments translated; rule allowlist/skip lists unchanged. | | .pre-commit-config.yaml | Comments translated to English; Bandit hook display name set to **bandit (Python security checks)**. Hook versions and args unchanged except for comment text. | ------ ## Source code changes(应用代码 / application code) | Area | Files | Purpose | |------|--------|---------| | serving_cast | communication.py, engine.py, instance.py, kv_cache_manager.py, load_gen.py, main.py, model_runner.py, request.py, serving.py, utils.py | Replace from . import stime with import serving_cast.stime as stime so Pylint resolves imports (fixes **E0611**). | | serving_cast | stime.py | Singleton **salabim** Environment via _get_sim_env() so type checkers/Pylint see **sim.Environment** (fixes **E1101** on SimulationEnv). | | serving_cast/service | base_throughput_optimizer.py | __init__ defaults + assert runner is not None before run_inference (fixes **E1101** on base class). | | tensor_cast | diffusers/diffusers_model.py, diffusers/diffusers_utils.py, runtime.py | Add **encoding="utf-8"** to open() / trace export (fixes **W1514**). | | web_ui | callbacks.py | **refresh_optimizer_detail:** call _optimizer_detail_view(rows, None, device) and unpack five return values (fixes **E1120**). | ------ ## Recent commits on pre-commit branch - ci(pre-commit): fix pylint message selection with disable=all - fix: resolve pylint findings in serving_cast, tensor_cast, and web_ui - docs(pre-commit): translate comments to English and add all-files run log ------  ------ ## Checklist / 检查列表 - [x] Please ensure code files contain no Chinese comments. / 请保证代码文件中不含中文注释。 See merge request: Ascend/msmodeling!176 | 1 个月前 | |
实测算子接入 Co-authored-by: ttcool<xujintao8@h-partners.com> # message auto-generated for no-merge-commit merge: !96 merge develop into develop 实测算子接入 Created-by: tt0cool Commit-by: tt0cool;ttcool Merged-by: ascend-robot Description: Thanks for your contribution; we appreciate it a lot. The following instructions will make your pull request healthier and help you get feedback more easily. If you do not understand some items, don't worry, just make the pull request and seek help from maintainers. 感谢您的贡献,我们非常重视。以下说明将使您的拉取请求更健康,更易于获得反馈。如果您不理解某些项目,请不要担心,只需提交拉取请求并从维护人员那里寻求帮助即可。 **PR Type / PR类型** - [x] Feature(功能新增) - [ ] Bugfix(Bug 修复) - [ ] Docs(文档更新) - [ ] CI/CD(持续集成/持续部署) - [ ] Refactor(代码重构) - [ ] Perf(性能优化) - [ ] Test-Cases(测试用例更新) - [ ] Other(其他) ## 🔍 Motivation / 变更动机 **Please describe the motivation of this PR and the goal you want to achieve through this PR.** **请描述您的拉取请求的动机和您希望通过此拉取请求实现的目标。** 新增实测算子接入功能 ------ ## 📝 Modification / 修改内容 **Please briefly describe what modification is made in this PR.** **请简要描述此拉取请求中进行的修改。** 扩展empirical model,新增datasource接口,完成cli相关配置 ------ ## 📐 Associated Test Results / 关联测试结果 **Please provide the related test results, such as test reports, etc.** **请提供相关测试结果,例如测试报告等。**  ------ ## 🌟 Use cases (Optional) / 使用案例(可选) **If this PR introduces a new feature, it is better to list some use cases here and update the documentation.** **如果此拉取请求引入了新功能,最好在此处列出一些用例并更新文档。** ------ ## ✅ Checklist / 检查列表 **Before PR**: - [x] [Linting tools](https://gitcode.com/Ascend/msmodeling/blob/develop/tensor_cast/README.md#coding-style) are used to fix the potential lint issues. / 使用 [lintrunner 工具](https://gitcode.com/Ascend/msmodeling/blob/develop/tensor_cast/README.md#coding-style) 来修复潜在的 lint 问题。 - [x] Bug fixes are fully covered by unit tests, the case that causes the bug should be added in the unit tests. / 修复的 Bug 已完全由单元测试覆盖,导致 Bug 的情况应在单元测试中添加。 - [x] The modification is covered by complete unit tests. If not, please add more unit tests to ensure the correctness. / 此拉取请求中的修改已完全由单元测试覆盖。如果不是,请添加更多单元测试以确保正确性。 - [x] All relevant documentation (API docs, docstrings, example tutorials) has been updated to reflect these changes. / 所有相关文档(API 文档、文档字符串、示例教程)已更新以反映这些更改。 - [x] Please ensure code files contain no Chinese comments. / 请保证代码文件中不含中文注释。 ------ See merge request: Ascend/msmodeling!96 | 3 个月前 | |
【FIX】【TEST】修复 README/文档失效链接并默认运行完整 benchmark 套件 Co-authored-by: liujiawang<anonymousdev@163.com> # message auto-generated for no-merge-commit merge: !331 merge fix into develop 【FIX】【TEST】修复 README/文档失效链接并默认运行完整 benchmark 套件 Created-by: AvadaKedavrua Commit-by: liujiawang Merged-by: ascend-robot Description: ## 修改原因 1. README.md 社区区公众号二维码指向 msinsight 仓库旧路径,资源已 404,用户扫码/预览失败。 2. OP_PLUGIN_MAPPING_TUTORIAL.md 中 Op Mapping skill 相对路径错误,文档内链接跳转失败。 3. benchmark 入口默认只跑 tests/benchmark/ops/,tests/benchmark/models/ 模型回归被静默跳过,CI/nightly 覆盖不足。 4. 全量 benchmark 启用后,qwen3-30b-a3b decode/prefill baseline 与当前 compile 输出不一致,需刷新。 --- ## 修改内容 | 类别 | 文件 | 变更 | |------|------|------| | 文档链接 | README.md | 公众号图片 URL 换为可用 user-images 资源;TOC 补全 Contributions / Community 等章节锚点 | | 文档链接 | docs/perf_database/tutorial/OP_PLUGIN_MAPPING_TUTORIAL.md | skill 路径 ../skills/... → ../../../.agents/skills/op-mapping/SKILL.md | | benchmark 默认行为 | scripts/run_benchmark.sh、scripts/helpers/nightly/main.py | 移除 MSMODELING_BENCHMARK_MODELS 开关,固定跑 tests/benchmark/ 全目录 | | 设计文档 | docs/design/ut_refactor.md | 同步 benchmark phase 描述 | | baseline | tests/benchmark/models/cases/qwen3-30b-a3b-{decode,prefill}.json | 刷新 baseline_time_s 与 operator top-N | | lint | experimental/optix/、scripts/、tensor_cast/、tests/ 等 | 为 inspect.* 误报补 pylint: disable 注释 | --- ## 自验证 ### README 公众号图片链接 目的:确认旧链接 404、新链接可访问。 步骤: 1. 检查旧 URL HTTP 状态 2. 检查新 URL HTTP 状态 bash curl -sI "https://raw.gitcode.com/Ascend/msinsight/raw/master/docs/zh/user_guide/figures/readme/officialAccount.jpg" | head -1 curl -sI "https://raw.gitcode.com/user-images/assets/8428112/2a22a707-de26-4bb3-b312-4952035e021b/30be980e7fd65b2486d251b48a7999f3.jpg" | head -1 结果: text HTTP/1.1 404 Not Found HTTP/1.1 200 OK ### Op Mapping skill 文档路径 目的:确认教程内链接指向真实文件。 步骤: 1. 在仓库根目录检查 skill 文件是否存在 bash test -f .agents/skills/op-mapping/SKILL.md && echo OK 结果: text OK ### Benchmark 入口默认全量 目的:确认 run_benchmark.sh 不再依赖 MSMODELING_BENCHMARK_MODELS,默认覆盖 models 子目录。 步骤: 1. 查看脚本 benchmark target 配置 bash grep -n "TESTS_BENCHMARK" scripts/run_benchmark.sh 结果: text run_pytest "${TESTS_BENCHMARK}/" \ ### CI 流水线 目的:确认改动未破坏现有 CI/docs CI。 步骤: 1. 查看 PR #331 CI label 状态 结果:PR 已打标 ci-pipeline-passed、docs-ci-pipeline-success。 See merge request: Ascend/msmodeling!331 | 14 天前 | |
Use _ for names of the ops and compute properties functions. Always return the graph module for all graph passes. Move stable_topo_sort to its own file. Move sink_split_pass to freezing passes since it depends on graph freezing. Co-authored-by: Jiong Gong<steven.gong@gmail.com> | 6 个月前 | |
chore(ci): adopt pre-commit and retire legacy lintrunner adapters Co-authored-by: liujiawang<anonymousdev@163.com> # message auto-generated for no-merge-commit merge: !176 merge pre-commit into develop chore(ci): adopt pre-commit and retire legacy lintrunner adapters Created-by: AvadaKedavrua Commit-by: liujiawang;AvadaKedavrua Merged-by: ascend-robot Description: Thanks for your contribution; we appreciate it a lot. The following instructions will make your pull request healthier and help you get feedback more easily. If you do not understand some items, don't worry, just make the pull request and seek help from maintainers. 感谢您的贡献,我们非常重视。以下说明将使您的拉取请求更健康,更易于获得反馈。如果您不理解某些项目,请不要担心,只需提交拉取请求并从维护人员那里寻求帮助即可。 **PR Type / PR类型** - [ ] Feature(功能新增) - [ ] Bugfix(Bug 修复) - [x] Docs(文档更新) - [x] CI/CD(持续集成/持续部署) - [ ] Refactor(代码重构) - [ ] Perf(性能优化) - [ ] Test-Cases(测试用例更新) - [ ] Other(其他) ------ ## Motivation / 变更动机 Continue the **pre-commit** migration: tighten **Pylint** so only high-signal messages run ( disable=all + explicit enable list), fix real issues that remained under that profile, and translate hook/config comments to **English**. ------ ## Configuration changes(仅工具与注释 / tooling & comments only) | Path | What changed | |------|----------------| | pre-commit/pyproject.toml | **Pylint:** [tool.pylint."messages control"] with disable = ["all"] and a short **allowlist** of message IDs (E0100, E0601–E0611, E0632, E1101, E1120, W0632, W1514). **Ruff:** unchanged behavior; comments translated to English. **Bandit:** comments translated; rule allowlist/skip lists unchanged. | | .pre-commit-config.yaml | Comments translated to English; Bandit hook display name set to **bandit (Python security checks)**. Hook versions and args unchanged except for comment text. | ------ ## Source code changes(应用代码 / application code) | Area | Files | Purpose | |------|--------|---------| | serving_cast | communication.py, engine.py, instance.py, kv_cache_manager.py, load_gen.py, main.py, model_runner.py, request.py, serving.py, utils.py | Replace from . import stime with import serving_cast.stime as stime so Pylint resolves imports (fixes **E0611**). | | serving_cast | stime.py | Singleton **salabim** Environment via _get_sim_env() so type checkers/Pylint see **sim.Environment** (fixes **E1101** on SimulationEnv). | | serving_cast/service | base_throughput_optimizer.py | __init__ defaults + assert runner is not None before run_inference (fixes **E1101** on base class). | | tensor_cast | diffusers/diffusers_model.py, diffusers/diffusers_utils.py, runtime.py | Add **encoding="utf-8"** to open() / trace export (fixes **W1514**). | | web_ui | callbacks.py | **refresh_optimizer_detail:** call _optimizer_detail_view(rows, None, device) and unpack five return values (fixes **E1120**). | ------ ## Recent commits on pre-commit branch - ci(pre-commit): fix pylint message selection with disable=all - fix: resolve pylint findings in serving_cast, tensor_cast, and web_ui - docs(pre-commit): translate comments to English and add all-files run log ------  ------ ## Checklist / 检查列表 - [x] Please ensure code files contain no Chinese comments. / 请保证代码文件中不含中文注释。 See merge request: Ascend/msmodeling!176 | 1 个月前 | |
Use _ for names of the ops and compute properties functions. Always return the graph module for all graph passes. Move stable_topo_sort to its own file. Move sink_split_pass to freezing passes since it depends on graph freezing. Co-authored-by: Jiong Gong<steven.gong@gmail.com> | 6 个月前 |
| 文件 | 最后提交记录 | 最后更新时间 |
|---|---|---|
| 1 个月前 | ||
| 13 天前 | ||
| 1 个月前 | ||
| 1 个月前 | ||
| 1 个月前 | ||
| 3 个月前 | ||
| 14 天前 | ||
| 6 个月前 | ||
| 1 个月前 | ||
| 6 个月前 |