msmodeling/tests/regression/tensor_cast · Ascend/MindStudio-Modeling - AtomGit

ascend-robotfix(deepseek-v4): account for context in ratio128 attention topk

文件	最后提交记录	最后更新时间
__init__.py	【同步】【非开发代码】代码从 develop 同步到 master Co-authored-by: yydyzr<liuyuncong1@huawei.com> Co-authored-by: gcw_61YBRfIt<chuzhenxing@huawei.com> Co-authored-by: 孔炳翔<1120200577@qq.com> Co-authored-by: zhengxinqian<qianzhengxin@huawei.com> Co-authored-by: hw_whx<wanghexiang7@huawei.com> Co-authored-by: jgong5<steven.gong@gmail.com> Co-authored-by: hw_whx<2952154980@qq.com> # message auto-generated for no-merge-commit merge: !330 merge master into master 【同步】【非开发代码】代码从 develop 同步到 master Created-by: AvadaKedavrua Commit-by: liujiawang;ascend-robot;AvadaKedavrua;lutean;Horacehxw;eveyin1;minghang_c;zwt__;tt0cool;elrond-g;jia_ya_nan;zhenyu_zhang;ChenHuiwen;wangshen001;Hudingyi;wendellX;Secluded_Ocean;jhon-117;yaohan404;jiangruitao;zhenghaojie;stormchasingg;panyj1993;cmh1056291129;yuyinkai1;sunguozhong;genius52;liu_jiaxu;HongMaoShuiGuai;zhengxinqian;weixin_43368449;jsez-li-bin;jgong5;wqh17101;w00609794;yydyzr;JieZhang679;sppedforcy;gcw_61YBRfIt;Jiong Gong;hw_whx;gongjiong;孔炳翔 Merged-by: ascend-robot Description: 代码从 develop 同步到 master，后续基于 master 演进，并支持打包 See merge request: Ascend/msmodeling!330	17 天前
conftest.py	【REFACTOR】重构 CI gate 与 test_map 同步基础设施 Co-authored-by: AvadaKedavrua<anonymousdev@163.com> # message auto-generated for no-merge-commit merge: !394 merge fix into master 【REFACTOR】重构 CI gate 与 test_map 同步基础设施 Created-by: AvadaKedavrua Commit-by: liujiawang;AvadaKedavrua Merged-by: ascend-robot Description: ## 修改原因 CI gate 的 diff 分类、策略校验、PR 评论、test_map 同步等逻辑耦合在 `gate_policy.py` 等单体模块中，难以独立演进与单测；`test_map` 缺少独立同步入口；`ast_utils` / `test_map_loader` / nightly 报告链路也需要与 gate 策略对齐。本 PR 仅包含 commit `82757732b6b6beb79b7083f6046e9cd9c72005f3`（`refactor`），不涉及 wheel/CLI/OptiX 变更。 --- ## test_map schema 重构：背景与收益 ### 背景旧 `test_map` 以产品源文件为顶层 key，value 为 `symbol → [test_nodes]`： `json { "tensor_cast/foo.py": { "Widget::run": ["tests/regression/.../test_x.py::test_foo"] } }` 新 `test_map` 改为以 pytest node id 为顶层 key，value 为 `source_file → [symbols]`，与 `coverage.py --cov-context=test` 的采集方向一致： `json { "tests/regression/.../test_x.py::test_foo": { "tensor_cast/foo.py": ["Widget::run"] } }` 两者存的是同一张「测试节点 ↔ (源文件, symbol)」二部图，仅主键方向相反；边数相同，语义不变。 ### 为何行数变多、体积反而更小 \| 指标 \| 旧 schema（source-oriented） \| 新 schema（node-oriented） \| \|------\|------------------------------\|----------------------------\| \| JSON 行数 \| ~7 万行 \| ~10 万行 \| \| 文件/内存占用 \| 8,355,104 B（~8.0 MB） \| 3,748,221 B（~3.6 MB，约 -55%） \| 行数不是体积的可靠 proxy。旧格式在每个 symbol 下重复存储完整 pytest node id（`tests/regression/.../test_xxx.py::Class::test_yyy`，通常 60–80 字符）；同一 test 覆盖 N 个 symbol 时，该长字符串出现 N 次。新格式每个 test node id 只作为顶层 key 出现一次，数组里存的是短 canonical symbol（如 `Widget::run`、`%`），边数 E 不变但长字符串重复次数从 O(E) 降到 O(T)（T = 有覆盖的 test 数，T ≪ E）。典型场景：大量 smoke/regression 用例通过 import 共享 `tensor_cast/`、`cli/` 等模块的 module symbol `%`——旧格式在单个 symbol 下聚合成千上万条 test id；新格式每个 test 只记一次短 symbol，整体字节数显著下降。 ### 工程收益 1. 构建零 pivot：`build_test_map.collect_from_coverage` 直接按 coverage context（test node）聚合 `by_test[nid][source].add(symbol)`，与 nightly phase1 采集路径一致。 2. 增量 sync 更自然：`sync.apply_incremental_test_map_update` 对 touched test 文件整 node 替换、对 touched product 文件按 `(test_node, source_path)` 合并，无需在两种索引间来回转换。 3. 删 test / 冗余检测更直接：`gate_deleted_tests`、`detect_redundant_cases` 按 test footprint 遍历；`test_map_loader` 强制顶层为 `tests/...::...`，可拒绝误写成 source key 的脏数据。 4. CI gate 查询无退化：`symbol → tests` 反向查询通过 `build_test_map_index` 一次 O(N) 建索引，与旧 schema 运行时等价。 5. 存储与传输更轻：实测文件体积约减半，OBS 下载与 `json.loads` 内存峰值更低。 --- ## 修改内容 ### CI gate 模块拆分 - `gate_policy.py` 瘦身：策略加载/校验迁入 `policy.py`，diff 分类迁入 `classifier.py`，GitCode PR 评论迁入 `comments.py`，test_map 查询迁入 `test_map_query.py` - 新增 `sync.py` + `scripts/run_test_map_sync.sh`：按目标分支 HEAD 维护权威 `test_map`（`--once` / `--watch`） - `diff.py` 精简，仅保留 git diff 与分支检出能力 - `main.py` / `rules.py` / `models.py` / `errors.py` 适配新模块边界 - `tests/.ci/gate_policy.yaml` 策略字段同步调整 ### 公共 helpers 增强 - `_config.py`：手写 env 解析改为 `pydantic-settings`，统一校验与错误信息 - `ast_utils.py`：扩展符号提取能力，支撑 test_map 粒度映射 - `build_test_map.py` / `test_map_loader.py`：重构收集与加载逻辑（node-oriented schema） - 新增 `test_map_report.py`：test_map 覆盖率汇总、过期豁免检测 - `coverage_symbol_check.py` / `pytest_runner.py` / `test_map_config.py` 对齐新数据结构 - `common/_logging.py`：补充 `log_env_audit` 环境审计日志 ### nightly 报告链路 - `report_builder.py` / `pytest_parser.py` / `main.py`：适配 node-oriented test_map 报告 - `report_models.py` / `feishu_notifier.py` 小幅对齐 ### symbol 校验与 exemption 漂移（新增） - last-wins canonical symbols：重复 `def foo` 仅 gate 最后一个定义；shadowed def 发非 blocking GitCode 评论 - `def _` 消歧：无 decorator → `_`；有 decorator → `_@<suffix>` - exemption 校验：`path::symbol` 整串须为 AST canonical name；coverage omit 路径禁止写入 `exemptions.sources` - exemption 漂移 blocking：PR 删/改名 product/test 文件时，未同步更新 `gate_policy.yaml` → `[ED]` 硬阻断 + GitCode 评论 - Expected/Got：Config / policy / loader 类型与值错误统一格式 - 文档：`scripts/README.md` 补充 `build.sh`、`MSMODELING_WHEEL_OUTPUT_DIR`；`tests/README.md` 补充 symbol 契约 ### 其他 - `scripts/run_ci_gate.sh`：入口参数对齐 - `scripts/prefetch_model_configs.py`：适配新 config 加载 - `tests/README.md` / `tests/SKILL.md`：文档同步 - 新增/更新回归测试：`test_classifier.py`、`test_comments.py`、`test_policy.py`、`test_sync.py`、`test_test_map_query.py`、`test_test_map_report.py` 等 --- ## 自验证 ### CI gate / test_map / nightly 回归测试目的：确认模块拆分后行为不变，新模块有单测覆盖步骤： 1. 在仓库根目录执行： `sh uv run pytest tests/regression/scripts/helpers/ci_gate/ \ tests/regression/scripts/helpers/common/test_ast_utils.py \ tests/regression/scripts/helpers/common/test_test_map_loader.py \ tests/regression/scripts/helpers/nightly/test_report_builder.py -q` 2. 检查退出码与通过数结果： `206 passed, 7 warnings in 0.43s` ### symbol 校验 / exemption 漂移回归 `sh uv run pytest tests/regression/scripts/helpers/ci_gate/ \ tests/regression/scripts/helpers/common/test_ast_utils.py \ tests/regression/scripts/helpers/common/test_test_map_loader.py \ tests/regression/scripts/helpers/test_config.py -q` 结果： `239 passed, 8 warnings in 1.05s` ### test_map 同步入口目的：确认 `run_test_map_sync.sh` 可正常拉起 `sync.py` 步骤： 1. 设置 `MSMODELING_TEST_MAP_PATH` 指向有效 test_map JSON 2. 执行 `bash scripts/run_test_map_sync.sh --once` 结果：脚本入口与 `sync.py` CLI 已随 PR 提交；完整同步需 CI/nightly 环境提供有效 test_map 文件 See merge request: Ascend/msmodeling!394	6 天前
test_adapter_automation.py	【同步】【非开发代码】代码从 develop 同步到 master Co-authored-by: yydyzr<liuyuncong1@huawei.com> Co-authored-by: gcw_61YBRfIt<chuzhenxing@huawei.com> Co-authored-by: 孔炳翔<1120200577@qq.com> Co-authored-by: zhengxinqian<qianzhengxin@huawei.com> Co-authored-by: hw_whx<wanghexiang7@huawei.com> Co-authored-by: jgong5<steven.gong@gmail.com> Co-authored-by: hw_whx<2952154980@qq.com> # message auto-generated for no-merge-commit merge: !330 merge master into master 【同步】【非开发代码】代码从 develop 同步到 master Created-by: AvadaKedavrua Commit-by: liujiawang;ascend-robot;AvadaKedavrua;lutean;Horacehxw;eveyin1;minghang_c;zwt__;tt0cool;elrond-g;jia_ya_nan;zhenyu_zhang;ChenHuiwen;wangshen001;Hudingyi;wendellX;Secluded_Ocean;jhon-117;yaohan404;jiangruitao;zhenghaojie;stormchasingg;panyj1993;cmh1056291129;yuyinkai1;sunguozhong;genius52;liu_jiaxu;HongMaoShuiGuai;zhengxinqian;weixin_43368449;jsez-li-bin;jgong5;wqh17101;w00609794;yydyzr;JieZhang679;sppedforcy;gcw_61YBRfIt;Jiong Gong;hw_whx;gongjiong;孔炳翔 Merged-by: ascend-robot Description: 代码从 develop 同步到 master，后续基于 master 演进，并支持打包 See merge request: Ascend/msmodeling!330	17 天前
test_auto_model_config.py	【同步】【非开发代码】代码从 develop 同步到 master Co-authored-by: yydyzr<liuyuncong1@huawei.com> Co-authored-by: gcw_61YBRfIt<chuzhenxing@huawei.com> Co-authored-by: 孔炳翔<1120200577@qq.com> Co-authored-by: zhengxinqian<qianzhengxin@huawei.com> Co-authored-by: hw_whx<wanghexiang7@huawei.com> Co-authored-by: jgong5<steven.gong@gmail.com> Co-authored-by: hw_whx<2952154980@qq.com> # message auto-generated for no-merge-commit merge: !330 merge master into master 【同步】【非开发代码】代码从 develop 同步到 master Created-by: AvadaKedavrua Commit-by: liujiawang;ascend-robot;AvadaKedavrua;lutean;Horacehxw;eveyin1;minghang_c;zwt__;tt0cool;elrond-g;jia_ya_nan;zhenyu_zhang;ChenHuiwen;wangshen001;Hudingyi;wendellX;Secluded_Ocean;jhon-117;yaohan404;jiangruitao;zhenghaojie;stormchasingg;panyj1993;cmh1056291129;yuyinkai1;sunguozhong;genius52;liu_jiaxu;HongMaoShuiGuai;zhengxinqian;weixin_43368449;jsez-li-bin;jgong5;wqh17101;w00609794;yydyzr;JieZhang679;sppedforcy;gcw_61YBRfIt;Jiong Gong;hw_whx;gongjiong;孔炳翔 Merged-by: ascend-robot Description: 代码从 develop 同步到 master，后续基于 master 演进，并支持打包 See merge request: Ascend/msmodeling!330	17 天前
test_auto_model_config_loader.py	feat(diffusers): support remote repo config autoload - add shared config-only Hugging Face and ModelScope snapshot helpers - resolve remote Diffusers repo ids and explicit snapshot subfolders before loading configs - expose video remote-source in CLI and Web UI - update RFC and offline regression coverage Signed-off-by: minghang_c <chiminghang@h-partners.com> Co-authored-by: minghang_c<chiminghang@h-partners.com> # message auto-generated for no-merge-commit merge: !356 merge diffusers-hf-autoload-master-impl into master feat(diffusers): support remote repo config autoload Created-by: minghang_c Commit-by: minghang_c Merged-by: ascend-robot Description: Thanks for your contribution; we appreciate it a lot. The following instructions will make your pull request healthier and help you get feedback more easily. If you do not understand some items, don't worry, just make the pull request and seek help from maintainers. 感谢您的贡献，我们非常重视。以下说明将使您的拉取请求更健康，更易于获得反馈。如果您不理解某些项目，请不要担心，只需提交拉取请求并从维护人员那里寻求帮助即可。 PR Type / PR类型 - [x] Feature（功能新增） - [ ] Bugfix（Bug 修复） - [x] Docs（文档更新） - [ ] CI/CD（持续集成/持续部署） - [ ] Refactor（代码重构） - [ ] Perf（性能优化） - [x] Test-Cases（测试用例更新） - [ ] Other（其他） ## 🔍 Motivation / 变更动机 `video_generate` 原先只支持本地 Diffusers 模型目录，使用公开 Hugging Face / ModelScope Diffusers repo 时需要用户手动下载并整理配置目录。本 PR 增加 Diffusers 远端 repo id 自动加载能力，使 video_generate 可以直接传入远端 repo id，并且只下载 config 文件，不下载权重文件。同时补充 ModelScope remote source、aggregate repo 子目录寻址和 Web UI 支持，使体验与 text_generate 的 remote source 设计保持一致。 ------ ## 📝 Modification / 修改内容 - 新增共享 Hub helper： - Hugging Face config-only snapshot 下载 - ModelScope config-only snapshot 下载 - ModelScope 参数兼容 `allow_patterns` / `allow_file_pattern` - snapshot 下载期间隐藏进度输出和噪音日志 - 新增 Diffusers model resolver： - 本地目录保持原行为，不访问网络 - 非本地输入按 `remote_source` 解析 Hugging Face / ModelScope repo id - 支持 `<namespace>/<repo>/<subfolder>` 格式，例如：`tencent/HunyuanVideo-1.5/transformer/720p_i2v_distilled_sparse` - Diffusers builder 接入 resolver，并将解析后的本地路径交给现有 `load_config_from_file` - `video_generate` 两个 CLI 入口新增： - `--remote-source {huggingface,modelscope}` - 更新 `model_id` help 文案 - Web UI video_generate 表单新增 remote-source 下拉框，并纳入 task params/hash - 更新 RFC：覆盖 Hugging Face、ModelScope、子目录寻址、日志行为与测试策略 - 新增/更新离线回归测试，覆盖 Hub helper、resolver、builder、CLI help、Web UI command builder/callback/frontend workflow ------ ## 📐 Associated Test Results / 关联测试结果 Focused regression: bash .venv/bin/python -m pytest \ tests/regression/tensor_cast/test_model_hub.py \ tests/regression/tensor_cast/test_diffusers_model_resolver.py \ tests/regression/tensor_cast/test_diffusers_remote_builder.py \ tests/regression/tensor_cast/test_auto_model_config_loader.py::test_modelscope_snapshot_config_only_uses_allowlist \ tests/regression/cli/test_video_generate_remote_source.py \ tests/regression/web_ui/test_command_builder.py \ tests/regression/web_ui/test_callbacks.py \ tests/regression/web_ui/test_frontend_workflows.py -q Result: `text 598 passed, 67 warnings` Whitespace check: `bash git diff --check` Result: no output. ------ ## 🌟 Use cases (Optional) / 使用案例（可选） Hugging Face 默认来源： `bash python -m cli.inference.video_generate Wan-AI/Wan2.2-T2V-A14B-Diffusers \ --device TEST_DEVICE \ --batch-size 1 \ --seq-len 128 \ --frame-num 81 \ --sample-step 1` ModelScope 来源： `bash python -m cli.inference.video_generate Wan-AI/Wan2.2-T2V-A14B-Diffusers \ --remote-source modelscope \ --device TEST_DEVICE \ --batch-size 1 \ --seq-len 128 \ --frame-num 81 \ --sample-step 1` Aggregate repo 子目录： `bash python -m cli.inference.video_generate tencent/HunyuanVideo-1.5/transformer/720p_i2v_distilled_sparse \ --device TEST_DEVICE \ --batch-size 1 \ --seq-len 128 \ --frame-num 121 \ --sample-step 1` ------ ## ✅ Checklist / 检查列表 Before PR: - [ ] Bug fixes are fully covered by unit tests, the case that causes the bug should be added in the unit tests. / 修复的 Bug 已完全由单元测试覆盖，导致 Bug 的情况应在单元测试中添加。 - [x] The modification is covered by complete unit tests. If not, please add more unit tests to ensure the correctness. / 此拉取请求中的修改已完全由单元测试覆盖。如果不是，请添加更多单元测试以确保正确性。 - [x] All relevant documentation (API docs, docstrings, example tutorials) has been updated to reflect these changes. / 所有相关文档（API 文档、文档字符串、示例教程）已更新以反映这些更改。 - [x] Please ensure code files contain no Chinese comments. / 请保证代码文件中不含中文注释。 ------ 🤖 Generated with [Claude Code](https://claude.com/claude-code) See merge request: Ascend/msmodeling!356	13 天前
test_comm_analytic.py	【同步】【非开发代码】代码从 develop 同步到 master Co-authored-by: yydyzr<liuyuncong1@huawei.com> Co-authored-by: gcw_61YBRfIt<chuzhenxing@huawei.com> Co-authored-by: 孔炳翔<1120200577@qq.com> Co-authored-by: zhengxinqian<qianzhengxin@huawei.com> Co-authored-by: hw_whx<wanghexiang7@huawei.com> Co-authored-by: jgong5<steven.gong@gmail.com> Co-authored-by: hw_whx<2952154980@qq.com> # message auto-generated for no-merge-commit merge: !330 merge master into master 【同步】【非开发代码】代码从 develop 同步到 master Created-by: AvadaKedavrua Commit-by: liujiawang;ascend-robot;AvadaKedavrua;lutean;Horacehxw;eveyin1;minghang_c;zwt__;tt0cool;elrond-g;jia_ya_nan;zhenyu_zhang;ChenHuiwen;wangshen001;Hudingyi;wendellX;Secluded_Ocean;jhon-117;yaohan404;jiangruitao;zhenghaojie;stormchasingg;panyj1993;cmh1056291129;yuyinkai1;sunguozhong;genius52;liu_jiaxu;HongMaoShuiGuai;zhengxinqian;weixin_43368449;jsez-li-bin;jgong5;wqh17101;w00609794;yydyzr;JieZhang679;sppedforcy;gcw_61YBRfIt;Jiong Gong;hw_whx;gongjiong;孔炳翔 Merged-by: ascend-robot Description: 代码从 develop 同步到 master，后续基于 master 演进，并支持打包 See merge request: Ascend/msmodeling!330	17 天前
test_common.py	【同步】【非开发代码】代码从 develop 同步到 master Co-authored-by: yydyzr<liuyuncong1@huawei.com> Co-authored-by: gcw_61YBRfIt<chuzhenxing@huawei.com> Co-authored-by: 孔炳翔<1120200577@qq.com> Co-authored-by: zhengxinqian<qianzhengxin@huawei.com> Co-authored-by: hw_whx<wanghexiang7@huawei.com> Co-authored-by: jgong5<steven.gong@gmail.com> Co-authored-by: hw_whx<2952154980@qq.com> # message auto-generated for no-merge-commit merge: !330 merge master into master 【同步】【非开发代码】代码从 develop 同步到 master Created-by: AvadaKedavrua Commit-by: liujiawang;ascend-robot;AvadaKedavrua;lutean;Horacehxw;eveyin1;minghang_c;zwt__;tt0cool;elrond-g;jia_ya_nan;zhenyu_zhang;ChenHuiwen;wangshen001;Hudingyi;wendellX;Secluded_Ocean;jhon-117;yaohan404;jiangruitao;zhenghaojie;stormchasingg;panyj1993;cmh1056291129;yuyinkai1;sunguozhong;genius52;liu_jiaxu;HongMaoShuiGuai;zhengxinqian;weixin_43368449;jsez-li-bin;jgong5;wqh17101;w00609794;yydyzr;JieZhang679;sppedforcy;gcw_61YBRfIt;Jiong Gong;hw_whx;gongjiong;孔炳翔 Merged-by: ascend-robot Description: 代码从 develop 同步到 master，后续基于 master 演进，并支持打包 See merge request: Ascend/msmodeling!330	17 天前
test_config_resolver.py	【同步】【非开发代码】代码从 develop 同步到 master Co-authored-by: yydyzr<liuyuncong1@huawei.com> Co-authored-by: gcw_61YBRfIt<chuzhenxing@huawei.com> Co-authored-by: 孔炳翔<1120200577@qq.com> Co-authored-by: zhengxinqian<qianzhengxin@huawei.com> Co-authored-by: hw_whx<wanghexiang7@huawei.com> Co-authored-by: jgong5<steven.gong@gmail.com> Co-authored-by: hw_whx<2952154980@qq.com> # message auto-generated for no-merge-commit merge: !330 merge master into master 【同步】【非开发代码】代码从 develop 同步到 master Created-by: AvadaKedavrua Commit-by: liujiawang;ascend-robot;AvadaKedavrua;lutean;Horacehxw;eveyin1;minghang_c;zwt__;tt0cool;elrond-g;jia_ya_nan;zhenyu_zhang;ChenHuiwen;wangshen001;Hudingyi;wendellX;Secluded_Ocean;jhon-117;yaohan404;jiangruitao;zhenghaojie;stormchasingg;panyj1993;cmh1056291129;yuyinkai1;sunguozhong;genius52;liu_jiaxu;HongMaoShuiGuai;zhengxinqian;weixin_43368449;jsez-li-bin;jgong5;wqh17101;w00609794;yydyzr;JieZhang679;sppedforcy;gcw_61YBRfIt;Jiong Gong;hw_whx;gongjiong;孔炳翔 Merged-by: ascend-robot Description: 代码从 develop 同步到 master，后续基于 master 演进，并支持打包 See merge request: Ascend/msmodeling!330	17 天前
test_custom_operator_modeling.py	【同步】【非开发代码】代码从 develop 同步到 master Co-authored-by: yydyzr<liuyuncong1@huawei.com> Co-authored-by: gcw_61YBRfIt<chuzhenxing@huawei.com> Co-authored-by: 孔炳翔<1120200577@qq.com> Co-authored-by: zhengxinqian<qianzhengxin@huawei.com> Co-authored-by: hw_whx<wanghexiang7@huawei.com> Co-authored-by: jgong5<steven.gong@gmail.com> Co-authored-by: hw_whx<2952154980@qq.com> # message auto-generated for no-merge-commit merge: !330 merge master into master 【同步】【非开发代码】代码从 develop 同步到 master Created-by: AvadaKedavrua Commit-by: liujiawang;ascend-robot;AvadaKedavrua;lutean;Horacehxw;eveyin1;minghang_c;zwt__;tt0cool;elrond-g;jia_ya_nan;zhenyu_zhang;ChenHuiwen;wangshen001;Hudingyi;wendellX;Secluded_Ocean;jhon-117;yaohan404;jiangruitao;zhenghaojie;stormchasingg;panyj1993;cmh1056291129;yuyinkai1;sunguozhong;genius52;liu_jiaxu;HongMaoShuiGuai;zhengxinqian;weixin_43368449;jsez-li-bin;jgong5;wqh17101;w00609794;yydyzr;JieZhang679;sppedforcy;gcw_61YBRfIt;Jiong Gong;hw_whx;gongjiong;孔炳翔 Merged-by: ascend-robot Description: 代码从 develop 同步到 master，后续基于 master 演进，并支持打包 See merge request: Ascend/msmodeling!330	17 天前
test_deepseek_v32.py	perf(tensor_cast): refine sparse attention roofline Model sparse MLA and dsa_indexer paged-cache traffic with calibrated data-movement efficiency so operator and end-to-end estimates align with GLM-5.1 profiling targets. Signed-off-by: minghang_c <chiminghang@h-partners.com> Co-authored-by: minghang_c<chiminghang@h-partners.com> # message auto-generated for no-merge-commit merge: !421 merge develop-on-upstream-master into master perf(tensor_cast): refine sparse attention roofline Created-by: minghang_c Commit-by: minghang_c Merged-by: ascend-robot Description: Thanks for your contribution; we appreciate it a lot. The following instructions will make your pull request healthier and help you get feedback more easily. If you do not understand some items, don't worry, just make the pull request and seek help from maintainers. 感谢您的贡献，我们非常重视。以下说明将使您的拉取请求更健康，更易于获得反馈。如果您不理解某些项目，请不要担心，只需提交拉取请求并从维护人员那里寻求帮助即可。 PR Type / PR类型 - [ ] Feature（功能新增） - [ ] Bugfix（Bug 修复） - [ ] Docs（文档更新） - [ ] CI/CD（持续集成/持续部署） - [ ] Refactor（代码重构） - [x] Perf（性能优化） - [x] Test-Cases（测试用例更新） - [ ] Other（其他） ## 🔍 Motivation / 变更动机 Refine TensorCast roofline modeling for sparse MLA, `dsa_indexer`, and GLM-5-series W4A8 MLA preprocessing so sparse-attention estimates better match operator profiling and end-to-end latency targets while keeping the model based on explicit data-movement and compute-efficiency assumptions. The main modeling gap is that sparse MLA KV reads and `dsa_indexer` historical-cache reads are dominated by random/paged memory access. Treating those bytes as ideal contiguous bandwidth traffic makes the analytic roofline too optimistic, especially for long-context GLM-5.1 prefill/decode scenarios. The latest GLM-5.1 W4A8 validation also showed that `mlapo_quant` needs to model packed W4 weights carefully: the tensor storage dtype is `torch.uint8`, but the logical MMA throughput should follow the INT8 compute path used by existing grouped quant matmul modeling. Otherwise the trace can report `mlapo_quant` MMA time as zero even though the op has nonzero projection MMA work. ------ ## 📝 Modification / 修改内容 - Add sparse/paged KV traffic accounting for MLA with separate decode and prefill data-movement efficiency. - Add `dsa_indexer` historical cache read efficiency modeling and separate append cache/scale write traffic. - Keep `dsa_indexer` block-table traffic covered by generic input memory accounting instead of a separate operator-specific model. - Use decode-only sparse page count for mixed prefill/decode sparse MLA batches. - Use raw sparse-index bytes in the quant/physical MLA path so physical KV/block-table/sparse-index accounting is consistent. - Tighten `dsa_indexer` helper signatures so `request_total_seq_lens` is required where the model depends on it. - Keep generic `tensor_cast.attention.default` accounting unchanged, so non-MLA attention models do not inherit sparse-attention calibration. - Extend GLM-5-series compile handling to cover both `GLM-5` and `GLM-5.1`, while excluding `GLM-5.2` because its config has meaningful indexer/long-context differences. - Refine `mlapo_quant` W4A8 modeling so packed `torch.uint8` weights use the logical INT8 MMA throughput path instead of losing MMA time in trace/statistics. - Add `mlapo`/`mlapo_quant` intermediate memory and static-cost accounting for the fused MLA preprocessing path. - Update related performance-model tests for sparse memory breakdowns and `mlapo`/`mlapo_quant` modeling behavior. ------ ## 📐 Associated Test Results / 关联测试结果 - `uvx --python .venv/bin/python pre-commit run --files tensor_cast/performance_model/__init__.py tests/regression/tensor_cast/test_runtime.py` - Passed after auto-format rerun. - `uv run --group ci --with socksio python -m unittest tests.benchmark.models.test_model_regression` - Log: `/tmp/msmodeling_model_regression_develop_after_pick.log` - `Ran 15 tests in 42.029s` - `OK` - `Total Cases: 15 \| Passed: 15 \| Failed: 0 \| No Baseline: 0` - `* All Operator Checks Passed ` - GLM-5.1 e2e validation across 10 query/context scenarios from 3.5k to 128k after the latest `mlapo_quant` W4A8 modeling update: - Log: `/tmp/msmodeling_glm51_e2e_after_user_change_rerun3.log` - `e2e_count=10` - `mean_e2e_err=28.717478%`, meeting the `≤30%` target. - Earlier GLM-5.1 sparse-attention e2e validation across the same 10 scenarios: - Log: `/tmp/msmodeling_glm51_e2e_26_1_0_latest.log` - `e2e_count=10` - `mean_e2e_err=27.678365%`, meeting the `≤30%` target. - GLM-5 e2e validation after applying the GLM-5-series compile override: - Log: `/tmp/msmodeling_glm5_e2e_with_glm5_override.log` - `e2e_count=10` - `mean_e2e_err=27.678365%`, matching the GLM-5.1 run with the same parameters. - Operator-level validation from the sparse MLA / `dsa_indexer` profiling set: - `mean_operator_err = 6.487008%` - `max_operator_err = 18.658699%` - Meets the `≤20%` target. - Issue #103 2.5K GLM-5.1 scenario: - Prefill analytic result: old roofline `182.377 ms` → new roofline `631.874 ms`; real wall `1225.849 ms`; new roofline/wall `51.55%`. - Decode analytic result: old roofline `48.685 ms` → new roofline `103.071 ms`; real wall `82.528 ms`; new roofline/wall `124.89%`. - Decode compared with kernel sum: new roofline `103.071 ms` vs kernel sum `117.158 ms`, ratio `87.97%`. ------ ## 🌟 Use cases (Optional) / 使用案例（可选） GLM-5.1 sparse attention inference latency estimation for prefill and decode scenarios from 3.5k to 128k context length. The latest e2e analytic results were validated with: `bash .venv/bin/python -m cli.inference.text_generate zai-org/GLM-5.1 \ --device ATLAS_800_A3_752T_128G_DIE \ --num-devices 16 \ --tp-size 16 \ --dp-size 1 \ --ep-size 16 \ --num-queries 1 \ --num-mtp-tokens 3 \ --compile \ --quantize-linear-action W4A8_STATIC \ --dump-input-shapes \ --context-length <context> \ --query-length <query>` \| Scenario \| Query length \| Context length \| Target latency \| Analytic latency \| Relative error \| \|---\|---:\|---:\|---:\|---:\|---:\| \| 3.5k-prefill \| 3500 \| 0 \| `1553.21 ms` \| `1010.00 ms` \| `34.9734%` \| \| 3.5k-decode \| 4 \| 3500 \| `69.90 ms` \| `44.79 ms` \| `35.9270%` \| \| 16k-prefill \| 4096 \| 12000 \| `1867.68 ms` \| `1449.00 ms` \| `22.4171%` \| \| 16k-decode \| 4 \| 16000 \| `68.10 ms` \| `47.22 ms` \| `30.6637%` \| \| 32k-prefill \| 4096 \| 28000 \| `2295.99 ms` \| `1807.00 ms` \| `21.2976%` \| \| 32k-decode \| 4 \| 32000 \| `68.70 ms` \| `47.76 ms` \| `30.4862%` \| \| 64k-prefill \| 4096 \| 60000 \| `3256.48 ms` \| `2522.00 ms` \| `22.5544%` \| \| 64k-decode \| 4 \| 64000 \| `71.70 ms` \| `49.63 ms` \| `30.7768%` \| \| 128k-prefill \| 4096 \| 124000 \| `5341.23 ms` \| `3952.00 ms` \| `26.0096%` \| \| 128k-decode \| 4 \| 128000 \| `78.30 ms` \| `53.19 ms` \| `32.0690%` \| `mean_e2e_err=28.717478%` ------ ## ✅ Checklist / 检查列表 Before PR*: - [ ] Bug fixes are fully covered by unit tests, the case that causes the bug should be added in the unit tests. / 修复的 Bug 已完全由单元测试覆盖，导致 Bug 的情况应在单元测试中添加。 - [x] The modification is covered by validation runs and targeted regression coverage. / 此拉取请求中的修改已通过验证用例和定向回归覆盖。 - [x] All relevant documentation (API docs, docstrings, example tutorials) has been updated to reflect these changes. / 所有相关文档（API 文档、文档字符串、示例教程）已更新以反映这些更改。 - [x] Please ensure code files contain no Chinese comments. / 请保证代码文件中不含中文注释。 ------ See merge request: Ascend/msmodeling!421	2 天前
test_deepseek_v4.py	fix(deepseek-v4): account for context in ratio128 attention topk Co-authored-by: jia_ya_nan<jiayanan3@h-partners.com> # message auto-generated for no-merge-commit merge: !448 merge master into master fix(deepseek-v4): account for context in ratio128 attention topk Created-by: jia_ya_nan Commit-by: jia_ya_nan Merged-by: ascend-robot Description: # PR Template Thanks for your contribution; we appreciate it a lot. The following instructions will make your pull request healthier and help you get feedback more easily. If you do not understand some items, don't worry, just make the pull request and seek help from maintainers. 感谢您的贡献，我们非常重视。以下说明将使您的拉取请求更健康，更易于获得反馈。如果您不理解某些项目，请不要担心，只需提交拉取请求并从维护人员那里寻求帮助即可。 PR Type / PR类型 - [ ] Feature（功能新增） - [x] Bugfix（Bug 修复） - [ ] Docs（文档更新） - [ ] CI/CD（持续集成/持续部署） - [ ] Refactor（代码重构） - [ ] Perf（性能优化） - [ ] Test-Cases（测试用例更新） - [ ] Other（其他） ## 🔍 Motivation / 变更动机 Please describe the motivation of this PR and the goal you want to achieve through this PR. 请描述您的拉取请求的动机和您希望通过此拉取请求实现的目标。 DeepSeek-V4 在 ratio=128 的 compressed sparse attention 场景下，compressed topk 的宽度应基于请求的总序列长度计算，即包含历史 context 和当前 query tokens。原实现仅使用当前 query length 计算 compressed topk，导致长上下文 prefill 场景下生成的 attention topk shape 偏小，进而影响 attention shape dump 和性能建模结果的准确性。本 PR 旨在修正 DeepSeek-V4 ratio=128 attention 在长上下文场景下的 compressed topk 计算逻辑，使其与真实总序列长度对齐。 ------ ## 📝 Modification / 修改内容 Please briefly describe what modification is made in this PR. 请简要描述此拉取请求中进行的修改。 - 在 attention metadata 中新增 `max_seq_len` 字段，用于记录当前 batch 的最大总序列长度。 - 在固定 batch 和 varlen 输入生成逻辑中填充 `max_seq_len`。 - 更新 DeepSeek-V4 `get_compress_topk_idxs`，支持使用 total sequence length 计算 compressed topk 宽度。 - 在 DeepSeek-V4 ratio=128 attention 路径中传入 `attention_meta.max_seq_len`，确保 compressed topk 包含历史 context chunks。 - 新增回归测试，覆盖 ratio=128 长上下文 topk shape 以及 helper 函数基于 total sequence length 计算的行为。 ------ ## 📐 Associated Test Results / 关联测试结果 Please provide the related test results, such as test reports, etc. 请提供相关测试结果，例如测试报告等。对于长上下文prefill场景：29s->45s，主要修复compressor ratio为128时attention耗时异常的问题 `python -m cli.inference.text_generate deepseek-ai/DeepSeek-V4-Pro --device ATLAS_800_A3_560T_128G_DIE --num-devices 32 --num-queries 32 --query-length 102400 --context-length 921600 --compile --quantize-linear-action MXFP4 --quantize-non-expert-linear-action FP8 --quantize-attention-action FP8 --tp-size 1 --ep-size 32 --log-level info` ![image.png](https://raw.gitcode.com/user-images/assets/8428112/4a322439-8dc9-4c4d-8faa-b194834d0e4a/image.png 'image.png') ![image.png](https://raw.gitcode.com/user-images/assets/8428112/cf27fead-1385-46fb-8894-12f44fc28df9/image.png 'image.png') ------ ## 🌟 Use cases (Optional) / 使用案例（可选） If this PR introduces a new feature, it is better to list some use cases here and update the documentation. 如果此拉取请求引入了新功能，最好在此处列出一些用例并更新文档。 ------ ## ✅ Checklist / 检查列表 Before PR: - [x] Bug fixes are fully covered by unit tests, the case that causes the bug should be added in the unit tests. / 修复的 Bug 已完全由单元测试覆盖，导致 Bug 的情况应在单元测试中添加。 - [x] The modification is covered by complete unit tests. If not, please add more unit tests to ensure the correctness. / 此拉取请求中的修改已完全由单元测试覆盖。如果不是，请添加更多单元测试以确保正确性。 - [x] All relevant documentation (API docs, docstrings, example tutorials) has been updated to reflect these changes. / 所有相关文档（API 文档、文档字符串、示例教程）已更新以反映这些更改。 - [x] Please ensure code files contain no Chinese comments. / 请保证代码文件中不含中文注释。 ------ See merge request: Ascend/msmodeling!448	1 小时前
test_device.py	【同步】【非开发代码】代码从 develop 同步到 master Co-authored-by: yydyzr<liuyuncong1@huawei.com> Co-authored-by: gcw_61YBRfIt<chuzhenxing@huawei.com> Co-authored-by: 孔炳翔<1120200577@qq.com> Co-authored-by: zhengxinqian<qianzhengxin@huawei.com> Co-authored-by: hw_whx<wanghexiang7@huawei.com> Co-authored-by: jgong5<steven.gong@gmail.com> Co-authored-by: hw_whx<2952154980@qq.com> # message auto-generated for no-merge-commit merge: !330 merge master into master 【同步】【非开发代码】代码从 develop 同步到 master Created-by: AvadaKedavrua Commit-by: liujiawang;ascend-robot;AvadaKedavrua;lutean;Horacehxw;eveyin1;minghang_c;zwt__;tt0cool;elrond-g;jia_ya_nan;zhenyu_zhang;ChenHuiwen;wangshen001;Hudingyi;wendellX;Secluded_Ocean;jhon-117;yaohan404;jiangruitao;zhenghaojie;stormchasingg;panyj1993;cmh1056291129;yuyinkai1;sunguozhong;genius52;liu_jiaxu;HongMaoShuiGuai;zhengxinqian;weixin_43368449;jsez-li-bin;jgong5;wqh17101;w00609794;yydyzr;JieZhang679;sppedforcy;gcw_61YBRfIt;Jiong Gong;hw_whx;gongjiong;孔炳翔 Merged-by: ascend-robot Description: 代码从 develop 同步到 master，后续基于 master 演进，并支持打包 See merge request: Ascend/msmodeling!330	17 天前
test_dfc_pass.py	【同步】【非开发代码】代码从 develop 同步到 master Co-authored-by: yydyzr<liuyuncong1@huawei.com> Co-authored-by: gcw_61YBRfIt<chuzhenxing@huawei.com> Co-authored-by: 孔炳翔<1120200577@qq.com> Co-authored-by: zhengxinqian<qianzhengxin@huawei.com> Co-authored-by: hw_whx<wanghexiang7@huawei.com> Co-authored-by: jgong5<steven.gong@gmail.com> Co-authored-by: hw_whx<2952154980@qq.com> # message auto-generated for no-merge-commit merge: !330 merge master into master 【同步】【非开发代码】代码从 develop 同步到 master Created-by: AvadaKedavrua Commit-by: liujiawang;ascend-robot;AvadaKedavrua;lutean;Horacehxw;eveyin1;minghang_c;zwt__;tt0cool;elrond-g;jia_ya_nan;zhenyu_zhang;ChenHuiwen;wangshen001;Hudingyi;wendellX;Secluded_Ocean;jhon-117;yaohan404;jiangruitao;zhenghaojie;stormchasingg;panyj1993;cmh1056291129;yuyinkai1;sunguozhong;genius52;liu_jiaxu;HongMaoShuiGuai;zhengxinqian;weixin_43368449;jsez-li-bin;jgong5;wqh17101;w00609794;yydyzr;JieZhang679;sppedforcy;gcw_61YBRfIt;Jiong Gong;hw_whx;gongjiong;孔炳翔 Merged-by: ascend-robot Description: 代码从 develop 同步到 master，后续基于 master 演进，并支持打包 See merge request: Ascend/msmodeling!330	17 天前
test_diffusers_model_resolver.py	fix(security): add model source safety checks Co-authored-by: jia_ya_nan<jiayanan3@h-partners.com> # message auto-generated for no-merge-commit merge: !385 merge fix/trust-remote-code-safety into master fix(security): add model source safety checks Created-by: jia_ya_nan Commit-by: jia_ya_nan Merged-by: ascend-robot Description: # PR Template Thanks for your contribution; we appreciate it a lot. The following instructions will make your pull request healthier and help you get feedback more easily. If you do not understand some items, don't worry, just make the pull request and seek help from maintainers. 感谢您的贡献，我们非常重视。以下说明将使您的拉取请求更健康，更易于获得反馈。如果您不理解某些项目，请不要担心，只需提交拉取请求并从维护人员那里寻求帮助即可。 PR Type / PR类型 - [ ] Feature（功能新增） - [x] Bugfix（Bug 修复） - [ ] Docs（文档更新） - [ ] CI/CD（持续集成/持续部署） - [ ] Refactor（代码重构） - [ ] Perf（性能优化） - [ ] Test-Cases（测试用例更新） - [x] Other（其他） ## 🔍 Motivation / 变更动机 Please describe the motivation of this PR and the goal you want to achieve through this PR. 请描述您的拉取请求的动机和您希望通过此拉取请求实现的目标。安全加固 ------ ## 📝 Modification / 修改内容 Please briefly describe what modification is made in this PR. 请简要描述此拉取请求中进行的修改。增加本地路径权限校验；增加日志风险提示去掉不维护的老接口 ------ ## 📐 Associated Test Results / 关联测试结果 Please provide the related test results, such as test reports, etc. 请提供相关测试结果，例如测试报告等。 ![image.png](https://raw.gitcode.com/user-images/assets/8428112/ef4f75a5-1346-4320-8de2-a19703ebedb3/image.png 'image.png') ------ ## 🌟 Use cases (Optional) / 使用案例（可选） If this PR introduces a new feature, it is better to list some use cases here and update the documentation. 如果此拉取请求引入了新功能，最好在此处列出一些用例并更新文档。 ------ ## ✅ Checklist / 检查列表 Before PR: - [x] Bug fixes are fully covered by unit tests, the case that causes the bug should be added in the unit tests. / 修复的 Bug 已完全由单元测试覆盖，导致 Bug 的情况应在单元测试中添加。 - [x] The modification is covered by complete unit tests. If not, please add more unit tests to ensure the correctness. / 此拉取请求中的修改已完全由单元测试覆盖。如果不是，请添加更多单元测试以确保正确性。 - [x] All relevant documentation (API docs, docstrings, example tutorials) has been updated to reflect these changes. / 所有相关文档（API 文档、文档字符串、示例教程）已更新以反映这些更改。 - [x] Please ensure code files contain no Chinese comments. / 请保证代码文件中不含中文注释。 ------ See merge request: Ascend/msmodeling!385	5 天前
test_diffusers_remote_builder.py	feat(diffusers): support remote repo config autoload - add shared config-only Hugging Face and ModelScope snapshot helpers - resolve remote Diffusers repo ids and explicit snapshot subfolders before loading configs - expose video remote-source in CLI and Web UI - update RFC and offline regression coverage Signed-off-by: minghang_c <chiminghang@h-partners.com> Co-authored-by: minghang_c<chiminghang@h-partners.com> # message auto-generated for no-merge-commit merge: !356 merge diffusers-hf-autoload-master-impl into master feat(diffusers): support remote repo config autoload Created-by: minghang_c Commit-by: minghang_c Merged-by: ascend-robot Description: Thanks for your contribution; we appreciate it a lot. The following instructions will make your pull request healthier and help you get feedback more easily. If you do not understand some items, don't worry, just make the pull request and seek help from maintainers. 感谢您的贡献，我们非常重视。以下说明将使您的拉取请求更健康，更易于获得反馈。如果您不理解某些项目，请不要担心，只需提交拉取请求并从维护人员那里寻求帮助即可。 PR Type / PR类型 - [x] Feature（功能新增） - [ ] Bugfix（Bug 修复） - [x] Docs（文档更新） - [ ] CI/CD（持续集成/持续部署） - [ ] Refactor（代码重构） - [ ] Perf（性能优化） - [x] Test-Cases（测试用例更新） - [ ] Other（其他） ## 🔍 Motivation / 变更动机 `video_generate` 原先只支持本地 Diffusers 模型目录，使用公开 Hugging Face / ModelScope Diffusers repo 时需要用户手动下载并整理配置目录。本 PR 增加 Diffusers 远端 repo id 自动加载能力，使 video_generate 可以直接传入远端 repo id，并且只下载 config 文件，不下载权重文件。同时补充 ModelScope remote source、aggregate repo 子目录寻址和 Web UI 支持，使体验与 text_generate 的 remote source 设计保持一致。 ------ ## 📝 Modification / 修改内容 - 新增共享 Hub helper： - Hugging Face config-only snapshot 下载 - ModelScope config-only snapshot 下载 - ModelScope 参数兼容 `allow_patterns` / `allow_file_pattern` - snapshot 下载期间隐藏进度输出和噪音日志 - 新增 Diffusers model resolver： - 本地目录保持原行为，不访问网络 - 非本地输入按 `remote_source` 解析 Hugging Face / ModelScope repo id - 支持 `<namespace>/<repo>/<subfolder>` 格式，例如：`tencent/HunyuanVideo-1.5/transformer/720p_i2v_distilled_sparse` - Diffusers builder 接入 resolver，并将解析后的本地路径交给现有 `load_config_from_file` - `video_generate` 两个 CLI 入口新增： - `--remote-source {huggingface,modelscope}` - 更新 `model_id` help 文案 - Web UI video_generate 表单新增 remote-source 下拉框，并纳入 task params/hash - 更新 RFC：覆盖 Hugging Face、ModelScope、子目录寻址、日志行为与测试策略 - 新增/更新离线回归测试，覆盖 Hub helper、resolver、builder、CLI help、Web UI command builder/callback/frontend workflow ------ ## 📐 Associated Test Results / 关联测试结果 Focused regression: bash .venv/bin/python -m pytest \ tests/regression/tensor_cast/test_model_hub.py \ tests/regression/tensor_cast/test_diffusers_model_resolver.py \ tests/regression/tensor_cast/test_diffusers_remote_builder.py \ tests/regression/tensor_cast/test_auto_model_config_loader.py::test_modelscope_snapshot_config_only_uses_allowlist \ tests/regression/cli/test_video_generate_remote_source.py \ tests/regression/web_ui/test_command_builder.py \ tests/regression/web_ui/test_callbacks.py \ tests/regression/web_ui/test_frontend_workflows.py -q Result: `text 598 passed, 67 warnings` Whitespace check: `bash git diff --check` Result: no output. ------ ## 🌟 Use cases (Optional) / 使用案例（可选） Hugging Face 默认来源： `bash python -m cli.inference.video_generate Wan-AI/Wan2.2-T2V-A14B-Diffusers \ --device TEST_DEVICE \ --batch-size 1 \ --seq-len 128 \ --frame-num 81 \ --sample-step 1` ModelScope 来源： `bash python -m cli.inference.video_generate Wan-AI/Wan2.2-T2V-A14B-Diffusers \ --remote-source modelscope \ --device TEST_DEVICE \ --batch-size 1 \ --seq-len 128 \ --frame-num 81 \ --sample-step 1` Aggregate repo 子目录： `bash python -m cli.inference.video_generate tencent/HunyuanVideo-1.5/transformer/720p_i2v_distilled_sparse \ --device TEST_DEVICE \ --batch-size 1 \ --seq-len 128 \ --frame-num 121 \ --sample-step 1` ------ ## ✅ Checklist / 检查列表 Before PR: - [ ] Bug fixes are fully covered by unit tests, the case that causes the bug should be added in the unit tests. / 修复的 Bug 已完全由单元测试覆盖，导致 Bug 的情况应在单元测试中添加。 - [x] The modification is covered by complete unit tests. If not, please add more unit tests to ensure the correctness. / 此拉取请求中的修改已完全由单元测试覆盖。如果不是，请添加更多单元测试以确保正确性。 - [x] All relevant documentation (API docs, docstrings, example tutorials) has been updated to reflect these changes. / 所有相关文档（API 文档、文档字符串、示例教程）已更新以反映这些更改。 - [x] Please ensure code files contain no Chinese comments. / 请保证代码文件中不含中文注释。 ------ 🤖 Generated with [Claude Code](https://claude.com/claude-code) See merge request: Ascend/msmodeling!356	13 天前
test_dtype.py	【同步】【非开发代码】代码从 develop 同步到 master Co-authored-by: yydyzr<liuyuncong1@huawei.com> Co-authored-by: gcw_61YBRfIt<chuzhenxing@huawei.com> Co-authored-by: 孔炳翔<1120200577@qq.com> Co-authored-by: zhengxinqian<qianzhengxin@huawei.com> Co-authored-by: hw_whx<wanghexiang7@huawei.com> Co-authored-by: jgong5<steven.gong@gmail.com> Co-authored-by: hw_whx<2952154980@qq.com> # message auto-generated for no-merge-commit merge: !330 merge master into master 【同步】【非开发代码】代码从 develop 同步到 master Created-by: AvadaKedavrua Commit-by: liujiawang;ascend-robot;AvadaKedavrua;lutean;Horacehxw;eveyin1;minghang_c;zwt__;tt0cool;elrond-g;jia_ya_nan;zhenyu_zhang;ChenHuiwen;wangshen001;Hudingyi;wendellX;Secluded_Ocean;jhon-117;yaohan404;jiangruitao;zhenghaojie;stormchasingg;panyj1993;cmh1056291129;yuyinkai1;sunguozhong;genius52;liu_jiaxu;HongMaoShuiGuai;zhengxinqian;weixin_43368449;jsez-li-bin;jgong5;wqh17101;w00609794;yydyzr;JieZhang679;sppedforcy;gcw_61YBRfIt;Jiong Gong;hw_whx;gongjiong;孔炳翔 Merged-by: ascend-robot Description: 代码从 develop 同步到 master，后续基于 master 演进，并支持打包 See merge request: Ascend/msmodeling!330	17 天前
test_empirical.py	【同步】【非开发代码】代码从 develop 同步到 master Co-authored-by: yydyzr<liuyuncong1@huawei.com> Co-authored-by: gcw_61YBRfIt<chuzhenxing@huawei.com> Co-authored-by: 孔炳翔<1120200577@qq.com> Co-authored-by: zhengxinqian<qianzhengxin@huawei.com> Co-authored-by: hw_whx<wanghexiang7@huawei.com> Co-authored-by: jgong5<steven.gong@gmail.com> Co-authored-by: hw_whx<2952154980@qq.com> # message auto-generated for no-merge-commit merge: !330 merge master into master 【同步】【非开发代码】代码从 develop 同步到 master Created-by: AvadaKedavrua Commit-by: liujiawang;ascend-robot;AvadaKedavrua;lutean;Horacehxw;eveyin1;minghang_c;zwt__;tt0cool;elrond-g;jia_ya_nan;zhenyu_zhang;ChenHuiwen;wangshen001;Hudingyi;wendellX;Secluded_Ocean;jhon-117;yaohan404;jiangruitao;zhenghaojie;stormchasingg;panyj1993;cmh1056291129;yuyinkai1;sunguozhong;genius52;liu_jiaxu;HongMaoShuiGuai;zhengxinqian;weixin_43368449;jsez-li-bin;jgong5;wqh17101;w00609794;yydyzr;JieZhang679;sppedforcy;gcw_61YBRfIt;Jiong Gong;hw_whx;gongjiong;孔炳翔 Merged-by: ascend-robot Description: 代码从 develop 同步到 master，后续基于 master 演进，并支持打包 See merge request: Ascend/msmodeling!330	17 天前
test_glm5.py	【同步】【非开发代码】代码从 develop 同步到 master Co-authored-by: yydyzr<liuyuncong1@huawei.com> Co-authored-by: gcw_61YBRfIt<chuzhenxing@huawei.com> Co-authored-by: 孔炳翔<1120200577@qq.com> Co-authored-by: zhengxinqian<qianzhengxin@huawei.com> Co-authored-by: hw_whx<wanghexiang7@huawei.com> Co-authored-by: jgong5<steven.gong@gmail.com> Co-authored-by: hw_whx<2952154980@qq.com> # message auto-generated for no-merge-commit merge: !330 merge master into master 【同步】【非开发代码】代码从 develop 同步到 master Created-by: AvadaKedavrua Commit-by: liujiawang;ascend-robot;AvadaKedavrua;lutean;Horacehxw;eveyin1;minghang_c;zwt__;tt0cool;elrond-g;jia_ya_nan;zhenyu_zhang;ChenHuiwen;wangshen001;Hudingyi;wendellX;Secluded_Ocean;jhon-117;yaohan404;jiangruitao;zhenghaojie;stormchasingg;panyj1993;cmh1056291129;yuyinkai1;sunguozhong;genius52;liu_jiaxu;HongMaoShuiGuai;zhengxinqian;weixin_43368449;jsez-li-bin;jgong5;wqh17101;w00609794;yydyzr;JieZhang679;sppedforcy;gcw_61YBRfIt;Jiong Gong;hw_whx;gongjiong;孔炳翔 Merged-by: ascend-robot Description: 代码从 develop 同步到 master，后续基于 master 演进，并支持打包 See merge request: Ascend/msmodeling!330	17 天前
test_gmm_pass.py	【同步】【非开发代码】代码从 develop 同步到 master Co-authored-by: yydyzr<liuyuncong1@huawei.com> Co-authored-by: gcw_61YBRfIt<chuzhenxing@huawei.com> Co-authored-by: 孔炳翔<1120200577@qq.com> Co-authored-by: zhengxinqian<qianzhengxin@huawei.com> Co-authored-by: hw_whx<wanghexiang7@huawei.com> Co-authored-by: jgong5<steven.gong@gmail.com> Co-authored-by: hw_whx<2952154980@qq.com> # message auto-generated for no-merge-commit merge: !330 merge master into master 【同步】【非开发代码】代码从 develop 同步到 master Created-by: AvadaKedavrua Commit-by: liujiawang;ascend-robot;AvadaKedavrua;lutean;Horacehxw;eveyin1;minghang_c;zwt__;tt0cool;elrond-g;jia_ya_nan;zhenyu_zhang;ChenHuiwen;wangshen001;Hudingyi;wendellX;Secluded_Ocean;jhon-117;yaohan404;jiangruitao;zhenghaojie;stormchasingg;panyj1993;cmh1056291129;yuyinkai1;sunguozhong;genius52;liu_jiaxu;HongMaoShuiGuai;zhengxinqian;weixin_43368449;jsez-li-bin;jgong5;wqh17101;w00609794;yydyzr;JieZhang679;sppedforcy;gcw_61YBRfIt;Jiong Gong;hw_whx;gongjiong;孔炳翔 Merged-by: ascend-robot Description: 代码从 develop 同步到 master，后续基于 master 演进，并支持打包 See merge request: Ascend/msmodeling!330	17 天前
test_helpers_usage.py	【同步】【非开发代码】代码从 develop 同步到 master Co-authored-by: yydyzr<liuyuncong1@huawei.com> Co-authored-by: gcw_61YBRfIt<chuzhenxing@huawei.com> Co-authored-by: 孔炳翔<1120200577@qq.com> Co-authored-by: zhengxinqian<qianzhengxin@huawei.com> Co-authored-by: hw_whx<wanghexiang7@huawei.com> Co-authored-by: jgong5<steven.gong@gmail.com> Co-authored-by: hw_whx<2952154980@qq.com> # message auto-generated for no-merge-commit merge: !330 merge master into master 【同步】【非开发代码】代码从 develop 同步到 master Created-by: AvadaKedavrua Commit-by: liujiawang;ascend-robot;AvadaKedavrua;lutean;Horacehxw;eveyin1;minghang_c;zwt__;tt0cool;elrond-g;jia_ya_nan;zhenyu_zhang;ChenHuiwen;wangshen001;Hudingyi;wendellX;Secluded_Ocean;jhon-117;yaohan404;jiangruitao;zhenghaojie;stormchasingg;panyj1993;cmh1056291129;yuyinkai1;sunguozhong;genius52;liu_jiaxu;HongMaoShuiGuai;zhengxinqian;weixin_43368449;jsez-li-bin;jgong5;wqh17101;w00609794;yydyzr;JieZhang679;sppedforcy;gcw_61YBRfIt;Jiong Gong;hw_whx;gongjiong;孔炳翔 Merged-by: ascend-robot Description: 代码从 develop 同步到 master，后续基于 master 演进，并支持打包 See merge request: Ascend/msmodeling!330	17 天前
test_input_generator.py	fix(deepseek-v4): account for context in ratio128 attention topk Co-authored-by: jia_ya_nan<jiayanan3@h-partners.com> # message auto-generated for no-merge-commit merge: !448 merge master into master fix(deepseek-v4): account for context in ratio128 attention topk Created-by: jia_ya_nan Commit-by: jia_ya_nan Merged-by: ascend-robot Description: # PR Template Thanks for your contribution; we appreciate it a lot. The following instructions will make your pull request healthier and help you get feedback more easily. If you do not understand some items, don't worry, just make the pull request and seek help from maintainers. 感谢您的贡献，我们非常重视。以下说明将使您的拉取请求更健康，更易于获得反馈。如果您不理解某些项目，请不要担心，只需提交拉取请求并从维护人员那里寻求帮助即可。 PR Type / PR类型 - [ ] Feature（功能新增） - [x] Bugfix（Bug 修复） - [ ] Docs（文档更新） - [ ] CI/CD（持续集成/持续部署） - [ ] Refactor（代码重构） - [ ] Perf（性能优化） - [ ] Test-Cases（测试用例更新） - [ ] Other（其他） ## 🔍 Motivation / 变更动机 Please describe the motivation of this PR and the goal you want to achieve through this PR. 请描述您的拉取请求的动机和您希望通过此拉取请求实现的目标。 DeepSeek-V4 在 ratio=128 的 compressed sparse attention 场景下，compressed topk 的宽度应基于请求的总序列长度计算，即包含历史 context 和当前 query tokens。原实现仅使用当前 query length 计算 compressed topk，导致长上下文 prefill 场景下生成的 attention topk shape 偏小，进而影响 attention shape dump 和性能建模结果的准确性。本 PR 旨在修正 DeepSeek-V4 ratio=128 attention 在长上下文场景下的 compressed topk 计算逻辑，使其与真实总序列长度对齐。 ------ ## 📝 Modification / 修改内容 Please briefly describe what modification is made in this PR. 请简要描述此拉取请求中进行的修改。 - 在 attention metadata 中新增 `max_seq_len` 字段，用于记录当前 batch 的最大总序列长度。 - 在固定 batch 和 varlen 输入生成逻辑中填充 `max_seq_len`。 - 更新 DeepSeek-V4 `get_compress_topk_idxs`，支持使用 total sequence length 计算 compressed topk 宽度。 - 在 DeepSeek-V4 ratio=128 attention 路径中传入 `attention_meta.max_seq_len`，确保 compressed topk 包含历史 context chunks。 - 新增回归测试，覆盖 ratio=128 长上下文 topk shape 以及 helper 函数基于 total sequence length 计算的行为。 ------ ## 📐 Associated Test Results / 关联测试结果 Please provide the related test results, such as test reports, etc. 请提供相关测试结果，例如测试报告等。对于长上下文prefill场景：29s->45s，主要修复compressor ratio为128时attention耗时异常的问题 `python -m cli.inference.text_generate deepseek-ai/DeepSeek-V4-Pro --device ATLAS_800_A3_560T_128G_DIE --num-devices 32 --num-queries 32 --query-length 102400 --context-length 921600 --compile --quantize-linear-action MXFP4 --quantize-non-expert-linear-action FP8 --quantize-attention-action FP8 --tp-size 1 --ep-size 32 --log-level info` ![image.png](https://raw.gitcode.com/user-images/assets/8428112/4a322439-8dc9-4c4d-8faa-b194834d0e4a/image.png 'image.png') ![image.png](https://raw.gitcode.com/user-images/assets/8428112/cf27fead-1385-46fb-8894-12f44fc28df9/image.png 'image.png') ------ ## 🌟 Use cases (Optional) / 使用案例（可选） If this PR introduces a new feature, it is better to list some use cases here and update the documentation. 如果此拉取请求引入了新功能，最好在此处列出一些用例并更新文档。 ------ ## ✅ Checklist / 检查列表 Before PR: - [x] Bug fixes are fully covered by unit tests, the case that causes the bug should be added in the unit tests. / 修复的 Bug 已完全由单元测试覆盖，导致 Bug 的情况应在单元测试中添加。 - [x] The modification is covered by complete unit tests. If not, please add more unit tests to ensure the correctness. / 此拉取请求中的修改已完全由单元测试覆盖。如果不是，请添加更多单元测试以确保正确性。 - [x] All relevant documentation (API docs, docstrings, example tutorials) has been updated to reflect these changes. / 所有相关文档（API 文档、文档字符串、示例教程）已更新以反映这些更改。 - [x] Please ensure code files contain no Chinese comments. / 请保证代码文件中不含中文注释。 ------ See merge request: Ascend/msmodeling!448	1 小时前
test_kimi_k25.py	fix(tensor_cast): model MTP speculative decode shapes Co-authored-by: Horacehxw<horacehxw@gmail.com> # message auto-generated for no-merge-commit merge: !362 merge resolve-issue-130 into master fix(tensor_cast): model MTP speculative decode shapes Created-by: Horacehxw Commit-by: Horacehxw Merged-by: ascend-robot Description: PR Type / PR类型 - [x] Bugfix（Bug 修复） - [x] Refactor（代码重构） - [x] Test-Cases（测试用例更新） ## 🔍 Motivation / 变更动机 MTP（Multi-Token Prediction）speculative decode 在 TensorCast 仿真中，lm_head 和 sampler 的 row selection 逻辑存在问题： 1. target/proposal rows 混淆：MTP decode 时，lm_head 应该只处理 spec window 内的 target+bonus verification rows，而不是全量 packed rows。旧逻辑用 `selected_token_indices` 做 prefill 级的 token 裁剪，但无法区分 target 和 proposal，导致 lm_head 多算了不参与 verification 的行。 2. Sampler 不支持 spec decode 输出格式：旧 Sampler 只做 greedy argmax 取最后一个 token，不支持返回 `(num_requests, num_speculative_tokens + 1)` 形状的 target+bonus tokens。 3. Kimi K2.5 MTP path 全量过 lm_head：Kimi 的 monkey-patch 在 MTP text path 里把全部 hidden states 送进 lm_head（163840 vocab），prefill 时 12×7168×163840 的矩阵乘法被放大 ~3500×。 ------ ## 📝 Modification / 修改内容 ### 核心：统一 row selection 路径 - 新增 `SpecDecodeMetadata` dataclass，记录每个 batch 的 `logits_indices`、`num_active_requests`、`num_speculative_tokens`。 - 新增 `select_lm_head_hidden_states(hidden_states, sampling_metadata, mode)`： - `mode="target"`：选 verification window 全部行（给 lm_head 用） - `mode="proposal"`：只选每个 request 的最后一行（给 MTP predictor 用） - `CausalLmWrapper`、`VLModelWrapper`、`MultiTokenPredictor` 统一调用该函数，替代原来散落各处的 `index_select`。 ### Input Generator - `generate_inputs` / `generate_inputs_varlen` 在 MTP decode 时构造 `SpecDecodeMetadata`，只覆盖每个 request 尾部的 spec window rows。 - 短窗口（query_len < num_mtp_tokens + 1）自动 fallback 到普通 decode selection。 ### Sampler - 识别 `spec_decode_metadata` 后，将 verification logits reshape 为 `(num_requests, spec_window, vocab)`，分别对 target 和 bonus 做 greedy argmax，返回 `(num_requests, spec_window)` 形状。 - 兼容 proposal rows（MTP 后续层）和旧的 `selected_token_indices` prefill 路径。 ### Kimi K2.5 - MTP text path 拆分：先跑 language model body 拿到 full hidden states（rotary/proposal 需要），再用 `select_lm_head_hidden_states` 裁 target rows 后过 lm_head。避免 163840-vocab 的全量投影。 - 删除旧的 `MultiTokenPredictorLayer` tuple-unpack monkey patch（已在上游 mtp.py 修复）。 ------ ## 📐 Associated Test Results / 关联测试结果 `pytest: 26 passed, 2 warnings in 3.72s` 覆盖： - target/proposal row selection 不混用 - default `selected_token_indices=-1` sentinel 被正确忽略 - wrong logits_indices length → ValueError - spec-decode sampler 返回 target+bonus tokens - CausalLmWrapper target row 投影 - MTP wrapper prefill fallback / spec-decode bonus token forward - fixed/varlen input generator MTP metadata 生成 - Kimi text path target rows 在 internal lm_head 前裁剪 - Kimi default sentinel 不触发 fast path ------ ## ✅ Checklist / 检查列表 - [x] Linting tools used / 使用 lintrunner 工具 - [x] Bug fixes covered by unit tests / 修复的 Bug 已由单元测试覆盖 - [x] Modification covered by unit tests / 修改已由单元测试覆盖 - [ ] Documentation updated / 文档已更新 - [x] No Chinese comments in code files / 代码文件中不含中文注释 See merge request: Ascend/msmodeling!362	3 天前
test_layers.py	【同步】【非开发代码】代码从 develop 同步到 master Co-authored-by: yydyzr<liuyuncong1@huawei.com> Co-authored-by: gcw_61YBRfIt<chuzhenxing@huawei.com> Co-authored-by: 孔炳翔<1120200577@qq.com> Co-authored-by: zhengxinqian<qianzhengxin@huawei.com> Co-authored-by: hw_whx<wanghexiang7@huawei.com> Co-authored-by: jgong5<steven.gong@gmail.com> Co-authored-by: hw_whx<2952154980@qq.com> # message auto-generated for no-merge-commit merge: !330 merge master into master 【同步】【非开发代码】代码从 develop 同步到 master Created-by: AvadaKedavrua Commit-by: liujiawang;ascend-robot;AvadaKedavrua;lutean;Horacehxw;eveyin1;minghang_c;zwt__;tt0cool;elrond-g;jia_ya_nan;zhenyu_zhang;ChenHuiwen;wangshen001;Hudingyi;wendellX;Secluded_Ocean;jhon-117;yaohan404;jiangruitao;zhenghaojie;stormchasingg;panyj1993;cmh1056291129;yuyinkai1;sunguozhong;genius52;liu_jiaxu;HongMaoShuiGuai;zhengxinqian;weixin_43368449;jsez-li-bin;jgong5;wqh17101;w00609794;yydyzr;JieZhang679;sppedforcy;gcw_61YBRfIt;Jiong Gong;hw_whx;gongjiong;孔炳翔 Merged-by: ascend-robot Description: 代码从 develop 同步到 master，后续基于 master 演进，并支持打包 See merge request: Ascend/msmodeling!330	17 天前
test_matmul_allreduce_pass.py	【同步】【非开发代码】代码从 develop 同步到 master Co-authored-by: yydyzr<liuyuncong1@huawei.com> Co-authored-by: gcw_61YBRfIt<chuzhenxing@huawei.com> Co-authored-by: 孔炳翔<1120200577@qq.com> Co-authored-by: zhengxinqian<qianzhengxin@huawei.com> Co-authored-by: hw_whx<wanghexiang7@huawei.com> Co-authored-by: jgong5<steven.gong@gmail.com> Co-authored-by: hw_whx<2952154980@qq.com> # message auto-generated for no-merge-commit merge: !330 merge master into master 【同步】【非开发代码】代码从 develop 同步到 master Created-by: AvadaKedavrua Commit-by: liujiawang;ascend-robot;AvadaKedavrua;lutean;Horacehxw;eveyin1;minghang_c;zwt__;tt0cool;elrond-g;jia_ya_nan;zhenyu_zhang;ChenHuiwen;wangshen001;Hudingyi;wendellX;Secluded_Ocean;jhon-117;yaohan404;jiangruitao;zhenghaojie;stormchasingg;panyj1993;cmh1056291129;yuyinkai1;sunguozhong;genius52;liu_jiaxu;HongMaoShuiGuai;zhengxinqian;weixin_43368449;jsez-li-bin;jgong5;wqh17101;w00609794;yydyzr;JieZhang679;sppedforcy;gcw_61YBRfIt;Jiong Gong;hw_whx;gongjiong;孔炳翔 Merged-by: ascend-robot Description: 代码从 develop 同步到 master，后续基于 master 演进，并支持打包 See merge request: Ascend/msmodeling!330	17 天前
test_memory_tracker.py	【同步】【非开发代码】代码从 develop 同步到 master Co-authored-by: yydyzr<liuyuncong1@huawei.com> Co-authored-by: gcw_61YBRfIt<chuzhenxing@huawei.com> Co-authored-by: 孔炳翔<1120200577@qq.com> Co-authored-by: zhengxinqian<qianzhengxin@huawei.com> Co-authored-by: hw_whx<wanghexiang7@huawei.com> Co-authored-by: jgong5<steven.gong@gmail.com> Co-authored-by: hw_whx<2952154980@qq.com> # message auto-generated for no-merge-commit merge: !330 merge master into master 【同步】【非开发代码】代码从 develop 同步到 master Created-by: AvadaKedavrua Commit-by: liujiawang;ascend-robot;AvadaKedavrua;lutean;Horacehxw;eveyin1;minghang_c;zwt__;tt0cool;elrond-g;jia_ya_nan;zhenyu_zhang;ChenHuiwen;wangshen001;Hudingyi;wendellX;Secluded_Ocean;jhon-117;yaohan404;jiangruitao;zhenghaojie;stormchasingg;panyj1993;cmh1056291129;yuyinkai1;sunguozhong;genius52;liu_jiaxu;HongMaoShuiGuai;zhengxinqian;weixin_43368449;jsez-li-bin;jgong5;wqh17101;w00609794;yydyzr;JieZhang679;sppedforcy;gcw_61YBRfIt;Jiong Gong;hw_whx;gongjiong;孔炳翔 Merged-by: ascend-robot Description: 代码从 develop 同步到 master，后续基于 master 演进，并支持打包 See merge request: Ascend/msmodeling!330	17 天前
test_merge_linear_pass.py	【同步】【非开发代码】代码从 develop 同步到 master Co-authored-by: yydyzr<liuyuncong1@huawei.com> Co-authored-by: gcw_61YBRfIt<chuzhenxing@huawei.com> Co-authored-by: 孔炳翔<1120200577@qq.com> Co-authored-by: zhengxinqian<qianzhengxin@huawei.com> Co-authored-by: hw_whx<wanghexiang7@huawei.com> Co-authored-by: jgong5<steven.gong@gmail.com> Co-authored-by: hw_whx<2952154980@qq.com> # message auto-generated for no-merge-commit merge: !330 merge master into master 【同步】【非开发代码】代码从 develop 同步到 master Created-by: AvadaKedavrua Commit-by: liujiawang;ascend-robot;AvadaKedavrua;lutean;Horacehxw;eveyin1;minghang_c;zwt__;tt0cool;elrond-g;jia_ya_nan;zhenyu_zhang;ChenHuiwen;wangshen001;Hudingyi;wendellX;Secluded_Ocean;jhon-117;yaohan404;jiangruitao;zhenghaojie;stormchasingg;panyj1993;cmh1056291129;yuyinkai1;sunguozhong;genius52;liu_jiaxu;HongMaoShuiGuai;zhengxinqian;weixin_43368449;jsez-li-bin;jgong5;wqh17101;w00609794;yydyzr;JieZhang679;sppedforcy;gcw_61YBRfIt;Jiong Gong;hw_whx;gongjiong;孔炳翔 Merged-by: ascend-robot Description: 代码从 develop 同步到 master，后续基于 master 演进，并支持打包 See merge request: Ascend/msmodeling!330	17 天前
test_minimax_m2.py	【同步】【非开发代码】代码从 develop 同步到 master Co-authored-by: yydyzr<liuyuncong1@huawei.com> Co-authored-by: gcw_61YBRfIt<chuzhenxing@huawei.com> Co-authored-by: 孔炳翔<1120200577@qq.com> Co-authored-by: zhengxinqian<qianzhengxin@huawei.com> Co-authored-by: hw_whx<wanghexiang7@huawei.com> Co-authored-by: jgong5<steven.gong@gmail.com> Co-authored-by: hw_whx<2952154980@qq.com> # message auto-generated for no-merge-commit merge: !330 merge master into master 【同步】【非开发代码】代码从 develop 同步到 master Created-by: AvadaKedavrua Commit-by: liujiawang;ascend-robot;AvadaKedavrua;lutean;Horacehxw;eveyin1;minghang_c;zwt__;tt0cool;elrond-g;jia_ya_nan;zhenyu_zhang;ChenHuiwen;wangshen001;Hudingyi;wendellX;Secluded_Ocean;jhon-117;yaohan404;jiangruitao;zhenghaojie;stormchasingg;panyj1993;cmh1056291129;yuyinkai1;sunguozhong;genius52;liu_jiaxu;HongMaoShuiGuai;zhengxinqian;weixin_43368449;jsez-li-bin;jgong5;wqh17101;w00609794;yydyzr;JieZhang679;sppedforcy;gcw_61YBRfIt;Jiong Gong;hw_whx;gongjiong;孔炳翔 Merged-by: ascend-robot Description: 代码从 develop 同步到 master，后续基于 master 演进，并支持打包 See merge request: Ascend/msmodeling!330	17 天前
test_mla.py	【同步】【非开发代码】代码从 develop 同步到 master Co-authored-by: yydyzr<liuyuncong1@huawei.com> Co-authored-by: gcw_61YBRfIt<chuzhenxing@huawei.com> Co-authored-by: 孔炳翔<1120200577@qq.com> Co-authored-by: zhengxinqian<qianzhengxin@huawei.com> Co-authored-by: hw_whx<wanghexiang7@huawei.com> Co-authored-by: jgong5<steven.gong@gmail.com> Co-authored-by: hw_whx<2952154980@qq.com> # message auto-generated for no-merge-commit merge: !330 merge master into master 【同步】【非开发代码】代码从 develop 同步到 master Created-by: AvadaKedavrua Commit-by: liujiawang;ascend-robot;AvadaKedavrua;lutean;Horacehxw;eveyin1;minghang_c;zwt__;tt0cool;elrond-g;jia_ya_nan;zhenyu_zhang;ChenHuiwen;wangshen001;Hudingyi;wendellX;Secluded_Ocean;jhon-117;yaohan404;jiangruitao;zhenghaojie;stormchasingg;panyj1993;cmh1056291129;yuyinkai1;sunguozhong;genius52;liu_jiaxu;HongMaoShuiGuai;zhengxinqian;weixin_43368449;jsez-li-bin;jgong5;wqh17101;w00609794;yydyzr;JieZhang679;sppedforcy;gcw_61YBRfIt;Jiong Gong;hw_whx;gongjiong;孔炳翔 Merged-by: ascend-robot Description: 代码从 develop 同步到 master，后续基于 master 演进，并支持打包 See merge request: Ascend/msmodeling!330	17 天前
test_model_hub.py	【REFACTOR】重构 CI gate 与 test_map 同步基础设施 Co-authored-by: AvadaKedavrua<anonymousdev@163.com> # message auto-generated for no-merge-commit merge: !394 merge fix into master 【REFACTOR】重构 CI gate 与 test_map 同步基础设施 Created-by: AvadaKedavrua Commit-by: liujiawang;AvadaKedavrua Merged-by: ascend-robot Description: ## 修改原因 CI gate 的 diff 分类、策略校验、PR 评论、test_map 同步等逻辑耦合在 `gate_policy.py` 等单体模块中，难以独立演进与单测；`test_map` 缺少独立同步入口；`ast_utils` / `test_map_loader` / nightly 报告链路也需要与 gate 策略对齐。本 PR 仅包含 commit `82757732b6b6beb79b7083f6046e9cd9c72005f3`（`refactor`），不涉及 wheel/CLI/OptiX 变更。 --- ## test_map schema 重构：背景与收益 ### 背景旧 `test_map` 以产品源文件为顶层 key，value 为 `symbol → [test_nodes]`： `json { "tensor_cast/foo.py": { "Widget::run": ["tests/regression/.../test_x.py::test_foo"] } }` 新 `test_map` 改为以 pytest node id 为顶层 key，value 为 `source_file → [symbols]`，与 `coverage.py --cov-context=test` 的采集方向一致： `json { "tests/regression/.../test_x.py::test_foo": { "tensor_cast/foo.py": ["Widget::run"] } }` 两者存的是同一张「测试节点 ↔ (源文件, symbol)」二部图，仅主键方向相反；边数相同，语义不变。 ### 为何行数变多、体积反而更小 \| 指标 \| 旧 schema（source-oriented） \| 新 schema（node-oriented） \| \|------\|------------------------------\|----------------------------\| \| JSON 行数 \| ~7 万行 \| ~10 万行 \| \| 文件/内存占用 \| 8,355,104 B（~8.0 MB） \| 3,748,221 B（~3.6 MB，约 -55%） \| 行数不是体积的可靠 proxy。旧格式在每个 symbol 下重复存储完整 pytest node id（`tests/regression/.../test_xxx.py::Class::test_yyy`，通常 60–80 字符）；同一 test 覆盖 N 个 symbol 时，该长字符串出现 N 次。新格式每个 test node id 只作为顶层 key 出现一次，数组里存的是短 canonical symbol（如 `Widget::run`、`%`），边数 E 不变但长字符串重复次数从 O(E) 降到 O(T)（T = 有覆盖的 test 数，T ≪ E）。典型场景：大量 smoke/regression 用例通过 import 共享 `tensor_cast/`、`cli/` 等模块的 module symbol `%`——旧格式在单个 symbol 下聚合成千上万条 test id；新格式每个 test 只记一次短 symbol，整体字节数显著下降。 ### 工程收益 1. 构建零 pivot：`build_test_map.collect_from_coverage` 直接按 coverage context（test node）聚合 `by_test[nid][source].add(symbol)`，与 nightly phase1 采集路径一致。 2. 增量 sync 更自然：`sync.apply_incremental_test_map_update` 对 touched test 文件整 node 替换、对 touched product 文件按 `(test_node, source_path)` 合并，无需在两种索引间来回转换。 3. 删 test / 冗余检测更直接：`gate_deleted_tests`、`detect_redundant_cases` 按 test footprint 遍历；`test_map_loader` 强制顶层为 `tests/...::...`，可拒绝误写成 source key 的脏数据。 4. CI gate 查询无退化：`symbol → tests` 反向查询通过 `build_test_map_index` 一次 O(N) 建索引，与旧 schema 运行时等价。 5. 存储与传输更轻：实测文件体积约减半，OBS 下载与 `json.loads` 内存峰值更低。 --- ## 修改内容 ### CI gate 模块拆分 - `gate_policy.py` 瘦身：策略加载/校验迁入 `policy.py`，diff 分类迁入 `classifier.py`，GitCode PR 评论迁入 `comments.py`，test_map 查询迁入 `test_map_query.py` - 新增 `sync.py` + `scripts/run_test_map_sync.sh`：按目标分支 HEAD 维护权威 `test_map`（`--once` / `--watch`） - `diff.py` 精简，仅保留 git diff 与分支检出能力 - `main.py` / `rules.py` / `models.py` / `errors.py` 适配新模块边界 - `tests/.ci/gate_policy.yaml` 策略字段同步调整 ### 公共 helpers 增强 - `_config.py`：手写 env 解析改为 `pydantic-settings`，统一校验与错误信息 - `ast_utils.py`：扩展符号提取能力，支撑 test_map 粒度映射 - `build_test_map.py` / `test_map_loader.py`：重构收集与加载逻辑（node-oriented schema） - 新增 `test_map_report.py`：test_map 覆盖率汇总、过期豁免检测 - `coverage_symbol_check.py` / `pytest_runner.py` / `test_map_config.py` 对齐新数据结构 - `common/_logging.py`：补充 `log_env_audit` 环境审计日志 ### nightly 报告链路 - `report_builder.py` / `pytest_parser.py` / `main.py`：适配 node-oriented test_map 报告 - `report_models.py` / `feishu_notifier.py` 小幅对齐 ### symbol 校验与 exemption 漂移（新增） - last-wins canonical symbols：重复 `def foo` 仅 gate 最后一个定义；shadowed def 发非 blocking GitCode 评论 - `def _` 消歧：无 decorator → `_`；有 decorator → `_@<suffix>` - exemption 校验：`path::symbol` 整串须为 AST canonical name；coverage omit 路径禁止写入 `exemptions.sources` - exemption 漂移 blocking：PR 删/改名 product/test 文件时，未同步更新 `gate_policy.yaml` → `[ED]` 硬阻断 + GitCode 评论 - Expected/Got：Config / policy / loader 类型与值错误统一格式 - 文档：`scripts/README.md` 补充 `build.sh`、`MSMODELING_WHEEL_OUTPUT_DIR`；`tests/README.md` 补充 symbol 契约 ### 其他 - `scripts/run_ci_gate.sh`：入口参数对齐 - `scripts/prefetch_model_configs.py`：适配新 config 加载 - `tests/README.md` / `tests/SKILL.md`：文档同步 - 新增/更新回归测试：`test_classifier.py`、`test_comments.py`、`test_policy.py`、`test_sync.py`、`test_test_map_query.py`、`test_test_map_report.py` 等 --- ## 自验证 ### CI gate / test_map / nightly 回归测试目的：确认模块拆分后行为不变，新模块有单测覆盖步骤： 1. 在仓库根目录执行： `sh uv run pytest tests/regression/scripts/helpers/ci_gate/ \ tests/regression/scripts/helpers/common/test_ast_utils.py \ tests/regression/scripts/helpers/common/test_test_map_loader.py \ tests/regression/scripts/helpers/nightly/test_report_builder.py -q` 2. 检查退出码与通过数结果： `206 passed, 7 warnings in 0.43s` ### symbol 校验 / exemption 漂移回归 `sh uv run pytest tests/regression/scripts/helpers/ci_gate/ \ tests/regression/scripts/helpers/common/test_ast_utils.py \ tests/regression/scripts/helpers/common/test_test_map_loader.py \ tests/regression/scripts/helpers/test_config.py -q` 结果： `239 passed, 8 warnings in 1.05s` ### test_map 同步入口目的：确认 `run_test_map_sync.sh` 可正常拉起 `sync.py` 步骤： 1. 设置 `MSMODELING_TEST_MAP_PATH` 指向有效 test_map JSON 2. 执行 `bash scripts/run_test_map_sync.sh --once` 结果：脚本入口与 `sync.py` CLI 已随 PR 提交；完整同步需 CI/nightly 环境提供有效 test_map 文件 See merge request: Ascend/msmodeling!394	6 天前
test_model_load.py	【同步】【非开发代码】代码从 develop 同步到 master Co-authored-by: yydyzr<liuyuncong1@huawei.com> Co-authored-by: gcw_61YBRfIt<chuzhenxing@huawei.com> Co-authored-by: 孔炳翔<1120200577@qq.com> Co-authored-by: zhengxinqian<qianzhengxin@huawei.com> Co-authored-by: hw_whx<wanghexiang7@huawei.com> Co-authored-by: jgong5<steven.gong@gmail.com> Co-authored-by: hw_whx<2952154980@qq.com> # message auto-generated for no-merge-commit merge: !330 merge master into master 【同步】【非开发代码】代码从 develop 同步到 master Created-by: AvadaKedavrua Commit-by: liujiawang;ascend-robot;AvadaKedavrua;lutean;Horacehxw;eveyin1;minghang_c;zwt__;tt0cool;elrond-g;jia_ya_nan;zhenyu_zhang;ChenHuiwen;wangshen001;Hudingyi;wendellX;Secluded_Ocean;jhon-117;yaohan404;jiangruitao;zhenghaojie;stormchasingg;panyj1993;cmh1056291129;yuyinkai1;sunguozhong;genius52;liu_jiaxu;HongMaoShuiGuai;zhengxinqian;weixin_43368449;jsez-li-bin;jgong5;wqh17101;w00609794;yydyzr;JieZhang679;sppedforcy;gcw_61YBRfIt;Jiong Gong;hw_whx;gongjiong;孔炳翔 Merged-by: ascend-robot Description: 代码从 develop 同步到 master，后续基于 master 演进，并支持打包 See merge request: Ascend/msmodeling!330	17 天前
test_model_source_security.py	fix(security): add model source safety checks Co-authored-by: jia_ya_nan<jiayanan3@h-partners.com> # message auto-generated for no-merge-commit merge: !385 merge fix/trust-remote-code-safety into master fix(security): add model source safety checks Created-by: jia_ya_nan Commit-by: jia_ya_nan Merged-by: ascend-robot Description: # PR Template Thanks for your contribution; we appreciate it a lot. The following instructions will make your pull request healthier and help you get feedback more easily. If you do not understand some items, don't worry, just make the pull request and seek help from maintainers. 感谢您的贡献，我们非常重视。以下说明将使您的拉取请求更健康，更易于获得反馈。如果您不理解某些项目，请不要担心，只需提交拉取请求并从维护人员那里寻求帮助即可。 PR Type / PR类型 - [ ] Feature（功能新增） - [x] Bugfix（Bug 修复） - [ ] Docs（文档更新） - [ ] CI/CD（持续集成/持续部署） - [ ] Refactor（代码重构） - [ ] Perf（性能优化） - [ ] Test-Cases（测试用例更新） - [x] Other（其他） ## 🔍 Motivation / 变更动机 Please describe the motivation of this PR and the goal you want to achieve through this PR. 请描述您的拉取请求的动机和您希望通过此拉取请求实现的目标。安全加固 ------ ## 📝 Modification / 修改内容 Please briefly describe what modification is made in this PR. 请简要描述此拉取请求中进行的修改。增加本地路径权限校验；增加日志风险提示去掉不维护的老接口 ------ ## 📐 Associated Test Results / 关联测试结果 Please provide the related test results, such as test reports, etc. 请提供相关测试结果，例如测试报告等。 ![image.png](https://raw.gitcode.com/user-images/assets/8428112/ef4f75a5-1346-4320-8de2-a19703ebedb3/image.png 'image.png') ------ ## 🌟 Use cases (Optional) / 使用案例（可选） If this PR introduces a new feature, it is better to list some use cases here and update the documentation. 如果此拉取请求引入了新功能，最好在此处列出一些用例并更新文档。 ------ ## ✅ Checklist / 检查列表 Before PR: - [x] Bug fixes are fully covered by unit tests, the case that causes the bug should be added in the unit tests. / 修复的 Bug 已完全由单元测试覆盖，导致 Bug 的情况应在单元测试中添加。 - [x] The modification is covered by complete unit tests. If not, please add more unit tests to ensure the correctness. / 此拉取请求中的修改已完全由单元测试覆盖。如果不是，请添加更多单元测试以确保正确性。 - [x] All relevant documentation (API docs, docstrings, example tutorials) has been updated to reflect these changes. / 所有相关文档（API 文档、文档字符串、示例教程）已更新以反映这些更改。 - [x] Please ensure code files contain no Chinese comments. / 请保证代码文件中不含中文注释。 ------ See merge request: Ascend/msmodeling!385	5 天前
test_mtp.py	【同步】【非开发代码】代码从 develop 同步到 master Co-authored-by: yydyzr<liuyuncong1@huawei.com> Co-authored-by: gcw_61YBRfIt<chuzhenxing@huawei.com> Co-authored-by: 孔炳翔<1120200577@qq.com> Co-authored-by: zhengxinqian<qianzhengxin@huawei.com> Co-authored-by: hw_whx<wanghexiang7@huawei.com> Co-authored-by: jgong5<steven.gong@gmail.com> Co-authored-by: hw_whx<2952154980@qq.com> # message auto-generated for no-merge-commit merge: !330 merge master into master 【同步】【非开发代码】代码从 develop 同步到 master Created-by: AvadaKedavrua Commit-by: liujiawang;ascend-robot;AvadaKedavrua;lutean;Horacehxw;eveyin1;minghang_c;zwt__;tt0cool;elrond-g;jia_ya_nan;zhenyu_zhang;ChenHuiwen;wangshen001;Hudingyi;wendellX;Secluded_Ocean;jhon-117;yaohan404;jiangruitao;zhenghaojie;stormchasingg;panyj1993;cmh1056291129;yuyinkai1;sunguozhong;genius52;liu_jiaxu;HongMaoShuiGuai;zhengxinqian;weixin_43368449;jsez-li-bin;jgong5;wqh17101;w00609794;yydyzr;JieZhang679;sppedforcy;gcw_61YBRfIt;Jiong Gong;hw_whx;gongjiong;孔炳翔 Merged-by: ascend-robot Description: 代码从 develop 同步到 master，后续基于 master 演进，并支持打包 See merge request: Ascend/msmodeling!330	17 天前
test_mtp_ep.py	【同步】【非开发代码】代码从 develop 同步到 master Co-authored-by: yydyzr<liuyuncong1@huawei.com> Co-authored-by: gcw_61YBRfIt<chuzhenxing@huawei.com> Co-authored-by: 孔炳翔<1120200577@qq.com> Co-authored-by: zhengxinqian<qianzhengxin@huawei.com> Co-authored-by: hw_whx<wanghexiang7@huawei.com> Co-authored-by: jgong5<steven.gong@gmail.com> Co-authored-by: hw_whx<2952154980@qq.com> # message auto-generated for no-merge-commit merge: !330 merge master into master 【同步】【非开发代码】代码从 develop 同步到 master Created-by: AvadaKedavrua Commit-by: liujiawang;ascend-robot;AvadaKedavrua;lutean;Horacehxw;eveyin1;minghang_c;zwt__;tt0cool;elrond-g;jia_ya_nan;zhenyu_zhang;ChenHuiwen;wangshen001;Hudingyi;wendellX;Secluded_Ocean;jhon-117;yaohan404;jiangruitao;zhenghaojie;stormchasingg;panyj1993;cmh1056291129;yuyinkai1;sunguozhong;genius52;liu_jiaxu;HongMaoShuiGuai;zhengxinqian;weixin_43368449;jsez-li-bin;jgong5;wqh17101;w00609794;yydyzr;JieZhang679;sppedforcy;gcw_61YBRfIt;Jiong Gong;hw_whx;gongjiong;孔炳翔 Merged-by: ascend-robot Description: 代码从 develop 同步到 master，后续基于 master 演进，并支持打包 See merge request: Ascend/msmodeling!330	17 天前
test_multistream_pass.py	【同步】【非开发代码】代码从 develop 同步到 master Co-authored-by: yydyzr<liuyuncong1@huawei.com> Co-authored-by: gcw_61YBRfIt<chuzhenxing@huawei.com> Co-authored-by: 孔炳翔<1120200577@qq.com> Co-authored-by: zhengxinqian<qianzhengxin@huawei.com> Co-authored-by: hw_whx<wanghexiang7@huawei.com> Co-authored-by: jgong5<steven.gong@gmail.com> Co-authored-by: hw_whx<2952154980@qq.com> # message auto-generated for no-merge-commit merge: !330 merge master into master 【同步】【非开发代码】代码从 develop 同步到 master Created-by: AvadaKedavrua Commit-by: liujiawang;ascend-robot;AvadaKedavrua;lutean;Horacehxw;eveyin1;minghang_c;zwt__;tt0cool;elrond-g;jia_ya_nan;zhenyu_zhang;ChenHuiwen;wangshen001;Hudingyi;wendellX;Secluded_Ocean;jhon-117;yaohan404;jiangruitao;zhenghaojie;stormchasingg;panyj1993;cmh1056291129;yuyinkai1;sunguozhong;genius52;liu_jiaxu;HongMaoShuiGuai;zhengxinqian;weixin_43368449;jsez-li-bin;jgong5;wqh17101;w00609794;yydyzr;JieZhang679;sppedforcy;gcw_61YBRfIt;Jiong Gong;hw_whx;gongjiong;孔炳翔 Merged-by: ascend-robot Description: 代码从 develop 同步到 master，后续基于 master 演进，并支持打包 See merge request: Ascend/msmodeling!330	17 天前
test_ops.py	【同步】【非开发代码】代码从 develop 同步到 master Co-authored-by: yydyzr<liuyuncong1@huawei.com> Co-authored-by: gcw_61YBRfIt<chuzhenxing@huawei.com> Co-authored-by: 孔炳翔<1120200577@qq.com> Co-authored-by: zhengxinqian<qianzhengxin@huawei.com> Co-authored-by: hw_whx<wanghexiang7@huawei.com> Co-authored-by: jgong5<steven.gong@gmail.com> Co-authored-by: hw_whx<2952154980@qq.com> # message auto-generated for no-merge-commit merge: !330 merge master into master 【同步】【非开发代码】代码从 develop 同步到 master Created-by: AvadaKedavrua Commit-by: liujiawang;ascend-robot;AvadaKedavrua;lutean;Horacehxw;eveyin1;minghang_c;zwt__;tt0cool;elrond-g;jia_ya_nan;zhenyu_zhang;ChenHuiwen;wangshen001;Hudingyi;wendellX;Secluded_Ocean;jhon-117;yaohan404;jiangruitao;zhenghaojie;stormchasingg;panyj1993;cmh1056291129;yuyinkai1;sunguozhong;genius52;liu_jiaxu;HongMaoShuiGuai;zhengxinqian;weixin_43368449;jsez-li-bin;jgong5;wqh17101;w00609794;yydyzr;JieZhang679;sppedforcy;gcw_61YBRfIt;Jiong Gong;hw_whx;gongjiong;孔炳翔 Merged-by: ascend-robot Description: 代码从 develop 同步到 master，后续基于 master 演进，并支持打包 See merge request: Ascend/msmodeling!330	17 天前
test_parallel_embedding.py	【同步】【非开发代码】代码从 develop 同步到 master Co-authored-by: yydyzr<liuyuncong1@huawei.com> Co-authored-by: gcw_61YBRfIt<chuzhenxing@huawei.com> Co-authored-by: 孔炳翔<1120200577@qq.com> Co-authored-by: zhengxinqian<qianzhengxin@huawei.com> Co-authored-by: hw_whx<wanghexiang7@huawei.com> Co-authored-by: jgong5<steven.gong@gmail.com> Co-authored-by: hw_whx<2952154980@qq.com> # message auto-generated for no-merge-commit merge: !330 merge master into master 【同步】【非开发代码】代码从 develop 同步到 master Created-by: AvadaKedavrua Commit-by: liujiawang;ascend-robot;AvadaKedavrua;lutean;Horacehxw;eveyin1;minghang_c;zwt__;tt0cool;elrond-g;jia_ya_nan;zhenyu_zhang;ChenHuiwen;wangshen001;Hudingyi;wendellX;Secluded_Ocean;jhon-117;yaohan404;jiangruitao;zhenghaojie;stormchasingg;panyj1993;cmh1056291129;yuyinkai1;sunguozhong;genius52;liu_jiaxu;HongMaoShuiGuai;zhengxinqian;weixin_43368449;jsez-li-bin;jgong5;wqh17101;w00609794;yydyzr;JieZhang679;sppedforcy;gcw_61YBRfIt;Jiong Gong;hw_whx;gongjiong;孔炳翔 Merged-by: ascend-robot Description: 代码从 develop 同步到 master，后续基于 master 演进，并支持打包 See merge request: Ascend/msmodeling!330	17 天前
test_parallel_linear.py	【同步】【非开发代码】代码从 develop 同步到 master Co-authored-by: yydyzr<liuyuncong1@huawei.com> Co-authored-by: gcw_61YBRfIt<chuzhenxing@huawei.com> Co-authored-by: 孔炳翔<1120200577@qq.com> Co-authored-by: zhengxinqian<qianzhengxin@huawei.com> Co-authored-by: hw_whx<wanghexiang7@huawei.com> Co-authored-by: jgong5<steven.gong@gmail.com> Co-authored-by: hw_whx<2952154980@qq.com> # message auto-generated for no-merge-commit merge: !330 merge master into master 【同步】【非开发代码】代码从 develop 同步到 master Created-by: AvadaKedavrua Commit-by: liujiawang;ascend-robot;AvadaKedavrua;lutean;Horacehxw;eveyin1;minghang_c;zwt__;tt0cool;elrond-g;jia_ya_nan;zhenyu_zhang;ChenHuiwen;wangshen001;Hudingyi;wendellX;Secluded_Ocean;jhon-117;yaohan404;jiangruitao;zhenghaojie;stormchasingg;panyj1993;cmh1056291129;yuyinkai1;sunguozhong;genius52;liu_jiaxu;HongMaoShuiGuai;zhengxinqian;weixin_43368449;jsez-li-bin;jgong5;wqh17101;w00609794;yydyzr;JieZhang679;sppedforcy;gcw_61YBRfIt;Jiong Gong;hw_whx;gongjiong;孔炳翔 Merged-by: ascend-robot Description: 代码从 develop 同步到 master，后续基于 master 演进，并支持打包 See merge request: Ascend/msmodeling!330	17 天前
test_parallel_moe.py	【同步】【非开发代码】代码从 develop 同步到 master Co-authored-by: yydyzr<liuyuncong1@huawei.com> Co-authored-by: gcw_61YBRfIt<chuzhenxing@huawei.com> Co-authored-by: 孔炳翔<1120200577@qq.com> Co-authored-by: zhengxinqian<qianzhengxin@huawei.com> Co-authored-by: hw_whx<wanghexiang7@huawei.com> Co-authored-by: jgong5<steven.gong@gmail.com> Co-authored-by: hw_whx<2952154980@qq.com> # message auto-generated for no-merge-commit merge: !330 merge master into master 【同步】【非开发代码】代码从 develop 同步到 master Created-by: AvadaKedavrua Commit-by: liujiawang;ascend-robot;AvadaKedavrua;lutean;Horacehxw;eveyin1;minghang_c;zwt__;tt0cool;elrond-g;jia_ya_nan;zhenyu_zhang;ChenHuiwen;wangshen001;Hudingyi;wendellX;Secluded_Ocean;jhon-117;yaohan404;jiangruitao;zhenghaojie;stormchasingg;panyj1993;cmh1056291129;yuyinkai1;sunguozhong;genius52;liu_jiaxu;HongMaoShuiGuai;zhengxinqian;weixin_43368449;jsez-li-bin;jgong5;wqh17101;w00609794;yydyzr;JieZhang679;sppedforcy;gcw_61YBRfIt;Jiong Gong;hw_whx;gongjiong;孔炳翔 Merged-by: ascend-robot Description: 代码从 develop 同步到 master，后续基于 master 演进，并支持打包 See merge request: Ascend/msmodeling!330	17 天前
test_parameterized_pytest_param_compat.py	【同步】【非开发代码】代码从 develop 同步到 master Co-authored-by: yydyzr<liuyuncong1@huawei.com> Co-authored-by: gcw_61YBRfIt<chuzhenxing@huawei.com> Co-authored-by: 孔炳翔<1120200577@qq.com> Co-authored-by: zhengxinqian<qianzhengxin@huawei.com> Co-authored-by: hw_whx<wanghexiang7@huawei.com> Co-authored-by: jgong5<steven.gong@gmail.com> Co-authored-by: hw_whx<2952154980@qq.com> # message auto-generated for no-merge-commit merge: !330 merge master into master 【同步】【非开发代码】代码从 develop 同步到 master Created-by: AvadaKedavrua Commit-by: liujiawang;ascend-robot;AvadaKedavrua;lutean;Horacehxw;eveyin1;minghang_c;zwt__;tt0cool;elrond-g;jia_ya_nan;zhenyu_zhang;ChenHuiwen;wangshen001;Hudingyi;wendellX;Secluded_Ocean;jhon-117;yaohan404;jiangruitao;zhenghaojie;stormchasingg;panyj1993;cmh1056291129;yuyinkai1;sunguozhong;genius52;liu_jiaxu;HongMaoShuiGuai;zhengxinqian;weixin_43368449;jsez-li-bin;jgong5;wqh17101;w00609794;yydyzr;JieZhang679;sppedforcy;gcw_61YBRfIt;Jiong Gong;hw_whx;gongjiong;孔炳翔 Merged-by: ascend-robot Description: 代码从 develop 同步到 master，后续基于 master 演进，并支持打包 See merge request: Ascend/msmodeling!330	17 天前
test_pattern_match.py	【同步】【非开发代码】代码从 develop 同步到 master Co-authored-by: yydyzr<liuyuncong1@huawei.com> Co-authored-by: gcw_61YBRfIt<chuzhenxing@huawei.com> Co-authored-by: 孔炳翔<1120200577@qq.com> Co-authored-by: zhengxinqian<qianzhengxin@huawei.com> Co-authored-by: hw_whx<wanghexiang7@huawei.com> Co-authored-by: jgong5<steven.gong@gmail.com> Co-authored-by: hw_whx<2952154980@qq.com> # message auto-generated for no-merge-commit merge: !330 merge master into master 【同步】【非开发代码】代码从 develop 同步到 master Created-by: AvadaKedavrua Commit-by: liujiawang;ascend-robot;AvadaKedavrua;lutean;Horacehxw;eveyin1;minghang_c;zwt__;tt0cool;elrond-g;jia_ya_nan;zhenyu_zhang;ChenHuiwen;wangshen001;Hudingyi;wendellX;Secluded_Ocean;jhon-117;yaohan404;jiangruitao;zhenghaojie;stormchasingg;panyj1993;cmh1056291129;yuyinkai1;sunguozhong;genius52;liu_jiaxu;HongMaoShuiGuai;zhengxinqian;weixin_43368449;jsez-li-bin;jgong5;wqh17101;w00609794;yydyzr;JieZhang679;sppedforcy;gcw_61YBRfIt;Jiong Gong;hw_whx;gongjiong;孔炳翔 Merged-by: ascend-robot Description: 代码从 develop 同步到 master，后续基于 master 演进，并支持打包 See merge request: Ascend/msmodeling!330	17 天前
test_quant_attention.py	【同步】【非开发代码】代码从 develop 同步到 master Co-authored-by: yydyzr<liuyuncong1@huawei.com> Co-authored-by: gcw_61YBRfIt<chuzhenxing@huawei.com> Co-authored-by: 孔炳翔<1120200577@qq.com> Co-authored-by: zhengxinqian<qianzhengxin@huawei.com> Co-authored-by: hw_whx<wanghexiang7@huawei.com> Co-authored-by: jgong5<steven.gong@gmail.com> Co-authored-by: hw_whx<2952154980@qq.com> # message auto-generated for no-merge-commit merge: !330 merge master into master 【同步】【非开发代码】代码从 develop 同步到 master Created-by: AvadaKedavrua Commit-by: liujiawang;ascend-robot;AvadaKedavrua;lutean;Horacehxw;eveyin1;minghang_c;zwt__;tt0cool;elrond-g;jia_ya_nan;zhenyu_zhang;ChenHuiwen;wangshen001;Hudingyi;wendellX;Secluded_Ocean;jhon-117;yaohan404;jiangruitao;zhenghaojie;stormchasingg;panyj1993;cmh1056291129;yuyinkai1;sunguozhong;genius52;liu_jiaxu;HongMaoShuiGuai;zhengxinqian;weixin_43368449;jsez-li-bin;jgong5;wqh17101;w00609794;yydyzr;JieZhang679;sppedforcy;gcw_61YBRfIt;Jiong Gong;hw_whx;gongjiong;孔炳翔 Merged-by: ascend-robot Description: 代码从 develop 同步到 master，后续基于 master 演进，并支持打包 See merge request: Ascend/msmodeling!330	17 天前
test_quant_config.py	【同步】【非开发代码】代码从 develop 同步到 master Co-authored-by: yydyzr<liuyuncong1@huawei.com> Co-authored-by: gcw_61YBRfIt<chuzhenxing@huawei.com> Co-authored-by: 孔炳翔<1120200577@qq.com> Co-authored-by: zhengxinqian<qianzhengxin@huawei.com> Co-authored-by: hw_whx<wanghexiang7@huawei.com> Co-authored-by: jgong5<steven.gong@gmail.com> Co-authored-by: hw_whx<2952154980@qq.com> # message auto-generated for no-merge-commit merge: !330 merge master into master 【同步】【非开发代码】代码从 develop 同步到 master Created-by: AvadaKedavrua Commit-by: liujiawang;ascend-robot;AvadaKedavrua;lutean;Horacehxw;eveyin1;minghang_c;zwt__;tt0cool;elrond-g;jia_ya_nan;zhenyu_zhang;ChenHuiwen;wangshen001;Hudingyi;wendellX;Secluded_Ocean;jhon-117;yaohan404;jiangruitao;zhenghaojie;stormchasingg;panyj1993;cmh1056291129;yuyinkai1;sunguozhong;genius52;liu_jiaxu;HongMaoShuiGuai;zhengxinqian;weixin_43368449;jsez-li-bin;jgong5;wqh17101;w00609794;yydyzr;JieZhang679;sppedforcy;gcw_61YBRfIt;Jiong Gong;hw_whx;gongjiong;孔炳翔 Merged-by: ascend-robot Description: 代码从 develop 同步到 master，后续基于 master 演进，并支持打包 See merge request: Ascend/msmodeling!330	17 天前
test_quant_linear.py	【同步】【非开发代码】代码从 develop 同步到 master Co-authored-by: yydyzr<liuyuncong1@huawei.com> Co-authored-by: gcw_61YBRfIt<chuzhenxing@huawei.com> Co-authored-by: 孔炳翔<1120200577@qq.com> Co-authored-by: zhengxinqian<qianzhengxin@huawei.com> Co-authored-by: hw_whx<wanghexiang7@huawei.com> Co-authored-by: jgong5<steven.gong@gmail.com> Co-authored-by: hw_whx<2952154980@qq.com> # message auto-generated for no-merge-commit merge: !330 merge master into master 【同步】【非开发代码】代码从 develop 同步到 master Created-by: AvadaKedavrua Commit-by: liujiawang;ascend-robot;AvadaKedavrua;lutean;Horacehxw;eveyin1;minghang_c;zwt__;tt0cool;elrond-g;jia_ya_nan;zhenyu_zhang;ChenHuiwen;wangshen001;Hudingyi;wendellX;Secluded_Ocean;jhon-117;yaohan404;jiangruitao;zhenghaojie;stormchasingg;panyj1993;cmh1056291129;yuyinkai1;sunguozhong;genius52;liu_jiaxu;HongMaoShuiGuai;zhengxinqian;weixin_43368449;jsez-li-bin;jgong5;wqh17101;w00609794;yydyzr;JieZhang679;sppedforcy;gcw_61YBRfIt;Jiong Gong;hw_whx;gongjiong;孔炳翔 Merged-by: ascend-robot Description: 代码从 develop 同步到 master，后续基于 master 演进，并支持打包 See merge request: Ascend/msmodeling!330	17 天前
test_quantization_config_create.py	【Bugfix】修复deepseek-v4部分量化场景下的问题 Co-authored-by: ChenHuiwen<chenhuiwen7@huawei.com> # message auto-generated for no-merge-commit merge: !373 merge fix-ds-v4-quant into master 【Bugfix】修复deepseek-v4部分量化场景下的问题 Created-by: ChenHuiwen Commit-by: ChenHuiwen Merged-by: ascend-robot Description: # PR Template Thanks for your contribution; we appreciate it a lot. The following instructions will make your pull request healthier and help you get feedback more easily. If you do not understand some items, don't worry, just make the pull request and seek help from maintainers. 感谢您的贡献，我们非常重视。以下说明将使您的拉取请求更健康，更易于获得反馈。如果您不理解某些项目，请不要担心，只需提交拉取请求并从维护人员那里寻求帮助即可。 PR Type / PR类型 - [ ] Feature（功能新增） - [x] Bugfix（Bug 修复） - [ ] Docs（文档更新） - [ ] CI/CD（持续集成/持续部署） - [ ] Refactor（代码重构） - [ ] Perf（性能优化） - [x] Test-Cases（测试用例更新） - [ ] Other（其他） ## 🔍 Motivation / 变更动机 This PR fixes DeepSeek V4 quantization and tensor-parallel execution issues found in `text_generate` simulation. 此 PR 修复 DeepSeek V4 在 `text_generate` 仿真中与量化和张量并行相关的问题。 Specifically, the old `backbone` quantization override naming was ambiguous because the option is used to configure non-routed-expert linear layers such as attention projections, dense MLP layers, and shared experts. This PR renames it to `non-expert` to better match the actual behavior. 同时，DeepSeek V4 Flash/Pro 的 O projection path has model-specific grouped projection behavior. Under W4A8 quantization or high TP configurations, TensorCast previously could hit shape mismatches in `wo_a` / `wo_b` modeling. This PR keeps the modeled path aligned with the real DeepSeek V4 structure while avoiding invalid reshape or double-sharding behavior. ## 📝 Modification / 修改内容 - Rename quantization override terminology: - `--quantize-backbone-linear-action` -> `--quantize-non-expert-linear-action` - `quantize_backbone_linear_action` -> `quantize_non_expert_linear_action` - Update CLI help text, `UserInputConfig`, quantization config creation, and related regression tests. - Update non-expert quantization config patterns: - Rename `_BACKBONE_LINEAR_PATTERNS` to `_NON_EXPERT_LINEAR_PATTERNS`. - Keep routed MoE experts controlled by the broad `--quantize-linear-action` setting. - Keep attention, dense MLP, and shared-expert layers covered by the non-expert override. - Fix DeepSeek V4 W4A8 `wo_a` grouped projection: - When `wo_a` is quantized as W4A8, unpack int4 packed `qweight` back to its logical weight shape before the grouped einsum reshape. - Avoid using stale `in_features/out_features` values after TP wrapping. - Fix DeepSeek V4 O projection TP sharding: - Remove the duplicate V4-specific `self_attn.o_proj` TP rule. - Let the generic `o_proj` RowParallel rule handle `wo_b/o_proj` once. - Prevent `o_proj` from being sharded twice, which caused local input dim mismatch. - Add/update regression tests: - Cover W4A8 `wo_a` logical weight shape before grouped einsum. - Update DeepSeek V4 TP plan expectations. - Update quantization config and user config tests for the new non-expert terminology. ------ ## 📐 Associated Test Results / 关联测试结果 Please provide the related test results, such as test reports, etc. 请提供相关测试结果，例如测试报告等。 ![image.png](https://raw.gitcode.com/user-images/assets/8428112/554930ff-0798-4331-8131-355e9d34c759/image.png 'image.png') ![image.png](https://raw.gitcode.com/user-images/assets/8428112/de48f376-7feb-49b1-96ff-daa94228f25a/image.png 'image.png') ------ ## 🌟 Use cases (Optional) / 使用案例（可选） If this PR introduces a new feature, it is better to list some use cases here and update the documentation. 如果此拉取请求引入了新功能，最好在此处列出一些用例并更新文档。 ------ ## ✅ Checklist / 检查列表 Before PR: - [ ] Bug fixes are fully covered by unit tests, the case that causes the bug should be added in the unit tests. / 修复的 Bug 已完全由单元测试覆盖，导致 Bug 的情况应在单元测试中添加。 - [ ] The modification is covered by complete unit tests. If not, please add more unit tests to ensure the correctness. / 此拉取请求中的修改已完全由单元测试覆盖。如果不是，请添加更多单元测试以确保正确性。 - [ ] All relevant documentation (API docs, docstrings, example tutorials) has been updated to reflect these changes. / 所有相关文档（API 文档、文档字符串、示例教程）已更新以反映这些更改。 - [ ] Please ensure code files contain no Chinese comments. / 请保证代码文件中不含中文注释。 ------ See merge request: Ascend/msmodeling!373	13 天前
test_repetition.py	【同步】【非开发代码】代码从 develop 同步到 master Co-authored-by: yydyzr<liuyuncong1@huawei.com> Co-authored-by: gcw_61YBRfIt<chuzhenxing@huawei.com> Co-authored-by: 孔炳翔<1120200577@qq.com> Co-authored-by: zhengxinqian<qianzhengxin@huawei.com> Co-authored-by: hw_whx<wanghexiang7@huawei.com> Co-authored-by: jgong5<steven.gong@gmail.com> Co-authored-by: hw_whx<2952154980@qq.com> # message auto-generated for no-merge-commit merge: !330 merge master into master 【同步】【非开发代码】代码从 develop 同步到 master Created-by: AvadaKedavrua Commit-by: liujiawang;ascend-robot;AvadaKedavrua;lutean;Horacehxw;eveyin1;minghang_c;zwt__;tt0cool;elrond-g;jia_ya_nan;zhenyu_zhang;ChenHuiwen;wangshen001;Hudingyi;wendellX;Secluded_Ocean;jhon-117;yaohan404;jiangruitao;zhenghaojie;stormchasingg;panyj1993;cmh1056291129;yuyinkai1;sunguozhong;genius52;liu_jiaxu;HongMaoShuiGuai;zhengxinqian;weixin_43368449;jsez-li-bin;jgong5;wqh17101;w00609794;yydyzr;JieZhang679;sppedforcy;gcw_61YBRfIt;Jiong Gong;hw_whx;gongjiong;孔炳翔 Merged-by: ascend-robot Description: 代码从 develop 同步到 master，后续基于 master 演进，并支持打包 See merge request: Ascend/msmodeling!330	17 天前
test_repetition_wrappers.py	【同步】【非开发代码】代码从 develop 同步到 master Co-authored-by: yydyzr<liuyuncong1@huawei.com> Co-authored-by: gcw_61YBRfIt<chuzhenxing@huawei.com> Co-authored-by: 孔炳翔<1120200577@qq.com> Co-authored-by: zhengxinqian<qianzhengxin@huawei.com> Co-authored-by: hw_whx<wanghexiang7@huawei.com> Co-authored-by: jgong5<steven.gong@gmail.com> Co-authored-by: hw_whx<2952154980@qq.com> # message auto-generated for no-merge-commit merge: !330 merge master into master 【同步】【非开发代码】代码从 develop 同步到 master Created-by: AvadaKedavrua Commit-by: liujiawang;ascend-robot;AvadaKedavrua;lutean;Horacehxw;eveyin1;minghang_c;zwt__;tt0cool;elrond-g;jia_ya_nan;zhenyu_zhang;ChenHuiwen;wangshen001;Hudingyi;wendellX;Secluded_Ocean;jhon-117;yaohan404;jiangruitao;zhenghaojie;stormchasingg;panyj1993;cmh1056291129;yuyinkai1;sunguozhong;genius52;liu_jiaxu;HongMaoShuiGuai;zhengxinqian;weixin_43368449;jsez-li-bin;jgong5;wqh17101;w00609794;yydyzr;JieZhang679;sppedforcy;gcw_61YBRfIt;Jiong Gong;hw_whx;gongjiong;孔炳翔 Merged-by: ascend-robot Description: 代码从 develop 同步到 master，后续基于 master 演进，并支持打包 See merge request: Ascend/msmodeling!330	17 天前
test_runtime.py	perf(tensor_cast): refine sparse attention roofline Model sparse MLA and dsa_indexer paged-cache traffic with calibrated data-movement efficiency so operator and end-to-end estimates align with GLM-5.1 profiling targets. Signed-off-by: minghang_c <chiminghang@h-partners.com> Co-authored-by: minghang_c<chiminghang@h-partners.com> # message auto-generated for no-merge-commit merge: !421 merge develop-on-upstream-master into master perf(tensor_cast): refine sparse attention roofline Created-by: minghang_c Commit-by: minghang_c Merged-by: ascend-robot Description: Thanks for your contribution; we appreciate it a lot. The following instructions will make your pull request healthier and help you get feedback more easily. If you do not understand some items, don't worry, just make the pull request and seek help from maintainers. 感谢您的贡献，我们非常重视。以下说明将使您的拉取请求更健康，更易于获得反馈。如果您不理解某些项目，请不要担心，只需提交拉取请求并从维护人员那里寻求帮助即可。 PR Type / PR类型 - [ ] Feature（功能新增） - [ ] Bugfix（Bug 修复） - [ ] Docs（文档更新） - [ ] CI/CD（持续集成/持续部署） - [ ] Refactor（代码重构） - [x] Perf（性能优化） - [x] Test-Cases（测试用例更新） - [ ] Other（其他） ## 🔍 Motivation / 变更动机 Refine TensorCast roofline modeling for sparse MLA, `dsa_indexer`, and GLM-5-series W4A8 MLA preprocessing so sparse-attention estimates better match operator profiling and end-to-end latency targets while keeping the model based on explicit data-movement and compute-efficiency assumptions. The main modeling gap is that sparse MLA KV reads and `dsa_indexer` historical-cache reads are dominated by random/paged memory access. Treating those bytes as ideal contiguous bandwidth traffic makes the analytic roofline too optimistic, especially for long-context GLM-5.1 prefill/decode scenarios. The latest GLM-5.1 W4A8 validation also showed that `mlapo_quant` needs to model packed W4 weights carefully: the tensor storage dtype is `torch.uint8`, but the logical MMA throughput should follow the INT8 compute path used by existing grouped quant matmul modeling. Otherwise the trace can report `mlapo_quant` MMA time as zero even though the op has nonzero projection MMA work. ------ ## 📝 Modification / 修改内容 - Add sparse/paged KV traffic accounting for MLA with separate decode and prefill data-movement efficiency. - Add `dsa_indexer` historical cache read efficiency modeling and separate append cache/scale write traffic. - Keep `dsa_indexer` block-table traffic covered by generic input memory accounting instead of a separate operator-specific model. - Use decode-only sparse page count for mixed prefill/decode sparse MLA batches. - Use raw sparse-index bytes in the quant/physical MLA path so physical KV/block-table/sparse-index accounting is consistent. - Tighten `dsa_indexer` helper signatures so `request_total_seq_lens` is required where the model depends on it. - Keep generic `tensor_cast.attention.default` accounting unchanged, so non-MLA attention models do not inherit sparse-attention calibration. - Extend GLM-5-series compile handling to cover both `GLM-5` and `GLM-5.1`, while excluding `GLM-5.2` because its config has meaningful indexer/long-context differences. - Refine `mlapo_quant` W4A8 modeling so packed `torch.uint8` weights use the logical INT8 MMA throughput path instead of losing MMA time in trace/statistics. - Add `mlapo`/`mlapo_quant` intermediate memory and static-cost accounting for the fused MLA preprocessing path. - Update related performance-model tests for sparse memory breakdowns and `mlapo`/`mlapo_quant` modeling behavior. ------ ## 📐 Associated Test Results / 关联测试结果 - `uvx --python .venv/bin/python pre-commit run --files tensor_cast/performance_model/__init__.py tests/regression/tensor_cast/test_runtime.py` - Passed after auto-format rerun. - `uv run --group ci --with socksio python -m unittest tests.benchmark.models.test_model_regression` - Log: `/tmp/msmodeling_model_regression_develop_after_pick.log` - `Ran 15 tests in 42.029s` - `OK` - `Total Cases: 15 \| Passed: 15 \| Failed: 0 \| No Baseline: 0` - `* All Operator Checks Passed ` - GLM-5.1 e2e validation across 10 query/context scenarios from 3.5k to 128k after the latest `mlapo_quant` W4A8 modeling update: - Log: `/tmp/msmodeling_glm51_e2e_after_user_change_rerun3.log` - `e2e_count=10` - `mean_e2e_err=28.717478%`, meeting the `≤30%` target. - Earlier GLM-5.1 sparse-attention e2e validation across the same 10 scenarios: - Log: `/tmp/msmodeling_glm51_e2e_26_1_0_latest.log` - `e2e_count=10` - `mean_e2e_err=27.678365%`, meeting the `≤30%` target. - GLM-5 e2e validation after applying the GLM-5-series compile override: - Log: `/tmp/msmodeling_glm5_e2e_with_glm5_override.log` - `e2e_count=10` - `mean_e2e_err=27.678365%`, matching the GLM-5.1 run with the same parameters. - Operator-level validation from the sparse MLA / `dsa_indexer` profiling set: - `mean_operator_err = 6.487008%` - `max_operator_err = 18.658699%` - Meets the `≤20%` target. - Issue #103 2.5K GLM-5.1 scenario: - Prefill analytic result: old roofline `182.377 ms` → new roofline `631.874 ms`; real wall `1225.849 ms`; new roofline/wall `51.55%`. - Decode analytic result: old roofline `48.685 ms` → new roofline `103.071 ms`; real wall `82.528 ms`; new roofline/wall `124.89%`. - Decode compared with kernel sum: new roofline `103.071 ms` vs kernel sum `117.158 ms`, ratio `87.97%`. ------ ## 🌟 Use cases (Optional) / 使用案例（可选） GLM-5.1 sparse attention inference latency estimation for prefill and decode scenarios from 3.5k to 128k context length. The latest e2e analytic results were validated with: `bash .venv/bin/python -m cli.inference.text_generate zai-org/GLM-5.1 \ --device ATLAS_800_A3_752T_128G_DIE \ --num-devices 16 \ --tp-size 16 \ --dp-size 1 \ --ep-size 16 \ --num-queries 1 \ --num-mtp-tokens 3 \ --compile \ --quantize-linear-action W4A8_STATIC \ --dump-input-shapes \ --context-length <context> \ --query-length <query>` \| Scenario \| Query length \| Context length \| Target latency \| Analytic latency \| Relative error \| \|---\|---:\|---:\|---:\|---:\|---:\| \| 3.5k-prefill \| 3500 \| 0 \| `1553.21 ms` \| `1010.00 ms` \| `34.9734%` \| \| 3.5k-decode \| 4 \| 3500 \| `69.90 ms` \| `44.79 ms` \| `35.9270%` \| \| 16k-prefill \| 4096 \| 12000 \| `1867.68 ms` \| `1449.00 ms` \| `22.4171%` \| \| 16k-decode \| 4 \| 16000 \| `68.10 ms` \| `47.22 ms` \| `30.6637%` \| \| 32k-prefill \| 4096 \| 28000 \| `2295.99 ms` \| `1807.00 ms` \| `21.2976%` \| \| 32k-decode \| 4 \| 32000 \| `68.70 ms` \| `47.76 ms` \| `30.4862%` \| \| 64k-prefill \| 4096 \| 60000 \| `3256.48 ms` \| `2522.00 ms` \| `22.5544%` \| \| 64k-decode \| 4 \| 64000 \| `71.70 ms` \| `49.63 ms` \| `30.7768%` \| \| 128k-prefill \| 4096 \| 124000 \| `5341.23 ms` \| `3952.00 ms` \| `26.0096%` \| \| 128k-decode \| 4 \| 128000 \| `78.30 ms` \| `53.19 ms` \| `32.0690%` \| `mean_e2e_err=28.717478%` ------ ## ✅ Checklist / 检查列表 Before PR*: - [ ] Bug fixes are fully covered by unit tests, the case that causes the bug should be added in the unit tests. / 修复的 Bug 已完全由单元测试覆盖，导致 Bug 的情况应在单元测试中添加。 - [x] The modification is covered by validation runs and targeted regression coverage. / 此拉取请求中的修改已通过验证用例和定向回归覆盖。 - [x] All relevant documentation (API docs, docstrings, example tutorials) has been updated to reflect these changes. / 所有相关文档（API 文档、文档字符串、示例教程）已更新以反映这些更改。 - [x] Please ensure code files contain no Chinese comments. / 请保证代码文件中不含中文注释。 ------ See merge request: Ascend/msmodeling!421	2 天前
test_sampler.py	fix(tensor_cast): model MTP speculative decode shapes Co-authored-by: Horacehxw<horacehxw@gmail.com> # message auto-generated for no-merge-commit merge: !362 merge resolve-issue-130 into master fix(tensor_cast): model MTP speculative decode shapes Created-by: Horacehxw Commit-by: Horacehxw Merged-by: ascend-robot Description: PR Type / PR类型 - [x] Bugfix（Bug 修复） - [x] Refactor（代码重构） - [x] Test-Cases（测试用例更新） ## 🔍 Motivation / 变更动机 MTP（Multi-Token Prediction）speculative decode 在 TensorCast 仿真中，lm_head 和 sampler 的 row selection 逻辑存在问题： 1. target/proposal rows 混淆：MTP decode 时，lm_head 应该只处理 spec window 内的 target+bonus verification rows，而不是全量 packed rows。旧逻辑用 `selected_token_indices` 做 prefill 级的 token 裁剪，但无法区分 target 和 proposal，导致 lm_head 多算了不参与 verification 的行。 2. Sampler 不支持 spec decode 输出格式：旧 Sampler 只做 greedy argmax 取最后一个 token，不支持返回 `(num_requests, num_speculative_tokens + 1)` 形状的 target+bonus tokens。 3. Kimi K2.5 MTP path 全量过 lm_head：Kimi 的 monkey-patch 在 MTP text path 里把全部 hidden states 送进 lm_head（163840 vocab），prefill 时 12×7168×163840 的矩阵乘法被放大 ~3500×。 ------ ## 📝 Modification / 修改内容 ### 核心：统一 row selection 路径 - 新增 `SpecDecodeMetadata` dataclass，记录每个 batch 的 `logits_indices`、`num_active_requests`、`num_speculative_tokens`。 - 新增 `select_lm_head_hidden_states(hidden_states, sampling_metadata, mode)`： - `mode="target"`：选 verification window 全部行（给 lm_head 用） - `mode="proposal"`：只选每个 request 的最后一行（给 MTP predictor 用） - `CausalLmWrapper`、`VLModelWrapper`、`MultiTokenPredictor` 统一调用该函数，替代原来散落各处的 `index_select`。 ### Input Generator - `generate_inputs` / `generate_inputs_varlen` 在 MTP decode 时构造 `SpecDecodeMetadata`，只覆盖每个 request 尾部的 spec window rows。 - 短窗口（query_len < num_mtp_tokens + 1）自动 fallback 到普通 decode selection。 ### Sampler - 识别 `spec_decode_metadata` 后，将 verification logits reshape 为 `(num_requests, spec_window, vocab)`，分别对 target 和 bonus 做 greedy argmax，返回 `(num_requests, spec_window)` 形状。 - 兼容 proposal rows（MTP 后续层）和旧的 `selected_token_indices` prefill 路径。 ### Kimi K2.5 - MTP text path 拆分：先跑 language model body 拿到 full hidden states（rotary/proposal 需要），再用 `select_lm_head_hidden_states` 裁 target rows 后过 lm_head。避免 163840-vocab 的全量投影。 - 删除旧的 `MultiTokenPredictorLayer` tuple-unpack monkey patch（已在上游 mtp.py 修复）。 ------ ## 📐 Associated Test Results / 关联测试结果 `pytest: 26 passed, 2 warnings in 3.72s` 覆盖： - target/proposal row selection 不混用 - default `selected_token_indices=-1` sentinel 被正确忽略 - wrong logits_indices length → ValueError - spec-decode sampler 返回 target+bonus tokens - CausalLmWrapper target row 投影 - MTP wrapper prefill fallback / spec-decode bonus token forward - fixed/varlen input generator MTP metadata 生成 - Kimi text path target rows 在 internal lm_head 前裁剪 - Kimi default sentinel 不触发 fast path ------ ## ✅ Checklist / 检查列表 - [x] Linting tools used / 使用 lintrunner 工具 - [x] Bug fixes covered by unit tests / 修复的 Bug 已由单元测试覆盖 - [x] Modification covered by unit tests / 修改已由单元测试覆盖 - [ ] Documentation updated / 文档已更新 - [x] No Chinese comments in code files / 代码文件中不含中文注释 See merge request: Ascend/msmodeling!362	3 天前
test_sequence_parallel_pass.py	optimize memory peak for servingcast & support model_config from tensorcast Co-authored-by: stormchasingg<sh_ding@zju.edu.cn> # message auto-generated for no-merge-commit merge: !360 merge enhance-servingcast into master optimize memory peak for servingcast & support model_config from tensorcast Created-by: stormchasingg Commit-by: stormchasingg Merged-by: ascend-robot Description: # PR Template Thanks for your contribution; we appreciate it a lot. The following instructions will make your pull request healthier and help you get feedback more easily. If you do not understand some items, don't worry, just make the pull request and seek help from maintainers. 感谢您的贡献，我们非常重视。以下说明将使您的拉取请求更健康，更易于获得反馈。如果您不理解某些项目，请不要担心，只需提交拉取请求并从维护人员那里寻求帮助即可。 PR Type / PR类型 - [x] Feature（功能新增） - [ ] Bugfix（Bug 修复） - [ ] Docs（文档更新） - [ ] CI/CD（持续集成/持续部署） - [ ] Refactor（代码重构） - [ ] Perf（性能优化） - [x] Test-Cases（测试用例更新） - [ ] Other（其他） ## 🔍 Motivation / 变更动机 This PR aligns TensorCast/ServingCast throughput simulation with vLLM-Ascend MoE optimization behavior, especially for shared expert tensor parallelism, sequence parallel configuration, and fused MoE communication paths. 本 PR 旨在使 ServingCast 的吞吐仿真配置与 TensorCast 保持一致，尤其是 shared expert TP、sequence parallel 配置以及 fused MoE 通信路径相关行为。 ------ ## 📝 Modification / 修改内容 - Add throughput optimizer options for shared expert TP, sequence parallel, word embedding TP mode, and chrome trace output. - Propagate optimizer CLI options into `UserInputConfig` and per-parallel-search model runner configs. - Apply sequence-parallel compilation configuration inside each parallel runner task. - Add TP/DP suffixes to chrome trace filenames to avoid overwriting trace files across parallel search candidates. - Adjust MoE shared expert TP execution to decrease memory peak in servingcast. - Enable dispatch-FFN-combine fusion by default in compilation config. ------ ## 📐 Associated Test Results / 关联测试结果略。 Test coverage included: None. ------ ## 🌟 Use cases (Optional) / 使用案例（可选） This change is useful when evaluating MoE models with vLLM-style shared expert TP and sequence parallel optimizations, and when collecting chrome traces for multiple TP/DP candidates in one throughput search. `python3 -m cli.inference.throughput_optimizer $dense_model_path \ --device ATLAS_800_A3_752T_128G_DIE \ --num-devices 16 \ --input-length 4096 \ --output-length 1 \ --compile \ --tp-sizes 8 16 \ --batch-range 16 16 \ --enable-sequence-parallel \ --word-embedding-tp row \ --quantize-linear-action DISABLED \ --ttft-limits 2000 \ --log-level info \ 2>&1 \| tee ./run_sc_1.log` `python3 -m cli.inference.throughput_optimizer $moe_model_path \ --device ATLAS_800_A3_752T_128G_DIE \ --num-devices 16 \ --input-length 4096 \ --output-length 1 \ --compile \ --quantize-linear-action W8A8_STATIC \ --disagg \ --ttft-limits 2000 \ --tp-sizes 8 16 \ --batch-range 4 4 \ --reserved-memory-gb 10 \ --enable-shared-expert-tp \ --word-embedding-tp row \ --chrome-trace trace_decode.json \ --log-level info \ 2>&1 \| tee ./run_sc3_2.log` ------ ## ✅ Checklist / 检查列表 Before PR: - [ ] Bug fixes are fully covered by unit tests, the case that causes the bug should be added in the unit tests. / 修复的 Bug 已完全由单元测试覆盖，导致 Bug 的情况应在单元测试中添加。 - [x] The modification is covered by complete unit tests. If not, please add more unit tests to ensure the correctness. / 此拉取请求中的修改已完全由单元测试覆盖。如果不是，请添加更多单元测试以确保正确性。 - [ ] All relevant documentation (API docs, docstrings, example tutorials) has been updated to reflect these changes. / 所有相关文档（API 文档、文档字符串、示例教程）已更新以反映这些更改。 - [x] Please ensure code files contain no Chinese comments. / 请保证代码文件中不含中文注释。 See merge request: Ascend/msmodeling!360	5 天前
test_shape_cat_passes.py	【同步】【非开发代码】代码从 develop 同步到 master Co-authored-by: yydyzr<liuyuncong1@huawei.com> Co-authored-by: gcw_61YBRfIt<chuzhenxing@huawei.com> Co-authored-by: 孔炳翔<1120200577@qq.com> Co-authored-by: zhengxinqian<qianzhengxin@huawei.com> Co-authored-by: hw_whx<wanghexiang7@huawei.com> Co-authored-by: jgong5<steven.gong@gmail.com> Co-authored-by: hw_whx<2952154980@qq.com> # message auto-generated for no-merge-commit merge: !330 merge master into master 【同步】【非开发代码】代码从 develop 同步到 master Created-by: AvadaKedavrua Commit-by: liujiawang;ascend-robot;AvadaKedavrua;lutean;Horacehxw;eveyin1;minghang_c;zwt__;tt0cool;elrond-g;jia_ya_nan;zhenyu_zhang;ChenHuiwen;wangshen001;Hudingyi;wendellX;Secluded_Ocean;jhon-117;yaohan404;jiangruitao;zhenghaojie;stormchasingg;panyj1993;cmh1056291129;yuyinkai1;sunguozhong;genius52;liu_jiaxu;HongMaoShuiGuai;zhengxinqian;weixin_43368449;jsez-li-bin;jgong5;wqh17101;w00609794;yydyzr;JieZhang679;sppedforcy;gcw_61YBRfIt;Jiong Gong;hw_whx;gongjiong;孔炳翔 Merged-by: ascend-robot Description: 代码从 develop 同步到 master，后续基于 master 演进，并支持打包 See merge request: Ascend/msmodeling!330	17 天前
test_sp_pass_unit.py	【同步】【非开发代码】代码从 develop 同步到 master Co-authored-by: yydyzr<liuyuncong1@huawei.com> Co-authored-by: gcw_61YBRfIt<chuzhenxing@huawei.com> Co-authored-by: 孔炳翔<1120200577@qq.com> Co-authored-by: zhengxinqian<qianzhengxin@huawei.com> Co-authored-by: hw_whx<wanghexiang7@huawei.com> Co-authored-by: jgong5<steven.gong@gmail.com> Co-authored-by: hw_whx<2952154980@qq.com> # message auto-generated for no-merge-commit merge: !330 merge master into master 【同步】【非开发代码】代码从 develop 同步到 master Created-by: AvadaKedavrua Commit-by: liujiawang;ascend-robot;AvadaKedavrua;lutean;Horacehxw;eveyin1;minghang_c;zwt__;tt0cool;elrond-g;jia_ya_nan;zhenyu_zhang;ChenHuiwen;wangshen001;Hudingyi;wendellX;Secluded_Ocean;jhon-117;yaohan404;jiangruitao;zhenghaojie;stormchasingg;panyj1993;cmh1056291129;yuyinkai1;sunguozhong;genius52;liu_jiaxu;HongMaoShuiGuai;zhengxinqian;weixin_43368449;jsez-li-bin;jgong5;wqh17101;w00609794;yydyzr;JieZhang679;sppedforcy;gcw_61YBRfIt;Jiong Gong;hw_whx;gongjiong;孔炳翔 Merged-by: ascend-robot Description: 代码从 develop 同步到 master，后续基于 master 演进，并支持打包 See merge request: Ascend/msmodeling!330	17 天前
test_swiglu_fusion_pass.py	fix(test): align swiglu fusion test with PR !362 selected_token_indices semantics Co-authored-by: Horacehxw<horacehxw@gmail.com> # message auto-generated for no-merge-commit merge: !459 merge bug-fix-mtp into master fix(test): align swiglu fusion test with PR !362 selected_token_indices semantics Created-by: Horacehxw Commit-by: Horacehxw Merged-by: ascend-robot Description: PR Type / PR类型 - [x] Bugfix（Bug 修复） - [x] Test-Cases（测试用例更新） ## 🔍 Motivation / 变更动机 PR !362 (commit 31d9de33) 合入 master 后，以下两个 nightly 回归测试失败： `FAILED test_swiglu_fused_op_present_deepseek_0_deepseek_ai_DeepSeek_V3_1 FAILED test_swiglu_fused_op_present_deepseek_1_deepseek_ai_DeepSeek_V3_1 AssertionError: torch.Size([1, 100, 129280]) != (1, 1, 129280)` ### 根因分析 PR !362 将 `CausalLmWrapper.forward()` 中 lm_head 前的 hidden states 行选择逻辑从： `python # 旧代码 if sampling_metadata and sampling_metadata.selected_token_indices is not None: hidden_states = hidden_states.index_select(1, sampling_metadata.selected_token_indices)` 替换为统一的 `select_lm_head_hidden_states()` 函数，内部通过 `_has_explicit_selected_token_indices()` 判断： `python def _has_explicit_selected_token_indices(indices): return indices is not None and indices.ndim > 0` `SamplingMetadata.selected_token_indices` 的默认值是 `torch.tensor(-1)`（标量，ndim==0）。 \| \| 旧行为 \| 新行为 \| \|---\|---\|---\| \| `tensor(-1)` 判定 \| `is not None` → True → 执行 `index_select` \| `ndim == 0` → False → 不做选择 \| \| lm_head 输入 \| 选最后 1 个 token → `(1, 1, hidden)` \| 全部 100 token → `(1, 100, hidden)` \| PR !362 对默认值的语义修改本身是合理的（哨兵值与有效值分离、默认"不选择"是 fail-safe 行为、生产代码 `generate_inputs` 从未依赖旧的隐式行为）。但遗漏了更新 `test_swiglu_fused_op_present_deepseek` 测试——该测试手动构造 `SamplingMetadata` 时依赖了旧的 `tensor(-1)` 隐式选择行为。此外，旧测试的 `(1, 1, vocab_size)` 期望本身是偶然正确的：标量 `-1` 通过 `index_select` 只选了全局最后一个 token，而对于 2 序列 packed batch（`query_start_loc=[0, 55, 100]`），正确的 prefill 行为应选每序列最后一个 token，产出 2 行。 ------ ## 📝 Modification / 修改内容修改 `test_swiglu_fused_op_present_deepseek` 测试： 1. 显式传入 `selected_token_indices`：使用 `attn_meta.query_start_loc[1:] - 1` 选取每个序列的最后一个 token（对齐 `generate_inputs_varlen` 的 prefill 路径逻辑） 2. 更新期望形状：从硬编码 `(1, 1, vocab_size)` 改为动态计算 `(1, num_sequences, vocab_size)` 修复后的测试比旧测试语义更正确：旧测试偶然选了 1 个全局最后 token，新测试显式选每序列最后 token。 ------ ## 📐 Associated Test Results / 关联测试结果 `pytest: 10 passed, 2 warnings in 60.32s` 覆盖 `test_swiglu_fusion_pass.py` 全部 10 个测试用例（含原先失败的 2 个 nightly 用例）。 ------ ## ✅ Checklist / 检查列表 - [x] Linting tools used / 使用 lintrunner 工具 - [x] Bug fixes covered by unit tests / 修复的 Bug 已由单元测试覆盖 - [x] Modification covered by unit tests / 修改已由单元测试覆盖 - [ ] Documentation updated / 文档已更新 - [x] No Chinese comments in code files / 代码文件中不含中文注释 See merge request: Ascend/msmodeling!459	1 天前
test_text_generate.py	fix(security): add model source safety checks Co-authored-by: jia_ya_nan<jiayanan3@h-partners.com> # message auto-generated for no-merge-commit merge: !385 merge fix/trust-remote-code-safety into master fix(security): add model source safety checks Created-by: jia_ya_nan Commit-by: jia_ya_nan Merged-by: ascend-robot Description: # PR Template Thanks for your contribution; we appreciate it a lot. The following instructions will make your pull request healthier and help you get feedback more easily. If you do not understand some items, don't worry, just make the pull request and seek help from maintainers. 感谢您的贡献，我们非常重视。以下说明将使您的拉取请求更健康，更易于获得反馈。如果您不理解某些项目，请不要担心，只需提交拉取请求并从维护人员那里寻求帮助即可。 PR Type / PR类型 - [ ] Feature（功能新增） - [x] Bugfix（Bug 修复） - [ ] Docs（文档更新） - [ ] CI/CD（持续集成/持续部署） - [ ] Refactor（代码重构） - [ ] Perf（性能优化） - [ ] Test-Cases（测试用例更新） - [x] Other（其他） ## 🔍 Motivation / 变更动机 Please describe the motivation of this PR and the goal you want to achieve through this PR. 请描述您的拉取请求的动机和您希望通过此拉取请求实现的目标。安全加固 ------ ## 📝 Modification / 修改内容 Please briefly describe what modification is made in this PR. 请简要描述此拉取请求中进行的修改。增加本地路径权限校验；增加日志风险提示去掉不维护的老接口 ------ ## 📐 Associated Test Results / 关联测试结果 Please provide the related test results, such as test reports, etc. 请提供相关测试结果，例如测试报告等。 ![image.png](https://raw.gitcode.com/user-images/assets/8428112/ef4f75a5-1346-4320-8de2-a19703ebedb3/image.png 'image.png') ------ ## 🌟 Use cases (Optional) / 使用案例（可选） If this PR introduces a new feature, it is better to list some use cases here and update the documentation. 如果此拉取请求引入了新功能，最好在此处列出一些用例并更新文档。 ------ ## ✅ Checklist / 检查列表 Before PR: - [x] Bug fixes are fully covered by unit tests, the case that causes the bug should be added in the unit tests. / 修复的 Bug 已完全由单元测试覆盖，导致 Bug 的情况应在单元测试中添加。 - [x] The modification is covered by complete unit tests. If not, please add more unit tests to ensure the correctness. / 此拉取请求中的修改已完全由单元测试覆盖。如果不是，请添加更多单元测试以确保正确性。 - [x] All relevant documentation (API docs, docstrings, example tutorials) has been updated to reflect these changes. / 所有相关文档（API 文档、文档字符串、示例教程）已更新以反映这些更改。 - [x] Please ensure code files contain no Chinese comments. / 请保证代码文件中不含中文注释。 ------ See merge request: Ascend/msmodeling!385	5 天前
test_transformers_utils.py	fix(security): add model source safety checks Co-authored-by: jia_ya_nan<jiayanan3@h-partners.com> # message auto-generated for no-merge-commit merge: !385 merge fix/trust-remote-code-safety into master fix(security): add model source safety checks Created-by: jia_ya_nan Commit-by: jia_ya_nan Merged-by: ascend-robot Description: # PR Template Thanks for your contribution; we appreciate it a lot. The following instructions will make your pull request healthier and help you get feedback more easily. If you do not understand some items, don't worry, just make the pull request and seek help from maintainers. 感谢您的贡献，我们非常重视。以下说明将使您的拉取请求更健康，更易于获得反馈。如果您不理解某些项目，请不要担心，只需提交拉取请求并从维护人员那里寻求帮助即可。 PR Type / PR类型 - [ ] Feature（功能新增） - [x] Bugfix（Bug 修复） - [ ] Docs（文档更新） - [ ] CI/CD（持续集成/持续部署） - [ ] Refactor（代码重构） - [ ] Perf（性能优化） - [ ] Test-Cases（测试用例更新） - [x] Other（其他） ## 🔍 Motivation / 变更动机 Please describe the motivation of this PR and the goal you want to achieve through this PR. 请描述您的拉取请求的动机和您希望通过此拉取请求实现的目标。安全加固 ------ ## 📝 Modification / 修改内容 Please briefly describe what modification is made in this PR. 请简要描述此拉取请求中进行的修改。增加本地路径权限校验；增加日志风险提示去掉不维护的老接口 ------ ## 📐 Associated Test Results / 关联测试结果 Please provide the related test results, such as test reports, etc. 请提供相关测试结果，例如测试报告等。 ![image.png](https://raw.gitcode.com/user-images/assets/8428112/ef4f75a5-1346-4320-8de2-a19703ebedb3/image.png 'image.png') ------ ## 🌟 Use cases (Optional) / 使用案例（可选） If this PR introduces a new feature, it is better to list some use cases here and update the documentation. 如果此拉取请求引入了新功能，最好在此处列出一些用例并更新文档。 ------ ## ✅ Checklist / 检查列表 Before PR: - [x] Bug fixes are fully covered by unit tests, the case that causes the bug should be added in the unit tests. / 修复的 Bug 已完全由单元测试覆盖，导致 Bug 的情况应在单元测试中添加。 - [x] The modification is covered by complete unit tests. If not, please add more unit tests to ensure the correctness. / 此拉取请求中的修改已完全由单元测试覆盖。如果不是，请添加更多单元测试以确保正确性。 - [x] All relevant documentation (API docs, docstrings, example tutorials) has been updated to reflect these changes. / 所有相关文档（API 文档、文档字符串、示例教程）已更新以反映这些更改。 - [x] Please ensure code files contain no Chinese comments. / 请保证代码文件中不含中文注释。 ------ See merge request: Ascend/msmodeling!385	5 天前
test_user_config.py	【Bugfix】修复deepseek-v4部分量化场景下的问题 Co-authored-by: ChenHuiwen<chenhuiwen7@huawei.com> # message auto-generated for no-merge-commit merge: !373 merge fix-ds-v4-quant into master 【Bugfix】修复deepseek-v4部分量化场景下的问题 Created-by: ChenHuiwen Commit-by: ChenHuiwen Merged-by: ascend-robot Description: # PR Template Thanks for your contribution; we appreciate it a lot. The following instructions will make your pull request healthier and help you get feedback more easily. If you do not understand some items, don't worry, just make the pull request and seek help from maintainers. 感谢您的贡献，我们非常重视。以下说明将使您的拉取请求更健康，更易于获得反馈。如果您不理解某些项目，请不要担心，只需提交拉取请求并从维护人员那里寻求帮助即可。 PR Type / PR类型 - [ ] Feature（功能新增） - [x] Bugfix（Bug 修复） - [ ] Docs（文档更新） - [ ] CI/CD（持续集成/持续部署） - [ ] Refactor（代码重构） - [ ] Perf（性能优化） - [x] Test-Cases（测试用例更新） - [ ] Other（其他） ## 🔍 Motivation / 变更动机 This PR fixes DeepSeek V4 quantization and tensor-parallel execution issues found in `text_generate` simulation. 此 PR 修复 DeepSeek V4 在 `text_generate` 仿真中与量化和张量并行相关的问题。 Specifically, the old `backbone` quantization override naming was ambiguous because the option is used to configure non-routed-expert linear layers such as attention projections, dense MLP layers, and shared experts. This PR renames it to `non-expert` to better match the actual behavior. 同时，DeepSeek V4 Flash/Pro 的 O projection path has model-specific grouped projection behavior. Under W4A8 quantization or high TP configurations, TensorCast previously could hit shape mismatches in `wo_a` / `wo_b` modeling. This PR keeps the modeled path aligned with the real DeepSeek V4 structure while avoiding invalid reshape or double-sharding behavior. ## 📝 Modification / 修改内容 - Rename quantization override terminology: - `--quantize-backbone-linear-action` -> `--quantize-non-expert-linear-action` - `quantize_backbone_linear_action` -> `quantize_non_expert_linear_action` - Update CLI help text, `UserInputConfig`, quantization config creation, and related regression tests. - Update non-expert quantization config patterns: - Rename `_BACKBONE_LINEAR_PATTERNS` to `_NON_EXPERT_LINEAR_PATTERNS`. - Keep routed MoE experts controlled by the broad `--quantize-linear-action` setting. - Keep attention, dense MLP, and shared-expert layers covered by the non-expert override. - Fix DeepSeek V4 W4A8 `wo_a` grouped projection: - When `wo_a` is quantized as W4A8, unpack int4 packed `qweight` back to its logical weight shape before the grouped einsum reshape. - Avoid using stale `in_features/out_features` values after TP wrapping. - Fix DeepSeek V4 O projection TP sharding: - Remove the duplicate V4-specific `self_attn.o_proj` TP rule. - Let the generic `o_proj` RowParallel rule handle `wo_b/o_proj` once. - Prevent `o_proj` from being sharded twice, which caused local input dim mismatch. - Add/update regression tests: - Cover W4A8 `wo_a` logical weight shape before grouped einsum. - Update DeepSeek V4 TP plan expectations. - Update quantization config and user config tests for the new non-expert terminology. ------ ## 📐 Associated Test Results / 关联测试结果 Please provide the related test results, such as test reports, etc. 请提供相关测试结果，例如测试报告等。 ![image.png](https://raw.gitcode.com/user-images/assets/8428112/554930ff-0798-4331-8131-355e9d34c759/image.png 'image.png') ![image.png](https://raw.gitcode.com/user-images/assets/8428112/de48f376-7feb-49b1-96ff-daa94228f25a/image.png 'image.png') ------ ## 🌟 Use cases (Optional) / 使用案例（可选） If this PR introduces a new feature, it is better to list some use cases here and update the documentation. 如果此拉取请求引入了新功能，最好在此处列出一些用例并更新文档。 ------ ## ✅ Checklist / 检查列表 Before PR: - [ ] Bug fixes are fully covered by unit tests, the case that causes the bug should be added in the unit tests. / 修复的 Bug 已完全由单元测试覆盖，导致 Bug 的情况应在单元测试中添加。 - [ ] The modification is covered by complete unit tests. If not, please add more unit tests to ensure the correctness. / 此拉取请求中的修改已完全由单元测试覆盖。如果不是，请添加更多单元测试以确保正确性。 - [ ] All relevant documentation (API docs, docstrings, example tutorials) has been updated to reflect these changes. / 所有相关文档（API 文档、文档字符串、示例教程）已更新以反映这些更改。 - [ ] Please ensure code files contain no Chinese comments. / 请保证代码文件中不含中文注释。 ------ See merge request: Ascend/msmodeling!373	13 天前
test_video_generate.py	fix(security): add model source safety checks Co-authored-by: jia_ya_nan<jiayanan3@h-partners.com> # message auto-generated for no-merge-commit merge: !385 merge fix/trust-remote-code-safety into master fix(security): add model source safety checks Created-by: jia_ya_nan Commit-by: jia_ya_nan Merged-by: ascend-robot Description: # PR Template Thanks for your contribution; we appreciate it a lot. The following instructions will make your pull request healthier and help you get feedback more easily. If you do not understand some items, don't worry, just make the pull request and seek help from maintainers. 感谢您的贡献，我们非常重视。以下说明将使您的拉取请求更健康，更易于获得反馈。如果您不理解某些项目，请不要担心，只需提交拉取请求并从维护人员那里寻求帮助即可。 PR Type / PR类型 - [ ] Feature（功能新增） - [x] Bugfix（Bug 修复） - [ ] Docs（文档更新） - [ ] CI/CD（持续集成/持续部署） - [ ] Refactor（代码重构） - [ ] Perf（性能优化） - [ ] Test-Cases（测试用例更新） - [x] Other（其他） ## 🔍 Motivation / 变更动机 Please describe the motivation of this PR and the goal you want to achieve through this PR. 请描述您的拉取请求的动机和您希望通过此拉取请求实现的目标。安全加固 ------ ## 📝 Modification / 修改内容 Please briefly describe what modification is made in this PR. 请简要描述此拉取请求中进行的修改。增加本地路径权限校验；增加日志风险提示去掉不维护的老接口 ------ ## 📐 Associated Test Results / 关联测试结果 Please provide the related test results, such as test reports, etc. 请提供相关测试结果，例如测试报告等。 ![image.png](https://raw.gitcode.com/user-images/assets/8428112/ef4f75a5-1346-4320-8de2-a19703ebedb3/image.png 'image.png') ------ ## 🌟 Use cases (Optional) / 使用案例（可选） If this PR introduces a new feature, it is better to list some use cases here and update the documentation. 如果此拉取请求引入了新功能，最好在此处列出一些用例并更新文档。 ------ ## ✅ Checklist / 检查列表 Before PR: - [x] Bug fixes are fully covered by unit tests, the case that causes the bug should be added in the unit tests. / 修复的 Bug 已完全由单元测试覆盖，导致 Bug 的情况应在单元测试中添加。 - [x] The modification is covered by complete unit tests. If not, please add more unit tests to ensure the correctness. / 此拉取请求中的修改已完全由单元测试覆盖。如果不是，请添加更多单元测试以确保正确性。 - [x] All relevant documentation (API docs, docstrings, example tutorials) has been updated to reflect these changes. / 所有相关文档（API 文档、文档字符串、示例教程）已更新以反映这些更改。 - [x] Please ensure code files contain no Chinese comments. / 请保证代码文件中不含中文注释。 ------ See merge request: Ascend/msmodeling!385	5 天前
test_vl_compile.py	【同步】【非开发代码】代码从 develop 同步到 master Co-authored-by: yydyzr<liuyuncong1@huawei.com> Co-authored-by: gcw_61YBRfIt<chuzhenxing@huawei.com> Co-authored-by: 孔炳翔<1120200577@qq.com> Co-authored-by: zhengxinqian<qianzhengxin@huawei.com> Co-authored-by: hw_whx<wanghexiang7@huawei.com> Co-authored-by: jgong5<steven.gong@gmail.com> Co-authored-by: hw_whx<2952154980@qq.com> # message auto-generated for no-merge-commit merge: !330 merge master into master 【同步】【非开发代码】代码从 develop 同步到 master Created-by: AvadaKedavrua Commit-by: liujiawang;ascend-robot;AvadaKedavrua;lutean;Horacehxw;eveyin1;minghang_c;zwt__;tt0cool;elrond-g;jia_ya_nan;zhenyu_zhang;ChenHuiwen;wangshen001;Hudingyi;wendellX;Secluded_Ocean;jhon-117;yaohan404;jiangruitao;zhenghaojie;stormchasingg;panyj1993;cmh1056291129;yuyinkai1;sunguozhong;genius52;liu_jiaxu;HongMaoShuiGuai;zhengxinqian;weixin_43368449;jsez-li-bin;jgong5;wqh17101;w00609794;yydyzr;JieZhang679;sppedforcy;gcw_61YBRfIt;Jiong Gong;hw_whx;gongjiong;孔炳翔 Merged-by: ascend-robot Description: 代码从 develop 同步到 master，后续基于 master 演进，并支持打包 See merge request: Ascend/msmodeling!330	17 天前