ascend-robotfix(tensor_cast): update DeepSeek regression baselines

文件	最后提交记录	最后更新时间
cases	fix(tensor_cast): update DeepSeek regression baselines Co-authored-by: minghang_c<chiminghang@h-partners.com> # message auto-generated for no-merge-commit merge: !474 merge develop-on-upstream-master into master fix(tensor_cast): update DeepSeek regression baselines Created-by: minghang_c Commit-by: minghang_c Merged-by: ascend-robot Description: # PR Template Thanks for your contribution; we appreciate it a lot. The following instructions will make your pull request healthier and help you get feedback more easily. If you do not understand some items, don't worry, just make the pull request and seek help from maintainers. 感谢您的贡献，我们非常重视。以下说明将使您的拉取请求更健康，更易于获得反馈。如果您不理解某些项目，请不要担心，只需提交拉取请求并从维护人员那里寻求帮助即可。 PR Type / PR类型 - [ ] Feature（功能新增） - [x] Bugfix（Bug 修复） - [ ] Docs（文档更新） - [ ] CI/CD（持续集成/持续部署） - [ ] Refactor（代码重构） - [ ] Perf（性能优化） - [x] Test-Cases（测试用例更新） - [ ] Other（其他） ## 🔍 Motivation / 变更动机 After the latest `mlapo_quant` roofline modeling update, the DeepSeek-V3.1 decode model-regression case still passed total-time validation but failed operator-level validation because the stored `tensor_cast.mlapo_quant.default` baseline was stale. The current analytic result for this case is `8.443 ms`, while the previous baseline was `3.626 ms`, causing a false operator-level regression anomaly. ------ ## 📝 Modification / 修改内容 - Update `tests/benchmark/models/cases/deepseek-v3.1-decode.json`. - Refresh the `tensor_cast.mlapo_quant.default` operator baseline from `0.003626s` to `0.008443s`. - Keep total-time baseline and all other operator baselines unchanged. ------ ## 📐 Associated Test Results / 关联测试结果 - `develop-on-upstream-master` rebased onto latest `upstream/master` validation: - `uv run pytest tests/benchmark/models/test_model_regression.py::TestPerformanceRegression::test_performance_regression_01_deepseek_v3_1_decode` - `1 passed in 18.34s` ------ ## ✅ Checklist / 检查列表 Before PR: - [x] The modification is covered by validation runs and targeted regression tests. / 此拉取请求中的修改已通过验证用例和定向回归测试覆盖。 - [x] Please ensure code files contain no Chinese comments. / 请保证代码文件中不含中文注释。 ------ See merge request: Ascend/msmodeling!474	15 小时前
README.md	【同步】【非开发代码】代码从 develop 同步到 master Co-authored-by: yydyzr<liuyuncong1@huawei.com> Co-authored-by: gcw_61YBRfIt<chuzhenxing@huawei.com> Co-authored-by: 孔炳翔<1120200577@qq.com> Co-authored-by: zhengxinqian<qianzhengxin@huawei.com> Co-authored-by: hw_whx<wanghexiang7@huawei.com> Co-authored-by: jgong5<steven.gong@gmail.com> Co-authored-by: hw_whx<2952154980@qq.com> # message auto-generated for no-merge-commit merge: !330 merge master into master 【同步】【非开发代码】代码从 develop 同步到 master Created-by: AvadaKedavrua Commit-by: liujiawang;ascend-robot;AvadaKedavrua;lutean;Horacehxw;eveyin1;minghang_c;zwt__;tt0cool;elrond-g;jia_ya_nan;zhenyu_zhang;ChenHuiwen;wangshen001;Hudingyi;wendellX;Secluded_Ocean;jhon-117;yaohan404;jiangruitao;zhenghaojie;stormchasingg;panyj1993;cmh1056291129;yuyinkai1;sunguozhong;genius52;liu_jiaxu;HongMaoShuiGuai;zhengxinqian;weixin_43368449;jsez-li-bin;jgong5;wqh17101;w00609794;yydyzr;JieZhang679;sppedforcy;gcw_61YBRfIt;Jiong Gong;hw_whx;gongjiong;孔炳翔 Merged-by: ascend-robot Description: 代码从 develop 同步到 master，后续基于 master 演进，并支持打包 See merge request: Ascend/msmodeling!330	17 天前
__init__.py	【同步】【非开发代码】代码从 develop 同步到 master Co-authored-by: yydyzr<liuyuncong1@huawei.com> Co-authored-by: gcw_61YBRfIt<chuzhenxing@huawei.com> Co-authored-by: 孔炳翔<1120200577@qq.com> Co-authored-by: zhengxinqian<qianzhengxin@huawei.com> Co-authored-by: hw_whx<wanghexiang7@huawei.com> Co-authored-by: jgong5<steven.gong@gmail.com> Co-authored-by: hw_whx<2952154980@qq.com> # message auto-generated for no-merge-commit merge: !330 merge master into master 【同步】【非开发代码】代码从 develop 同步到 master Created-by: AvadaKedavrua Commit-by: liujiawang;ascend-robot;AvadaKedavrua;lutean;Horacehxw;eveyin1;minghang_c;zwt__;tt0cool;elrond-g;jia_ya_nan;zhenyu_zhang;ChenHuiwen;wangshen001;Hudingyi;wendellX;Secluded_Ocean;jhon-117;yaohan404;jiangruitao;zhenghaojie;stormchasingg;panyj1993;cmh1056291129;yuyinkai1;sunguozhong;genius52;liu_jiaxu;HongMaoShuiGuai;zhengxinqian;weixin_43368449;jsez-li-bin;jgong5;wqh17101;w00609794;yydyzr;JieZhang679;sppedforcy;gcw_61YBRfIt;Jiong Gong;hw_whx;gongjiong;孔炳翔 Merged-by: ascend-robot Description: 代码从 develop 同步到 master，后续基于 master 演进，并支持打包 See merge request: Ascend/msmodeling!330	17 天前
auto_baseline.py	【同步】【非开发代码】代码从 develop 同步到 master Co-authored-by: yydyzr<liuyuncong1@huawei.com> Co-authored-by: gcw_61YBRfIt<chuzhenxing@huawei.com> Co-authored-by: 孔炳翔<1120200577@qq.com> Co-authored-by: zhengxinqian<qianzhengxin@huawei.com> Co-authored-by: hw_whx<wanghexiang7@huawei.com> Co-authored-by: jgong5<steven.gong@gmail.com> Co-authored-by: hw_whx<2952154980@qq.com> # message auto-generated for no-merge-commit merge: !330 merge master into master 【同步】【非开发代码】代码从 develop 同步到 master Created-by: AvadaKedavrua Commit-by: liujiawang;ascend-robot;AvadaKedavrua;lutean;Horacehxw;eveyin1;minghang_c;zwt__;tt0cool;elrond-g;jia_ya_nan;zhenyu_zhang;ChenHuiwen;wangshen001;Hudingyi;wendellX;Secluded_Ocean;jhon-117;yaohan404;jiangruitao;zhenghaojie;stormchasingg;panyj1993;cmh1056291129;yuyinkai1;sunguozhong;genius52;liu_jiaxu;HongMaoShuiGuai;zhengxinqian;weixin_43368449;jsez-li-bin;jgong5;wqh17101;w00609794;yydyzr;JieZhang679;sppedforcy;gcw_61YBRfIt;Jiong Gong;hw_whx;gongjiong;孔炳翔 Merged-by: ascend-robot Description: 代码从 develop 同步到 master，后续基于 master 演进，并支持打包 See merge request: Ascend/msmodeling!330	17 天前
test_model_regression.py	[smoke]精度防护网案例补充 Co-authored-by: yuyinkai1<769293914@qq.com> # message auto-generated for no-merge-commit merge: !405 merge master into master [smoke]精度防护网案例补充 Created-by: yuyinkai1 Commit-by: yuyinkai1 Merged-by: ascend-robot Description: # PR Template Thanks for your contribution; we appreciate it a lot. The following instructions will make your pull request healthier and help you get feedback more easily. If you do not understand some items, don't worry, just make the pull request and seek help from maintainers. 感谢您的贡献，我们非常重视。以下说明将使您的拉取请求更健康，更易于获得反馈。如果您不理解某些项目，请不要担心，只需提交拉取请求并从维护人员那里寻求帮助即可。 PR Type / PR类型 - [ ] Feature（功能新增） - [ ] Bugfix（Bug 修复） - [ ] Docs（文档更新） - [ ] CI/CD（持续集成/持续部署） - [ ] Refactor（代码重构） - [ ] Perf（性能优化） - [✅️ ] Test-Cases（测试用例更新） - [ ] Other（其他） ## 🔍 Motivation / 变更动机 Please describe the motivation of this PR and the goal you want to achieve through this PR. 新增模型实测精度看护 ------ ## 📝 Modification / 修改内容 Please briefly describe what modification is made in this PR. 1、增加了精度QWEN3.5 KIMI2.7 MINIMAX模型的精度看护。 ------ ## 📐 Associated Test Results / 关联测试结果 Please provide the related test results, such as test reports, etc. ![image.png](https://raw.gitcode.com/user-images/assets/8428112/35687a6b-2539-4060-8d73-3a95a416fa32/image.png 'image.png') ------ ## 🌟 Use cases (Optional) / 使用案例（可选） If this PR introduces a new feature, it is better to list some use cases here and update the documentation. 如果此拉取请求引入了新功能，最好在此处列出一些用例并更新文档。 ------ ## ✅ Checklist / 检查列表 Before PR: - [ ] Bug fixes are fully covered by unit tests, the case that causes the bug should be added in the unit tests. / 修复的 Bug 已完全由单元测试覆盖，导致 Bug 的情况应在单元测试中添加。 - [ ] The modification is covered by complete unit tests. If not, please add more unit tests to ensure the correctness. / 此拉取请求中的修改已完全由单元测试覆盖。如果不是，请添加更多单元测试以确保正确性。 - [ ] All relevant documentation (API docs, docstrings, example tutorials) has been updated to reflect these changes. / 所有相关文档（API 文档、文档字符串、示例教程）已更新以反映这些更改。 - [ ] Please ensure code files contain no Chinese comments. / 请保证代码文件中不含中文注释。 ------ See merge request: Ascend/msmodeling!405	4 天前

Performance Regression Testing Framework

Directory Structure

tests/benchmark/models/
├── test_model_regression.py            ← Main entry: total time regression + operator regression
├── auto_baseline.py                    ← Standalone entry: auto-baseline runner (pytest)
├── __init__.py                         ← Package init
├── cases/                              ← Per-case JSON configuration files (includes operator baselines)
└── README.md                           ← This file

Core Design: One Case Definition, Two Automatic Checks

Add a JSON configuration file under the cases/ directory and the framework automatically performs two checks:

Check	Description
Check 1: Total Time Comparison	vs initial time (default 10%) + vs baseline time (default 20%)
Check 2: Operator-Level Comparison	Top-N operators vs initial operator baseline (default 10%)

Execution policy in this repo:

Put model-level fidelity cases only under tests/benchmark/models/
Do not add nightly marker for these cases
run_benchmark.sh and nightly full run will execute them; compile/regression incremental pipelines will not
Shared model configs are stored in tests/assets/model_config/

Case Configuration

Case Types

The framework supports two model types with dedicated data structures:

TextPerfRegressionCase: For text/VL/LLM models, configured via UserInputConfig
VideoPerfRegressionCase: For video diffusion models, configured with video-specific parameters

Both share common fields from BasePerfRegressionCase.

Adding a New Case

Create a JSON file under the cases/ directory. The filename should match the name field.

Text Model Example (`cases/qwen3-8B-decode.json`)

{
  "type": "text",
  "name": "qwen3-8B-decode",
  "description": "Qwen3-8B decode, 32 queries, ctx=1536, TP=2, compile",
  "initial_time_s": 0.012733,
  "baseline_time_s": 0.015406,
  "initial_tolerance": 0.10,
  "baseline_tolerance": 0.20,
  "operator_top_n": 10,
  "operator_tolerance": 0.10,
  "user_input": {
    "device": "ATLAS_800_A2_376T_64G",
    "model_id": "Qwen/Qwen3-8B",
    "num_queries": 32,
    "query_len": 1,
    "context_length": 1536,
    "do_compile": true,
    "decode": true,
    "quantize_linear_action": "DISABLED",
    "tp_size": 2,
    "world_size": 2
  }
}

Video Model Example (`cases/wan2.2-ulysses8.json`)

{
  "type": "video",
  "name": "wan2.2-ulysses8",
  "description": "Wan2.2-T2V-A14B ulysses=8, batch=1, seq=128, 720x1280x81frames, bfloat16, use_cfg",
  "initial_time_s": 8.542,
  "baseline_time_s": 7.625,
  "initial_tolerance": 0.10,
  "baseline_tolerance": 0.20,
  "operator_top_n": 10,
  "operator_tolerance": 0.10,
  "device": "ATLAS_800_A3_752T_128G_DIE",
  "model_id": "assets/model_config/Wan2.2-T2V-A14B-Diffusers",
  "seq_len": 128,
  "batch_size": 1,
  "height": 720,
  "width": 1280,
  "frame_num": 81,
  "sample_step": 1,
  "dtype": "bfloat16",
  "use_cfg": true,
  "world_size": 8,
  "ulysses_size": 8,
  "cfg_parallel": false,
  "quantize_linear_action": "DISABLED"
}

Common Fields (Base)

Field	Type	Default	Description
`type`	`str`	`"text"`	Case type: `"text"` or `"video"`
`name`	`str`	required	Unique case identifier; operator baseline is stored in `operators` field of this file
`description`	`str`	required	Case description, shown on failure
`initial_time_s`	`float`	`0.0`	Initial total time (seconds). Set `0` to skip initial comparison
`baseline_time_s`	`float`	`0.0`	Baseline total time (seconds). Set `0` to skip baseline comparison
`initial_tolerance`	`float`	`0.10`	Tolerance vs initial time (10%)
`baseline_tolerance`	`float`	`0.20`	Tolerance vs baseline time (20%)
`operator_top_n`	`int`	`10`	Compare top-N most expensive operators
`operator_tolerance`	`float`	`0.10`	Operator-level tolerance (10%)
`operators`	`array`	`[]`	Operator baseline data: list of `{name, total_time_s, num_calls}` objects

Text-Specific Fields (`user_input`)

Field	Type	Description
`device`	`str`	Target device name
`model_id`	`str`	Model identifier or path
`num_queries`	`int`	Number of queries
`query_len`	`int`	Query token length
`context_length`	`int`	Context length for decode
`do_compile`	`bool`	Enable `torch.compile`
`decode`	`bool`	Enable decode mode
`quantize_linear_action`	`str`	Quantization action: `"DISABLED"`, `"W8A8_DYNAMIC"`
`quantize_attention_action`	`str`	Attention quantization: `"DISABLED"`, `"INT8"`
`tp_size`	`int`	Tensor parallelism degree
`dp_size`	`int`	Data parallelism degree
`ep_size`	`int`	Expert parallelism degree
`world_size`	`int`	Total device count
`num_mtp_tokens`	`int`	MTP token count
`image_batch_size`	`int`	Image batch size (VL models)
`image_height`	`int`	Image height (VL models)
`image_width`	`int`	Image width (VL models)

Video-Specific Fields

Field	Type	Default	Description
`device`	`str`	`""`	Target device name
`model_id`	`str`	`""`	Path to model configuration directory
`seq_len`	`int`	`0`	Sequence length
`batch_size`	`int`	`0`	Batch size
`height`	`int`	`0`	Video height
`width`	`int`	`0`	Video width
`frame_num`	`int`	`0`	Number of frames
`sample_step`	`int`	`0`	Sampling step
`dtype`	`str`	`"float16"`	Data type
`use_cfg`	`bool`	`false`	Enable classifier-free guidance
`world_size`	`int`	`1`	Total device count
`ulysses_size`	`int`	`1`	Ulysses sequence parallelism degree
`cfg_parallel`	`bool`	`false`	Enable CFG parallel
`quantize_linear_action`	`str`	`"DISABLED"`	Quantization action

Running Tests

# Run all regression tests
python -m pytest tests/benchmark/models/test_model_regression.py -v --tb=short

# Filter by name
python -m pytest tests/benchmark/models/test_model_regression.py -k "qwen3_30b" -v --tb=short

Output Example

==============================================================================================================
  [Check 1] Total Time Regression Summary
==============================================================================================================
Case                                       Actual      Init   InitDiff%      Base   BaseDiff%         Status
--------------------------------------------------------------------------------------------------------------
qwen3_30b_a3b_prefill_w8a8_tp2_compile  330.123ms  300.000ms    +10.04%  322.000ms     +2.52%     FAIL(INIT)
qwen3_32b_prefill_w8a8_tp1              455.000ms  450.000ms     +1.11%  440.000ms     +3.41%           PASS
--------------------------------------------------------------------------------------------------------------
Total: 2 | Passed: 1 | Failed: 1 | No Baseline: 0
==============================================================================================================

==============================================================================================================
  [Check 2] Operator Regression Summary
==============================================================================================================
Case                                                         Status                              Details
--------------------------------------------------------------------------------------------------------------
qwen3_30b_a3b_prefill_w8a8_tp2_compile                       FAIL                   2 operator(s) exceeded
qwen3_32b_prefill_w8a8_tp1                                    PASS          All operators within tolerance
--------------------------------------------------------------------------------------------------------------
Total: 2 | Passed: 1 | Failed: 1 | No Baseline: 0
==============================================================================================================

*** Operator regression anomalies detected! ***

  [qwen3_30b_a3b_prefill_w8a8_tp2_compile]:
    aten::mm: +12.34% (baseline=45.123ms, actual=50.691ms)
    aten::addmm: +15.67% (baseline=32.456ms, actual=37.542ms)

New Case Onboarding Process

Follow this standard lifecycle when adding a new performance regression case:

Step 1: Create the Case Configuration

Create a new JSON file under cases/<case_name>.json with the appropriate type ("text" or "video") and all required fields. See the examples above for the correct format.

The framework automatically discovers and loads all *.json files from the cases/ directory — no changes to the test source code are required.

Step 2: First Run — Generate the Baseline

The operator baseline must be generated explicitly before regression tests can pass. On the first run, the test will fail with a message that no operator baseline was found. You need to capture the operator output and populate the operators field in your case JSON (cases/<case_name>.json):

"operators": [
  {"name": "aten::mm", "total_time_s": 0.003200, "num_calls": 64},
  {"name": "aten::addmm", "total_time_s": 0.002100, "num_calls": 32}
]

Once the operators field is populated, subsequent runs will perform operator-level comparisons.

Step 3: Second Run — Verify Stability

Run the same test a second time. The framework now has baseline data and will compare operator-level timings:

python -m pytest tests/benchmark/models/test_model_regression.py -k "your_case_name" -v --tb=short

Verify that:

Total time comparisons (initial_time_s and baseline_time_s) are within tolerance
Operator-level comparisons are stable (no unexpected regressions)
Results are reproducible across multiple runs

Step 4: Commit the Configuration

Once the case passes consistently, commit the case file:

cases/<case_name>.json — the case configuration with operator baseline data

Step 5: Refreshing Baselines

When a baseline refresh is needed (e.g., after a model update, performance optimization, or intentional operator change), clear the operators field in the case JSON and follow Steps 2–4 again:

# Manually edit the case JSON and set "operators": []

Then re-generate the operator baseline and re-verify.

Important: When committing a refreshed baseline, always include the reason in the commit message:

Model version change (e.g., "Updated Qwen3-8B to v2.1")
Performance baseline adjustment (e.g., "Adjusted baseline after compiler optimization")
Intentional operator change (e.g., "Switched from aten::mm to aten::matmul")

Auto-Baseline Runner (auto_baseline.py)

A pytest-based runner that automatically runs each case twice: the first run establishes a baseline, the second run compares against it (default tolerance: 5%).

Adding a Case

Edit auto_baseline.py and add an AutoBaselineCase to the AUTO_BASELINE_CASES list:

AUTO_BASELINE_CASES: List[AutoBaselineCase] = [
    AutoBaselineCase(
        name="qwen3-8B_auto",
        description="Qwen3-8B decode, baseline ctx=1536 vs compare ctx=1500",
        baseline_input=UserInputConfig(
            device="ATLAS_800_A2_376T_64G",
            model_id="Qwen/Qwen3-8B",
            num_queries=32,
            query_len=1,
            context_length=1536,
            do_compile=True,
            decode=True,
            tp_size=2,
            world_size=2,
        ),
        compare_input=UserInputConfig(
            device="ATLAS_800_A2_376T_64G",
            model_id="Qwen/Qwen3-8B",
            num_queries=32,
            query_len=1,
            context_length=1500,
            do_compile=True,
            decode=True,
            tp_size=2,
            world_size=2,
        ),
        tolerance=0.05,
    ),
]

Auto-Baseline Fields

Field	Type	Default	Description
`name`	`str`	required	Unique case identifier
`description`	`str`	required	Case description
`baseline_input`	`UserInputConfig`	required	Baseline inference configuration
`compare_input`	`UserInputConfig`	required	Comparison inference configuration
`tolerance`	`float`	`0.05`	Tolerance (5%)

Running

# Run all auto-baseline cases
python -m pytest tests/benchmark/models/auto_baseline.py -v -s

# Filter by name
python -m pytest tests/benchmark/models/auto_baseline.py -k "qwen3-8B" -v -s

Quick Start

1. Add a Case

Create a JSON file under cases/:

{
  "type": "text",
  "name": "your_case_name",
  "description": "your description",
  "initial_time_s": 0.300,
  "baseline_time_s": 0.322,
  "user_input": {
    "device": "YOUR_DEVICE",
    "model_id": "your/model/id",
    "num_queries": 1,
    "query_len": 6600,
    "do_compile": true,
    "tp_size": 2,
    "world_size": 2
  }
}

2. Run

python -m pytest tests/benchmark/models/test_model_regression.py -v --tb=short

Note: The operators field in each case JSON must be populated before the regression tests can pass. Without operator baseline data, the test will fail with a clear message directing you to generate the baseline first. See the onboarding process above for details.

3. Quick Self-Test

python -m pytest tests/benchmark/models/auto_baseline.py -v -s