ascend-robot【同步】【非开发代码】代码从 develop 同步到 master

Performance Regression Testing Framework

Directory Structure

tests/benchmark/models/
├── test_model_regression.py            ← Main entry: total time regression + operator regression
├── auto_baseline.py                    ← Standalone entry: auto-baseline runner (pytest)
├── __init__.py                         ← Package init
├── cases/                              ← Per-case JSON configuration files (includes operator baselines)
└── README.md                           ← This file

Core Design: One Case Definition, Two Automatic Checks

Add a JSON configuration file under the cases/ directory and the framework automatically performs two checks:

Check	Description
Check 1: Total Time Comparison	vs initial time (default 10%) + vs baseline time (default 20%)
Check 2: Operator-Level Comparison	Top-N operators vs initial operator baseline (default 10%)

Execution policy in this repo:

Put model-level fidelity cases only under tests/benchmark/models/
Do not add nightly marker for these cases
run_benchmark.sh and nightly full run will execute them; compile/regression incremental pipelines will not
Shared model configs are stored in tests/assets/model_config/

Case Configuration

Case Types

The framework supports two model types with dedicated data structures:

TextPerfRegressionCase: For text/VL/LLM models, configured via UserInputConfig
VideoPerfRegressionCase: For video diffusion models, configured with video-specific parameters

Both share common fields from BasePerfRegressionCase.

Adding a New Case

Create a JSON file under the cases/ directory. The filename should match the name field.

Text Model Example (`cases/qwen3-8B-decode.json`)

{
  "type": "text",
  "name": "qwen3-8B-decode",
  "description": "Qwen3-8B decode, 32 queries, ctx=1536, TP=2, compile",
  "initial_time_s": 0.012733,
  "baseline_time_s": 0.015406,
  "initial_tolerance": 0.10,
  "baseline_tolerance": 0.20,
  "operator_top_n": 10,
  "operator_tolerance": 0.10,
  "user_input": {
    "device": "ATLAS_800_A2_376T_64G",
    "model_id": "Qwen/Qwen3-8B",
    "num_queries": 32,
    "query_len": 1,
    "context_length": 1536,
    "do_compile": true,
    "decode": true,
    "quantize_linear_action": "DISABLED",
    "tp_size": 2,
    "world_size": 2
  }
}

Video Model Example (`cases/wan2.2-ulysses8.json`)

{
  "type": "video",
  "name": "wan2.2-ulysses8",
  "description": "Wan2.2-T2V-A14B ulysses=8, batch=1, seq=128, 720x1280x81frames, bfloat16, use_cfg",
  "initial_time_s": 8.542,
  "baseline_time_s": 7.625,
  "initial_tolerance": 0.10,
  "baseline_tolerance": 0.20,
  "operator_top_n": 10,
  "operator_tolerance": 0.10,
  "device": "ATLAS_800_A3_752T_128G_DIE",
  "model_id": "assets/model_config/Wan2.2-T2V-A14B-Diffusers",
  "seq_len": 128,
  "batch_size": 1,
  "height": 720,
  "width": 1280,
  "frame_num": 81,
  "sample_step": 1,
  "dtype": "bfloat16",
  "use_cfg": true,
  "world_size": 8,
  "ulysses_size": 8,
  "cfg_parallel": false,
  "quantize_linear_action": "DISABLED"
}

Common Fields (Base)

Field	Type	Default	Description
`type`	`str`	`"text"`	Case type: `"text"` or `"video"`
`name`	`str`	required	Unique case identifier; operator baseline is stored in `operators` field of this file
`description`	`str`	required	Case description, shown on failure
`initial_time_s`	`float`	`0.0`	Initial total time (seconds). Set `0` to skip initial comparison
`baseline_time_s`	`float`	`0.0`	Baseline total time (seconds). Set `0` to skip baseline comparison
`initial_tolerance`	`float`	`0.10`	Tolerance vs initial time (10%)
`baseline_tolerance`	`float`	`0.20`	Tolerance vs baseline time (20%)
`operator_top_n`	`int`	`10`	Compare top-N most expensive operators
`operator_tolerance`	`float`	`0.10`	Operator-level tolerance (10%)
`operators`	`array`	`[]`	Operator baseline data: list of `{name, total_time_s, num_calls}` objects

Text-Specific Fields (`user_input`)

Field	Type	Description
`device`	`str`	Target device name
`model_id`	`str`	Model identifier or path
`num_queries`	`int`	Number of queries
`query_len`	`int`	Query token length
`context_length`	`int`	Context length for decode
`do_compile`	`bool`	Enable `torch.compile`
`decode`	`bool`	Enable decode mode
`quantize_linear_action`	`str`	Quantization action: `"DISABLED"`, `"W8A8_DYNAMIC"`
`quantize_attention_action`	`str`	Attention quantization: `"DISABLED"`, `"INT8"`
`tp_size`	`int`	Tensor parallelism degree
`dp_size`	`int`	Data parallelism degree
`ep_size`	`int`	Expert parallelism degree
`world_size`	`int`	Total device count
`num_mtp_tokens`	`int`	MTP token count
`image_batch_size`	`int`	Image batch size (VL models)
`image_height`	`int`	Image height (VL models)
`image_width`	`int`	Image width (VL models)

Video-Specific Fields

Field	Type	Default	Description
`device`	`str`	`""`	Target device name
`model_id`	`str`	`""`	Path to model configuration directory
`seq_len`	`int`	`0`	Sequence length
`batch_size`	`int`	`0`	Batch size
`height`	`int`	`0`	Video height
`width`	`int`	`0`	Video width
`frame_num`	`int`	`0`	Number of frames
`sample_step`	`int`	`0`	Sampling step
`dtype`	`str`	`"float16"`	Data type
`use_cfg`	`bool`	`false`	Enable classifier-free guidance
`world_size`	`int`	`1`	Total device count
`ulysses_size`	`int`	`1`	Ulysses sequence parallelism degree
`cfg_parallel`	`bool`	`false`	Enable CFG parallel
`quantize_linear_action`	`str`	`"DISABLED"`	Quantization action

Running Tests

# Run all regression tests
python -m pytest tests/benchmark/models/test_model_regression.py -v --tb=short

# Filter by name
python -m pytest tests/benchmark/models/test_model_regression.py -k "qwen3_30b" -v --tb=short

Output Example

==============================================================================================================
  [Check 1] Total Time Regression Summary
==============================================================================================================
Case                                       Actual      Init   InitDiff%      Base   BaseDiff%         Status
--------------------------------------------------------------------------------------------------------------
qwen3_30b_a3b_prefill_w8a8_tp2_compile  330.123ms  300.000ms    +10.04%  322.000ms     +2.52%     FAIL(INIT)
qwen3_32b_prefill_w8a8_tp1              455.000ms  450.000ms     +1.11%  440.000ms     +3.41%           PASS
--------------------------------------------------------------------------------------------------------------
Total: 2 | Passed: 1 | Failed: 1 | No Baseline: 0
==============================================================================================================

==============================================================================================================
  [Check 2] Operator Regression Summary
==============================================================================================================
Case                                                         Status                              Details
--------------------------------------------------------------------------------------------------------------
qwen3_30b_a3b_prefill_w8a8_tp2_compile                       FAIL                   2 operator(s) exceeded
qwen3_32b_prefill_w8a8_tp1                                    PASS          All operators within tolerance
--------------------------------------------------------------------------------------------------------------
Total: 2 | Passed: 1 | Failed: 1 | No Baseline: 0
==============================================================================================================

*** Operator regression anomalies detected! ***

  [qwen3_30b_a3b_prefill_w8a8_tp2_compile]:
    aten::mm: +12.34% (baseline=45.123ms, actual=50.691ms)
    aten::addmm: +15.67% (baseline=32.456ms, actual=37.542ms)

New Case Onboarding Process

Follow this standard lifecycle when adding a new performance regression case:

Step 1: Create the Case Configuration

Create a new JSON file under cases/<case_name>.json with the appropriate type ("text" or "video") and all required fields. See the examples above for the correct format.

The framework automatically discovers and loads all *.json files from the cases/ directory — no changes to the test source code are required.

Step 2: First Run — Generate the Baseline

The operator baseline must be generated explicitly before regression tests can pass. On the first run, the test will fail with a message that no operator baseline was found. You need to capture the operator output and populate the operators field in your case JSON (cases/<case_name>.json):

"operators": [
  {"name": "aten::mm", "total_time_s": 0.003200, "num_calls": 64},
  {"name": "aten::addmm", "total_time_s": 0.002100, "num_calls": 32}
]

Once the operators field is populated, subsequent runs will perform operator-level comparisons.

Step 3: Second Run — Verify Stability

Run the same test a second time. The framework now has baseline data and will compare operator-level timings:

python -m pytest tests/benchmark/models/test_model_regression.py -k "your_case_name" -v --tb=short

Verify that:

Total time comparisons (initial_time_s and baseline_time_s) are within tolerance
Operator-level comparisons are stable (no unexpected regressions)
Results are reproducible across multiple runs

Step 4: Commit the Configuration

Once the case passes consistently, commit the case file:

cases/<case_name>.json — the case configuration with operator baseline data

Step 5: Refreshing Baselines

When a baseline refresh is needed (e.g., after a model update, performance optimization, or intentional operator change), clear the operators field in the case JSON and follow Steps 2–4 again:

# Manually edit the case JSON and set "operators": []

Then re-generate the operator baseline and re-verify.

Important: When committing a refreshed baseline, always include the reason in the commit message:

Model version change (e.g., "Updated Qwen3-8B to v2.1")
Performance baseline adjustment (e.g., "Adjusted baseline after compiler optimization")
Intentional operator change (e.g., "Switched from aten::mm to aten::matmul")

Auto-Baseline Runner (auto_baseline.py)

A pytest-based runner that automatically runs each case twice: the first run establishes a baseline, the second run compares against it (default tolerance: 5%).

Adding a Case

Edit auto_baseline.py and add an AutoBaselineCase to the AUTO_BASELINE_CASES list:

AUTO_BASELINE_CASES: List[AutoBaselineCase] = [
    AutoBaselineCase(
        name="qwen3-8B_auto",
        description="Qwen3-8B decode, baseline ctx=1536 vs compare ctx=1500",
        baseline_input=UserInputConfig(
            device="ATLAS_800_A2_376T_64G",
            model_id="Qwen/Qwen3-8B",
            num_queries=32,
            query_len=1,
            context_length=1536,
            do_compile=True,
            decode=True,
            tp_size=2,
            world_size=2,
        ),
        compare_input=UserInputConfig(
            device="ATLAS_800_A2_376T_64G",
            model_id="Qwen/Qwen3-8B",
            num_queries=32,
            query_len=1,
            context_length=1500,
            do_compile=True,
            decode=True,
            tp_size=2,
            world_size=2,
        ),
        tolerance=0.05,
    ),
]

Auto-Baseline Fields

Field	Type	Default	Description
`name`	`str`	required	Unique case identifier
`description`	`str`	required	Case description
`baseline_input`	`UserInputConfig`	required	Baseline inference configuration
`compare_input`	`UserInputConfig`	required	Comparison inference configuration
`tolerance`	`float`	`0.05`	Tolerance (5%)

Running

# Run all auto-baseline cases
python -m pytest tests/benchmark/models/auto_baseline.py -v -s

# Filter by name
python -m pytest tests/benchmark/models/auto_baseline.py -k "qwen3-8B" -v -s

Quick Start

1. Add a Case

Create a JSON file under cases/:

{
  "type": "text",
  "name": "your_case_name",
  "description": "your description",
  "initial_time_s": 0.300,
  "baseline_time_s": 0.322,
  "user_input": {
    "device": "YOUR_DEVICE",
    "model_id": "your/model/id",
    "num_queries": 1,
    "query_len": 6600,
    "do_compile": true,
    "tp_size": 2,
    "world_size": 2
  }
}

2. Run

python -m pytest tests/benchmark/models/test_model_regression.py -v --tb=short

Note: The operators field in each case JSON must be populated before the regression tests can pass. Without operator baseline data, the test will fail with a clear message directing you to generate the baseline first. See the onboarding process above for details.

3. Quick Self-Test

python -m pytest tests/benchmark/models/auto_baseline.py -v -s