msmodeling Test Case Generator
Generate structurally correct test cases that conform to the msmodeling three-layer testing framework.
When to Use
- User asks to add a new test case for a feature, model, or source symbol
- User asks to extend an existing test module with additional scenarios
- User asks to create a smoke guard for a nightly-marked test
- User asks to add a benchmark precision or performance baseline case
- User provides a model ID, feature name, or source file path and wants test coverage
When NOT to Use
- User asks to modify production source code under
cli/,serving_cast/,tensor_cast/,web_ui/, orscripts/helpers/— this skill only generates test code undertests/ - User asks to change
pyproject.toml,conftest.py, or shell scripts — out of scope (see conftest hygiene below if the user only needs guidance) - User asks to debug a failing test — this skill generates new cases, not triage
- User asks to generate mock data, model weights, or asset files — only test
.pyor benchmark.jsonfiles - User asks to generate tests for third-party libraries or external dependencies — only msmodeling product code
Inputs
Collect these from the user before generating. If any is missing, ask explicitly.
| Input | Required | If Not Provided |
|---|---|---|
| What to test — feature name, model ID, or source file/symbol path | Yes | Ask the user to specify |
| Test intent — quick path check, functional verification, or precision/performance baseline | Yes | Ask the user to choose; if ambiguous, infer from the "What to test" description and confirm |
Model ID (e.g., Qwen/Qwen3-30B-A13B) |
Conditional — required if the test involves model compilation or inference | Ask the user; if they don't know, suggest looking up tests/assets/model_config/ for available IDs |
| Directory preference — smoke, regression, or benchmark | No | Infer from test intent (see Decision Tree below) |
| Existing test file to extend | No | If the user mentions an existing module, extend it; otherwise create a new file |
Output Format
Produce exactly one of the following, matching the test intent:
1. Python test file (smoke or regression)
- File path under
tests/smoke/ortests/regression/ - Module docstring describing what the test covers
- Imports from
tests/helpers/shared utilities only — no copy-pasted builder or assertion logic unittest.TestCaseclass or standalonetest_*functions@pytest.mark.nightlyonly when the test runs full compile withdo_compile=Trueand exceeds 300 seconds@pytest.mark.npuonly when NPU hardware is required
2. Benchmark JSON case file
- File path under
tests/benchmark/models/cases/(model-level) ortests/benchmark/ops/perf_database/(op-level) - JSON with
name,description,user_input,baseline_time_s, andtolerancefields baseline_time_s = 0when no baseline exists yet
3. Smoke guard companion
When generating a @pytest.mark.nightly test, also generate or extend a corresponding smoke test under tests/smoke/ that exercises the same path with num_hidden_layers_override=1 and do_compile=False.
Framework Conventions
Directory-Driven Layering
Test intent is expressed by directory placement, not by markers.
| Directory | Layer | When to use |
|---|---|---|
tests/smoke/ |
Smoke | Quick path validation, PR-level guard, lightweight compile checks, under 10 seconds |
tests/regression/ |
Regression | Functional / integration verification, default destination for new cases |
tests/benchmark/models/ |
Benchmark (model) | Model-level precision or performance baseline |
tests/benchmark/ops/perf_database/ |
Benchmark (op) | Operator-level performance database |
Never add layer markers (smoke, regression, benchmark). Only two markers exist:
@pytest.mark.nightly— long-running compile paths (excluded from ci_gate mapped/guard wave; new/modified test files run in the changed-test wave with-m not npuonly)@pytest.mark.npu— requires NPU hardware (excluded from allrun_*.sh)
Decision Tree
-
Is it a quick path check? →
tests/smoke/- Use local tiny configs from
tests/assets/model_config/withnum_hidden_layers_override=1 - Assert basic reachability:
build_modelsucceeds,ModelRunner.run_inferencereturns, CLI exit code is 0 - Keep it under 10 seconds
- Use local tiny configs from
-
Is it functional/integration verification? →
tests/regression/- Place under the appropriate subdirectory (
tensor_cast/,serving_cast/,cli/,web_ui/,scripts/) - Use
get_session_model/get_session_hf_configfor model construction - Use
tests/helpers/assert_utils.pyfor assertions - If the test takes > 300 seconds with
do_compile=True, add@pytest.mark.nightlyand create a corresponding smoke guard undertests/smoke/
- Place under the appropriate subdirectory (
-
Is it a precision/performance baseline? →
tests/benchmark/- Model-level: create a JSON config under
tests/benchmark/models/cases/ - Op-level: add to
tests/benchmark/ops/perf_database/ - Set
baseline_time_sandtolerancein the JSON; use0if no baseline exists yet
- Model-level: create a JSON config under
Shared Helpers
Always prefer these over copy-paste:
| Module | Purpose | Key API |
|---|---|---|
tests/helpers/config_factory.py |
Build UserInputConfig |
create_user_config(model_id, **overrides) |
tests/helpers/model_builder.py |
Build TransformerModel |
build_transformer_model(user_config) |
tests/helpers/assert_utils.py |
Assert model metrics | assert_model_metrics_valid(result, test_name) |
tests/helpers/op_registry.py |
Op registry for unit tests | build_op_registry(cfg_registry) |
tests/helpers/fake_subprocess.py |
Subprocess stubs for CLI tests | FakeSubprocess |
Session-Level Fixtures (regression)
Regression tests under tests/regression/tensor_cast/ can reuse session-scoped caches:
from tests.regression.tensor_cast.conftest import get_session_model, get_session_hf_config
get_session_model(user_config)— returns a cachedTransformerModel(built once per session)get_session_hf_config(model_id)— returns a cached HuggingFace config
Always use these instead of calling build_model() inside each test function.
conftest.py Hygiene
When the user asks for mocks, torch avoidance, or a new conftest.py under a regression subdirectory:
| Do | Do not |
|---|---|
Use fixture-scoped monkeypatch / @patch in the test file |
Set sys.modules["tensor_cast"] = MagicMock() (or similar) at conftest import time |
Rely on real torch / tensor_cast (project dependencies) |
Assume web_ui or CLI tests can mock product packages globally |
| Add directory-local fixtures only | Add pytest_plugins in a subdirectory conftest (only valid in tests/conftest.py) |
Mention that tests/**/conftest.py changes trigger CI full suite |
Expect incremental CI gate to catch cross-directory pollution from conftest alone |
Regression guard: tests/smoke/test_conftest_hygiene.py. Root pytest_plugins in tests/conftest.py shares tensor_cast / serving_cast fixtures across layers — that is separate from import mocking.
Templates
Smoke Test
"""Smoke test for <feature>."""
import pytest
from tests.helpers.config_factory import create_user_config
from tests.helpers.model_builder import build_transformer_model
def test_<feature>_smoke():
user_config = create_user_config(
"<model_id>",
num_hidden_layers_override=1,
do_compile=False,
)
model = build_transformer_model(user_config)
assert model is not None
Regression Test
"""Regression test for <feature>."""
import unittest
from tests.helpers.config_factory import create_user_config
from tests.helpers.assert_utils import assert_model_metrics_valid
from tests.regression.tensor_cast.conftest import get_session_model
class Test<Feature>(unittest.TestCase):
def test_<scenario>(self):
user_config = create_user_config("<model_id>", do_compile=False)
model = get_session_model(user_config)
result = model.run_inference(...)
assert_model_metrics_valid(result, "test_<scenario>")
Nightly Regression Test
"""Nightly regression test for <feature> (full compile)."""
import unittest
import pytest
from tests.helpers.config_factory import create_user_config
from tests.helpers.assert_utils import assert_model_metrics_valid
from tests.regression.tensor_cast.conftest import get_session_model
@pytest.mark.nightly
class Test<Feature>Nightly(unittest.TestCase):
def test_<scenario>_nightly(self):
user_config = create_user_config("<model_id>", do_compile=True)
model = get_session_model(user_config)
result = model.run_inference(...)
assert_model_metrics_valid(result, "test_<scenario>_nightly")
Benchmark JSON Case
{
"name": "<case_name>",
"description": "<what this case validates>",
"user_input": {
"model_id": "<model_id>",
"do_compile": true
},
"baseline_time_s": 0,
"tolerance": 0.20
}
Boundaries — What NOT to Do
- Do not modify production source code — this skill generates test code only
- Do not add layer markers (
smoke,regression,benchmark) — layering is directory-driven - Do not copy-paste builder or assertion logic — always import from
tests/helpers/ - Do not call
build_model()directly in regression tests — useget_session_model()orget_session_hf_config()session fixtures - Do not add
@pytest.mark.nightlyto tests that complete under 300 seconds — nightly is for long-running compile paths only - Do not add
@pytest.mark.npuunless the test truly requires NPU hardware and cannot run on CPU - Do not create new helper modules under
tests/helpers/without checking existing ones first - Do not hardcode model weights or file paths — use
tests/assets/model_config/for configs andcreate_user_config()for construction - Do not generate tests for third-party library internals — only test msmodeling product code
- Do not skip the smoke guard when generating a nightly test — every
@pytest.mark.nightlycase must have a corresponding smoke counterpart - Do not generate conftest code that mutates
sys.modulesfortensor_cast,serving_cast, or other product packages at import time - Do not suggest
pytest_pluginsin subdirectory conftest files — register cross-layer fixtures only intests/conftest.py
Checklist (verify before outputting)
- Case is in the correct directory
- No layer markers — only
nightlyornpuwhen applicable - Shared helpers used (no copy-paste of builder/assertion logic)
- Session fixtures used for model construction in regression
- If
@pytest.mark.nightly, a smoke guard is mentioned or co-generated - No generated conftest uses module-level
sys.modulesmocks for product packages - New product symbols are covered, listed in
exemptions.sources, omitted viapyproject.toml[tool.coverage.run] omitwhen appropriate, or satisfied by.coveragefallback during pre-run policy; failing or blocked test nodes may be listed underexemptions.tests(pytest node id with::)
CI Gate Policy
When adding tests for product code under cli/, tensor_cast/, serving_cast/, web_ui/, scripts/, or tools/:
- Prefer real coverage — regression tests should execute changed symbols so nightly
test_mapmaps them. - ci_gate flow (
scripts/helpers/ci_gate/main.py):- Pre-run policy: classify diff →
build_ci_gate_plan→blocking_errorsnon-empty → exit 1, pytest skipped. - Execution (policy passed):
compute_execution_planunions changed-test nodes, mapped regression nodes, and deleted-source guard nodes; deduplicates by node id; runs up to two waves — changed tests-m not npu; mapped/guard tests-m "not npu and not nightly and not network"(respectingexemptions.tests). Config change → single wavetests/-m not npu. Collect-first xdist,-vv, no--cov. Selected-test failure prints copy-pasteexemptions.testsYAML; full-suite failure does not.
- Pre-run policy: classify diff →
tests/.ci/gate_policy.yaml:roots— product source prefixes (SSOT for diff classification,test_mapkeys, coverage packages); must end with/exemptions.sources— temporary product-symbol waivers (path::symbolunderroots); skipstest_mapcoverage checks for matching source symbols during pre-run policyexemptions.tests— temporary pytest-node waivers; eachsymbolsentry is a pytest node id (tests/.../test_foo.py::test_bar); skips matching nodes in changed-test collection and mapped/guard scheduling. Both exemption kinds requirereason,applicant,approver,deadline.- Symbol formats —
exemptions.sources:product/path.py::qualified_name(exactly one::).exemptions.tests: pytest node id with::(file + test function or unittest method); no parametrized bracket ids ([...]); no class-only ids (::TestClasswithout a method) — register the concrete test node id instead. test_discovery— which paths undertests/count as gate test modules- Changing
gate_policy.yamldoes not trigger full-suite pytest; approver validation runs viavalidate_gate_policy_if_changed
- Source omit SSOT —
pyproject.toml[tool.coverage.run] omit(not gate_policy); e.g.*/builtin_model/*skips gate checks andtest_mapcollection for matching product paths underroots. - Coverage fallback — during pre-run policy, unmapped changed symbols may pass when repo-root
.coverageshows an executed line in the symbol span (import-time/conftest empty context viasymbol_lines_covered_in_data). Not a separate Phase 0 pytest+cov step; nightly maintains the externaltest_map. - Local verify — shell scripts use
-o addopts=to clear pyproject default markers, then apply their own-m; run smoke/regression with-vvbefore merge.
Example test exemption (node-level):
exemptions:
tests:
- symbols:
- tests/regression/cli/test_run.py::test_run
reason: "Blocked on upstream fixture; tracked in issue-123"
applicant: alice
approver: fangkai
deadline: 2026-12-31
ticket: "issue-123"