ascend-robot优化 msModeling README 与中英文文档结构

Design Document: TensorCast New Model Adaptation Efficiency

Revision History

Date	Version	Change Description	Author	RFC Document
2026-06-03	1.0	Initial design for the model adaptation efficiency workflow	kai1949, codex	N/A

1. Background

TensorCast model adaptation currently depends on repeated manual investigation: adapter authors inspect HuggingFace model source, infer ModelProfile fields, discover runtime incompatibilities, map MindStudio Insight kernels to TensorCast semantic operators, and then construct regression evidence. This process is costly because each model has different module naming, multimodal layouts, MoE/MLA/MTP details, runtime checks, and backend fusion behavior.

The proposed feature introduces a reusable adaptation workflow that combines deterministic adapter programs with large-model skills. The deterministic programs shall parse command and profiling inputs, inspect model structure, validate profile candidates, classify failures, generate evidence drafts, and verify actual TensorCast behavior. The large-model skill shall assist only in the areas where deterministic analysis is insufficient, such as source-code semantic reasoning, patch-method authoring, and uncertain operator mapping review. AI output shall remain a draft until deterministic gates and human review accept it.

The primary goals are:

Goal	Design Direction	Success Signal
Reduce required user input	Require only a TensorCast simulation command and matching MindStudio Insight raw profiling export	Doctor report can be produced from the two inputs
Make adaptation traceable	Preserve normalized command, raw profiling provenance, hints, profile candidates, and verification reports	Every generated artifact references its evidence source
Improve correctness	Gate AI-generated profile, patch, and evidence drafts through validation, dry-run, and verifier checks	Failures are classified with actionable next steps
Enable reusable regression	Convert verified evidence into ST guardrail cases	Adapted models have repeatable count and latency checks
Support blind replay	Validate the workflow by re-adapting an already supported model while hiding its existing profile	Replay discovers the expected Qwen3-VL structure and patch needs

The design intentionally separates user-facing operating instructions from software design. Detailed commands, input formats, and output review steps are documented in docs/en/user_guide/msmodeling_tensor_cast_new_model_adaptation_user_guide.md.

2. Design

2.1 Design Principles

The adaptation system shall follow four principles:

Principle	Meaning
Deterministic first	Programmatic inspection, validation, parsing, and verification run before AI reasoning is accepted.
Minimal human checkpoint	When uncertainty remains, the system asks for the smallest confirmed fact instead of asking the user to explain a full model.
Provenance preserving	Reports retain the raw command, raw profiling source, hints, candidate fields, and confidence values.
Replayable by design	Existing profiles may be ignored only in replay or audit mode, so the process can be tested without reading the known answer.

2.2 System Context

flowchart LR
    User["Adapter Author"]
    Skill["Large-Model Skill"]
    CLI["model_adapter CLI"]
    Doctor["Doctor Engine"]
    Registry["ModelProfile Registry"]
    HF["Installed Transformers Source"]
    Insight["MindStudio Insight Raw Export"]
    Evidence["Evidence YAML"]
    Verifier["Evidence Verifier"]
    ST["ST Guardrail Case"]

    User -->|"simulation command"| CLI
    User -->|"raw profiling export"| CLI
    User -->|"optional hints"| CLI
    CLI --> Doctor
    Insight --> Doctor
    Doctor --> HF
    Doctor --> Registry
    Doctor -->|"candidate profile, questions, AI tasks"| User
    Doctor -->|"bounded prompt + deterministic findings"| Skill
    Skill -->|"reviewable patch or mapping draft"| User
    User -->|"reviewed profile or hints"| Registry
    Doctor -->|"evidence draft"| Evidence
    Evidence --> Verifier
    Verifier -->|"classified result"| User
    Verifier -->|"verified report"| ST

The CLI is the public entry point. The doctor engine coordinates deterministic subsystems. The large-model skill is not a replacement for the doctor; it is a bounded assistant that receives structured evidence and produces reviewable drafts.

2.3 Component Responsibilities

Component	Responsibility	Non-Goal
Adaptation Context	Parse a TensorCast simulation command into normalized model, workload, device, parallelism, quantization, and multimodal parameters	Guess a workload that was not supplied
Raw Insight Importer	Convert Insight raw export rows into normalized kernel summaries and total forward timing	Require the user to hand-write evidence
Structure Inspector	Scan the installed model tree for attention, MoE, MLA, MTP, and VL facts	Infer fields from model names alone
Profile Candidate Materializer	Create a minimal `ModelProfile` candidate with review-friendly fields	Mark a profile as verified before validation and runtime checks
Profile Validator	Check profile field type, required fields, default elision, and callable patch methods	Prove runtime semantic equivalence
Patch Discovery	Classify dry-run or smoke failures and produce bounded AI assistance tasks	Generate model-specific patch code directly
Evidence Builder	Merge raw profiling, normalized command, actual summaries, and hints into an evidence draft	Hide low-confidence mappings
Evidence Verifier	Compare expected evidence with actual TensorCast summaries and classify mismatches	Treat all mismatches as generic failures
ST Case Generator	Convert verified reports into regression guardrail case drafts or verified cases	Generate verified cases from unverified evidence

2.4 4+1 Architecture Views

The design can be described with a 4+1 view model so that functional behavior, runtime execution, source organization, deployment dependencies, and validation scenarios are reviewed separately.

View	Design Focus	Main Stakeholder	Key Artifact
Logical view	Domain abstractions, report schema, profile/evidence model	Adapter author, reviewer	Data model and UML class diagrams
Process view	Runtime workflow, iteration states, AI handoff boundaries	Adapter author, test owner	Sequence and state diagrams
Development view	Source modules and dependency direction	Maintainer	Component/module dependency diagram
Physical view	Local runtime, external artifacts, optional AI assistant	Tooling owner	Deployment diagram
Scenario view	Representative model onboarding and replay/audit cases	Reviewer, acceptance owner	Usage case and test scenarios

2.4.1 Logical View

The logical view centers on immutable evidence and reviewable drafts. The doctor report shall preserve raw inputs, deterministic findings, candidate profile fields, validation results, AI task packages, and evidence drafts in one reviewable object. A profile shall not be treated as verified merely because it is materialized; it moves toward verification only through validation, runtime checks, evidence review, and verifier results.

flowchart TB
    Command["Simulation Command"]
    RawInsight["Raw Insight Export"]
    Hints["Hints Ledger"]
    Context["AdaptationContext"]
    Structure["ModelStructureFacts"]
    Candidate["ProfileCandidate"]
    Profile["ModelProfile"]
    DoctorReport["DoctorReport"]
    EvidenceDraft["Evidence Draft"]
    EvidenceYaml["Evidence YAML"]
    VerifyReport["VerificationReport"]

    Command --> Context
    RawInsight --> EvidenceDraft
    Hints --> EvidenceDraft
    Context --> DoctorReport
    Structure --> Candidate
    Candidate --> Profile
    Candidate --> DoctorReport
    Profile --> DoctorReport
    EvidenceDraft --> DoctorReport
    DoctorReport --> EvidenceYaml
    EvidenceYaml --> VerifyReport

2.4.2 Process View

The process view is an iterative control loop. Deterministic tooling narrows the unknowns first. Human review and large-model skill assistance are used only at explicit checkpoints.

flowchart LR
    Intake["Input Intake"]
    Doctor["Doctor Run"]
    Review["Human Review"]
    AITask["AI Task Handoff"]
    Register["Profile Registration"]
    Evidence["Evidence Review"]
    Verify["Verification"]
    ST["ST Case"]
    Iterate["Classified Iteration"]

    Intake --> Doctor
    Doctor --> Review
    Review -->|"profile accepted"| Register
    Review -->|"patch needed"| AITask
    AITask --> Register
    Register --> Doctor
    Doctor --> Evidence
    Evidence --> Verify
    Verify -->|"passed or accepted gap"| ST
    Verify -->|"classified mismatch"| Iterate
    Iterate --> Review

2.4.3 Development View

The development view keeps adapter automation isolated from model execution and registry internals. The CLI depends on adapter services; adapter services depend on TensorCast model building, registry, runtime summaries, and profiling parsers. Built-in model profiles remain the extension point for model-specific metadata and reviewed patch methods.

flowchart TB
    CLI["cli/inference/model_adapter.py"]
    Adapter["tensor_cast/adapter/*"]
    Context["context.py"]
    Insight["insight.py"]
    Inspect["inspect.py"]
    Profile["profile.py / profile_draft.py"]
    Patch["patch_discovery.py / ai_task.py"]
    Evidence["evidence_builder.py / evidence_export.py / evidence.py"]
    Verify["verifier.py / runner.py / st_case.py"]
    Builder["tensor_cast/core/model_builder.py"]
    Registry["tensor_cast/transformers/custom_model_registry.py"]
    Builtins["tensor_cast/transformers/builtin_model/*.py"]
    Runtime["tensor_cast/runtime.py"]

    CLI --> Adapter
    Adapter --> Context
    Adapter --> Insight
    Adapter --> Inspect
    Adapter --> Profile
    Adapter --> Patch
    Adapter --> Evidence
    Adapter --> Verify
    Inspect --> Builder
    Profile --> Registry
    Registry --> Builtins
    Verify --> Runtime
    Builder --> Registry

2.4.4 Physical View

The physical view assumes a local development host. Model source, TensorCast code, raw profiling files, reports, and optional AI assistance are separate runtime concerns. The deterministic tools shall not require network access after the model source and dependencies are installed.

flowchart TB
    subgraph Host["Linux/WSL Development Host"]
        Repo["msmodeling repository"]
        Python["Project Python Environment"]
        Reports["reports/<case_name>/ artifacts"]
        CLI["model_adapter CLI process"]
        TC["TensorCast runtime/model builder"]
        Transformers["Installed transformers package"]
    end

    Insight["MindStudio Insight raw export"]
    Skill["Large-Model Skill or AI Assistant"]
    OptionalModelCache["Optional local model/config cache"]

    Insight --> Reports
    OptionalModelCache --> Transformers
    Repo --> Python
    Python --> CLI
    CLI --> Reports
    CLI --> TC
    TC --> Transformers
    CLI -->|"bounded prompt text"| Skill
    Skill -->|"reviewable draft"| Reports

2.4.5 Scenario View

The scenario view validates the architecture against representative cases:

Scenario	Purpose	Expected Design Behavior
New dense text model	Confirm minimal-profile path	Doctor emits only source-backed required fields and evidence draft
New MoE or MLA model	Confirm structure discovery and profile validation	Candidate fields include module names, expert keys, and validation issues if any
New VL model	Confirm visual path and linear mapping discovery	Candidate includes visual/language/layer paths and mapping patterns
Runtime patch failure	Confirm AI task boundary	Doctor classifies failure and emits bounded AI task, not final patch code
Qwen3-VL blind replay	Confirm no answer leakage	Existing Qwen3-VL profile is hidden during discovery and used only as final oracle

2.5 Current Module and Surrounding Component Relationships

The feature is designed as an adapter automation layer around existing TensorCast responsibilities. It shall not replace model building, profile registration, quantization, runtime execution, or performance modeling. Instead, it coordinates those components and records structured adaptation evidence.

Existing Area	Relationship to Adapter Workflow	Design Constraint
`cli/inference/text_generate`	Provides the simulation command whose semantics are parsed into `AdaptationContext`	The adapter command parser shall preserve the original command text
`tensor_cast/core/model_builder.py`	Builds the model instance used for structure inspection and runtime verification	Doctor shall use the same model-building path as normal simulation when possible
`tensor_cast/transformers/custom_model_registry.py`	Stores and resolves `ModelProfile` entries	Replay/audit profile hiding shall be scoped and reversible
`tensor_cast/transformers/builtin_model/*.py`	Holds reviewed model-specific profiles and patch methods	Generated drafts shall be review aids, not automatically trusted code
`tensor_cast/transformers/transformations.py`	Applies model transformations and may emit patch reports	Patch reports shall become review and blocking evidence
`tensor_cast/runtime.py`	Records actual runtime events for verification	Verification shall summarize actual operator counts and timings from runtime events
Performance model modules	Provide analytic or profiling latency estimates	Evidence verification shall distinguish profile errors from performance model coverage gaps
MindStudio Insight exports	Provide external measured kernel evidence	Parser shall preserve raw kernel names, counts, and total forward timing

flowchart LR
    TextGenerate["text_generate command"]
    AdapterCLI["model_adapter CLI"]
    ModelBuilder["Model Builder"]
    Registry["Profile Registry"]
    BuiltinProfile["Built-in Profiles"]
    Transformations["Transformations and Patch Reports"]
    Runtime["Runtime Events"]
    PerfModel["Performance Models"]
    Insight["Raw Insight"]
    Evidence["Evidence Verification"]

    TextGenerate --> AdapterCLI
    Insight --> AdapterCLI
    AdapterCLI --> ModelBuilder
    ModelBuilder --> Registry
    Registry --> BuiltinProfile
    ModelBuilder --> Transformations
    Transformations --> Runtime
    Runtime --> Evidence
    PerfModel --> Evidence
    AdapterCLI --> Evidence

2.6 Interface Design

The design exposes three user-facing CLI operations and several structured artifact interfaces. Detailed command examples are maintained in the usage guide; this section defines the design contract.

2.6.1 CLI Operations

Operation	Required Inputs	Optional Inputs	Output	Failure Contract
`doctor`	model id or command file	raw Insight file, hints file, failure log, ignored profiles, profile draft output	JSON doctor report and optional Python profile draft	Must report field-level validation, hint conflicts, or classified patch failure rather than a generic exception where possible
`export-evidence`	doctor report with `evidence_draft`	output path	YAML evidence document	Must fail if the report has no exportable evidence draft
`verify`	evidence YAML and resolvable model id	device/runtime overrides, ST case output	JSON verification report and optional ST case JSON	Must classify mismatch categories and return non-zero on failed verification

2.6.2 Report Interfaces

Artifact	Producer	Consumer	Required Design Fields
`AdaptationContext`	command parser	doctor, evidence builder, verifier	`model_id`, `raw_command`, `normalized_args`, artifact paths
`RawInsightSummary`	raw Insight parser	evidence builder, hint conflict checker	`totals`, normalized kernels, categories, counts, timing
`DoctorReport`	doctor engine	human reviewer, skill, evidence exporter	context, raw summary, candidate profile, validation, patch discovery, AI tasks, evidence draft
`AiAssistanceTask`	patch discovery or unsupported-semantics classifier	large-model skill, human reviewer	task type, deterministic evidence, suspected locations, constraints, required output, verification commands, prompt text
`EvidenceDocument`	evidence exporter or reviewer	verifier, ST case generator	model metadata, cases, expected total forward, major ops, tolerances, confidence, accepted gaps
`VerificationReport`	verifier	reviewer, ST case generator	pass/fail, issue categories, severities, suggestions, actual summaries

2.6.3 Interface Stability Rules

Rule	Rationale
Reports shall use explicit field names and source/confidence annotations	Reviewers and tests need stable provenance
Draft artifacts shall be distinguishable from verified artifacts	Prevent accidental promotion of unreviewed AI or heuristic output
Replay/audit options shall be explicit opt-in flags	Avoid silently hiding production profiles
Optional hints shall be additive and conflict-reporting	Preserve deterministic facts from command and raw profiling inputs
AI prompt text shall be generated from structured tasks	Keep AI assistance bounded and reproducible

2.7 Data Model

classDiagram
    class AdaptationContext {
        +str model_id
        +str raw_command
        +dict normalized_args
        +dict artifacts
        +to_dict()
    }

    class RawInsightSummary {
        +KernelTotals totals
        +list kernels
        +float total_wall_duration_ms
        +to_dict(top_n)
    }

    class HintLedger {
        +list hints
        +conflicts_with_raw_insight(summary)
        +to_dict()
    }

    class ModelStructureFacts {
        +str model_type
        +dict module_paths
        +dict expert_fields
        +dict visual_paths
    }

    class ProfileCandidate {
        +CandidateField model_type
        +CandidateField moe_module_name
        +CandidateField visual_module_path
        +CandidateField patch_method
    }

    class DoctorReport {
        +dict adaptation_context
        +dict raw_insight_summary
        +dict candidate_profile
        +dict candidate_profile_validation
        +dict evidence_draft
        +list human_questions
        +list ai_tasks
    }

    class EvidenceDocument {
        +dict model
        +list cases
    }

    class VerificationReport {
        +bool passed
        +list issues
        +list suggestions
    }

    AdaptationContext --> DoctorReport
    RawInsightSummary --> DoctorReport
    HintLedger --> DoctorReport
    ModelStructureFacts --> ProfileCandidate
    ProfileCandidate --> DoctorReport
    DoctorReport --> EvidenceDocument
    EvidenceDocument --> VerificationReport

The report schema shall preserve both candidate data and validation results. This enables reviewers to distinguish "detected", "reviewed", and "verified" states.

The main data entities shall use the following lifecycle semantics:

Entity	Draft State	Reviewed State	Verified State
`ProfileCandidate`	Generated by structure inspection	Human-reviewed and materialized as `ModelProfile`	Validated and exercised by dry-run/smoke/verification
`AiAssistanceTask`	Generated by deterministic failure classification	Prompt response reviewed by adapter author	Resulting patch passes doctor and verification
`EvidenceDocument`	Exported from `evidence_draft`	Counts, confidence, tolerances, and accepted gaps reviewed	Verifier passes or gaps are accepted
ST case	Generated from verifier report	Reviewed for workload and tolerance	Marked verified only after passing evidence verification

2.8 Core Workflow

sequenceDiagram
    autonumber
    participant A as Adapter Author
    participant C as model_adapter CLI
    participant D as Doctor Engine
    participant M as Installed Model Source
    participant S as Large-Model Skill
    participant V as Evidence Verifier

    A->>C: Provide command file and raw Insight export
    C->>D: Build AdaptationContext and RawInsightSummary
    D->>M: Build/inspect model tree
    M-->>D: Structure facts
    D->>D: Materialize and validate profile candidate
    D->>D: Build evidence draft and human questions
    D-->>A: Doctor report
    alt runtime failure requires patch
        A->>C: Provide failure log
        C->>D: Run patch discovery
        D-->>A: AI assistance task with deterministic evidence
        A->>S: Submit bounded prompt
        S-->>A: Patch-method draft
        A->>A: Human review and profile update
    end
    A->>C: Export reviewed evidence
    C->>V: Run actual TensorCast case against evidence
    V-->>A: Pass/fail report with classified issues

The workflow is iterative. A failed verifier result shall route to a specific next action: adjust profile fields, revise a patch, add hints, update operator mapping, accept a documented fusion gap, or regenerate ST evidence.

2.9 State Model

stateDiagram-v2
    [*] --> InputsReady
    InputsReady --> DoctorDrafted: context + profiling parsed
    DoctorDrafted --> NeedsHumanHint: low confidence or conflicts
    DoctorDrafted --> NeedsPatch: runtime failure classified
    NeedsHumanHint --> DoctorDrafted: hints added
    NeedsPatch --> PatchDrafted: AI task completed
    PatchDrafted --> ProfileRegistered: human review accepted
    DoctorDrafted --> ProfileRegistered: candidate accepted
    ProfileRegistered --> EvidenceReviewed: evidence exported and reviewed
    EvidenceReviewed --> Verified: verifier passed
    EvidenceReviewed --> GapAccepted: reviewed accepted gap
    EvidenceReviewed --> DoctorDrafted: mismatch requires iteration
    Verified --> STGenerated
    GapAccepted --> STGenerated
    STGenerated --> [*]

This state model prevents accidental promotion of drafts. Only Verified or a reviewed GapAccepted state may produce a verified ST guardrail.

2.10 Large-Model Skill and Deterministic Adapter Cooperation

The design uses a two-lane cooperation model:

Lane	Inputs	Outputs	Trust Boundary
Deterministic adapter program	Command file, raw profiling export, hints, installed model tree, failure log	Candidate profile, validation report, evidence draft, AI task package, verification report	Authoritative for parsing, validation, counts, and pass/fail classification
Large-model skill	Structured AI task, failure evidence, source snippets, constraints, required outputs	Patch-method draft, mapping review draft, explanation of uncertain semantics	Advisory only; must be reviewed and rechecked by deterministic gates

The doctor shall package AI tasks with deterministic findings, suspected locations, constraints, and verification commands. The skill shall not invent a profile from the model name alone, shall not hand-write evidence from scratch, and shall not bypass validation. This cooperation is the main efficiency mechanism: deterministic tools narrow the problem, while the skill accelerates the small amount of semantic work that remains.

2.11 Quality Attribute Design

The following quality attributes are part of the design, not post-hoc implementation preferences.

Attribute	Design Mechanism	Review Signal
Maintainability	Keep adapter automation in `tensor_cast/adapter/*`; keep model-specific metadata in built-in profile files; use structured dataclasses and explicit report schemas	New model rules do not require broad changes across unrelated modules
Extensibility	Add new inspectors, hint kinds, evidence categories, and failure classifiers behind stable report interfaces	New model families can extend candidate materialization without changing CLI contracts
Applicability	Support dense, MoE, MLA, MTP, VL, quantized, compile, and parallel workloads through normalized command fields and optional profile fields	Unsupported model behavior is reported as a bounded task or accepted limitation
Testability	Every deterministic stage has a serializable artifact and can be unit tested independently	Tests can assert parser output, candidate fields, validation issues, evidence, verifier classifications, and replay behavior
Security	Treat raw commands, profiling files, hints, and AI drafts as untrusted inputs; avoid executing AI output automatically; keep private paths and raw internal notes out of committed artifacts	Generated prompts are bounded; reviewed code is required before patch methods enter profiles
Traceability	Preserve source, confidence, raw command, raw profiling path, and hint provenance	A reviewer can explain why each profile/evidence field exists
Reproducibility	Use case directories, stable JSON/YAML artifacts, and replay profile hiding	A case can be rerun and compared with the same inputs

2.11.1 Maintainability and Extensibility

The adapter layer shall follow extension points rather than hard-coded model-specific branches whenever possible:

Extension Point	Intended Extension	Guardrail
Structure inspector	New module pattern detectors for model families	Candidate fields must include source and confidence
Profile materializer	New recipe hints for MoE/MLA/VL/MTP families	Defaults and empty overrides must be omitted from review output
Patch discovery classifier	New failure taxonomy entries	AI tasks must include deterministic evidence and verification commands
Evidence builder	New kernel category or mapping rules	Low-confidence mappings must remain visible
Verifier	New issue categories or accepted-gap policy	Failed verification must provide next actions

2.11.2 Security and Safety

The workflow handles local commands, model paths, profiling exports, and AI drafts. The design therefore treats all non-code artifacts as untrusted input until parsed and reviewed.

Risk	Mitigation
Raw command contains unintended shell behavior	Command parser shall normalize supported TensorCast arguments rather than execute arbitrary shell text during parsing
Raw Insight or hints contain malformed data	Parsers shall validate required fields and report conflicts
AI patch draft changes real-model semantics	Patch methods require human review and deterministic verification
Private paths leak into committed artifacts	Reports and prompts should use repo-relative paths where possible; submission checklist rejects local-only notes
Replay mode hides production profiles accidentally	`--ignore-existing-profile` is explicit, scoped, and restored after the replay context

2.12 Usage Case

The following example illustrates intended usage without prescribing command details, which are covered in the user guide.

An adapter author wants to onboard a virtual model ExampleVL-7B. The author collects a TensorCast simulation command for a prefill workload and the matching MindStudio Insight raw profiling export. The doctor builds an adaptation context, scans the installed transformers implementation, detects a VL module path, drafts a minimal ModelProfile, and emits evidence for the top attention and visual MLP kernels. A dry-run failure shows data-dependent placeholder masking in the model source, so patch discovery emits a PATCH_METHOD_AUTHORING task. The author gives the bounded prompt to the model-adaptation skill, reviews the patch draft, registers the profile, exports evidence, and runs verification. When verification passes, the verified report becomes a regression guardrail case.

2.13 Scope, Applicability, and Constraints

Topic	Constraint
Required inputs	The workflow shall start from a simulation command and a matching raw Insight export.
Existing profiles	Normal adaptation may use existing profiles and recipes as references. Replay/audit mode may ignore named profiles to avoid circular validation.
Patch methods	Patch methods shall target TensorCast simulation compatibility and preserve normal tensor semantics as much as possible.
Evidence confidence	Low-confidence mappings shall be explicit and may produce review questions or warnings instead of false certainty.
Raw profiling	The raw Insight export shall include a `Totals` row so total forward latency can be compared.
Verification	Passing verification depends on available TensorCast operator coverage and performance model coverage; unsupported backend fusion may be recorded as an accepted gap after review.

Applicability by model category:

Model Category	Supported Adaptation Focus	Notes
Dense decoder-only text model	`model_type`, attention/runtime evidence	Minimal profile may be enough
MoE model	MoE module name, expert count key, field overrides, routing evidence	Expert storage patterns may need custom expert wrappers
MLA model	MLA module name, TensorCast MLA class, field overrides	Validation shall reject incomplete MLA profile fields
MTP/speculative model	MTP block path and repeated-block behavior	Verification should include count-sensitive cases
Vision-language model	Visual/language paths, visual layer path, merger/MLP mappings, placeholder patch tasks	Qwen3-VL replay is the reference stress case
Quantized or parallel workload	Quantization and TP/DP/EP/MoE parallel normalized args	Evidence must account for communication or accepted gaps

3. Usage Instructions

This design document provides only the conceptual usage case and constraints. All step-by-step guidance, including exact commands, required files, optional hints, doctor outputs, patch authoring handoff, evidence export, verification, ST case generation, and Qwen3-VL replay/audit procedure, shall be maintained in docs/en/user_guide/msmodeling_tensor_cast_new_model_adaptation_user_guide.md.

The user-facing workflow shall expose these public operations:

Operation	Purpose	Expected Artifact
Doctor	Inspect inputs, model structure, candidate profile, evidence draft, questions, and AI tasks	JSON doctor report
Export Evidence	Convert reviewed doctor evidence draft to YAML	Evidence YAML
Verify	Run TensorCast and compare actual behavior with evidence	JSON verification report
ST Case Output	Generate guardrail cases from verified or draft reports	ST case JSON
Replay/Audit	Hide an existing profile and re-run adaptation discovery	Replay doctor report

The guide shall also describe the required review gates:

Gate	Reviewer Checks
Candidate profile	Field values are minimal, source-backed, and validation passes
Patch draft	Patch is scoped to the failing simulation path and preserves expected semantics
Evidence YAML	Case input, expected counts, total latency, confidence, and accepted gaps are reviewed
Verification report	Failures are classified and resolved or explicitly accepted
ST case	Verified status is used only when evidence verification passes or gaps are reviewed

4. Test Design

4.1 Test Strategy

The test design shall cover deterministic units, integration flows, and end-to-end replay. Tests should verify that the workflow can progress from the two required inputs to a reviewed report without relying on hidden local state or existing profile answers.

The test pyramid shall align with the architecture views:

Test Layer	Architecture View Covered	Primary Risk
Unit tests	Logical and development views	Parser, validator, materializer, evidence, and verifier logic regressions
Integration tests	Process and development views	Doctor report assembly, profile registry scope, export/verify handoff
End-to-end tests	Scenario view	New-model workflow and replay/audit behavior
Security and robustness tests	Physical and interface views	Malformed inputs, untrusted AI drafts, private-path leakage
Documentation and contract checks	Interface view	CLI/report contract drift from usage guidance

4.2 Unit Test Cases

Area	Case	Expected Result
Command parser	Parse a TensorCast simulation command with device, workload, quantization, and parallelism options	`AdaptationContext.normalized_args` matches the command
Raw Insight parser	Parse `Totals` and kernel rows	Total forward timing and normalized kernel names are preserved
Raw Insight validation	Kernel row appears before `Totals`	Parser rejects the file with an actionable error
Hints merge	Hints conflict with raw profiling counts or missing kernels	Conflicts are reported with provenance
Profile review	Default fields and empty overrides are omitted	Review dict contains only required fields
Profile validation	Invalid MoE/MLA override or non-callable patch method	Validation report contains field-specific errors
Patch discovery	Meta tensor placeholder and dynamic-mask failure log	`PATCH_METHOD_AUTHORING` task is generated
Evidence builder	Raw kernels plus hints produce major op evidence	Counts, confidence, and source are present
Verifier	Expected major op is missing, low confidence, or accepted gap	Issue severity and pass/fail result match policy
ST generator	Verified and unverified reports	Verified reports create verified cases; failed reports create draft cases

4.3 Integration Test Cases

Area	Case	Expected Result
Doctor report	Context, raw Insight, hints, and model inspection are combined	Report includes candidate profile, validation, evidence draft, questions, and suggestions
Doctor with failure log	Failure taxonomy detects patch need	Report includes patch discovery and AI task fields
Export evidence	Doctor report has an evidence draft	YAML evidence document is generated without losing case data
Verify with model ID from evidence	Verify command omits positional model ID	Model ID is read from evidence metadata
Actual runner isolation	Verification runs multiple cases	Shared user input is not mutated across cases
Replay registry isolation	Existing profile is ignored inside replay scope only	Registry state is restored after the replay scope

4.4 End-to-End and Replay Test Cases

Scenario	Design
Existing adapted model replay	Select an already adapted text or MoE model, temporarily remove or ignore its profile, re-run structure inspection and candidate materialization, then compare key TensorCast operators with the registered-profile baseline.
Qwen3-VL blind replay	Treat an already adapted Qwen3-VL model as unadapted by using replay/audit profile hiding. The test shall use a tiny config-only fixture to build the installed Qwen3-VL model tree without downloading weights, shall not use the existing TensorCast `qwen3_vl.py` profile as an input, and shall verify that the doctor rediscovers VL paths, visual merger/MLP linear mappings, model family, and patch-authoring evidence.
Qwen3-VL comparison	After blind replay produces a candidate, compare the replay candidate and patch task expectations with the known adapted Qwen3-VL behavior as an oracle. The oracle is used only after replay discovery completes.
Raw Insight evidence flow	Use a representative raw Insight export and command to generate evidence, export YAML, run verification, and classify remaining gaps.

The Qwen3-VL replay test is important because it simulates the exact risk the feature is meant to reduce: adapting a complex VL model without reading an existing TensorCast answer. It should demonstrate that deterministic discovery and the model-adaptation skill can cooperate to recover the same adaptation shape with bounded human review.

4.5 Acceptance Criteria

Category	Criteria
Functional	Doctor can create a complete report from a command file and matching raw Insight export.
Functional	Candidate profiles include source-backed fields and validation results.
Functional	Patch discovery produces AI assistance tasks instead of direct model-specific patch code.
Functional	Evidence export and verification preserve reviewed case data and classify mismatches.
Functional	Verified reports can produce ST guardrail cases.
Quality	AI-generated drafts are not trusted until profile validation, dry-run, evidence verification, and human review pass.
Quality	Replay mode does not depend on the existing profile of the model being replayed.
Quality	Qwen3-VL blind replay can rediscover the expected VL structure and patch need under profile hiding.

4.6 Quality Attribute Test Matrix

Quality Attribute	Test Design	Acceptance Signal
Maintainability	Add a new synthetic model family detector through adapter modules only	No unrelated model builder or runtime changes are required
Extensibility	Add a new hint kind or failure category with focused unit tests	Existing doctor/export/verify interfaces remain compatible
Applicability	Run dense, MoE, MLA, and VL fixture cases through candidate materialization	Each category produces minimal source-backed fields
Testability	Assert every major stage can serialize a deterministic artifact	Tests can inspect JSON/YAML or dataclass dictionaries without running full E2E
Security	Feed malformed raw Insight, conflicting hints, and suspicious failure logs	Parser reports structured errors or AI tasks without executing untrusted content
Traceability	Check generated evidence and profile review output for source/confidence fields	Reviewer can trace every non-default field to a source
Reproducibility	Rerun replay/audit with profile hiding and compare deterministic fields	Registry state is restored and candidate fields are stable

4.7 Interface and Contract Tests

Contract	Test
CLI help remains available	`doctor --help`, `export-evidence --help`, and `verify --help` load successfully
Doctor report schema is stable	Focused tests assert required top-level fields and nested validation fields
Evidence export requires an evidence draft	Export test rejects reports without `evidence_draft`
Verify reads model ID from evidence	Verify test omits positional model id and confirms metadata is used
ST case status follows verification result	Passed reports produce `verified`; failed reports produce `draft`
Replay profile hiding is scoped	Registry state before and after `ignore_existing_profile` is identical

4.8 Security and Robustness Test Cases

Risk	Test Case	Expected Result
Malformed raw Insight	Missing `Totals`, bad numeric values, or kernel rows before totals	Parser rejects input with actionable error
Conflicting hints	Hint count or mapping conflicts with raw profiling	Doctor reports conflicts and human questions
Untrusted AI output	Patch draft is represented only as reviewed profile code, not executed by doctor	Doctor emits task package; no automatic code execution
Private path leakage	Reports and generated prompts are checked for local-only paths before submission	Local-only artifacts are not staged
Replay misuse	`--ignore-existing-profile` is used outside scoped replay context	Tests verify registry restoration and explicit ignored profile reporting