Design Document: TensorCast New Model Adaptation Efficiency
Revision History
| Date | Version | Change Description | Author | RFC Document |
|---|---|---|---|---|
| 2026-06-03 | 1.0 | Initial design for the model adaptation efficiency workflow | kai1949, codex | N/A |
1. Background
TensorCast model adaptation currently depends on repeated manual investigation:
adapter authors inspect HuggingFace model source, infer ModelProfile fields,
discover runtime incompatibilities, map MindStudio Insight kernels to
TensorCast semantic operators, and then construct regression evidence. This
process is costly because each model has different module naming, multimodal
layouts, MoE/MLA/MTP details, runtime checks, and backend fusion behavior.
The proposed feature introduces a reusable adaptation workflow that combines deterministic adapter programs with large-model skills. The deterministic programs shall parse command and profiling inputs, inspect model structure, validate profile candidates, classify failures, generate evidence drafts, and verify actual TensorCast behavior. The large-model skill shall assist only in the areas where deterministic analysis is insufficient, such as source-code semantic reasoning, patch-method authoring, and uncertain operator mapping review. AI output shall remain a draft until deterministic gates and human review accept it.
The primary goals are:
| Goal | Design Direction | Success Signal |
|---|---|---|
| Reduce required user input | Require only a TensorCast simulation command and matching MindStudio Insight raw profiling export | Doctor report can be produced from the two inputs |
| Make adaptation traceable | Preserve normalized command, raw profiling provenance, hints, profile candidates, and verification reports | Every generated artifact references its evidence source |
| Improve correctness | Gate AI-generated profile, patch, and evidence drafts through validation, dry-run, and verifier checks | Failures are classified with actionable next steps |
| Enable reusable regression | Convert verified evidence into ST guardrail cases | Adapted models have repeatable count and latency checks |
| Support blind replay | Validate the workflow by re-adapting an already supported model while hiding its existing profile | Replay discovers the expected Qwen3-VL structure and patch needs |
The design intentionally separates user-facing operating instructions from
software design. Detailed commands, input formats, and output review steps are
documented in docs/en/user_guide/msmodeling_tensor_cast_new_model_adaptation_user_guide.md.
2. Design
2.1 Design Principles
The adaptation system shall follow four principles:
| Principle | Meaning |
|---|---|
| Deterministic first | Programmatic inspection, validation, parsing, and verification run before AI reasoning is accepted. |
| Minimal human checkpoint | When uncertainty remains, the system asks for the smallest confirmed fact instead of asking the user to explain a full model. |
| Provenance preserving | Reports retain the raw command, raw profiling source, hints, candidate fields, and confidence values. |
| Replayable by design | Existing profiles may be ignored only in replay or audit mode, so the process can be tested without reading the known answer. |
2.2 System Context
flowchart LR
User["Adapter Author"]
Skill["Large-Model Skill"]
CLI["model_adapter CLI"]
Doctor["Doctor Engine"]
Registry["ModelProfile Registry"]
HF["Installed Transformers Source"]
Insight["MindStudio Insight Raw Export"]
Evidence["Evidence YAML"]
Verifier["Evidence Verifier"]
ST["ST Guardrail Case"]
User -->|"simulation command"| CLI
User -->|"raw profiling export"| CLI
User -->|"optional hints"| CLI
CLI --> Doctor
Insight --> Doctor
Doctor --> HF
Doctor --> Registry
Doctor -->|"candidate profile, questions, AI tasks"| User
Doctor -->|"bounded prompt + deterministic findings"| Skill
Skill -->|"reviewable patch or mapping draft"| User
User -->|"reviewed profile or hints"| Registry
Doctor -->|"evidence draft"| Evidence
Evidence --> Verifier
Verifier -->|"classified result"| User
Verifier -->|"verified report"| ST
The CLI is the public entry point. The doctor engine coordinates deterministic subsystems. The large-model skill is not a replacement for the doctor; it is a bounded assistant that receives structured evidence and produces reviewable drafts.
2.3 Component Responsibilities
| Component | Responsibility | Non-Goal |
|---|---|---|
| Adaptation Context | Parse a TensorCast simulation command into normalized model, workload, device, parallelism, quantization, and multimodal parameters | Guess a workload that was not supplied |
| Raw Insight Importer | Convert Insight raw export rows into normalized kernel summaries and total forward timing | Require the user to hand-write evidence |
| Structure Inspector | Scan the installed model tree for attention, MoE, MLA, MTP, and VL facts | Infer fields from model names alone |
| Profile Candidate Materializer | Create a minimal ModelProfile candidate with review-friendly fields |
Mark a profile as verified before validation and runtime checks |
| Profile Validator | Check profile field type, required fields, default elision, and callable patch methods | Prove runtime semantic equivalence |
| Patch Discovery | Classify dry-run or smoke failures and produce bounded AI assistance tasks | Generate model-specific patch code directly |
| Evidence Builder | Merge raw profiling, normalized command, actual summaries, and hints into an evidence draft | Hide low-confidence mappings |
| Evidence Verifier | Compare expected evidence with actual TensorCast summaries and classify mismatches | Treat all mismatches as generic failures |
| ST Case Generator | Convert verified reports into regression guardrail case drafts or verified cases | Generate verified cases from unverified evidence |
2.4 4+1 Architecture Views
The design can be described with a 4+1 view model so that functional behavior, runtime execution, source organization, deployment dependencies, and validation scenarios are reviewed separately.
| View | Design Focus | Main Stakeholder | Key Artifact |
|---|---|---|---|
| Logical view | Domain abstractions, report schema, profile/evidence model | Adapter author, reviewer | Data model and UML class diagrams |
| Process view | Runtime workflow, iteration states, AI handoff boundaries | Adapter author, test owner | Sequence and state diagrams |
| Development view | Source modules and dependency direction | Maintainer | Component/module dependency diagram |
| Physical view | Local runtime, external artifacts, optional AI assistant | Tooling owner | Deployment diagram |
| Scenario view | Representative model onboarding and replay/audit cases | Reviewer, acceptance owner | Usage case and test scenarios |
2.4.1 Logical View
The logical view centers on immutable evidence and reviewable drafts. The doctor report shall preserve raw inputs, deterministic findings, candidate profile fields, validation results, AI task packages, and evidence drafts in one reviewable object. A profile shall not be treated as verified merely because it is materialized; it moves toward verification only through validation, runtime checks, evidence review, and verifier results.
flowchart TB
Command["Simulation Command"]
RawInsight["Raw Insight Export"]
Hints["Hints Ledger"]
Context["AdaptationContext"]
Structure["ModelStructureFacts"]
Candidate["ProfileCandidate"]
Profile["ModelProfile"]
DoctorReport["DoctorReport"]
EvidenceDraft["Evidence Draft"]
EvidenceYaml["Evidence YAML"]
VerifyReport["VerificationReport"]
Command --> Context
RawInsight --> EvidenceDraft
Hints --> EvidenceDraft
Context --> DoctorReport
Structure --> Candidate
Candidate --> Profile
Candidate --> DoctorReport
Profile --> DoctorReport
EvidenceDraft --> DoctorReport
DoctorReport --> EvidenceYaml
EvidenceYaml --> VerifyReport
2.4.2 Process View
The process view is an iterative control loop. Deterministic tooling narrows the unknowns first. Human review and large-model skill assistance are used only at explicit checkpoints.
flowchart LR
Intake["Input Intake"]
Doctor["Doctor Run"]
Review["Human Review"]
AITask["AI Task Handoff"]
Register["Profile Registration"]
Evidence["Evidence Review"]
Verify["Verification"]
ST["ST Case"]
Iterate["Classified Iteration"]
Intake --> Doctor
Doctor --> Review
Review -->|"profile accepted"| Register
Review -->|"patch needed"| AITask
AITask --> Register
Register --> Doctor
Doctor --> Evidence
Evidence --> Verify
Verify -->|"passed or accepted gap"| ST
Verify -->|"classified mismatch"| Iterate
Iterate --> Review
2.4.3 Development View
The development view keeps adapter automation isolated from model execution and registry internals. The CLI depends on adapter services; adapter services depend on TensorCast model building, registry, runtime summaries, and profiling parsers. Built-in model profiles remain the extension point for model-specific metadata and reviewed patch methods.
flowchart TB
CLI["cli/inference/model_adapter.py"]
Adapter["tensor_cast/adapter/*"]
Context["context.py"]
Insight["insight.py"]
Inspect["inspect.py"]
Profile["profile.py / profile_draft.py"]
Patch["patch_discovery.py / ai_task.py"]
Evidence["evidence_builder.py / evidence_export.py / evidence.py"]
Verify["verifier.py / runner.py / st_case.py"]
Builder["tensor_cast/core/model_builder.py"]
Registry["tensor_cast/transformers/custom_model_registry.py"]
Builtins["tensor_cast/transformers/builtin_model/*.py"]
Runtime["tensor_cast/runtime.py"]
CLI --> Adapter
Adapter --> Context
Adapter --> Insight
Adapter --> Inspect
Adapter --> Profile
Adapter --> Patch
Adapter --> Evidence
Adapter --> Verify
Inspect --> Builder
Profile --> Registry
Registry --> Builtins
Verify --> Runtime
Builder --> Registry
2.4.4 Physical View
The physical view assumes a local development host. Model source, TensorCast code, raw profiling files, reports, and optional AI assistance are separate runtime concerns. The deterministic tools shall not require network access after the model source and dependencies are installed.
flowchart TB
subgraph Host["Linux/WSL Development Host"]
Repo["msmodeling repository"]
Python["Project Python Environment"]
Reports["reports/<case_name>/ artifacts"]
CLI["model_adapter CLI process"]
TC["TensorCast runtime/model builder"]
Transformers["Installed transformers package"]
end
Insight["MindStudio Insight raw export"]
Skill["Large-Model Skill or AI Assistant"]
OptionalModelCache["Optional local model/config cache"]
Insight --> Reports
OptionalModelCache --> Transformers
Repo --> Python
Python --> CLI
CLI --> Reports
CLI --> TC
TC --> Transformers
CLI -->|"bounded prompt text"| Skill
Skill -->|"reviewable draft"| Reports
2.4.5 Scenario View
The scenario view validates the architecture against representative cases:
| Scenario | Purpose | Expected Design Behavior |
|---|---|---|
| New dense text model | Confirm minimal-profile path | Doctor emits only source-backed required fields and evidence draft |
| New MoE or MLA model | Confirm structure discovery and profile validation | Candidate fields include module names, expert keys, and validation issues if any |
| New VL model | Confirm visual path and linear mapping discovery | Candidate includes visual/language/layer paths and mapping patterns |
| Runtime patch failure | Confirm AI task boundary | Doctor classifies failure and emits bounded AI task, not final patch code |
| Qwen3-VL blind replay | Confirm no answer leakage | Existing Qwen3-VL profile is hidden during discovery and used only as final oracle |
2.5 Current Module and Surrounding Component Relationships
The feature is designed as an adapter automation layer around existing TensorCast responsibilities. It shall not replace model building, profile registration, quantization, runtime execution, or performance modeling. Instead, it coordinates those components and records structured adaptation evidence.
| Existing Area | Relationship to Adapter Workflow | Design Constraint |
|---|---|---|
cli/inference/text_generate |
Provides the simulation command whose semantics are parsed into AdaptationContext |
The adapter command parser shall preserve the original command text |
tensor_cast/core/model_builder.py |
Builds the model instance used for structure inspection and runtime verification | Doctor shall use the same model-building path as normal simulation when possible |
tensor_cast/transformers/custom_model_registry.py |
Stores and resolves ModelProfile entries |
Replay/audit profile hiding shall be scoped and reversible |
tensor_cast/transformers/builtin_model/*.py |
Holds reviewed model-specific profiles and patch methods | Generated drafts shall be review aids, not automatically trusted code |
tensor_cast/transformers/transformations.py |
Applies model transformations and may emit patch reports | Patch reports shall become review and blocking evidence |
tensor_cast/runtime.py |
Records actual runtime events for verification | Verification shall summarize actual operator counts and timings from runtime events |
| Performance model modules | Provide analytic or profiling latency estimates | Evidence verification shall distinguish profile errors from performance model coverage gaps |
| MindStudio Insight exports | Provide external measured kernel evidence | Parser shall preserve raw kernel names, counts, and total forward timing |
flowchart LR
TextGenerate["text_generate command"]
AdapterCLI["model_adapter CLI"]
ModelBuilder["Model Builder"]
Registry["Profile Registry"]
BuiltinProfile["Built-in Profiles"]
Transformations["Transformations and Patch Reports"]
Runtime["Runtime Events"]
PerfModel["Performance Models"]
Insight["Raw Insight"]
Evidence["Evidence Verification"]
TextGenerate --> AdapterCLI
Insight --> AdapterCLI
AdapterCLI --> ModelBuilder
ModelBuilder --> Registry
Registry --> BuiltinProfile
ModelBuilder --> Transformations
Transformations --> Runtime
Runtime --> Evidence
PerfModel --> Evidence
AdapterCLI --> Evidence
2.6 Interface Design
The design exposes three user-facing CLI operations and several structured artifact interfaces. Detailed command examples are maintained in the usage guide; this section defines the design contract.
2.6.1 CLI Operations
| Operation | Required Inputs | Optional Inputs | Output | Failure Contract |
|---|---|---|---|---|
doctor |
model id or command file | raw Insight file, hints file, failure log, ignored profiles, profile draft output | JSON doctor report and optional Python profile draft | Must report field-level validation, hint conflicts, or classified patch failure rather than a generic exception where possible |
export-evidence |
doctor report with evidence_draft |
output path | YAML evidence document | Must fail if the report has no exportable evidence draft |
verify |
evidence YAML and resolvable model id | device/runtime overrides, ST case output | JSON verification report and optional ST case JSON | Must classify mismatch categories and return non-zero on failed verification |
2.6.2 Report Interfaces
| Artifact | Producer | Consumer | Required Design Fields |
|---|---|---|---|
AdaptationContext |
command parser | doctor, evidence builder, verifier | model_id, raw_command, normalized_args, artifact paths |
RawInsightSummary |
raw Insight parser | evidence builder, hint conflict checker | totals, normalized kernels, categories, counts, timing |
DoctorReport |
doctor engine | human reviewer, skill, evidence exporter | context, raw summary, candidate profile, validation, patch discovery, AI tasks, evidence draft |
AiAssistanceTask |
patch discovery or unsupported-semantics classifier | large-model skill, human reviewer | task type, deterministic evidence, suspected locations, constraints, required output, verification commands, prompt text |
EvidenceDocument |
evidence exporter or reviewer | verifier, ST case generator | model metadata, cases, expected total forward, major ops, tolerances, confidence, accepted gaps |
VerificationReport |
verifier | reviewer, ST case generator | pass/fail, issue categories, severities, suggestions, actual summaries |
2.6.3 Interface Stability Rules
| Rule | Rationale |
|---|---|
| Reports shall use explicit field names and source/confidence annotations | Reviewers and tests need stable provenance |
| Draft artifacts shall be distinguishable from verified artifacts | Prevent accidental promotion of unreviewed AI or heuristic output |
| Replay/audit options shall be explicit opt-in flags | Avoid silently hiding production profiles |
| Optional hints shall be additive and conflict-reporting | Preserve deterministic facts from command and raw profiling inputs |
| AI prompt text shall be generated from structured tasks | Keep AI assistance bounded and reproducible |
2.7 Data Model
classDiagram
class AdaptationContext {
+str model_id
+str raw_command
+dict normalized_args
+dict artifacts
+to_dict()
}
class RawInsightSummary {
+KernelTotals totals
+list kernels
+float total_wall_duration_ms
+to_dict(top_n)
}
class HintLedger {
+list hints
+conflicts_with_raw_insight(summary)
+to_dict()
}
class ModelStructureFacts {
+str model_type
+dict module_paths
+dict expert_fields
+dict visual_paths
}
class ProfileCandidate {
+CandidateField model_type
+CandidateField moe_module_name
+CandidateField visual_module_path
+CandidateField patch_method
}
class DoctorReport {
+dict adaptation_context
+dict raw_insight_summary
+dict candidate_profile
+dict candidate_profile_validation
+dict evidence_draft
+list human_questions
+list ai_tasks
}
class EvidenceDocument {
+dict model
+list cases
}
class VerificationReport {
+bool passed
+list issues
+list suggestions
}
AdaptationContext --> DoctorReport
RawInsightSummary --> DoctorReport
HintLedger --> DoctorReport
ModelStructureFacts --> ProfileCandidate
ProfileCandidate --> DoctorReport
DoctorReport --> EvidenceDocument
EvidenceDocument --> VerificationReport
The report schema shall preserve both candidate data and validation results. This enables reviewers to distinguish "detected", "reviewed", and "verified" states.
The main data entities shall use the following lifecycle semantics:
| Entity | Draft State | Reviewed State | Verified State |
|---|---|---|---|
ProfileCandidate |
Generated by structure inspection | Human-reviewed and materialized as ModelProfile |
Validated and exercised by dry-run/smoke/verification |
AiAssistanceTask |
Generated by deterministic failure classification | Prompt response reviewed by adapter author | Resulting patch passes doctor and verification |
EvidenceDocument |
Exported from evidence_draft |
Counts, confidence, tolerances, and accepted gaps reviewed | Verifier passes or gaps are accepted |
| ST case | Generated from verifier report | Reviewed for workload and tolerance | Marked verified only after passing evidence verification |
2.8 Core Workflow
sequenceDiagram
autonumber
participant A as Adapter Author
participant C as model_adapter CLI
participant D as Doctor Engine
participant M as Installed Model Source
participant S as Large-Model Skill
participant V as Evidence Verifier
A->>C: Provide command file and raw Insight export
C->>D: Build AdaptationContext and RawInsightSummary
D->>M: Build/inspect model tree
M-->>D: Structure facts
D->>D: Materialize and validate profile candidate
D->>D: Build evidence draft and human questions
D-->>A: Doctor report
alt runtime failure requires patch
A->>C: Provide failure log
C->>D: Run patch discovery
D-->>A: AI assistance task with deterministic evidence
A->>S: Submit bounded prompt
S-->>A: Patch-method draft
A->>A: Human review and profile update
end
A->>C: Export reviewed evidence
C->>V: Run actual TensorCast case against evidence
V-->>A: Pass/fail report with classified issues
The workflow is iterative. A failed verifier result shall route to a specific next action: adjust profile fields, revise a patch, add hints, update operator mapping, accept a documented fusion gap, or regenerate ST evidence.
2.9 State Model
stateDiagram-v2
[*] --> InputsReady
InputsReady --> DoctorDrafted: context + profiling parsed
DoctorDrafted --> NeedsHumanHint: low confidence or conflicts
DoctorDrafted --> NeedsPatch: runtime failure classified
NeedsHumanHint --> DoctorDrafted: hints added
NeedsPatch --> PatchDrafted: AI task completed
PatchDrafted --> ProfileRegistered: human review accepted
DoctorDrafted --> ProfileRegistered: candidate accepted
ProfileRegistered --> EvidenceReviewed: evidence exported and reviewed
EvidenceReviewed --> Verified: verifier passed
EvidenceReviewed --> GapAccepted: reviewed accepted gap
EvidenceReviewed --> DoctorDrafted: mismatch requires iteration
Verified --> STGenerated
GapAccepted --> STGenerated
STGenerated --> [*]
This state model prevents accidental promotion of drafts. Only Verified or a
reviewed GapAccepted state may produce a verified ST guardrail.
2.10 Large-Model Skill and Deterministic Adapter Cooperation
The design uses a two-lane cooperation model:
| Lane | Inputs | Outputs | Trust Boundary |
|---|---|---|---|
| Deterministic adapter program | Command file, raw profiling export, hints, installed model tree, failure log | Candidate profile, validation report, evidence draft, AI task package, verification report | Authoritative for parsing, validation, counts, and pass/fail classification |
| Large-model skill | Structured AI task, failure evidence, source snippets, constraints, required outputs | Patch-method draft, mapping review draft, explanation of uncertain semantics | Advisory only; must be reviewed and rechecked by deterministic gates |
The doctor shall package AI tasks with deterministic findings, suspected locations, constraints, and verification commands. The skill shall not invent a profile from the model name alone, shall not hand-write evidence from scratch, and shall not bypass validation. This cooperation is the main efficiency mechanism: deterministic tools narrow the problem, while the skill accelerates the small amount of semantic work that remains.
2.11 Quality Attribute Design
The following quality attributes are part of the design, not post-hoc implementation preferences.
| Attribute | Design Mechanism | Review Signal |
|---|---|---|
| Maintainability | Keep adapter automation in tensor_cast/adapter/*; keep model-specific metadata in built-in profile files; use structured dataclasses and explicit report schemas |
New model rules do not require broad changes across unrelated modules |
| Extensibility | Add new inspectors, hint kinds, evidence categories, and failure classifiers behind stable report interfaces | New model families can extend candidate materialization without changing CLI contracts |
| Applicability | Support dense, MoE, MLA, MTP, VL, quantized, compile, and parallel workloads through normalized command fields and optional profile fields | Unsupported model behavior is reported as a bounded task or accepted limitation |
| Testability | Every deterministic stage has a serializable artifact and can be unit tested independently | Tests can assert parser output, candidate fields, validation issues, evidence, verifier classifications, and replay behavior |
| Security | Treat raw commands, profiling files, hints, and AI drafts as untrusted inputs; avoid executing AI output automatically; keep private paths and raw internal notes out of committed artifacts | Generated prompts are bounded; reviewed code is required before patch methods enter profiles |
| Traceability | Preserve source, confidence, raw command, raw profiling path, and hint provenance | A reviewer can explain why each profile/evidence field exists |
| Reproducibility | Use case directories, stable JSON/YAML artifacts, and replay profile hiding | A case can be rerun and compared with the same inputs |
2.11.1 Maintainability and Extensibility
The adapter layer shall follow extension points rather than hard-coded model-specific branches whenever possible:
| Extension Point | Intended Extension | Guardrail |
|---|---|---|
| Structure inspector | New module pattern detectors for model families | Candidate fields must include source and confidence |
| Profile materializer | New recipe hints for MoE/MLA/VL/MTP families | Defaults and empty overrides must be omitted from review output |
| Patch discovery classifier | New failure taxonomy entries | AI tasks must include deterministic evidence and verification commands |
| Evidence builder | New kernel category or mapping rules | Low-confidence mappings must remain visible |
| Verifier | New issue categories or accepted-gap policy | Failed verification must provide next actions |
2.11.2 Security and Safety
The workflow handles local commands, model paths, profiling exports, and AI drafts. The design therefore treats all non-code artifacts as untrusted input until parsed and reviewed.
| Risk | Mitigation |
|---|---|
| Raw command contains unintended shell behavior | Command parser shall normalize supported TensorCast arguments rather than execute arbitrary shell text during parsing |
| Raw Insight or hints contain malformed data | Parsers shall validate required fields and report conflicts |
| AI patch draft changes real-model semantics | Patch methods require human review and deterministic verification |
| Private paths leak into committed artifacts | Reports and prompts should use repo-relative paths where possible; submission checklist rejects local-only notes |
| Replay mode hides production profiles accidentally | --ignore-existing-profile is explicit, scoped, and restored after the replay context |
2.12 Usage Case
The following example illustrates intended usage without prescribing command details, which are covered in the user guide.
An adapter author wants to onboard a virtual model ExampleVL-7B. The author
collects a TensorCast simulation command for a prefill workload and the matching
MindStudio Insight raw profiling export. The doctor builds an adaptation
context, scans the installed transformers implementation, detects a VL module
path, drafts a minimal ModelProfile, and emits evidence for the top attention
and visual MLP kernels. A dry-run failure shows data-dependent placeholder
masking in the model source, so patch discovery emits a PATCH_METHOD_AUTHORING
task. The author gives the bounded prompt to the model-adaptation skill, reviews
the patch draft, registers the profile, exports evidence, and runs verification.
When verification passes, the verified report becomes a regression guardrail
case.
2.13 Scope, Applicability, and Constraints
| Topic | Constraint |
|---|---|
| Required inputs | The workflow shall start from a simulation command and a matching raw Insight export. |
| Existing profiles | Normal adaptation may use existing profiles and recipes as references. Replay/audit mode may ignore named profiles to avoid circular validation. |
| Patch methods | Patch methods shall target TensorCast simulation compatibility and preserve normal tensor semantics as much as possible. |
| Evidence confidence | Low-confidence mappings shall be explicit and may produce review questions or warnings instead of false certainty. |
| Raw profiling | The raw Insight export shall include a Totals row so total forward latency can be compared. |
| Verification | Passing verification depends on available TensorCast operator coverage and performance model coverage; unsupported backend fusion may be recorded as an accepted gap after review. |
Applicability by model category:
| Model Category | Supported Adaptation Focus | Notes |
|---|---|---|
| Dense decoder-only text model | model_type, attention/runtime evidence |
Minimal profile may be enough |
| MoE model | MoE module name, expert count key, field overrides, routing evidence | Expert storage patterns may need custom expert wrappers |
| MLA model | MLA module name, TensorCast MLA class, field overrides | Validation shall reject incomplete MLA profile fields |
| MTP/speculative model | MTP block path and repeated-block behavior | Verification should include count-sensitive cases |
| Vision-language model | Visual/language paths, visual layer path, merger/MLP mappings, placeholder patch tasks | Qwen3-VL replay is the reference stress case |
| Quantized or parallel workload | Quantization and TP/DP/EP/MoE parallel normalized args | Evidence must account for communication or accepted gaps |
3. Usage Instructions
This design document provides only the conceptual usage case and constraints.
All step-by-step guidance, including exact commands, required files, optional
hints, doctor outputs, patch authoring handoff, evidence export, verification,
ST case generation, and Qwen3-VL replay/audit procedure, shall be maintained in
docs/en/user_guide/msmodeling_tensor_cast_new_model_adaptation_user_guide.md.
The user-facing workflow shall expose these public operations:
| Operation | Purpose | Expected Artifact |
|---|---|---|
| Doctor | Inspect inputs, model structure, candidate profile, evidence draft, questions, and AI tasks | JSON doctor report |
| Export Evidence | Convert reviewed doctor evidence draft to YAML | Evidence YAML |
| Verify | Run TensorCast and compare actual behavior with evidence | JSON verification report |
| ST Case Output | Generate guardrail cases from verified or draft reports | ST case JSON |
| Replay/Audit | Hide an existing profile and re-run adaptation discovery | Replay doctor report |
The guide shall also describe the required review gates:
| Gate | Reviewer Checks |
|---|---|
| Candidate profile | Field values are minimal, source-backed, and validation passes |
| Patch draft | Patch is scoped to the failing simulation path and preserves expected semantics |
| Evidence YAML | Case input, expected counts, total latency, confidence, and accepted gaps are reviewed |
| Verification report | Failures are classified and resolved or explicitly accepted |
| ST case | Verified status is used only when evidence verification passes or gaps are reviewed |
4. Test Design
4.1 Test Strategy
The test design shall cover deterministic units, integration flows, and end-to-end replay. Tests should verify that the workflow can progress from the two required inputs to a reviewed report without relying on hidden local state or existing profile answers.
The test pyramid shall align with the architecture views:
| Test Layer | Architecture View Covered | Primary Risk |
|---|---|---|
| Unit tests | Logical and development views | Parser, validator, materializer, evidence, and verifier logic regressions |
| Integration tests | Process and development views | Doctor report assembly, profile registry scope, export/verify handoff |
| End-to-end tests | Scenario view | New-model workflow and replay/audit behavior |
| Security and robustness tests | Physical and interface views | Malformed inputs, untrusted AI drafts, private-path leakage |
| Documentation and contract checks | Interface view | CLI/report contract drift from usage guidance |
4.2 Unit Test Cases
| Area | Case | Expected Result |
|---|---|---|
| Command parser | Parse a TensorCast simulation command with device, workload, quantization, and parallelism options | AdaptationContext.normalized_args matches the command |
| Raw Insight parser | Parse Totals and kernel rows |
Total forward timing and normalized kernel names are preserved |
| Raw Insight validation | Kernel row appears before Totals |
Parser rejects the file with an actionable error |
| Hints merge | Hints conflict with raw profiling counts or missing kernels | Conflicts are reported with provenance |
| Profile review | Default fields and empty overrides are omitted | Review dict contains only required fields |
| Profile validation | Invalid MoE/MLA override or non-callable patch method | Validation report contains field-specific errors |
| Patch discovery | Meta tensor placeholder and dynamic-mask failure log | PATCH_METHOD_AUTHORING task is generated |
| Evidence builder | Raw kernels plus hints produce major op evidence | Counts, confidence, and source are present |
| Verifier | Expected major op is missing, low confidence, or accepted gap | Issue severity and pass/fail result match policy |
| ST generator | Verified and unverified reports | Verified reports create verified cases; failed reports create draft cases |
4.3 Integration Test Cases
| Area | Case | Expected Result |
|---|---|---|
| Doctor report | Context, raw Insight, hints, and model inspection are combined | Report includes candidate profile, validation, evidence draft, questions, and suggestions |
| Doctor with failure log | Failure taxonomy detects patch need | Report includes patch discovery and AI task fields |
| Export evidence | Doctor report has an evidence draft | YAML evidence document is generated without losing case data |
| Verify with model ID from evidence | Verify command omits positional model ID | Model ID is read from evidence metadata |
| Actual runner isolation | Verification runs multiple cases | Shared user input is not mutated across cases |
| Replay registry isolation | Existing profile is ignored inside replay scope only | Registry state is restored after the replay scope |
4.4 End-to-End and Replay Test Cases
| Scenario | Design |
|---|---|
| Existing adapted model replay | Select an already adapted text or MoE model, temporarily remove or ignore its profile, re-run structure inspection and candidate materialization, then compare key TensorCast operators with the registered-profile baseline. |
| Qwen3-VL blind replay | Treat an already adapted Qwen3-VL model as unadapted by using replay/audit profile hiding. The test shall use a tiny config-only fixture to build the installed Qwen3-VL model tree without downloading weights, shall not use the existing TensorCast qwen3_vl.py profile as an input, and shall verify that the doctor rediscovers VL paths, visual merger/MLP linear mappings, model family, and patch-authoring evidence. |
| Qwen3-VL comparison | After blind replay produces a candidate, compare the replay candidate and patch task expectations with the known adapted Qwen3-VL behavior as an oracle. The oracle is used only after replay discovery completes. |
| Raw Insight evidence flow | Use a representative raw Insight export and command to generate evidence, export YAML, run verification, and classify remaining gaps. |
The Qwen3-VL replay test is important because it simulates the exact risk the feature is meant to reduce: adapting a complex VL model without reading an existing TensorCast answer. It should demonstrate that deterministic discovery and the model-adaptation skill can cooperate to recover the same adaptation shape with bounded human review.
4.5 Acceptance Criteria
| Category | Criteria |
|---|---|
| Functional | Doctor can create a complete report from a command file and matching raw Insight export. |
| Functional | Candidate profiles include source-backed fields and validation results. |
| Functional | Patch discovery produces AI assistance tasks instead of direct model-specific patch code. |
| Functional | Evidence export and verification preserve reviewed case data and classify mismatches. |
| Functional | Verified reports can produce ST guardrail cases. |
| Quality | AI-generated drafts are not trusted until profile validation, dry-run, evidence verification, and human review pass. |
| Quality | Replay mode does not depend on the existing profile of the model being replayed. |
| Quality | Qwen3-VL blind replay can rediscover the expected VL structure and patch need under profile hiding. |
4.6 Quality Attribute Test Matrix
| Quality Attribute | Test Design | Acceptance Signal |
|---|---|---|
| Maintainability | Add a new synthetic model family detector through adapter modules only | No unrelated model builder or runtime changes are required |
| Extensibility | Add a new hint kind or failure category with focused unit tests | Existing doctor/export/verify interfaces remain compatible |
| Applicability | Run dense, MoE, MLA, and VL fixture cases through candidate materialization | Each category produces minimal source-backed fields |
| Testability | Assert every major stage can serialize a deterministic artifact | Tests can inspect JSON/YAML or dataclass dictionaries without running full E2E |
| Security | Feed malformed raw Insight, conflicting hints, and suspicious failure logs | Parser reports structured errors or AI tasks without executing untrusted content |
| Traceability | Check generated evidence and profile review output for source/confidence fields | Reviewer can trace every non-default field to a source |
| Reproducibility | Rerun replay/audit with profile hiding and compare deterministic fields | Registry state is restored and candidate fields are stable |
4.7 Interface and Contract Tests
| Contract | Test |
|---|---|
| CLI help remains available | doctor --help, export-evidence --help, and verify --help load successfully |
| Doctor report schema is stable | Focused tests assert required top-level fields and nested validation fields |
| Evidence export requires an evidence draft | Export test rejects reports without evidence_draft |
| Verify reads model ID from evidence | Verify test omits positional model id and confirms metadata is used |
| ST case status follows verification result | Passed reports produce verified; failed reports produce draft |
| Replay profile hiding is scoped | Registry state before and after ignore_existing_profile is identical |
4.8 Security and Robustness Test Cases
| Risk | Test Case | Expected Result |
|---|---|---|
| Malformed raw Insight | Missing Totals, bad numeric values, or kernel rows before totals |
Parser rejects input with actionable error |
| Conflicting hints | Hint count or mapping conflicts with raw profiling | Doctor reports conflicts and human questions |
| Untrusted AI output | Patch draft is represented only as reviewed profile code, not executed by doctor | Doctor emits task package; no automatic code execution |
| Private path leakage | Reports and generated prompts are checked for local-only paths before submission | Local-only artifacts are not staged |
| Replay misuse | --ignore-existing-profile is used outside scoped replay context |
Tests verify registry restoration and explicit ignored profile reporting |