RFC: Profiling Operator Interpolation Module Design (Phase 1)
Metadata
| Field | Content |
|---|---|
| RFC name | Profiling Operator Interpolation Module Design (Phase 1: compute and attention_special multidimensional interpolation) |
| Status | Draft |
| Phase | Phase 1 / 3 |
| Target module | tensor_cast/performance_model/profiling_database |
| Target branch | develop |
| Author(s) | @zhenyu_zhang |
| Created | 2026-05-26 |
| Updated | 2026-06-13 |
| Related Issue/PR | #229 |
Document Alignment Note
The Chinese RFC is the detailed master. This English version covers the same design boundaries, default behavior, rollback strategy, test requirements and compatibility conclusions, but some long Chinese design sections are split into smaller English sections for readability. If the two versions ever diverge, the Chinese design constraints are authoritative and the English version must be updated without introducing different defaults or acceptance criteria.
1. Overview
This RFC defines the first phase of the profiling operator interpolation module. Phase 1 is not limited to a single MatMul/GEMM M/K 2D case. It builds a runnable, reversible, observable interpolation framework and delivers two core capabilities:
- Multidimensional interpolation for
computeoperators, focusing on MatMul/GEMM M/K/N 1D, 2D and 3D interpolation. - 1D, 2D and 3D interpolation paths for
attention_special, based on the current query semantics(batch, seq, heads, head_dim).seq/avg_seq_lenis the baseline axis.batch,headsandhead_dimrequire runtime field derivation, CSV enrichment or synthetic test CSVs. Runtime must still verify field derivability, candidate availability, boundary/grid completeness and latency validity.
Phase 1 intentionally keeps the scope narrow: communication operators are not wrapped by another interpolation layer, extrapolation is not enabled, and fallback_order is not exposed as user configuration. The phase introduces scipy.interpolate.griddata as a general 2D/3D interpolation backend, following the capability pattern used by aiconfigurator.
The main query flow is:
profiling datasource entry
-> exact lookup first
-> exact miss / partial enters interpolation fallback
-> compute or attention_special allowlist
-> ComputeIndex / AttentionIndex candidate lookup
-> 1D linear / scipy.griddata 2D-3D / regular grid fallback
-> QuerySource.INTERPOLATED
-> observable details / shape_match_info / debug log
2. Three-Phase Plan
Phase 1: compute and attention_special multidimensional interpolation
Phase 1 delivers:
- Wire
InterpolatingDataSourceinto the profiling path. - Add
--disable-profiling-interpolationas a baseProfilingDataSourcerollback switch. - Add the common math module
interpolation_math.py. - Use
scipy.interpolate.griddataas the general 2D/3D interpolation backend. - Add a candidate index layer for compute and attention_special candidates.
- Support compute 1D/2D/3D interpolation, focusing on MatMul/GEMM M/K/N.
- Support attention_special 1D/2D/3D interpolation paths. The 2D/3D paths must be implemented and tested in Phase 1. If production CSVs cannot reliably express batch or the third axis, runtime downgrades to
seq/avg_seq_len1D or another provably safe lower-dimensional combination. - Emit
source,confidence,details,shape_match_infoand debug logs. - Reserve the extrapolation guard interface but keep extrapolation disabled. Targets outside boundary/convex hull only record the failure reason and fall back. Phase 1 never returns
EXTRAPOLATED.
Phase 2: MoE/DFC, elementwise and composite expansion
Phase 2 extends the Phase 1 framework:
query_mode: moe_fusedandDispatchFFNCombineinterpolation.- Elementwise 1D / conditional 2D / conditional 3D interpolation.
- Composite decomposition reuses each sub-kernel's interpolation capability.
- FP8 quant scale compute subcases, such as
compute_scaleandscale_matrix, when the M/K schema is stable.
Communication remains outside wrapper interpolation. It continues to use the existing base exact / alpha-beta(message_bytes) path.
Phase 3: extrapolation, accuracy evaluation and engineering convergence
Phase 3 delivers:
- Opt-in extrapolation, disabled by default.
- Ratio guard / max bound protection.
- Holdout accuracy evaluation and error reports.
- Performance budgets, cache policy, stale rebuild and concurrent access convergence.
- Benchmarks for exact hit, 1D/2D/3D interpolation, schema miss, policy miss, index build, stale rebuild and memory overhead.
- Metrics, warnings and debug log convergence.
- Policy field cleanup and documentation convergence.
Overall Architecture Across Three Phases
User / CLI / Config
|
|-- --performance-model profiling
|-- --profiling-database <dir>
|-- --disable-profiling-interpolation [Phase 1]
v
ModelRunner / PerformanceModel Factory
|
|-- disable_profiling_interpolation = true
| |
| v
| ProfilingDataSource [base CSV/existing path]
| |
| |-- CSV exact match ----------------------------> QueryResult(source=MEASURED)
| |-- base internal derived path -----------------> QueryResult(source=INTERPOLATED)
| | for example communication alpha-beta/message_bytes
| |-- partial composite hit ----------------------> QueryResult(source=PARTIAL)
| |-- miss ---------------------------------------> None
|
|-- disable_profiling_interpolation = false
|
v
InterpolatingDataSource [Phase 1]
|
| 1. call base.lookup(op) first
v
ProfilingDataSource [existing]
|
|-- MEASURED -----------------------------------> return base result
|-- INTERPOLATED -------------------------------> return base result
| base communication interpolation remains authoritative
|-- EXTRAPOLATED -------------------------------> return base result if ever produced by base
| Phase 1 wrapper itself must not produce EXTRAPOLATED
|-- PARTIAL ------------------------------------> try wrapper interpolation fallback
|-- None ---------------------------------------> try wrapper interpolation fallback
|
v
Interpolation Fallback Dispatcher [Phase 1]
|
|-- category/query_mode allowlist
|-- policy merge:
| built-in defaults
| + global interpolation_policy
| + operator_mappings.<op>.interpolation_policy
|
+---------------------------------------------------------------+
| |
v v
communication compute
category == communication category == compute
[base only; no wrapper 1D/2D/3D interpolation] [Phase 1]
| |
|-- base exact / _query_comm_csv |-- CandidateIndex / ComputeIndex
|-- base alpha-beta(message_bytes) may return INTERPOLATED |-- axes: M / K / N
|-- wrapper returns base result or None/PARTIAL unchanged |-- regime keys:
| | kernel_type, dtype, format,
| | layout, transpose, exact fields
| |-- algorithms:
| | 1D linear
| | 2D/3D scipy.griddata(linear)
| |
| |-- FP8 quant scale subcases:
| | compute_scale / scale_matrix
| | [Phase 2 if schema/policy is not
| | stable enough in Phase 1]
| |
| v
| QueryResult(source=INTERPOLATED)
|
+---------------------------------------------------------------+
|
v
attention_special
query_mode == attention_special [Phase 1]
|
|-- CandidateIndex / AttentionIndex
|-- axes from current query semantics and CSV enrichment:
| seq / avg_seq_len mandatory
| batch / heads / head_dim only when derivable and varying
|-- regime keys:
| kernel_type, dtype, sparse_mode, kv_heads,
| quant_mode, layout, mask/cache regime
|-- supported paths:
| direct attention_special ops
| attention sub-kernel fallback from decomposer
|-- algorithms:
| 1D linear
| 2D/3D scipy.griddata(linear)
|-- optional axis_transform:
| sqrt(seq) for known O(seq^2) kernels
|
v
QueryResult(source=INTERPOLATED)
Phase 2 Extensions
|
|-- moe_fused / DispatchFFNCombine
|-- elementwise 1D / conditional 2D / conditional 3D
|-- composite sub-kernel interpolation reuse
|-- FP8 quant scale compute subcases
Phase 3 Extensions
|
|-- ExtrapolationPolicy
| |-- allow_extrapolation default false
| |-- opt-in only
| |-- ratio guard / max bound
| |-- QuerySource.EXTRAPOLATED
| |-- lower confidence than interpolation
|
|-- Accuracy / Holdout Evaluation
|-- Performance / Cache Governance
|-- Observability Convergence
|-- Policy / Documentation Cleanup
3. Phase 1 Scope
In scope:
- Default profiling path wraps
ProfilingDataSourcewithInterpolatingDataSource. - Exact hit remains the highest priority and returns
MEASURED. - Interpolation is only a fallback for
NoneorPARTIAL. - Compute operators support 1D/2D/3D interpolation, with MatMul/GEMM M/K/N as the main target.
- Attention_special supports 1D/2D/3D code paths and tests.
- Interpolation success returns
QuerySource.INTERPOLATED. - Output includes
details,shape_match_info,confidence, source and debug logs. confidenceis display-only in Phase 1. It does not drive result selection or fallback.
Out of scope:
- No wrapper interpolation for communication operators.
- No extrapolated result.
QuerySource.EXTRAPOLATEDis not returned in Phase 1. - No full MoE/DFC/elementwise/composite multidimensional implementation.
- No user-configurable
fallback_order; the order is fixed asexact -> 1D -> 2D -> 3D. - No requirement to modify every
op_mapping.yamlentry at once.
4. Current State and Problems
The current profiling datasource is reliable for exact matches and a few existing derived paths, but it has limited behavior after exact miss:
- Measured CSV rows are used only when the runtime query matches the expected schema and regime.
- Compute operators such as MatMul/GEMM often have nearby measured points across M/K/N axes, but the current path cannot reuse them for multidimensional interpolation.
- Attention_special sub-kernels can often derive
seq/avg_seq_len, while batch/head/head_dim information is only partially available from runtime fields or enriched CSVs. - Existing fallback behavior is conservative but opaque: users cannot easily see whether a result is measured, interpolated by an existing base path, partially decomposed or missing.
- Communication already has alpha-beta/message_bytes behavior in the base datasource, so wrapping communication with another shape interpolation layer would mix two different models and increase risk.
Phase 1 addresses only the first safe slice: compute and attention_special interpolation behind exact-first lookup, explicit allowlists, no extrapolation and a rollback switch.
5. Main Path Design
--performance-model profiling should use InterpolatingDataSource by default. The wrapper calls base lookup first. Exact measured results are returned unchanged.
Add the rollback CLI:
--disable-profiling-interpolation
and the matching user config field:
disable_profiling_interpolation: bool = False
When the switch is enabled, profiling uses the base ProfilingDataSource path without the wrapper's compute/attention interpolation fallback. This is not a guarantee that every base result is CSV-exact: the base path may still contain existing specialized derived paths, such as communication alpha-beta/message_bytes handling.
Phase 1 wrapper behavior is restricted to compute and attention_special allowlist scenarios. Unsupported operators keep the base ProfilingDataSource behavior.
6. Query Flow
lookup(op)
-> base.lookup(op)
-> MEASURED: return base result
-> INTERPOLATED: return base result
# possible from existing base communication paths
-> EXTRAPOLATED: return base result if base ever produces it
# Phase 1 wrapper itself must not produce EXTRAPOLATED
-> PARTIAL: try wrapper interpolation fallback
-> None: try wrapper interpolation fallback
-> if op is communication:
return base result
-> if op is not compute or attention_special allowlist:
return base result
-> build interpolation target
-> query candidate index
-> try 1D / 2D / 3D by fixed default order
-> if interpolation succeeds:
return QueryResult(source=INTERPOLATED)
else:
return base result
The fixed attempt order is:
exact -> 1D -> 2D -> 3D
This means the runtime first searches for candidates where only one continuous axis needs interpolation. If low-dimensional candidates cannot form a boundary because the fixed axes are too strict, the search expands to 2D and then 3D. This order is chosen to minimize error and behavior change. The index layer should avoid full table rescans per dimension and should return boundary/grid availability for the current regime.
PARTIAL means the base ProfilingDataSource produced a partial result. Phase 1 does not merge wrapper interpolation into missing sub-kernels. Instead, PARTIAL is treated as an entry point for a whole-wrapper retry: if the wrapper can independently build a complete interpolation result, it returns QuerySource.INTERPOLATED with details["fallback_from"] = "partial"; otherwise the original base PARTIAL is returned unchanged.
7. Candidate Index Layer
InterpolatingDataSource remains the datasource wrapper. The candidate index is an internal helper. It is not a new performance model and does not inherit from DataSourcePerformanceModel.
Recommended structure:
DataSourcePerformanceModel
^
|
InterpolatingDataSource
|-- ComputeIndex
|-- AttentionIndex
Candidate point shape:
CandidatePoint(
regime_key=...,
axes={"M": 1024, "K": 4096},
latency=...,
row_meta={...},
)
Cache policy:
- Build lazily. Do not scan all CSVs in
InterpolatingDataSource.__init__. - Reuse DataFrames returned by
ProfilingDataSource._load_csv(). - Store the index cache inside the wrapper instance; no global cache.
- Do not use
id(df)as the cache key. Use a stable content fingerprint such as(kernel_type, df_content_hash, policy_hash, index_kind, tc_input_count).df_content_hashshould cover DataFrame row count, column names, index and values, not only first/last row, so that middle-row changes rebuild the index. CSV path, mtime or profiling database version may also be included when available. - If base
_csv_cacheis explicitly cleared, the wrapper index cache must also be cleared.
policy_hash is not a user-facing field. It is a deterministic hash generated by the index builder from the normalized effective interpolation policy after merging built-in defaults, global interpolation_policy, and operator-level interpolation_policy. Recommended serialization is canonical JSON with sorted keys and stable representations for missing values before applying SHA-256. It must change whenever effective axes, exact fields, max interpolation dimension, axis transform, extrapolation flags, or duplicate aggregation behavior changes.
regime_key must use a stable field order. Missing fields must be encoded explicitly as None or "missing" to avoid splitting equivalent candidates unpredictably.
8. Math Module
Add:
tensor_cast/performance_model/profiling_database/interpolation_math.py
Phase 1 functions:
| Function | Responsibility |
|---|---|
find_boundary(values, target) |
Find the lower/upper boundary values containing target. Out-of-range targets fail in Phase 1. |
linear_interp(x, x0, y0, x1, y1) |
1D linear interpolation. |
griddata_linear_interp(points, values, target) |
General 2D/3D linear interpolation using scipy.interpolate.griddata(..., method="linear"). |
validate_latency(value) |
Reject NaN, infinite and negative latency. |
estimate_confidence(...) |
Produce a display-only confidence value based on dimension, boundary width, candidate density, transform and downgrade information. |
build_interpolation_details(...) |
Build normalized details with method, dimension, axes, target, boundary, candidates, confidence and failure reason. |
check_extrapolation_guard(target, bounds, policy) |
Reserved extrapolation guard. Phase 1 only returns guard status/reason and never produces extrapolated results. |
griddata constraints:
- Only
method="linear"is enabled in Phase 1. - Do not use
griddatafor extrapolation. - If target is outside the convex hull,
griddatareturns NaN, or scipy raises Qhull/ValueError, the attempt fails and falls back. - 2D griddata needs at least three non-collinear points. 3D griddata needs at least four non-coplanar points.
- Phase 1 keeps only
griddata_linear_interp(points, values, target)as the public 2D/3D math interface. Dedicated regular-grid bilinear/trilinear functions are deferred as an open follow-up if they later provide clear accuracy, performance or explainability value.
9. Compute Multidimensional Interpolation
MatMul/GEMM axes:
activation: [..., M, K]
weight: [K, N] or [N, K]
output: [..., M, N]
Continuous axes:
M: token/row dimension.K: reduction dimension.N: output column dimension.
Exact/regime fields:
- kernel type.
- dtype.
- format.
- layout.
- canonicalized transpose / weight layout result, without becoming stricter than exact lookup.
- quant regime.
- other fields that change kernel semantics.
input_formats is CSV physical-format metadata and should remain visible in row_meta/details. Phase 1 should not reject all canonicalizable FRACTAL_NZ candidates only because the runtime tensor is logical ND. Candidate groups may still be separated by CSV input_formats, and if both ND and non-ND groups are available for the same target, the ND group should be preferred. Cross-format reuse is allowed only when shape canonicalization proves the same M/K/N semantics.
M/K/N extraction:
| Scenario | M | K | N | Requirement |
|---|---|---|---|---|
ND [M,K] x [K,N] |
input0[0] | input0[1] | input1[1] | source_layout=KN only goes to row_meta/details unless exact lookup also distinguishes it. |
ND [M,K] x [N,K] |
input0[0] | input0[1] | input1[0] | F.linear / transposed weight form. source_layout=NK must not split buckets more strictly than exact lookup. |
BatchMatMulV2 / TransposeBatchMatMul 3D (H,M,K) x (H,K,N) |
input0[1] | input0[2] | input1[2] | H is a batch/head regime or exact field. If input1 is (H,N,K), parse N=input1[1]. |
_FLATTEN_BATCH_KERNELS batched / flatten batch |
prod(batch_dims) * M |
input0[-1] | Parse only by exact-supported rules | Applies only to kernels where exact lookup already supports flatten batch. MatMul kernels must not use this rule today. |
| FRACTAL_NZ weight | parsed from input0 | parsed from input0 | restore with fractal_nz_to_nd() first |
Never extract axes from tiled shape directly. |
| Unknown weight layout | none | none | none | Do not enter M/K/N multidimensional interpolation. Fall back or keep safe 1D behavior. |
CSV candidates and runtime targets must use the same extraction rule. If K can be inferred from both input0 and input1 and the values differ, the candidate is invalid.
Accelerator Core appears in some CSVs, but current exact lookup does not use it as a match condition. Phase 1 interpolation must not put it in regime_key; otherwise interpolation would be stricter than exact lookup. If AIC/AIV/MIX_AIC distinction becomes necessary later, exact lookup should be extended first and the interpolation index should follow.
RoPE, SwiGlu and block padding special cases must stay aligned with _inputs_match. If Phase 1 does not implement dedicated multidimensional axis extractors for these kernels, they must be excluded from the multidimensional allowlist.
Compute query order:
exact -> 1D -> 2D -> 3D
1D examples:
Mvaries,K/Nfixed.Kvaries,M/Nfixed.Nvaries,M/Kfixed.
2D examples:
M/Kvary,Nfixed.M/Nvary,Kfixed.K/Nvary,Mfixed.
3D:
M/K/Nall vary.- Use
griddata_linear_interp()for the valid 3D point set. Dedicated regular-grid trilinear interpolation is deferred from Phase 1. - Downgrade or fall back on insufficient or degenerate candidates.
10. Attention_special Multidimensional Interpolation
Current attention_special query semantics are (batch, seq, heads, head_dim). Phase 1 should implement and test 1D/2D/3D paths instead of staying at 1D. Runtime still downgrades if production CSVs lack fields or candidate points.
Continuous axes:
seqoravg_seq_len.batch, only when runtime and CSV can derive it reliably.heads, only when multiple values exist and can form a boundary around the target.head_dim, only when multiple values exist and can form a boundary around the target.
Before using Runtime actual_seq_lengths_shape or Runtime actual_seq_lengths_values as a batch source, implementation must inspect the current CSV distribution and verify that the field represents batch size rather than a scalar value, ndim or one seq_len value. If the field is always batch=1 or cannot be reliably interpreted, production seq + batch 2D should not trigger, though synthetic/enrichment CSVs must still test the code path.
Exact/regime fields:
- kernel type.
- dtype.
- sparse mode / mask mode.
- KV-head regime.
- quant mode.
- input layout.
- KV cache mode.
- attention state.
- other fields that change attention kernel semantics.
q_tokens is not part of the attention regime_key, otherwise the same attention scenario can be split into overly narrow candidate groups. It remains in target/candidate axes as a non-interpolation axis for exact filtering. Phase 1 does not interpolate along q_tokens and must not aggregate candidates from different q_tokens values.
Covered entry points:
- Direct
query_mode: attention_specialmappings such astensor_cast.attention.defaultandtensor_cast.attention_quant.default. - Attention sub-kernel fallback produced by composite decomposers, such as MLA decomposition generating
FusedInferAttentionScoresub-queries. Full composite interpolation is Phase 2 scope.
Recommended axis priority:
seq/avg_seq_len -> batch(if derivable and varying) -> heads/head_dim(if varying)
Phase 1 must implement:
- 1D:
seq/avg_seq_len. - 2D:
seq + batch,seq + heads,seq + head_dim, selected by available fields. - 3D:
seq + batch + headsorseq + batch + head_dim, selected by available fields.
These paths are required capabilities, not only reserved interfaces. Runtime must verify field existence, derivability, multi-value candidates, boundary/plane completeness and valid latency. If any condition fails, downgrade or return the base result.
Axis transform:
- Default to raw seq linear interpolation.
- For known O(seq^2) attention kernels,
sqrt(seq)may be used through built-in or kernel override policy. - In 2D/3D, the same transform must be applied to both CSV candidate coordinates and runtime target coordinates.
- Record
axis_transformin details and lower confidence. Confidence remains display-only.
Decode attention note:
- Decode attention commonly has
query_len=1; the interpolation axis is not the one-token query length but the average context length or past-KV length represented byavg_seq_lenor equivalent runtime metadata. - If
avg_seq_lenis estimated from token totals, the implementation must align with the current profiling matcher's block-padding tolerance. The existing FIA matching logic accepts a small block-level gap, for example the_fia_avg_seq_len_gapstyle tolerance around 16 tokens. Phase 1 must not make interpolation stricter than exact lookup by rejecting candidates that exact lookup would accept after block padding. - The original and transformed coordinates should both be recorded in details when
sqrt(seq)is used, so block-padding effects remain debuggable.
11. Minimal Policy
Phase 1 keeps policy minimal:
| Field | Phase 1 usage |
|---|---|
axes |
Allowed continuous axes. Compute defaults to ["M", "K", "N"]; attention_special defaults to ["seq", "batch", "heads", "head_dim"], with runtime choosing the actual dimension by derivability and candidates. |
exact_fields / regime_keys |
Discrete fields that must match exactly. |
max_interpolation_dim |
Maximum interpolation dimension. Compute/MatMul can use 3. Attention_special defaults to 3 but only uses 2D/3D when CSV enrichment and candidates are sufficient. |
allow_extrapolation |
Reserved extrapolation field. Phase 1 treats it as an internal guard parameter, not an effective user option. Default false; no extrapolated result is returned. If YAML already contains it, treat as reserved/no-effect. |
axis_transform |
Used only when needed, for example attention sqrt(seq). |
Unsupported in Phase 1:
- User-configurable
fallback_order. - Communication multidimensional policy.
- Extrapolated result return.
Policy merge order:
built-in default policy
-> global interpolation_policy
-> operator_mappings.<op>.interpolation_policy
Most operators do not need explicit policy in Phase 1. Conservative defaults should be synthesized from category/query_mode.
12. Observability
Interpolation success returns:
source = QuerySource.INTERPOLATED
details should include:
{
"method": "griddata_linear",
"interpolation_dim": 2,
"axes": ["M", "K"],
"target": {"M": M, "K": K, "N": N},
"boundary": {
"M": [M_lo, M_hi],
"K": [K_lo, K_hi],
},
"corner_points": [...],
"confidence": 0.7,
"axis_transform": None,
"fallback_from": "exact_miss",
}
shape_match_info should record the interpolation rule, source, target shape, matched candidate shapes and fallback reason.
Final user-visible output must show source and confidence for profiling results, for example:
op_name source confidence method axes latency_us
torch.ops.npu.npu_transpose_batchmatmul MEASURED 1.00 exact - 120.3
torch.ops.npu.npu_matmul INTERPOLATED 0.70 griddata_linear M,K 145.8
confidence is not a statistical confidence interval in Phase 1. It is an observability tag. It must not affect result selection, warning behavior, policy acceptance or analytic fallback.
13. Extrapolation Interface
Phase 1 reserves the extrapolation interface but keeps it disabled:
allow_extrapolation = false
Therefore:
- Do not estimate targets outside the measured range.
- Do not return
QuerySource.EXTRAPOLATED. - If target is outside boundary, record
outside_boundaryand fall back.
Phase 3 may enable opt-in extrapolation with ratio guard / max bound.
14. Communication Handling
Communication operators do not enter wrapper interpolation in Phase 1. This does not mean communication has no interpolation-like behavior at all. The existing base ProfilingDataSource may return INTERPOLATED for alpha-beta/message_bytes paths. Phase 1 only avoids adding another 1D/2D/3D wrapper interpolation layer on top of communication.
15. Candidate Aggregation and Validation
Candidate aggregation is split into semantic filtering and interpolation-structure validation:
- Canonicalize CSV rows and runtime target using the same shape rules.
- Filter by policy/regime fields.
- Validate latency.
- Group candidates by axis tuple.
- Aggregate duplicate points, defaulting to mean latency.
- Build 1D boundary, 2D grid/convex hull or 3D grid/convex hull.
- Run interpolation only when the structure is complete.
- Record raw count, filtered count, aggregated count, dropped count, boundary, corner points and method in details.
The first filter excludes semantically inconsistent candidates. The second validation checks whether the remaining candidates can form a valid interpolation structure.
16. Test Plan
Math tests:
find_boundary()hit, boundary hit and out-of-range failure.linear_interp()midpoint / quarter point.griddata_linear_interp()2D and 3D success.griddata_linear_interp()outside convex hull failure.griddata_linear_interp()covers both regular-grid and irregular point sets. Dedicated bilinear/trilinear functions are not Phase 1 test targets.validate_latency()rejects NaN, infinity and negative values.estimate_confidence()is display-only.
Candidate index tests:
- Build M/K/N buckets from compute CSV rows.
- Validate
BatchMatMulV2andTransposeBatchMatMul(H,M,K) x (H,K,N)extraction. - Ensure
_FLATTEN_BATCH_KERNELSflattening is not accidentally applied to MatMul kernels. - Ensure RoPE/SwiGlu kernels do not accidentally enter M/K/N multidimensional interpolation without dedicated extractors.
- Build attention
seq/avg_seq_len,seq+batch,seq+heads/head_dim,seq+batch+heads/head_dimbuckets. - Ensure
Accelerator Coreis not a regime key unless exact lookup uses it. - Ensure transpose/layout does not split buckets more strictly than exact lookup.
Integration tests:
- Exact hit returns
MEASURED. - Compute exact miss with complete candidates returns
INTERPOLATED. - Attention_special exact miss with complete candidates returns
INTERPOLATED. - Missing candidates fall back to base behavior.
--disable-profiling-interpolationdisables wrapper interpolation.- Communication operators do not enter wrapper interpolation.
- Details and shape_match_info include dimension, axes, method, candidates and confidence.
- Attention
axis_transform=sqrtapplies to both candidates and target. actual_seq_lengths_shape/valuesthat cannot reliably express batch downgrades rather than being misread.
Holdout tests:
- Compute holdout: build a MatMul/GEMM 2D/3D grid, remove one target point and verify interpolation.
- Attention holdout: cover
avg_seq_len1D and synthetic/enriched 2D/3D grids with batch/head/head_dim.
The implementation should produce a visible holdout / accuracy-loss report, either markdown or JSON. At minimum it should list target axes, removed measured latency, interpolated latency, absolute error, relative error, source, confidence, method and candidate count for every holdout sample.
17. Implementation Steps
Recommended Phase 1 steps:
- Add user config and CLI rollback switch.
- Wrap the profiling datasource construction path with
InterpolatingDataSource. - Add
interpolation_math.py. - Add the candidate index structure.
- Implement compute M/K/N target extraction and 1D/2D/3D lookup.
- Implement attention_special
seq/avg_seq_len,batch,heads/head_dimtarget extraction and 1D/2D/3D lookup, with downgrade when CSV fields or candidates are insufficient. - Populate
QueryResult.detailsandshape_match_info. - Add debug log and final source/confidence display.
- Add unit tests, datasource integration tests and synthetic holdout tests.
18. Open Questions
Phase 1 keeps only a few review questions open:
- Whether the wrapper should be enabled by default as recommended, with
--disable-profiling-interpolationas rollback. - Whether attention CSV enrichment should add an explicit
Runtime batch_sizecolumn to improve production 2D/3D hit rate for the batch axis. - Whether the candidate index should be a generic
CandidateIndexor split intoComputeIndexandAttentionIndex. - Whether dedicated regular-grid bilinear/trilinear functions are worth adding later. Phase 1 uses
griddata_linear_interponly. - Cross-format shape canonicalization/transform remains unresolved. For example, if measured CSV points only exist in NZ/FRACTAL_NZ format but runtime queries are ND, interpolation reuse should only be considered after exact lookup rules and data validation prove a reliable canonicalization.
19. Migration and Compatibility
19.1 Default Behavior
Phase 1 wraps the profiling datasource construction path with InterpolatingDataSource by default, but it does not blindly interpolate every existing op_mapping.yaml entry. Runtime behavior is still constrained by:
- Exact-first lookup.
MEASURED, existing baseINTERPOLATEDand future baseEXTRAPOLATEDresults are returned as-is. - Wrapper fallback only covers the compute and attention_special allowlists.
- Entries without explicit
interpolation_policyuse conservative built-in defaults. - Schema, regime-key, candidate-structure and latency validation failures downgrade or fall back to base behavior.
Existing entries can continue to run unchanged. New policy fields are optional and are used only to narrow or explicitly document interpolation behavior.
19.2 QueryResult and Downstream Compatibility
QueryResult remains backward compatible:
sourcecontinues to distinguishMEASURED,INTERPOLATED,PARTIALand other result origins.confidenceis a display/observability field. It does not select winners or change scheduling decisions.detailsandshape_match_infomay gain optional keys such as interpolation dimension, axes, method, candidate counts, boundary and fallback reason.- Downstream consumers should read these fields as optional dictionaries and must not assume a fixed details key set.
Reports such as throughput_optimizer may display the new information, but they must not change optimization decisions solely because confidence or a details key is missing.
19.3 CSV Schema Compatibility
Phase 1 does not require all existing CSVs to be versioned or enriched immediately:
- Existing CSV exact hits keep exact behavior.
- Missing high-dimensional attention fields downgrade to
seq/avg_seq_len1D or fall back. - New candidate fields are optional columns; old CSVs with missing columns must not fail to load.
- If a later phase introduces required CSV columns or a CSV version, it must also provide compatible readers and migration tooling.
19.4 Rollout and Rollback
Phase 1 provides --disable-profiling-interpolation as the hard rollback switch. Recommended rollout:
- Shadow/dry-run: record candidates and predicted latency, but keep using the base result.
- Allowlist enablement: return wrapper
INTERPOLATEDonly for validated compute / attention_special kernels or model families. - Default enablement: keep the disable switch and debug details while monitoring hit rate, fallback reason and holdout error.
If behavior regresses, users can disable the wrapper immediately, and maintainers can narrow impact through policy allowlists or stricter regime keys.
20. Risks and Mitigation
| Risk | Impact | Mitigation |
|---|---|---|
| Default wrapper changes old behavior | Exact miss may return interpolation instead of the old fallback | Exact-first behavior, compute/attention_special allowlist, and --disable-profiling-interpolation. |
| Shape normalization diverges from exact lookup | Incorrect cross-regime interpolation | Reuse or strictly align current exact lookup normalizers; test special cases. |
| 2D/3D candidates are incomplete or degenerate | Invalid interpolation result | Downgrade or fall back. Never extrapolate in Phase 1. Record reasons in details/debug log. |
21. Acceptance Criteria
- Compute multidimensional interpolation is end-to-end usable: in
--performance-model profiling, MatMul/GEMM compute operators with CSV exact miss and complete candidates can returnQuerySource.INTERPOLATEDfor M/K/N 1D, 2D and 3D interpolation. - Attention_special multidimensional interpolation is end-to-end usable: direct
query_mode: attention_specialoperators and attention sub-kernel fallback from decomposers can executeseq/avg_seq_len1D,seq+batchorseq+heads/head_dim2D, andseq+batch+heads/head_dim3D code paths. Production queries returnINTERPOLATEDonly when fields are derivable, candidates are complete and regime matches; otherwise they downgrade or fall back predictably. - Interpolation results are reversible and explainable: users can switch back to the base
ProfilingDataSourcebehavior with--disable-profiling-interpolation, and output/report/details/debug logs showsource,confidence, interpolation axes, method, candidates and fallback reason. Confidence is displayed only and does not affect result selection.