AI tool-schema normalization
@oh-my-pi/pi-ai exposes one unified schema normalizer that providers consume
before tools are sent on the wire. All walkers live in
packages/ai/src/utils/schema/normalize.ts; the operational contract is
packages/ai/src/utils/schema/CONSTRAINTS.md.
There is no separate strict-mode.ts module any more — OpenAI strict-mode
sanitization, OpenAI Responses oneOf rewriting, Google/Vertex/Gemini-CLI
sanitization, Cloud Code Assist Claude sanitization, and MCP sanitization all
share the same option-driven walk.
Entry points
All exports live under @oh-my-pi/pi-ai/utils/schema:
normalizeSchema(value, options)— generic option-driven walker.normalizeSchemaForGoogle(value)— Gemini / Vertex / Gemini CLI.normalizeSchemaForCCA(value)— Cloud Code Assist Claude (Antigravity + GCA).normalizeSchemaForMCP(value)— MCP inputSchemas before they enter the custom-tool registry.tool-bridge.tsruns every MCPinputSchemathrough this dispatcher.normalizeSchemaForOpenAIResponses(schema)(aliassanitizeSchemaForOpenAIResponses) — rewritesoneOf→anyOffor the Responses family.sanitizeSchemaForStrictMode(schema)andenforceStrictSchema(schema)/tryEnforceStrictSchema(schema)— the OpenAI strict-mode pipeline (sanitize → enforce). All three are exported fromnormalize.ts.adaptSchemaForStrict(schema, strict)from./adapt— thin composer that wrapstryEnforceStrictSchemafor provider call sites and consultsPI_NO_STRICT(envPI_NO_STRICT) for the global bypass.
Removed in the unified-flow refactor:
strict-mode.ts(merged intonormalize.ts).sanitize-google.tsandnormalize-cca.ts(replaced bynormalizeSchemaFor*dispatchers).StringEnumhelper — usez.enum([...])directly; Zod's emitted JSON Schema is already wire-compatible with Google and other providers.sanitizeSchemaFor{Google,CCA,MCP}/prepareSchemaForCCA— renamed tonormalizeSchemaFor{Google,CCA,MCP}.
Dispatcher mapping
| Provider transport(s) | Dispatcher |
|---|---|
openai-completions, openai-responses, openai-codex-responses |
adaptSchemaForStrict (sanitize + enforce) |
openai-responses family (oneOf → anyOf only) |
normalizeSchemaForOpenAIResponses |
google-generative-ai, google-vertex, Gemini CLI |
normalizeSchemaForGoogle |
Cloud Code Assist Claude (Antigravity + GCA, claude-* model ids) |
normalizeSchemaForCCA |
MCP inputSchema ingestion |
normalizeSchemaForMCP |
anthropic-messages (native, not CCA) |
per-provider whitelist in anthropic.ts |
Gemini CLI / Antigravity CCA MUST run the full normalizeSchemaForCCA
pipeline (not just the first keyword-stripping pass) to keep parity with the
shared Google Claude path.
Walk semantics
normalizeSchema first detoxifies serialized Zod-instance-shaped inputs, upgrades them to
JSON Schema 2020-12, dereferences the tree, then walks it with the option set
pinned by the dispatcher. Each node:
- Renames
snake_casecombinator/property keys to camelCase (any_of→anyOf, etc.; collisions follow python-genaipop(from)/set(to)semantics — snake_case wins). - Applies the
handle_null_fieldscollapse for nullable unions before recursing into children. - Strips keys the target provider does not support, optionally lifting
human-meaningful keys (
pattern,format, min/max,default,examples, ...) into the siblingdescriptionvia the spill formatter (spill.ts). Structural/meta keys ($ref,$defs,additionalProperties) are not spilled. - Normalizes type unions (
type: ["T", "null"]→type: "T"+ nullable marker on Google, plaintype: "T"on CCA). - Collapses object-only / same-type combiners, optionally lossy-collapses mixed-type combiners (CCA only), and runs the residual-combiner fixpoint.
- Validates against AJV 2020 when
validateAndFallbackis set (CCA path) and emits the per-tool fallback{ "type": "object", "properties": {} }on residual incompatibility —typearray,type: "null",nullablekey, or any remaininganyOf/oneOf/allOf.
OpenAI strict-mode pipeline
adaptSchemaForStrict(schema, strict) runs tryEnforceStrictSchema,
which composes:
- Sanitize (
sanitizeSchemaForStrictMode): strips non-structural keywords (format,pattern, min/max,examples,default,if/then/else,not,unevaluated*,patternProperties,dependent*,content*,min/maxProperties,$dynamicRef, etc.). Thedefaultvalue is inlined into the siblingdescriptionas(default: X)before being dropped, unlessdescriptionalready contains(default:or nodescriptionexists. - Enforce (
enforceStrictSchema): every object node getsadditionalProperties: false, every property goes intorequired, and optional properties become nullable unions (anyOf: [<original>, { "type": "null" }]). TupleprefixItemsare strictified recursively.
The two passes use cache/cycle guards, so refs, allOf, and nullable wrapping
stay deterministic without recursing forever. tryEnforceStrictSchema is
fail-open: if anything throws, it returns { strict: false, schema: upgraded }
so callers MUST emit strict: true only when enforcement actually succeeded.
Edge cases the strict-mode normalizer handles
- Local
$refinlining. OpenAI strict mode rejects{ "$ref": "...", "description": "..." }with sibling keys. The sanitizer pre-resolves local#/...refs against the root and merges with sibling keys winning over the resolved def — same precedence asopenai-python's_ensure_strict_json_schema. Recursive refs are guarded by the per-walk epoch. - Single-item
allOf. A{ "allOf": [X], ...siblings }collapses to{ ...X, ...siblings }with the inlined entry's keys winning over the original siblings (matchesopenai-python's_pydantic.py:79-83). Multi- itemallOfis left intact for the downstream validator to reject if needed. - Type-array branches and nullable unions. When a node has
type: ["T", "U"], the sanitizer emits one variant schema per type, pruning type-specific keywords (e.g.properties/requiredonly stay on theobjectvariant,itemsonly on thearrayvariant). The shareddescriptionis hoisted onto theanyOfwrapper instead of being duplicated on every branch — so a strict nullable union becomes{ anyOf: [T, { type: "null" }], description: "..." }, notanyOf: [{ ..., description }, { ..., description }]. - Enum/const without a
type. Both sanitize and enforce paths callinferStrictPrimitiveTypeFromEnumOrConstto infer the primitivetypefromenum/constvalues. Mixed-primitive enums ([1, "two", null]), enums containing objects/arrays, and non-primitiveconstvalues ({a:1},[1,2,3]) cannot be described by a singletypekeyword and trigger the strict-mode fail-open path — emitting a typeless schema would just be rejected on the wire by OpenAI.
Performance: static fingerprint cache
resolveProviderModels in packages/ai/src/model-manager.ts and
readModelCache/writeModelCache in model-cache.ts cooperate via a
schema-v3 static_fingerprint column on the model_cache SQLite table.
fingerprintStatic(staticModels)hashes the static catalog slice (Bun.hash(JSON.stringify(models))in base36) and memoizes the result in a per-processWeakMapkeyed by the array reference. Multiple cold-start arms callingresolveProviderModelswith the samestaticModelsarray pay the JSON+hash cost once.- On cache read, if the network fetch is being skipped, the cached row is
fresh + authoritative, and the cached
static_fingerprintmatches the current one,resolveProviderModelsreturns the cached models verbatim — the cache already incorporates the same static state, so re-runningmergeDynamicModels(static, cache)would just rebuild the same objects. mergeModelSourcesandmergeDynamicModelsshort-circuit on empty-source inputs (the common shape after(static, [])or for providers without a static catalog), avoiding Map churn entirely.
Cache rows written before schema v3 are dropped by the cache-version
check; the column defaults to '' for any row that survives a version
upgrade so the fingerprint-equality check naturally fails closed and the
full merge re-runs.
Related
docs/models.md— registry, equivalence, compat flags (supportsStrictMode,toolStrictMode,disableStrictTools).docs/provider-streaming-internals.md— how the normalized schemas are used downstream during the provider stream loop.docs/mcp-server-tool-authoring.md— MCPinputSchemaingestion vianormalizeSchemaForMCP.packages/ai/src/utils/schema/CONSTRAINTS.md— operational contract for every normalization rule.