title

version

lastUpdated

Resilience Guide

3.8.40

Resilience Guide

OmniRoute has three distinct but related resilience mechanisms. Each has a different scope and purpose. Keep them separate when debugging routing behavior.

3-layer resilience model

Source: diagrams/resilience-3layers.mmd

1. Provider Circuit Breaker

Scope: entire provider (e.g., glm, openai, anthropic).

Purpose: stop sending traffic to a provider that is repeatedly failing at the upstream/service level.

Implementation:

Core class: src/shared/utils/circuitBreaker.ts
Wiring: src/sse/handlers/chatHelpers.ts, src/sse/handlers/chat.ts
Status API: GET /api/monitoring/health
Reset API: POST /api/resilience/reset
Wrappers: open-sse/services/accountFallback.ts
DB table: domain_circuit_breakers

States:

CLOSED — normal traffic allowed
DEGRADED — traffic still allowed, but elevated provider failures are being tracked
OPEN — provider temporarily blocked; combo routing skips it
HALF_OPEN — reset timeout elapsed; probe request allowed

Configurable defaults (open-sse/config/constants.ts, exposed in Dashboard → Settings → Resilience):

Class	Degraded at	Opens at	Reset timeout
OAuth	5 failures	8 failures	60s
API-key	7 failures	12 failures	30s
Local	derived	2 failures	15s

degradationThreshold controls when a provider enters DEGRADED; failureThreshold controls when it opens and is skipped. Local provider profiles are not exposed on the Resilience settings page yet.

Trip codes: only provider-level statuses [408, 500, 502, 503, 504]. Do NOT trip for account-level errors (most 401/403/429 — those belong to cooldown or lockout).

Lazy recovery: when OPEN expires, getStatus(), canExecute(), getRetryAfterMs() refresh state to HALF_OPEN. No background timer needed.

2. Connection Cooldown

Scope: single provider connection/account/key.

Purpose: skip one bad key while other connections for the same provider keep serving.

Implementation:

Mark unavailable: src/sse/services/auth.ts::markAccountUnavailable()
Selection: getProviderCredentials* in same file
Cooldown calc: open-sse/services/accountFallback.ts::checkFallbackError()
Settings: src/lib/resilience/settings.ts

Fields per connection:

rateLimitedUntil — timestamp until cooldown expires
testStatus: "unavailable"
lastError, lastErrorType, errorCode
backoffLevel — exponential backoff counter

Default cooldowns:

OAuth base: 5s
API-key base: 3s
API-key 429: prefers upstream Retry-After/reset headers/parseable reset text
Backoff: baseCooldownMs * 2 ** failureIndex

Anti-thundering-herd guard: prevents concurrent failures from over-extending cooldown or double-incrementing backoffLevel.

Terminal states (NOT cooldowns):

banned — set by banned-keyword / account-ban detection (see BAN_DETECTION)
expired
credits_exhausted

These persist until credentials change or an operator resets them. Do not overwrite terminal states with transient cooldown state.

Lazy recovery: when rateLimitedUntil is past, connection becomes eligible again. On successful use, clearAccountError() clears all error fields.

3. Model Lockout

Scope: provider + connection + model triple.

Purpose: avoid disabling a whole connection when only one model is unavailable or quota-limited.

Examples:

Per-model quota providers returning 429
Local providers returning 404 for one missing model
Provider-specific mode/model permission failures (e.g., Grok modes)

Implementation: open-sse/services/accountFallback.ts — lockModel(), clearModelLock(), getAllModelLockouts().

Model Cooldowns Dashboard (v3.8.0)

UI: Settings → Model Cooldowns (src/app/(dashboard)/dashboard/settings/components/ModelCooldownsCard.tsx)

Lists active lockouts with: provider, connection, model, reason, expiresAt. Operators can manually re-enable a model from the card.

REST API:

GET /api/resilience/model-cooldowns — list active lockouts
DELETE /api/resilience/model-cooldowns — manual re-enable. Body: {provider, connection, model}. Auth: management.

Lockout settings UI + success-decay recovery (v3.8.23)

Model lockout went from always-on hardcoded behavior to a fully configurable, opt-in feature with its own settings card and a self-healing recovery path.

Settings card: Settings → Model Lockout (src/app/(dashboard)/dashboard/settings/components/ModelLockoutCard.tsx). This is distinct from the read-only ModelCooldownsCard above (which only lists active lockouts) — the new card configures the parameters. Defaults live in DEFAULT_MODEL_LOCKOUT_SETTINGS (src/lib/resilience/modelLockoutSettings.ts):

Setting	Default	Meaning
`enabled`	`false`	Master toggle — model lockout is off by default.
`errorCodes`	`[403, 404, 429, 502, 503, 504]`	Upstream statuses that count as a model-scoped failure.
`baseCooldownMs`	`120_000` (120 s)	Initial lockout duration for the first failure.
`maxCooldownMs`	`1_800_000` (30 min)	Cap on the escalated cooldown.
`maxBackoffSteps`	`10`	Max exponential-backoff escalation steps.
`useExponentialBackoff`	`true`	Whether repeated failures escalate the cooldown exponentially.

Settings persist through the normal settings store and validate via the resilience settings schema; the card clamps baseCooldownMs/maxCooldownMs (with maxCooldownMs ≥ baseCooldownMs) and maxBackoffSteps.

Success-decay recovery: recovery is not purely timer expiry. A healthy response walks the model's failure count back down so a model that recovered mid-window stops escalating (and clears) before its timer would. On a successful combo target, open-sse/services/combo.ts calls decayModelFailureCount() (open-sse/services/accountFallback.ts), which halves the stored failureCount (Math.floor(failureCount / 2)); when it reaches 0 the lockout entry is deleted entirely. The counterpart recordModelLockoutFailure() increments the count (and escalates the cooldown) on failures within the escalation window. This success-decay is in addition to plain timer expiry — either path can re-enable a model.

State: lockouts are held in-memory (per-process Maps of ModelLockoutEntry keyed by provider:connectionId:model), not persisted to the DB — they are lost on restart. The settings are persisted; the active lockout state is ephemeral.

Subscription accounts (GLM, MiniMax, etc.) often accept only ~1–3 concurrent requests; exceeding that triggers 429s and cooldowns. This is acute under quota-share (qtSd/…) combos, where several API keys share one upstream account. Three layers keep a shared account from being flooded.

Per-connection concurrency cap (`max_concurrent`)

Each provider connection can declare a max_concurrent ceiling (provider_connections.max_concurrent, set in the connection modal / API / DB). Leave it empty for no limit. This is the single knob that drives the serialization layer below — set it to the account's real concurrency (e.g. GLM ~1, MiniMax ~2).

When a quota-share dispatch targets a connection that declares a positive max_concurrent, concurrent requests to that account are serialized through a per-connection semaphore (key qsconn:<connectionId>): excess requests wait in the queue instead of flooding the account. It is fail-open — a saturated queue or timeout proceeds without a slot rather than ever rejecting a dispatchable request. Toggle in Settings → Resilience → Quota-share per-connection concurrency (resilienceSettings.quotaShareConcurrencyLimit.enabled, default on). Without a max_concurrent cap the behavior is unchanged.

The quota-share routing gate (selectQuotaShareTarget, DRR + P2C) is itself fail-open and only deprioritizes an at-cap connection — with a single-connection pool it cannot hard-limit, so this semaphore is what actually contains the flood.

Combo cooldown-aware retry

For quota-share combos only, a request that would crystallize a 429 for a SHORT transient cooldown waits it out and re-dispatches instead of returning the 429. Bounded by comboCooldownWait (enabled, maxWaitMs 5s, maxAttempts 2, budgetMs 8s) in Settings → Resilience. It never waits on quota_exhausted (locked until midnight) or auth/not-found reasons.

Other Resilience Features

17 routing strategies (priority, weighted, round-robin, context-relay, fill-first, p2c, random, least-used, cost-optimized, reset-aware, reset-window, headroom, strict-random, auto, lkgp, context-optimized, fusion) — see AUTO-COMBO.md.
Reset-aware routing (v3.8.0) — prioritizes connections by quota reset time.
Background mode degradation — Responses API background: true degraded to sync with warning.
Dynamic tool limit detection — backs off providers when tool count limits hit.
Emergency fallback — controlled by OMNIROUTE_EMERGENCY_FALLBACK; operators can override it from the Feature Flags page without a restart.

Debugging

All keys for a provider skipped → check both circuit breaker state AND each connection's rateLimitedUntil/testStatus.
Provider permanently excluded after reset window → code reading raw state instead of getStatus()/canExecute().
One key fails, others should work → prefer connection cooldown over circuit breaker.
Only one model fails → prefer model lockout over connection cooldown.
State should self-recover but doesn't → check for future timestamp + read path that refreshes expired state. Permanent statuses require manual changes.

TLS Fingerprinting & Stealth

Provider-specific stealth (JA3/JA4, CCH, obfuscation) is separately documented — see STEALTH_GUIDE.md.

Resilience testing (Fase 8 · Bloco C)

Além dos unit tests da lógica de resiliência, três testes exercitam o runtime sob estresse/falha real (todos integração/nightly — nenhum bloqueia PR):

Teste	O quê	Rodar
Chaos	Fake-upstream node injeta latência/reset/timeout/503 reais; valida que o circuit breaker abre/recupera e `checkFallbackError` classifica 503 como fallback recuperável.	`RUN_CHAOS_INT=1 npm run test:chaos`
Heap-growth	~500 streams por `createSSEStream` sob `--expose-gc`; falha se o heap crescer além do teto (guarda OOM #3069).	`npm run test:heap`
k6 soak	Carga sustentada contra `/api/monitoring/health`; thresholds p95/erro.	`k6 run tests/load/k6-soak.js` (nightly)

Orquestrados por .github/workflows/nightly-resilience.yml (cron + dispatch). No test:integration default, chaos e heap se auto-skipam (sem RUN_CHAOS_INT/--expose-gc).

Resilience Guide

1. Provider Circuit Breaker

2. Connection Cooldown

3. Model Lockout

Model Cooldowns Dashboard (v3.8.0)

Lockout settings UI + success-decay recovery (v3.8.23)

4. Quota-Share Concurrency Control (v3.8.36)

Per-connection concurrency cap (max_concurrent)

Quota-share request serialization

Combo cooldown-aware retry

Other Resilience Features

Debugging

TLS Fingerprinting & Stealth

Resilience testing (Fase 8 · Bloco C)

See Also

Per-connection concurrency cap (`max_concurrent`)