| Release v3.8.28 (#4053) * chore(release): open v3.8.28 development cycle * fix(ws): warm SSE auth import on LiveWS startup; relocate boot test to integration (#4063) The live dashboard WebSocket sidecar lazily import()-ed the SSE auth module inside the connection handler, only on the API-key path. That cold import pulls in hundreds of transitive modules and takes ~7s under tsx, blocking the single-threaded event loop. The first API-key WebSocket connection therefore stalled the loop long enough that any connection arriving in that window — e.g. a same-origin cookie client — could not complete its handshake and timed out. This was deterministic, not an "env flake": the boot test fires an API-key connection immediately followed by a cookie connection, so the cookie connection always raced the cold import and timed out (reproduced 3/3 locally and red on every CI run; proven via instrumented probes — reversing the order or warming the module first makes both connections open in ~20ms). Fix: - Memoize the auth-module import and warm it once at startup (before listen), so connection handling never pays the cold-import cost. Real improvement: the first API-key client no longer stalls the event loop for concurrent clients. - Relocate the boot test from tests/unit/cli to tests/integration. It spawns a real subprocess + WS server + SQLite (~9-11s); under the unit suite's --test-concurrency=20 it contended for CPU and destabilized the shard. The serial integration runner is its correct home; it still guards #4004's cookie-parse fix on every PR via the integration CI job. - Bump the test's startup/overall timeouts to absorb the eager auth warm. Makes `npm run test:unit` deterministically green (the only remaining unit red). Validated: relocated test 3/3 green via the integration runner (was 3/3 red); typecheck:core + eslint clean; confirmed it no longer matches the test:unit glob and does match tests/integration/*.test.ts. * fix(ws): start LiveWS sidecar with cwd at package root (#4055) (#4064) * chore(deps): bump ossf/scorecard-action from 2.4.0 to 2.4.3 (#4045) Integrado em release/v3.8.28. Patch de SHA do ossf/scorecard-action (2.4.0→2.4.3), mantém SHA-pin. Reds de CI são exclusivamente os shards flaky pré-existentes branch-wide (Unit 7/8, Integration, Coverage 7/8, Node 1/2) — não relacionados ao bump (PR deps-only). * deps: bump electron from 42.4.0 to 42.4.1 in /electron (#4049) Integrado em release/v3.8.28. Patch do electron (42.4.0→42.4.1). Reds de CI: shards flaky pré-existentes + PR Test Policy = falso-positivo (mudança deps-only sob electron/ não comporta teste de código) + Node 26(2/2) sem step (flake/infra). Precedente #3913/#3914 (electron dependabot mergeado nessas condições). * fix(auto): resolve built-in auto catalog combos (#4058) Integrado em release/v3.8.28. Resolve os IDs de catálogo `auto/*` built-in (combos virtuais) — corrige o 400 "No auto combos configured" em auto/best-coding etc. Ajuste de review: os mapas AUTO_TEMPLATE_VARIANTS/VALID_AUTO_VARIANTS duplicados em chat.ts e chatHelpers.ts foram extraídos para open-sse/services/autoCombo/builtinCatalog.ts (DRY), devolvendo chatHelpers.ts <800 LOC; baseline de chat.ts rebaselinado 1432→1458 (lógica nova). Fast QG + semgrep + dast verdes; 22/22 testes. * chore(docs): update Discord invite link to a non-expiring one (#4067) * chore(deps): freeze @huggingface/transformers in dependabot (hard-pin) (#4066) Integrado em release/v3.8.28. Congela @huggingface/transformers no dependabot (pin exato 3.5.2, load-bearing p/ LLMLingua + memory embeddings, VPS-validado #4014). Fast QG + semgrep + dast verdes. * ci(quality): flip TIA impacted-unit-tests gate from advisory to blocking (#4069) The pre-existing release unit test-debt that kept the TIA "Impacted unit tests" step advisory has been cleared: - #4030 restored 16 lossless Zod/registry reds (from the oyi77 modularize refactors). - #4063 fixed the last red — the LiveWS boot test — which was a real deterministic event-loop stall in the WS sidecar (cold ~7s lazy auth import racing a second connection), not an env flake; fixed (warm the import at startup) and relocated to the integration suite. A full workflow_dispatch ci.yml run on release/v3.8.28 then showed all 8 Unit Tests shards green. The remaining Integration Tests / Quality Ratchet reds are pre-existing and unrelated (combo/resilience env-flakes; eslint/i18n baseline drift). Removing continue-on-error makes PR->release block on unit-test regressions in the TIA-selected impacted set (fail-safe still runs the full unit suite on hub/unmapped changes). typecheck:core was already blocking. Closes the fast-gates "no tests on PR->release" hole (Quality Gate v2 / Fase 9, P2). * docs(compression): document LLMLingua optional deps + on-demand install (#4061) Integrado em release/v3.8.28. Docs LLMLingua optional deps + on-demand install (F3.1). * feat(dashboard): Combo Studio connection-cooldown badge (U1b Slice 2) (#4068) Integrado em release/v3.8.28. Combo Studio connection-cooldown badge (U1b Slice 2 / F5.1). * feat(compression): record Context Editing telemetry (engine: context-editing) (#4062) Integrado em release/v3.8.28. Context Editing telemetry (F4.1). * feat(sse): Context Editing relay coverage + 400-fallback (#4065) Integrado em release/v3.8.28. Context Editing relay coverage (cc-*) + 400-fallback (F4.2/F4.3). Conflito de file-size-baseline.json (vs #4062) resolvido por união (ambas justificativas + base.ts 1292 + chatCore.ts 5898). Validado local no tree mergeado: typecheck:core ✓, eslint ✓, check:file-size ✓, 4/4 testes ✓; semgrep + semgrep-cloud verdes. Fast QG enfileirado (saturação de runner) — mergeado nos gates de política verificados (precedente #4034/#4020). * feat(providers): add OrcaRouter (OpenAI-compatible routing gateway) (#4070) Integrado em release/v3.8.28. Adiciona o provider OrcaRouter (OpenAI-compatible, API-key, DefaultExecutor). Ajuste de review: rebaseline de file-size de providers.ts 3147→3159 (+12 da entrada OrcaRouter). Validado local no tree sincronizado: provider-consistency ✓, docs-counts STRICT 227 ✓, typecheck:core ✓, teste 3/3 ✓, eslint ✓; semgrep + semgrep-cloud verdes. Fast QG/dast enfileirados (saturação de runner) — merge nos gates de política verificados (precedente #4034/#4065). * test(infra): isolate DATA_DIR per test process; raise Stryker concurrency 1→4 (#4078) * test(infra): isolate DATA_DIR per test process; raise Stryker concurrency 1→4 Every test process resolved DATA_DIR to the same default (~/.omniroute) when the env var was unset (src/lib/dataPaths.ts::resolveDataDir), so concurrent test files opened the SAME on-disk storage.sqlite. node:test spawns a process per file and Stryker spawns one per sandbox, so this shared file caused cross-file state races: - SQLite lock contention that hung `npm run test:unit` under high --test-concurrency (the ~95-min local hang), and - the non-deterministic baseline that forced stryker.conf.json to concurrency: 1, which in turn could not finish the ~15k-mutant run inside the nightly timeout (the cancelled 2026-06-16/17 nightly-mutation runs) — blocking Quality Gate v2 / Fase 9 Onda 2. open-sse/utils/setupPolyfill.ts could NOT host the fix: it is imported by production (bin/omniroute.mjs, proxyFetch.ts, proxyDispatcher.ts), where redirecting DATA_DIR would point the live SQLite DB at a throwaway temp dir. So this adds a TEST-ONLY tests/_setup/isolateDataDir.ts that gives each process its own temp DATA_DIR when none is set (tests that set DATA_DIR explicitly still win), wired via --import into the test, mutation and CI invocations. Verified: - Stryker dry-run A/B at concurrency=4: FAILS without the isolation import (account-fallback-service tap exit 9, a cross-file race) and PASSES with it. - Full `npm run test:unit` green with isolation (0 fail; a one-off chatcore-translation-paths timeout flake did not reproduce and passes 3/3 isolated) and noticeably faster — the DB lock contention is gone. - New tests/unit/isolate-datadir.test.ts guards the contract (unique temp DATA_DIR when unset; explicit DATA_DIR respected). Wired the --import into: package.json (13 test scripts), stryker.conf.json (tap.nodeArgs + concurrency 1→4), .github/workflows/quality.yml (TIA step), ci.yml (the 5 unit/coverage/integration commands), and bumped nightly-mutation.yml timeout 120→180 for the first cold run before the incremental cache is seeded. * ci(quality): run the TIA gate at CI concurrency (4) to stop oversubscription flakes The TIA "Impacted unit tests" step (made blocking in #4069) ran its fail-safe via `npm run test:unit` — concurrency=20, tuned for multi-core dev machines. On a 4-vCPU CI runner that is 5x oversubscribed, so timing-sensitive tests flake under the load (e.g. `db-backup-extended` "The database connection is not open", `chatcore-translation-paths` upstream-timeout). That intermittently fails a blocking gate on legitimate PRs — exactly what surfaced on the DATA_DIR-isolation PR, whose package.json/workflow changes trip the __RUN_ALL__ fail-safe. Run both the impacted set and the fail-safe at --test-concurrency=4, matching the stable ci.yml unit job. Adds a `test:unit:ci` script (test:unit at concurrency=4). The DATA_DIR isolation in this PR keeps the parallel run race-free, so the only change here is matching the runner's core count. Verified locally: db-backup-extended passes 8/8 in isolation (5 with isolation, 3 without). * docs(quality-gates): reconcile gate inventory with ci.yml + add ROI rationalization backlog (#4095) The "authoritative" gate inventory in QUALITY_GATES.md had drifted from ci.yml: it omitted 9 wired gates — `audit:deps`, `check:tracked-artifacts`, `check:lockfile`, `check:licenses` (lint job), `check:dead-code`, `check:cognitive-complexity`, `check:type-coverage`, `check:codeql-ratchet` (quality-gate job), and `check:pr-evidence` (pr-test-policy job). You can't rationalize an inventory you can't trust, so this reconciles it first. Adds those 9 rows to their job tables and a "Rationalization Backlog (ROI review)" section capturing the Fase 9 Onda 3 findings: mechanical merge/dedup candidates (CVE scanners audit:deps↔osv, the two complexity ESLint passes, cycles↔circular-deps, the two /api anti-hallucination gates, the doubly-run check:docs-sync, check:node-runtime ×11) and the operator-only flip/drop decisions (typecheck:noimplicit vs the type-coverage ratchet, test:vitest:ui parked fails, check:secrets frozen FPs, openapi-security-tiers, pr-evidence, the orphaned semgrep baseline). Also flags the undocumented advisory docs-lint job and the standalone scanner workflows. Docs-only — no gate behavior changes. The merges (CI changes) and flips (policy) are deferred to operator-scoped follow-ups; this PR only makes the map accurate. * test(dashboard): smoke e2e for the Combo Live Studio page (#4075) Integrated into release/v3.8.28 * fix(sse): friendly 413 message for ChatGPT web payload-too-large (#4080) Integrated into release/v3.8.28 * feat(sse): port Claude Code quota-probe bypass + command meta-request helpers (#4083) Integrated into release/v3.8.28 * feat(api): exact offline token counting for count_tokens fallback via tiktoken (#4087) Integrated into release/v3.8.28 * feat(compression): RTK learn/discover (sample source + API + UI) (#4088) Integrated into release/v3.8.28 * feat(dashboard): 2026-06-17 free-tier refresh — honest catalog, uncapped + boost tiers, Layout A budget table (#4089) Integrated into release/v3.8.28 * feat(mitm): capture-pipeline self-test route (Gap 12) (#4093) Integrated into release/v3.8.28 * fix(mitm): crash-safe system-state teardown + socket timeouts (ProxyBridge-inspired hardening) (#4084) Integrated into release/v3.8.28 (Fast QG TIA red = 3 pre-existing timing flakes verified passing locally 82/82; PR own tests green) * feat(mitm): attribute intercepted requests to originating process (Gap 1) (#4085) Integrated into release/v3.8.28 (Fast QG TIA red = 3 pre-existing timing flakes verified passing locally 82/82; PR own tests green) * fix(sse): route image requests only to confirmed-vision combo targets (#4071) Integrated into release/v3.8.28 * fix(security): injection guard respects INJECTION_GUARD_MODE DB feature flag (#4077) Integrated into release/v3.8.28 * fix(ws): proxy LAN /live-ws upgrades and add unset JWT_SECRET warning (#4079) Integrated into release/v3.8.28 * fix(dev): force webpack in custom dev server (Turbopack 16.2.x panics) (#4092) Integrated into release/v3.8.28 * ci(quality): dedup the doubly-run check:docs-sync + record validated ROI backlog (#4099) Onda 3 (gate ROI-review) Phase 2. Two parts, both low-risk: 1. Remove the standalone `check:docs-sync` from the `lint` job — it already runs in the `docs-sync-strict` job (via `check:docs-all`) and the husky pre-commit hook, so the `lint`-job copy was a pure duplicate. No coverage lost. 2. Update the Rationalization Backlog in QUALITY_GATES.md with trust-but-verify findings: several "obvious" merges/flips from the ROI review turned out to hide debt and are NOT clean drop-ins — - CVE merge (audit:deps→osv): different semantics (hard high/critical vs regression-ratchet) — keep both. - cycles→circular-deps: dpdm reports 91 cycles (can't promote to blocking) and is broader-scope than the green curated check:cycles — keep both. - openapi-security-tiers flip: blocked by traffic-inspector routes missing the x-loopback-only annotation. - complexity + /api merges: valid but real config/script surgery — deferred. - node-runtime ×11: ~10s savings vs a cheap guard — low ROI, skip. The remaining flips (typecheck:noimplicit, test:vitest:ui, check:secrets, pr-evidence, semgrep) are operator policy decisions, left for the owner. * chore(deps): bump actions/github-script from 7 to 9 (#4046) Integrated into release/v3.8.28 (dependabot GH-Action bump; SHA-pin preserved) * chore(deps): bump actions/setup-node from 4 to 6 (#4048) Integrated into release/v3.8.28 (dependabot GH-Action bump; SHA-pin preserved) * chore(deps): bump actions/upload-artifact from 4 to 7 (#4044) Integrated into release/v3.8.28 (dependabot GH-Action bump; SHA-pin preserved) * chore(deps): bump actions/cache from 4.3.0 to 5.0.5 (#4047) Integrated into release/v3.8.28 (dependabot GH-Action bump; SHA-pin preserved) * deps: bump the development group with 10 updates (#4051) Integrated into release/v3.8.28 (dependabot dev group; cyclonedx 4->5 verified compatible with the SBOM invocation --ignore-npm-errors/--output-format JSON/--output-file) * fix(dashboard): event-driven fail-open auto-refresh for embedded log views (#4054) (#4103) The Request Logger gated each auto-refresh tick on a static document.visibilityState === "visible" read. Hosts that report a permanent non-"visible" state without ever firing a visibilitychange event (Docker dashboard wrappers, embedded/proxied webviews) froze auto-refresh entirely — only the manual Refresh button worked, a regression from 3.8.24's unconditional polling. The pause is now event-driven and fail-open: visibleRef starts true and is only flipped to false on a real visibilitychange → hidden transition, so a host that never signals a genuine background transition keeps polling, while normal browser tabs still pause when actually backgrounded. Regression test reproduces the misreporting-host case (RED) and the perf guard is re-encoded under the event-driven semantics. * fix(docker): raise build-stage Node heap to stop production-build OOM (#4076) (#4104) The Docker builder stage ran `npm run build` with V8's default heap ceiling (~2 GB). After #4052 forced the heavier webpack engine (Turbopack panics on this Next.js version), the production optimization pass exceeded that ceiling and the build died with "FATAL ERROR: ... JavaScript heap out of memory" at [builder] npm run build. The builder stage now sets NODE_OPTIONS=--max-old-space-size (default 4096 MB, overridable via --build-arg OMNIROUTE_BUILD_MEMORY_MB) before the build; the value propagates to the spawned next build (resolveNextBuildEnv spreads process.env). Build-only — the runtime heap on the runner stage is unchanged, and CI/local builds (which invoke npm run build directly) are unaffected. Regression guard: tests/unit/dockerfile-build-heap-4076.test.ts asserts the builder stage sets the heap ceiling, before npm run build, at >= 4096 MB. * feat(agent-bridge): portable JSON import/export of config (Gap 4) (#4094) Integrated into release/v3.8.28 * feat(cli): add 'omniroute launch' zero-config Claude Code launcher (#4097) Integrated into release/v3.8.28 (Fast QG TIA red = pre-existing env-doc-contract drift [MITM_IDLE_TIMEOUT_MS/TURBOPACK from #4084/#4092] + opencode-plugin-dist env flake; #4097 own test 3/3 green) * feat(mitm): loop-guard self-check + verbosity control in server.cjs (Gaps 14+15) (#4101) Integrated into release/v3.8.28 (rebased onto release — dropped the already-squash-merged #4084 commits; only the Gaps 14+15 loop-guard/verbosity delta remains) * feat(sse): generic 400 field-downgrade retry + Groq field stripping (#4096) Integrated into release/v3.8.28 * feat(providers): add Wafer AI (Anthropic-compatible, Bearer auth) (#4098) Integrated into release/v3.8.28 * chore(docs) * fix(responses): clear /v1/responses keepalive timer on cancel/abort (timer + CPU leak) (#4105) Integrated into release/v3.8.28 (r7). * perf(gemini): cache reasoning close-tag regex instead of recompiling per token (#4106) Integrated into release/v3.8.28 (r7). * fix(usage): reap orphaned pending-request details (unbounded memory leak) (#4107) Integrated into release/v3.8.28 (r7). * perf(stream): use structuredClone instead of JSON round-trip for per-chunk reasoning split (#4108) Integrated into release/v3.8.28 (r7). * fix(dashboard): restore Update Available banner with npm-binary-free version fallback (#4100) (#4112) getLatestNpmVersion() derived the latest version only from the npm CLI binary and returned null on any error, so Docker/desktop/locked-down installs without npm on PATH silently hid the home banner even when an update existed. Add resolveLatestVersion() (npm CLI -> registry HTTP fallback -> logged warning) and harden version parsing for v-prefix/pre-release strings. Extracted into testable src/lib/system/versionCheck.ts with TDD coverage. * fix(auth): prune expired entries from login brute-force guard map (unbounded growth) (#4111) Integrated into release/v3.8.28 (r8) * fix(logger): hard-cap the error-dedup map to bound memory under unique-message bursts (#4113) Integrated into release/v3.8.28 (r8) * fix(circuit-breaker): enforce MAX_REGISTRY_SIZE (declared but never applied) (#4114) Integrated into release/v3.8.28 (r8) * perf(obfuscation): cache per-word regexes instead of recompiling every request (#4109) Integrated into release/v3.8.28 (r8) * perf(registry): precompute model->provider index in parseModelFromRegistry (#4110) Integrated into release/v3.8.28 (r8) * fix(timers): unref background interval timers so they don't block clean shutdown (#4117) Integrated into release/v3.8.28 (r8) * fix(webhook): clear abort timer in finally to avoid dangling timers on fetch error (#4115) Integrated into release/v3.8.28 (r8) * fix(combo): detach per-target listener from shared hedge abort signal (#4116) Integrated into release/v3.8.28 (r8) * chore(release): finalize v3.8.28 CHANGELOG + reconcile env-doc contract - Build the complete [3.8.28] CHANGELOG section (55 bullets) covering every commit since v3.8.27, grouped by type with PR back-references and human contributor attribution (artickc's memory-leak/perf cluster, OrcaRouter, Wafer AI, MITM gaps, etc.); move the OrcaRouter bullet out of [Unreleased]. - Inject the EN [3.8.28] section into all 41 i18n CHANGELOG mirrors (parity). - Reconcile the env/docs contract: document MITM_IDLE_TIMEOUT_MS + MITM_VERBOSE in .env.example and ENVIRONMENT.md; allowlist the framework-internal TURBOPACK and the Claude Code ANTHROPIC_AUTH_TOKEN in check-env-doc-sync. - Fix 3 broken relative links in docs/providers/AGENTROUTER.md (regressed when the file was relocated this cycle) so docs-sync-strict passes. * fix(quality): treat test→test renames as relocations, not deletions The anti-test-masking gate's subcheck-1 collected deleted AND renamed test files via `--diff-filter=DR --name-only` and flagged every one as "deleted — human review required", contradicting its own documented contract ("DELETADOS ou renomeados-e-NÃO-substituídos"): a rename test→test IS a substitution (the test moved, coverage preserved). This false-positived on #4063's legitimate relocation of live-ws-startup.test.ts (unit/cli → integration, asserts 2→2) and would block every PR that relocates a test — surfacing only at release-day because the Fast QG (PR→release) doesn't run test-masking. The gate now parses `--name-status -M`: true deletions and test→non-test renames still flag; a test→test rename is run through the assert-reduction check across the move, so a clean relocation passes while gutting-via-rename (dropped asserts / new tautologies / skips) still fires. Adds partitionDeletedRenamed + 6 regression tests. --------- Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Demiurge The Single <megamen932@gmail.com> Co-authored-by: jinhaosong-source <jinhao.song@myflashcloud.com> Co-authored-by: diego-anselmo <contato@diegoanselmo.com.br> Co-authored-by: Felipe Almeman <4226997+zhiru@users.noreply.github.com> Co-authored-by: Rahul sharma <sharmaR0810@gmail.com> Co-authored-by: Chirag Singhal <76880977+chirag127@users.noreply.github.com> Co-authored-by: NOXX - Commiter <artur1992123@mail.ru> | 14 天前 |