文件最后提交记录最后更新时间
fix(preflight): also gate on JSON body bytes, not just tokens (#1451) A user reported a cryptic `DeepSeek 400: messages[156].content: unexpected end of hex escape at line 1 column 884819` mid-session. Reproducing against real session logs: Node's JSON.stringify can't emit a malformed \u escape, and no lone surrogates exist in any of 10 saved sessions — the gateway is truncating the request body mid-escape because it exceeds the DeepSeek API's body-byte limit, which is independent of the 1 M-token context window the existing preflight checks. Token preflight stays at 95% of ctxMax. New byte preflight trips when the stringified messages array exceeds MAX_BODY_BYTES (700 KB, ~80% of the observed 884 KB failure point). Either signal triggers the existing mechanicalTruncate path; the truncate now bounds output by tokens AND bytes so byte-triggered folds actually shrink the body. The trigger field on PreflightDecision records which signal fired so follow-up diagnostics (and the user-facing warning, now showing body KB) can distinguish the two paths. Two tests/loop.test.ts fold cases used 2 MB seed logs and a 1 M-token ctxMax to verify fold tier behavior. With the new 700 KB byte gate the preflight emptied those logs before fold could run; switched both to a 100 K test ctx + proportionally scaled seed sizes so they exercise fold tiers within both windows. Two reusable tools added under tools/ (analyze-session-body.mjs, scan-all-sessions.mjs) so future "DeepSeek 400 from a long session" reports can be diagnosed from the user's local session files without guessing. Co-authored-by: reasonix <reasonix@deepseek.com>8 天前
fix(context): align fold summary prefix with main agent for cache reuse (#1565) * chore(release): 0.49.0 — static-history TUI, queued steers, Bing default, lifecycle plans Headline themes: - TUI: Static-history renderer is the only path; virtual-viewport layers removed (#1529 stages 1-4) - Chat: queued mid-turn steer handling so input mid-render doesn't drop or fight the live frame (#1501) - Web search: default switches to Bing; dashboard engine switcher; Mojeek dropped (#1558) - Plans: lifecycle evidence summaries surface why a plan is ready to accept (#1500) - Desktop: native OS notifications for approvals + completion (#1519) - i18n: CLI command output (/mcp /sessions /prune /theme) + approval-prompt labels translated (#1524, #1560) - Security: SSRF block in web_fetch (#1544), edit-snapshot path containment (#1454), shell redirect sandbox (#1457), Task integrity guardrail (#1516) - Tools: per-turn dispatch-rate limit (#1356); run_command discourages shell-based edits (#1514) - Client: DeepSeek 429 → concurrency-limit hint (#1526); timeoutMs honored with AbortSignal (#1535); --no-proxy opt-out for direct route (#1507) - Files: read/edit/restore preserves source encoding (GB18030 / UTF-8 BOM) (#1518) - Context: pinned constraints survive folds + full tail capture (#1515, #1552) - Refactor: lifecycle risk policy extracted into its own module (#1557) See CHANGELOG for the full list. * fix(context): align fold summary prefix with main agent for cache reuse The summarizer call was sending a bespoke "You compress conversation history" system prompt and no tools, guaranteeing a 0% cache hit against the main agent's just-cached prefix. Reshape the request so system + tools + head bytes mirror the live agent's last call — the only novel bytes are the trailing summarize instruction. Skill-pin handling now collects bodies read-only instead of stubbing mid-head, so the cache prefix stays unbroken. The summarize instruction names pinned skills so the model knows not to paraphrase their bodies (which we append verbatim regardless). Measured on a real session at 48.7K prompt tokens: OLD shape: 0.0% cache hit → $0.145 per fold NEW shape: 99.6% cache hit → $0.015 per fold saving: 89.6% per fold * tools: add fold-cache shape + live benchmarks bench-fold-cache-shape.mjs replays real session jsonls, simulates OLD vs NEW summary-call shapes at the fold point, and reports byte-level shared-prefix with the main agent's preceding request. Pure local — no API required. bench-fold-cache-live.mjs sends one priming + two summary calls to DeepSeek and reports prompt_cache_hit_tokens / cost for each shape. Used to confirm the shape change actually translates to API-side cache hits. --------- Co-authored-by: reasonix <reasonix@deepseek.com>7 天前
fix(context): align fold summary prefix with main agent for cache reuse (#1565) * chore(release): 0.49.0 — static-history TUI, queued steers, Bing default, lifecycle plans Headline themes: - TUI: Static-history renderer is the only path; virtual-viewport layers removed (#1529 stages 1-4) - Chat: queued mid-turn steer handling so input mid-render doesn't drop or fight the live frame (#1501) - Web search: default switches to Bing; dashboard engine switcher; Mojeek dropped (#1558) - Plans: lifecycle evidence summaries surface why a plan is ready to accept (#1500) - Desktop: native OS notifications for approvals + completion (#1519) - i18n: CLI command output (/mcp /sessions /prune /theme) + approval-prompt labels translated (#1524, #1560) - Security: SSRF block in web_fetch (#1544), edit-snapshot path containment (#1454), shell redirect sandbox (#1457), Task integrity guardrail (#1516) - Tools: per-turn dispatch-rate limit (#1356); run_command discourages shell-based edits (#1514) - Client: DeepSeek 429 → concurrency-limit hint (#1526); timeoutMs honored with AbortSignal (#1535); --no-proxy opt-out for direct route (#1507) - Files: read/edit/restore preserves source encoding (GB18030 / UTF-8 BOM) (#1518) - Context: pinned constraints survive folds + full tail capture (#1515, #1552) - Refactor: lifecycle risk policy extracted into its own module (#1557) See CHANGELOG for the full list. * fix(context): align fold summary prefix with main agent for cache reuse The summarizer call was sending a bespoke "You compress conversation history" system prompt and no tools, guaranteeing a 0% cache hit against the main agent's just-cached prefix. Reshape the request so system + tools + head bytes mirror the live agent's last call — the only novel bytes are the trailing summarize instruction. Skill-pin handling now collects bodies read-only instead of stubbing mid-head, so the cache prefix stays unbroken. The summarize instruction names pinned skills so the model knows not to paraphrase their bodies (which we append verbatim regardless). Measured on a real session at 48.7K prompt tokens: OLD shape: 0.0% cache hit → $0.145 per fold NEW shape: 99.6% cache hit → $0.015 per fold saving: 89.6% per fold * tools: add fold-cache shape + live benchmarks bench-fold-cache-shape.mjs replays real session jsonls, simulates OLD vs NEW summary-call shapes at the fold point, and reports byte-level shared-prefix with the main agent's preceding request. Pure local — no API required. bench-fold-cache-live.mjs sends one priming + two summary calls to DeepSeek and reports prompt_cache_hit_tokens / cost for each shape. Used to confirm the shape change actually translates to API-side cache hits. --------- Co-authored-by: reasonix <reasonix@deepseek.com>7 天前
perf(ui): index-backed mutateCard + O(n) plan.drop + elide cursor (#1769) Three hot-path fixes in the cards reducer measured to cut per-event cost dramatically on long sessions: - mutateCard now does a Map<CardId, number> lookup instead of a full state.cards.findIndex scan. Each streaming/reasoning/tool chunk used to walk the whole transcript; on a 2000-card session that cost ~12.7ms per 1000 chunks, now ~1.7ms (86% faster). - plan.drop replaces a .map() containing a .slice(i+1).some() with a single backward scan to find the last active plan, then one in-place patch. 1000-card session: ~65ms -> ~0.34ms per drop (99.5% faster). - elideOldCardContent now starts from a persistent elideCursor that advances past stubbed and immutable-kind cards, so repeated appends no longer re-scan all prior elision-zone cards. 500 appends on a 1000-card session: ~3.1ms -> ~1.6ms (48% faster). The cardIndex map is mutated in place on append; structural changes (session.fork, session.reset) rebuild it. tools/bench-reducer-hotpath.mjs keeps the inlined old vs new comparison as a regression check. All 3694 existing reducer / memory-budget / hydrate tests pass. Co-authored-by: reasonix <reasonix@deepseek.com>4 天前
refactor(context-manager): drop preflight, fold once at turn start (#1646) Follow-up to #1642. After landing the fold-first preflight, the next question was whether preflight needs to exist at all — and on a 1M-context provider it doesn't: post-response decideAfterUsage already folds at 75%, upstream tool-result caps prevent single-message blowups, and the byte ceiling the preflight was originally guarding against is gone. Converge on the Claude-Code-style single compaction path: one fold check per turn, at turn start. - Delete decidePreflight, mechanicalTruncate, related constants, the PreflightDecision interface, and the per-iter preflight block in step(). - Add a single turn-start check after the user message is appended: if local request estimate > TURN_START_FOLD_THRESHOLD (90%), fold once before the iter loop. - No mechanical fallback. If fold can't shrink the log, the request goes out and DeepSeek's error surfaces to the user — honest beats silent re-compaction with worse semantics. - Drop preflight i18n keys + delete tests/preflight.test.ts. Net -393 lines. E2E (live DeepSeek) green across baseline, high-ratio terminal turn, and turn-start fold scenarios.5 天前
refactor(context-manager): preflight folds first, drop obsolete byte ceiling (#1642) Live probe shows DeepSeek's gateway accepts at least 8MB request bodies — the empirical ~880KB ceiling MAX_BODY_BYTES guarded against no longer exists. Remove the byte path from decidePreflight + mechanicalTruncate and converge on a single fold-first pipeline: preflight tries semantic fold first, mechanical truncate is last-resort only when fold cannot summarize. fold() per-message token estimate now counts tool_calls JSON so heavy args can't slip past the tail-budget check. New requireTailBoundary option (preflight-only) prevents fold from wiping an active tool turn. Preflight sets _foldedThisTurn to block decideAfterUsage from re-folding.5 天前
fix(preflight): also gate on JSON body bytes, not just tokens (#1451) A user reported a cryptic `DeepSeek 400: messages[156].content: unexpected end of hex escape at line 1 column 884819` mid-session. Reproducing against real session logs: Node's JSON.stringify can't emit a malformed \u escape, and no lone surrogates exist in any of 10 saved sessions — the gateway is truncating the request body mid-escape because it exceeds the DeepSeek API's body-byte limit, which is independent of the 1 M-token context window the existing preflight checks. Token preflight stays at 95% of ctxMax. New byte preflight trips when the stringified messages array exceeds MAX_BODY_BYTES (700 KB, ~80% of the observed 884 KB failure point). Either signal triggers the existing mechanicalTruncate path; the truncate now bounds output by tokens AND bytes so byte-triggered folds actually shrink the body. The trigger field on PreflightDecision records which signal fired so follow-up diagnostics (and the user-facing warning, now showing body KB) can distinguish the two paths. Two tests/loop.test.ts fold cases used 2 MB seed logs and a 1 M-token ctxMax to verify fold tier behavior. With the new 700 KB byte gate the preflight emptied those logs before fold could run; switched both to a 100 K test ctx + proportionally scaled seed sizes so they exercise fold tiers within both windows. Two reusable tools added under tools/ (analyze-session-body.mjs, scan-all-sessions.mjs) so future "DeepSeek 400 from a long session" reports can be diagnosed from the user's local session files without guessing. Co-authored-by: reasonix <reasonix@deepseek.com>8 天前