Reasonix Architecture
Design philosophy
Reasonix is opinionated, not general. Every abstraction is justified by a DeepSeek-specific behavior or economic property. If it's generic, we don't ship it.
The product north star: coding agent that stays cheap enough to leave on. A tool that quietly burns $200/month on a background project is one nobody uses. Every subsystem below is answerable to that goal.
The four pillars
Pillar 1 — Cache-First Loop
Problem. DeepSeek bills cached input at ~10% of the miss rate. Automatic prefix caching activates only when the exact byte prefix of the previous request matches. Most agent loops reorder, rewrite, or inject fresh timestamps each turn — cache hit rate in practice: <20%.
Solution. Partition the context into three regions:
┌─────────────────────────────────────────┐
│ IMMUTABLE PREFIX │ ← fixed for session
│ system + tool_specs + few_shots │ cache hit candidate
├─────────────────────────────────────────┤
│ APPEND-ONLY LOG │ ← grows monotonically
│ [assistant₁][tool₁][assistant₂]... │ preserves prefix of prior turns
├─────────────────────────────────────────┤
│ VOLATILE SCRATCH │ ← reset each turn
│ R1 thought, transient plan state │ never sent upstream
└─────────────────────────────────────────┘
Invariants:
- Prefix is computed once per session, hashed, and pinned.
- Log entries are serialized in append order; no rewrites.
- Scratch is distilled via Pillar 2 before any information from it is folded into the log.
Metric. prompt_cache_hit_tokens / (hit + miss) exposed per-turn and
aggregated per-session. Visible in the TUI's top-bar cache cell.
Parallel tool dispatch
Each tool declares parallelSafe?: boolean (default false). The loop
dispatcher groups consecutive parallel-safe calls into chunks and races
them via Promise.allSettled; the first non-parallel-safe call ends the
chunk and runs alone (serial barrier — read-after-write order
preserved). Tool-result yields and history append still land in declared
order regardless of which call settles first, so the model sees the
same shape it would under a fully serial dispatch.
| Env var | Default | Effect |
|---|---|---|
REASONIX_PARALLEL_MAX |
3 (hard cap 16) |
Max chunk size. |
REASONIX_TOOL_DISPATCH=serial |
unset | Forces serial dispatch — escape hatch. |
Built-in opt-ins: read-only filesystem (read_file, list_directory,
directory_tree, search_files, search_content, get_file_info),
web (web_search, web_fetch), recall_memory, semantic_search,
isolated child loops (run_skill, spawn_subagent), in-memory job
queries (job_output, list_jobs). Mutating / side-effecting tools
stay default. MCP-bridged tools default false — third-party tools
opt in only when the server explicitly declares parallel safety.
Pillar 2 — Tool-Call Repair
Problem. Empirical DeepSeek failure modes:
- Tool-call JSON emitted inside
<think>, missing from the final message. - Arguments dropped when schema has >10 params or deeply nested objects.
- Same tool called repeatedly with identical args (call-storm).
- Truncated JSON due to
max_tokenshit mid-structure.
Solution. Four passes:
flatten— schemas with >10 leaf params or depth >2 are auto-detected onToolRegistry.register()and presented to the model in dot-notation form.dispatch()re-nests the args before calling the user'sfn.scavenge— regex + JSON parser sweepsreasoning_contentfor any tool call the model forgot to emit intool_calls.truncation— detect unbalanced JSON and repair by closing braces or requesting a continuation completion.storm— identical(tool, args)tuple within a sliding window → suppress the call, inject a reflection turn.
Pillar 3 — Cost Control (v0.6)
Problem. Coding agents that default to the frontier model (v4-pro, ~12× flash cost) and accumulate full tool results in context are $150-$250/month for active users. Most turns don't need frontier reasoning; most sessions re-pay for tool results that were only useful once.
Solution. Four complementary mechanisms, none of which require manual tuning in the common case:
4.1 Tiered defaults (flash-first)
The three presets trade model tier and reasoning effort:
| Preset | Model | Effort | Cost |
|---|---|---|---|
flash |
v4-flash |
max |
1× |
auto (default) |
v4-flash → v4-pro on hard turns |
max |
1–3× |
pro |
v4-pro |
max |
~12× |
All auxiliary calls — forceSummaryAfterIterLimit, subagent spawns,
truncation repair retries — hard-code v4-flash + effort=high regardless
of the user's preset. There's no reason to pay pro rates for "paraphrase
these tool results into prose" or for an explore subagent's grep chain.
4.2 Turn-end auto-compaction
Every tool result in the log exceeding TURN_END_RESULT_CAP_TOKENS (3000)
is shrunk to that cap when a turn ends. The model had the full text for
the turn that read it; subsequent turns see a compact summary and can
re-read if needed. One extra read_file call is vastly cheaper than
dragging 12KB through every future prompt.
A proactive 40% context-ratio threshold runs the same shrink pre-emptively inside long multi-iter turns before the 80% emergency threshold fires.
4.3 Model selection (/model)
Users switch between flash and pro via /model flash or /model pro
(persistent — applies to every turn until changed). Model can also be
set in .reasonix/settings.json under the model key. No one-shot
arming; no forgotten revert risk when switching is explicit and sticky.
History. Pre-0.50.0,
/prooffered single-turn arming — type/pro, the next turn ran on v4-pro then auto-disarmed. Removed in 0.50.0 (#1657, #1630) when presets were collapsed into direct model selection.
4.4 Model self-report escalation (<<<NEEDS_PRO>>>)
The model itself decides when a task exceeds its current tier. If a task
clearly needs stronger reasoning, the model emits a <<<NEEDS_PRO>>>
marker as the first line of its response. The system aborts the current
flash call and retries the turn on pro. Two forms:
<<<NEEDS_PRO>>>— bare marker, no rationale.<<<NEEDS_PRO: <reason>>>>— includes a one-sentence rationale the user sees in a warning row.
On the pro tier, the marker is a no-op — pro is the top, so the contract tells the model it can't escalate further. This is purely self-report: there is no failure-counter threshold, no scavenge/storm counting, no automatic escalation based on tool errors.
Cost transparency
Per-turn and session cost are colored in the StatsPanel:
turn $0.003— green <$0.05, yellow $0.05–0.20, red ≥$0.20session $0.12— same scale ×10
Module layout
src/
├── client.ts # DeepSeek client (fetch + SSE)
├── loop.ts # Pillar 1 + 3 — CacheFirstLoop
├── repair/ # Pillar 2 pipeline
│ ├── index.ts
│ ├── scavenge.ts
│ ├── flatten.ts
│ ├── truncation.ts
│ └── storm.ts
├── prompt-fragments.ts # TUI_FORMATTING_RULES, NEGATIVE_CLAIM_RULE —
│ # reused by main + subagent + skill prompts
├── code/prompt.ts # reasonix code main system prompt
├── tools/ # Tool implementations
│ ├── filesystem.ts # read / list / search / edit / write
│ ├── shell.ts # run_command + run_background (JobRegistry)
│ ├── jobs.ts # background-process registry
│ ├── memory.ts # remember / forget / list user memories
│ ├── skills.ts # list + invoke SKILL.md playbooks
│ ├── subagent.ts # spawn_subagent — flash+high by default
│ ├── plan.ts # submit_plan (review gate)
│ └── web.ts # web_search, web_fetch (multi-engine: Bing, Baidu, SearXNG, Metaso and API engines)
├── mcp/ # MCP client + bridge (stdio + SSE)
├── memory.ts # ImmutablePrefix / AppendOnlyLog / VolatileScratch
├── project-memory.ts # REASONIX.md loader
├── user-memory.ts # ~/.reasonix/memory/ store (project + global)
├── skills.ts # built-in explore + research skills
├── session.ts # JSONL session persistence
├── telemetry.ts # cost + cache-hit accounting + SessionSummary
├── tokenizer.ts # DeepSeek V3 tokenizer (ported)
├── usage.ts # ~/.reasonix/usage.jsonl roll-up
├── types.ts # ChatMessage, ToolCall, ToolSpec
├── index.ts # library barrel
└── cli/
├── index.ts # commander entry
├── resolve.ts # config + CLI flag precedence
├── commands/ # chat, code, run, stats, sessions, ...
└── ui/
├── App.tsx # root Ink component (~1984 LOC, was 2931)
├── LiveRows.tsx # spinner rows (OngoingTool / Status / ...)
├── EventLog.tsx # Historical row rendering
├── StatsPanel.tsx # top bar + cost badges
├── PromptInput.tsx # cursor-aware multi-line input
├── PlanConfirm.tsx # submit_plan review modal
├── ShellConfirm.tsx # run_command approval modal
├── EditConfirm.tsx # per-edit review modal
├── markdown.tsx # Ink-native markdown renderer
├── edit-history.ts # EditHistoryEntry + formatters
├── useEditHistory.ts # /undo, /history, /show state machine
├── useCompletionPickers.ts # slash, @, slash-arg pickers
├── useSessionInfo.ts # balance + models + updates fetch
├── useSubagent.ts # subagent sink wiring
└── slash/ # /-command implementation
├── types.ts # SlashContext, SlashResult, ...
├── commands.ts # SLASH_COMMANDS data + parse + suggest
├── helpers.ts # git, memory, token formatters
├── dispatch.ts # registry + handleSlash lookup
└── handlers/ # per-topic: basic, mcp, memory,
# skill, admin, observability, edits,
# jobs, sessions, model (/pro lives here)
Files kept small by design: the largest module under cli/ui/ is 2K
lines (App.tsx), every handler under slash/handlers/ is ≤200 lines,
every hook under cli/ui/ is ≤310 lines. Adding a new slash command
means editing one handler file and one registry line.
Design evolution
- v0.0.x — Pillar 1 end-to-end, repair pipeline complete, Ink TUI scaffold.
- v0.1 — τ-bench numbers published, streaming polish, transcript replay.
- v0.3 — MCP client (stdio + SSE), session persistence.
- v0.4.x —
reasonix codewith SEARCH/REPLACE edits, review/auto gate, background jobs, hooks. - v0.5.x — V4 model support, skills, memory, subagents, actionable error messages.
- v0.6 —
- Cost control (flash-first defaults, auto-compaction,
/proone-shot, failure-triggered escalation, cost badges). deepseek-chat/deepseek-reasonerscheduled for deprecation — all user-facing surfaces updated tov4-flash/v4-pro.- Shared prompt fragments (
TUI_FORMATTING_RULES,NEGATIVE_CLAIM_RULE). - UI refactor: App.tsx split into 6 hooks/components, slash.ts split into 13 per-topic modules.
- Cost control (flash-first defaults, auto-compaction,
- v0.31 (current) —
branch+harvestfeatures removed entirely (the parallel-sample selector and Pillar 2 plan-state extractor); both rarely paid for themselves and bloated the slash surface.
Explicit non-goals
- Multi-agent orchestration as a first-class concept (subagents are a cost-reduction mechanism, not a coordination primitive).
- RAG / vector retrieval.
- Support for non-DeepSeek backends (an OpenAI-compatible shim would
work today via
--modeloverride, but is not tested). - Web UI / SaaS.
- Automatic cost escalation without user-visible announcement. Every pro-tier model call is surfaced; silent escalation was considered and rejected.