Compaction and Branch Summaries
Compaction and branch summaries are the two mechanisms that keep long sessions usable without losing prior work context.
- Compaction rewrites old history into a summary on the current branch.
- Branch summary captures abandoned branch context during
/treenavigation.
Both are persisted as session entries and converted back into user-context messages when rebuilding LLM input.
Key implementation files
packages/agent/src/compaction/compaction.ts(context-full summarization and handoff generation)packages/agent/src/compaction/branch-summarization.tspackages/agent/src/compaction/pruning.tspackages/agent/src/compaction/utils.tspackages/agent/src/compaction/openai.tspackages/coding-agent/src/session/session-manager.tspackages/coding-agent/src/session/agent-session.tspackages/coding-agent/src/session/messages.tspackages/coding-agent/src/extensibility/hooks/types.tspackages/coding-agent/src/config/settings-schema.ts
Session entry model
Compaction and branch summaries are first-class session entries, not plain assistant/user messages.
CompactionEntrytype: "compaction"summary, optionalshortSummaryfirstKeptEntryId(compaction boundary)tokensBefore- optional
details,preserveData,fromExtension
BranchSummaryEntrytype: "branch_summary"fromId,summary- optional
details,fromExtension
When context is rebuilt (buildSessionContext):
- Latest compaction on the active path is converted to one
compactionSummarymessage. - Kept entries from
firstKeptEntryIdto the compaction point are re-included. - Later entries on the path are appended.
branch_summaryentries are converted tobranchSummarymessages.custom_messageentries are converted tocustommessages.
Those custom roles are then transformed into LLM-facing user messages in convertToLlm() using the static templates:
packages/agent/src/compaction/prompts/compaction-summary-context.mdpackages/agent/src/compaction/prompts/branch-summary-context.mdpackages/agent/src/compaction/prompts/handoff-document.md
Compaction pipeline
Triggers
Compaction/context maintenance can run in five ways:
- Manual context compaction:
/compact [instructions]callsAgentSession.compact(...). - Automatic overflow recovery: after a same-model assistant error that matches context overflow.
- Automatic incomplete-output recovery: after a same-model assistant message ends with
stopReason === "length"(OpenAI/Codexresponse.incomplete). - Automatic threshold maintenance: after a successful turn when context exceeds the resolved threshold.
- Idle maintenance:
runIdleCompaction()can invoke the same auto-maintenance path with reason"idle".
Compaction shape (visual)
Before compaction:
entry: 0 1 2 3 4 5 6 7 8 9
┌─────┬─────┬─────┬──────┬─────┬─────┬──────┬──────┬─────┬──────┐
│ hdr │ usr │ ass │ tool │ usr │ ass │ tool │ tool │ ass │ tool │
└─────┴─────┴─────┴──────┴─────┴─────┴──────┴──────┴─────┴──────┘
└────────┬───────┘ └──────────────┬──────────────┘
messagesToSummarize kept messages
↑
firstKeptEntryId (entry 4)
After compaction (new entry appended):
entry: 0 1 2 3 4 5 6 7 8 9 10
┌─────┬─────┬─────┬──────┬─────┬─────┬──────┬──────┬─────┬──────┬─────┐
│ hdr │ usr │ ass │ tool │ usr │ ass │ tool │ tool │ ass │ tool │ cmp │
└─────┴─────┴─────┴──────┴─────┴─────┴──────┴──────┴─────┴──────┴─────┘
└──────────┬──────┘ └──────────────────────┬───────────────────┘
not sent to LLM sent to LLM
↑
starts from firstKeptEntryId
What the LLM sees:
┌────────┬─────────┬─────┬─────┬──────┬──────┬─────┬──────┐
│ system │ summary │ usr │ ass │ tool │ tool │ ass │ tool │
└────────┴─────────┴─────┴─────┴──────┴──────┴─────┴──────┘
↑ ↑ └─────────────────┬────────────────┘
prompt from cmp messages from firstKeptEntryId
Overflow/incomplete recovery vs threshold/idle maintenance
The automatic paths are intentionally different:
-
Overflow recovery
- Trigger: current-model assistant error is detected as context overflow and the error is not older than the latest compaction.
- The failing assistant error message is removed from active agent state before retry.
- Context promotion is tried first; if a configured larger model is available, the agent switches model and retries without compacting.
- If promotion is unavailable and compaction is enabled, context-full compaction runs with
reason: "overflow"andwillRetry: true; handoff strategy is not used for overflow because the handoff request would reuse the overflowing input. - On success,
agent.continue()is scheduled to retry the turn.
-
Incomplete-output recovery
- Trigger: same-model assistant message ends with
stopReason === "length"and the message is not older than the latest compaction. - The incomplete assistant message is removed from active agent state before recovery.
- Context promotion is tried first.
- If promotion is unavailable and compaction is enabled, auto maintenance runs with
reason: "incomplete"andwillRetry: true. - Unlike overflow,
compaction.strategy: "handoff"is allowed for incomplete-output recovery because the input context is still usable. - On context-full success,
agent.continue()is scheduled to retry the turn.
- Trigger: same-model assistant message ends with
-
Threshold maintenance
- Trigger: successful, non-error assistant message whose adjusted context tokens exceed
resolveThresholdTokens(...). - Tool-output pruning can reduce the measured token count before threshold comparison.
- Context promotion is tried before compaction.
- If promotion is unavailable, auto maintenance runs with
reason: "threshold"andwillRetry: false. - With
compaction.strategy: "handoff", threshold maintenance normally schedules a post-prompt auto-handoff task instead of writing a compaction entry; pre-prompt checks run it inline to avoid racing the next turn. If handoff returns no document without aborting, it falls back to context-full compaction. - On success, if
compaction.autoContinue !== false, schedules an agent-authored developer auto-continue prompt fromprompts/system/auto-continue.md.
- Trigger: successful, non-error assistant message whose adjusted context tokens exceed
-
Idle maintenance
- Trigger:
runIdleCompaction()when not streaming or already compacting. - Uses
reason: "idle"and does not auto-continue afterward.
- Trigger:
Pre-compaction pruning
Before compaction checks, tool-result pruning may run (pruneToolOutputs).
Default prune policy:
- Protect newest
40_000tool-output tokens. - Require at least
20_000total estimated savings. - Never prune tool results from
skillorread.
Pruned tool results are replaced with:
[Output truncated - N tokens]
If pruning changes entries, session storage is rewritten and agent message state is refreshed before compaction decisions.
Boundary and cut-point logic
prepareCompaction() only considers entries since the last compaction entry (if any).
- Find previous compaction index.
- Compute
boundaryStart = prevCompactionIndex + 1. - Adapt
keepRecentTokensusing measured usage ratio when available. - Run
findCutPoint()over the boundary window.
Valid cut points include:
- message entries with roles:
user,assistant,bashExecution,hookMessage,branchSummary,compactionSummary custom_messageentriesbranch_summaryentries
Hard rule: never cut at toolResult.
If there are non-message metadata entries immediately before the cut point (model_change, thinking_level_change, labels, etc.), they are pulled into the kept region by moving cut index backward until a message or compaction boundary is hit.
Split-turn handling
If cut point is not at a user-turn start, compaction treats it as a split turn.
Turn start detection treats these as user-turn boundaries:
message.role === "user"message.role === "bashExecution"custom_messageentrybranch_summaryentry
Split-turn compaction generates two summaries:
- History summary (
messagesToSummarize) - Turn-prefix summary (
turnPrefixMessages)
Final stored summary is merged as:
<history summary>
---
**Turn Context (split turn):**
<turn prefix summary>
Summary generation
compact(...) builds summaries from serialized conversation text:
- Convert messages via
convertToLlm(). - Serialize with
serializeConversation(). - Wrap in
<conversation>...</conversation>. - Optionally include
<previous-summary>...</previous-summary>. - Optionally inject extension hook context and active memory-backend compaction context as
<additional-context>entries. - Execute summarization prompt with
SUMMARIZATION_SYSTEM_PROMPT.
Prompt selection:
- first compaction:
compaction-summary.md - iterative compaction with prior summary:
compaction-update-summary.md - split-turn second pass:
compaction-turn-prefix.md - short UI summary:
compaction-short-summary.md - handoff document:
handoff-document.md(used bygenerateHandoff(...), not serialized compaction)
Remote summarization modes:
- If
compaction.remoteEndpointis set and remote compaction is enabled, local summary generation POSTs:{ systemPrompt, prompt }
- Expects JSON containing at least
{ summary }. - For OpenAI/OpenAI Codex models, compaction first tries the provider-native
/responses/compactendpoint when remote compaction is enabled. It preserves provider replacement history inpreserveData.openaiRemoteCompactionand falls back to local summarization if that native request fails.
Handoff generation
packages/agent/src/compaction/compaction.ts also exports generateHandoff(...). Handoff generation uses the same completeSimple(...) oneshot style as summarization, but it preserves the live agent cache prefix by sending the active system prompt, tool array, and real LLM message history, then appending one agent-attributed user message containing the handoff prompt. It forces toolChoice: "none" and returns joined text blocks directly.
Handoff does not write a CompactionEntry. AgentSession.handoff() owns the session transition: it starts a new session, injects the generated document as a visible custom_message with customType: "handoff", and rebuilds agent messages from that new session.
File-operation context in summaries
Compaction tracks cumulative file activity using assistant tool calls:
read(path)→ read setwrite(path)→ modified setedit(path)→ modified set
Cumulative behavior:
- Includes prior compaction details only when prior entry is pi-generated (
fromExtension !== true). - In split turns, includes turn-prefix file ops too.
readFilesexcludes files also modified.
Summary text gets file tags appended via prompt template:
<read-files>
...
</read-files>
<modified-files>
...
</modified-files>
Persist and reload
After summary generation (or hook-provided summary), agent session:
- Appends
CompactionEntrywithappendCompaction(...)for context-full maintenance; handoff strategy creates a new session and injects a handoffcustom_messageinstead. - Rebuilds display context from the active leaf via
buildDisplaySessionContext(). - Replaces live agent messages with rebuilt context.
- Synchronizes active todo phases from the rebuilt branch and closes provider sessions whose history was rewritten.
- Emits
session_compacthook event.
Branch summarization pipeline
Branch summarization is tied to tree navigation, not token overflow.
Trigger
During navigateTree(...):
- Compute abandoned entries from old leaf to common ancestor using
collectEntriesForBranchSummary(...). - If caller requested summary (
options.summarize), generate summary before switching leaf. - If summary exists, attach it at the navigation target using
branchWithSummary(...).
Operationally this is commonly driven by /tree flow when branchSummary.enabled is enabled.
Branch switch shape (visual)
Tree before navigation:
┌─ B ─ C ─ D (old leaf, being abandoned)
A ───┤
└─ E ─ F (target)
Common ancestor: A
Entries to summarize: B, C, D
After navigation with summary:
┌─ B ─ C ─ D ─ [summary of B,C,D]
A ───┤
└─ E ─ F (new leaf)
Preparation and token budget
generateBranchSummary(...) computes budget as:
tokenBudget = model.contextWindow - branchSummary.reserveTokens
prepareBranchEntries(...) then:
- First pass: collect cumulative file ops from all summarized entries, including prior pi-generated
branch_summarydetails. - Second pass: walk newest → oldest, adding messages until token budget is reached.
- Prefer preserving recent context.
- May still include large summary entries near budget edge for continuity.
Compaction entries are included as messages (compactionSummary) during branch summarization input.
Summary generation and persistence
Branch summarization:
- Converts and serializes selected messages.
- Wraps in
<conversation>. - Uses custom instructions if supplied, otherwise
branch-summary.md. - Calls summarization model with
SUMMARIZATION_SYSTEM_PROMPT. - Prepends
branch-summary-preamble.md. - Appends file-operation tags.
Result is stored as BranchSummaryEntry with optional details (readFiles, modifiedFiles).
Extension and hook touchpoints
session_before_compact
Pre-compaction hook.
Can:
- cancel compaction (
{ cancel: true }) - provide full custom compaction payload (
{ compaction: CompactionResult })
session.compacting
Prompt/context customization hook for default compaction.
Can return:
prompt(override base summary prompt)context(extra context lines injected into<additional-context>)preserveData(stored on compaction entry)
session_compact
Post-compaction notification with saved compactionEntry and fromExtension flag.
session_before_tree
Runs on tree navigation before default branch summary generation.
Can:
- cancel navigation
- provide custom
{ summary: { summary, details } }used when user requested summarization
session_tree
Post-navigation event exposing new/old leaf and optional summary entry.
Runtime behavior and failure semantics
- Manual compaction aborts current agent operation first.
abortCompaction()cancels manual compaction, auto-compaction, and handoff generation controllers.- Auto compaction emits start/end session events for UI/state updates.
- Auto compaction can try multiple model candidates and retry transient failures; long retry delays prefer the next candidate when one is available.
- Overflow errors are excluded from generic retry path because they are handled by context promotion/compaction.
- If auto-compaction fails:
- overflow path emits
Context overflow recovery failed: ... - incomplete-output path emits
Incomplete response recovery failed: ... - threshold/idle paths emit
Auto-compaction failed: ...
- overflow path emits
- Branch summarization can be cancelled via abort signal (e.g., Escape), returning canceled/aborted navigation result.
Settings and defaults
From settings-schema.ts:
compaction.enabled=truecompaction.strategy="context-full"("handoff"and"off"are also supported)compaction.reserveTokens=16384compaction.keepRecentTokens=20000compaction.autoContinue=truecompaction.remoteEnabled=truecompaction.remoteEndpoint=undefinedcompaction.thresholdPercent=-1andcompaction.thresholdTokens=-1; when no positive override is set, the threshold iscontextWindow - max(15% of contextWindow, reserveTokens)compaction.idleEnabled=falsecompaction.idleThresholdTokens=200000compaction.idleTimeoutSeconds=300branchSummary.enabled=falsebranchSummary.reserveTokens=16384
These values are consumed at runtime by AgentSession and compaction/branch summarization modules.