Compaction and Branch Summaries

Compaction and branch summaries are the two mechanisms that keep long sessions usable without losing prior work context.

Compaction rewrites old history into a summary on the current branch.
Branch summary captures abandoned branch context during /tree navigation.

Both are persisted as session entries and converted back into user-context messages when rebuilding LLM input.

Key implementation files

packages/agent/src/compaction/compaction.ts (context-full summarization and handoff generation)
packages/agent/src/compaction/branch-summarization.ts
packages/agent/src/compaction/pruning.ts
packages/agent/src/compaction/utils.ts
packages/agent/src/compaction/openai.ts
packages/coding-agent/src/session/session-manager.ts
packages/coding-agent/src/session/agent-session.ts
packages/coding-agent/src/session/messages.ts
packages/coding-agent/src/extensibility/hooks/types.ts
packages/coding-agent/src/config/settings-schema.ts

Session entry model

Compaction and branch summaries are first-class session entries, not plain assistant/user messages.

CompactionEntry
- type: "compaction"
- summary, optional shortSummary
- firstKeptEntryId (compaction boundary)
- tokensBefore
- optional details, preserveData, fromExtension
BranchSummaryEntry
- type: "branch_summary"
- fromId, summary
- optional details, fromExtension

When context is rebuilt (buildSessionContext):

Latest compaction on the active path is converted to one compactionSummary message.
Kept entries from firstKeptEntryId to the compaction point are re-included.
Later entries on the path are appended.
branch_summary entries are converted to branchSummary messages.
custom_message entries are converted to custom messages.

Those custom roles are then transformed into LLM-facing user messages in convertToLlm() using the static templates:

packages/agent/src/compaction/prompts/compaction-summary-context.md
packages/agent/src/compaction/prompts/branch-summary-context.md
packages/agent/src/compaction/prompts/handoff-document.md

Compaction pipeline

Triggers

Compaction/context maintenance can run in five ways:

Manual context compaction: /compact [instructions] calls AgentSession.compact(...).
Automatic overflow recovery: after a same-model assistant error that matches context overflow.
Automatic incomplete-output recovery: after a same-model assistant message ends with stopReason === "length" (OpenAI/Codex response.incomplete).
Automatic threshold maintenance: after a successful turn when context exceeds the resolved threshold.
Idle maintenance: runIdleCompaction() can invoke the same auto-maintenance path with reason "idle".

Compaction shape (visual)

Before compaction:

  entry:  0     1     2     3      4     5     6      7      8     9
        ┌─────┬─────┬─────┬──────┬─────┬─────┬──────┬──────┬─────┬──────┐
        │ hdr │ usr │ ass │ tool │ usr │ ass │ tool │ tool │ ass │ tool │
        └─────┴─────┴─────┴──────┴─────┴─────┴──────┴──────┴─────┴──────┘
                └────────┬───────┘ └──────────────┬──────────────┘
               messagesToSummarize            kept messages
                                   ↑
                          firstKeptEntryId (entry 4)

After compaction (new entry appended):

  entry:  0     1     2     3      4     5     6      7      8     9      10
        ┌─────┬─────┬─────┬──────┬─────┬─────┬──────┬──────┬─────┬──────┬─────┐
        │ hdr │ usr │ ass │ tool │ usr │ ass │ tool │ tool │ ass │ tool │ cmp │
        └─────┴─────┴─────┴──────┴─────┴─────┴──────┴──────┴─────┴──────┴─────┘
               └──────────┬──────┘ └──────────────────────┬───────────────────┘
                 not sent to LLM                    sent to LLM
                                                         ↑
                                              starts from firstKeptEntryId

What the LLM sees:

  ┌────────┬─────────┬─────┬─────┬──────┬──────┬─────┬──────┐
  │ system │ summary │ usr │ ass │ tool │ tool │ ass │ tool │
  └────────┴─────────┴─────┴─────┴──────┴──────┴─────┴──────┘
       ↑         ↑      └─────────────────┬────────────────┘
    prompt   from cmp          messages from firstKeptEntryId

Overflow/incomplete recovery vs threshold/idle maintenance

The automatic paths are intentionally different:

Overflow recovery
- Trigger: current-model assistant error is detected as context overflow and the error is not older than the latest compaction.
- The failing assistant error message is removed from active agent state before retry.
- Context promotion is tried first; if a configured larger model is available, the agent switches model and retries without compacting.
- If promotion is unavailable and compaction is enabled, context-full compaction runs with reason: "overflow" and willRetry: true; handoff strategy is not used for overflow because the handoff request would reuse the overflowing input.
- On success, agent.continue() is scheduled to retry the turn.
Incomplete-output recovery
- Trigger: same-model assistant message ends with stopReason === "length" and the message is not older than the latest compaction.
- The incomplete assistant message is removed from active agent state before recovery.
- Context promotion is tried first.
- If promotion is unavailable and compaction is enabled, auto maintenance runs with reason: "incomplete" and willRetry: true.
- Unlike overflow, compaction.strategy: "handoff" is allowed for incomplete-output recovery because the input context is still usable.
- On context-full success, agent.continue() is scheduled to retry the turn.
Threshold maintenance
- Trigger: successful, non-error assistant message whose adjusted context tokens exceed resolveThresholdTokens(...).
- Tool-output pruning can reduce the measured token count before threshold comparison.
- Context promotion is tried before compaction.
- If promotion is unavailable, auto maintenance runs with reason: "threshold" and willRetry: false.
- With compaction.strategy: "handoff", threshold maintenance normally schedules a post-prompt auto-handoff task instead of writing a compaction entry; pre-prompt checks run it inline to avoid racing the next turn. If handoff returns no document without aborting, it falls back to context-full compaction.
- On success, if compaction.autoContinue !== false, schedules an agent-authored developer auto-continue prompt from prompts/system/auto-continue.md.
Idle maintenance
- Trigger: runIdleCompaction() when not streaming or already compacting.
- Uses reason: "idle" and does not auto-continue afterward.

Pre-compaction pruning

Before compaction checks, tool-result pruning may run (pruneToolOutputs).

Default prune policy:

Protect newest 40_000 tool-output tokens.
Require at least 20_000 total estimated savings.
Never prune tool results from skill or read.

Pruned tool results are replaced with:

[Output truncated - N tokens]

If pruning changes entries, session storage is rewritten and agent message state is refreshed before compaction decisions.

Boundary and cut-point logic

prepareCompaction() only considers entries since the last compaction entry (if any).

Find previous compaction index.
Compute boundaryStart = prevCompactionIndex + 1.
Adapt keepRecentTokens using measured usage ratio when available.
Run findCutPoint() over the boundary window.

Valid cut points include:

message entries with roles: user, assistant, bashExecution, hookMessage, branchSummary, compactionSummary
custom_message entries
branch_summary entries

Hard rule: never cut at toolResult.

If there are non-message metadata entries immediately before the cut point (model_change, thinking_level_change, labels, etc.), they are pulled into the kept region by moving cut index backward until a message or compaction boundary is hit.

Split-turn handling

If cut point is not at a user-turn start, compaction treats it as a split turn.

Turn start detection treats these as user-turn boundaries:

message.role === "user"
message.role === "bashExecution"
custom_message entry
branch_summary entry

Split-turn compaction generates two summaries:

History summary (messagesToSummarize)
Turn-prefix summary (turnPrefixMessages)

Final stored summary is merged as:

<history summary>

---

**Turn Context (split turn):**

<turn prefix summary>

Summary generation

compact(...) builds summaries from serialized conversation text:

Convert messages via convertToLlm().
Serialize with serializeConversation().
Wrap in <conversation>...</conversation>.
Optionally include <previous-summary>...</previous-summary>.
Optionally inject extension hook context and active memory-backend compaction context as <additional-context> entries.
Execute summarization prompt with SUMMARIZATION_SYSTEM_PROMPT.

Prompt selection:

first compaction: compaction-summary.md
iterative compaction with prior summary: compaction-update-summary.md
split-turn second pass: compaction-turn-prefix.md
short UI summary: compaction-short-summary.md
handoff document: handoff-document.md (used by generateHandoff(...), not serialized compaction)

Remote summarization modes:

If compaction.remoteEndpoint is set and remote compaction is enabled, local summary generation POSTs:
- { systemPrompt, prompt }
Expects JSON containing at least { summary }.
For OpenAI/OpenAI Codex models, compaction first tries the provider-native /responses/compact endpoint when remote compaction is enabled. It preserves provider replacement history in preserveData.openaiRemoteCompaction and falls back to local summarization if that native request fails.

Handoff generation

packages/agent/src/compaction/compaction.ts also exports generateHandoff(...). Handoff generation uses the same completeSimple(...) oneshot style as summarization, but it preserves the live agent cache prefix by sending the active system prompt, tool array, and real LLM message history, then appending one agent-attributed user message containing the handoff prompt. It forces toolChoice: "none" and returns joined text blocks directly.

Handoff does not write a CompactionEntry. AgentSession.handoff() owns the session transition: it starts a new session, injects the generated document as a visible custom_message with customType: "handoff", and rebuilds agent messages from that new session.

File-operation context in summaries

Compaction tracks cumulative file activity using assistant tool calls:

read(path) → read set
write(path) → modified set
edit(path) → modified set

Cumulative behavior:

Includes prior compaction details only when prior entry is pi-generated (fromExtension !== true).
In split turns, includes turn-prefix file ops too.
readFiles excludes files also modified.

Summary text gets file tags appended via prompt template:

<read-files>
...
</read-files>
<modified-files>
...
</modified-files>

Persist and reload

After summary generation (or hook-provided summary), agent session:

Appends CompactionEntry with appendCompaction(...) for context-full maintenance; handoff strategy creates a new session and injects a handoff custom_message instead.
Rebuilds display context from the active leaf via buildDisplaySessionContext().
Replaces live agent messages with rebuilt context.
Synchronizes active todo phases from the rebuilt branch and closes provider sessions whose history was rewritten.
Emits session_compact hook event.

Branch summarization pipeline

Branch summarization is tied to tree navigation, not token overflow.

Trigger

During navigateTree(...):

Compute abandoned entries from old leaf to common ancestor using collectEntriesForBranchSummary(...).
If caller requested summary (options.summarize), generate summary before switching leaf.
If summary exists, attach it at the navigation target using branchWithSummary(...).

Operationally this is commonly driven by /tree flow when branchSummary.enabled is enabled.

Branch switch shape (visual)

Tree before navigation:

         ┌─ B ─ C ─ D (old leaf, being abandoned)
    A ───┤
         └─ E ─ F (target)

Common ancestor: A
Entries to summarize: B, C, D

After navigation with summary:

         ┌─ B ─ C ─ D ─ [summary of B,C,D]
    A ───┤
         └─ E ─ F (new leaf)

Preparation and token budget

generateBranchSummary(...) computes budget as:

tokenBudget = model.contextWindow - branchSummary.reserveTokens

prepareBranchEntries(...) then:

First pass: collect cumulative file ops from all summarized entries, including prior pi-generated branch_summary details.
Second pass: walk newest → oldest, adding messages until token budget is reached.
Prefer preserving recent context.
May still include large summary entries near budget edge for continuity.

Compaction entries are included as messages (compactionSummary) during branch summarization input.

Summary generation and persistence

Branch summarization:

Converts and serializes selected messages.
Wraps in <conversation>.
Uses custom instructions if supplied, otherwise branch-summary.md.
Calls summarization model with SUMMARIZATION_SYSTEM_PROMPT.
Prepends branch-summary-preamble.md.
Appends file-operation tags.

Result is stored as BranchSummaryEntry with optional details (readFiles, modifiedFiles).

Extension and hook touchpoints

`session_before_compact`

Pre-compaction hook.

Can:

cancel compaction ({ cancel: true })
provide full custom compaction payload ({ compaction: CompactionResult })

`session.compacting`

Prompt/context customization hook for default compaction.

Can return:

prompt (override base summary prompt)
context (extra context lines injected into <additional-context>)
preserveData (stored on compaction entry)

`session_compact`

Post-compaction notification with saved compactionEntry and fromExtension flag.

`session_before_tree`

Runs on tree navigation before default branch summary generation.

Can:

cancel navigation
provide custom { summary: { summary, details } } used when user requested summarization

`session_tree`

Post-navigation event exposing new/old leaf and optional summary entry.

Runtime behavior and failure semantics

Manual compaction aborts current agent operation first.
abortCompaction() cancels manual compaction, auto-compaction, and handoff generation controllers.
Auto compaction emits start/end session events for UI/state updates.
Auto compaction can try multiple model candidates and retry transient failures; long retry delays prefer the next candidate when one is available.
Overflow errors are excluded from generic retry path because they are handled by context promotion/compaction.
If auto-compaction fails:
- overflow path emits Context overflow recovery failed: ...
- incomplete-output path emits Incomplete response recovery failed: ...
- threshold/idle paths emit Auto-compaction failed: ...
Branch summarization can be cancelled via abort signal (e.g., Escape), returning canceled/aborted navigation result.

Settings and defaults

From settings-schema.ts:

compaction.enabled = true
compaction.strategy = "context-full" ("handoff" and "off" are also supported)
compaction.reserveTokens = 16384
compaction.keepRecentTokens = 20000
compaction.autoContinue = true
compaction.remoteEnabled = true
compaction.remoteEndpoint = undefined
compaction.thresholdPercent = -1 and compaction.thresholdTokens = -1; when no positive override is set, the threshold is contextWindow - max(15% of contextWindow, reserveTokens)
compaction.idleEnabled = false
compaction.idleThresholdTokens = 200000
compaction.idleTimeoutSeconds = 300
branchSummary.enabled = false
branchSummary.reserveTokens = 16384

These values are consumed at runtime by AgentSession and compaction/branch summarization modules.