ContextEngine: Context Lifecycle Engine
for CLI Agents

Across every phase of the Agent loop, manage what goes in, what stays, what gets evicted, and what can be recalled. Seven context types. Three-level cache hierarchy. One virtual filesystem. Cut token cost. Preserve signal density. Survive session boundaries.

English | 中文


The Problem: Token Budget is the Bottleneck

Every CLI Agent runs on a fixed token budget. The context window is both the most expensive and the most scarce resource.

Symptom Root Cause Missing Lifecycle Stage
Agent forgets earlier conversation Context window fills up, old content is evicted with no persistence afterTurn — no extraction + persist mechanism
Same mistakes repeated across sessions No mechanism to carry lessons from one session to the next session_end — no archival + ② bootstrap — no cold-start injection
A query that should cost $0.05 costs $0.50 Flat retrieval loads entire documents when an abstract would suffice assemble — no L0/L1/L2 hierarchical retrieval
Multi-agent coordination failures Agents cannot see each other's working context afterTurn — no cross-session sharing + multi-tenant isolation
Context bloat in long sessions No systematic compression — the Agent drowns in its own history compact — no signal scoring + summary chain

This isn't a model problem. It's an infrastructure problem.

ContextEngine provides the missing infrastructure: a full-stack context lifecycle management system that intercepts the Agent loop at six defined stages, treating the context window as a managed resource — not an unbounded buffer.


Design Philosophy

Core Insight: Context Has a Lifecycle

Current RAG systems treat retrieval as a single operation — embed a query, search a vector store, return results. This misses the fundamental reality: context in an Agent system has a complete lifecycle, just like data in a database.

  Born          Structured       Stored           Indexed          Recalled        Compressed       Archived
  (extracted    (categorized     (written         (embedded +      (vector         (summarized,     (session end,
   from          by type,        atomically       upserted         search,         deduplicated,    archived,
   conversation) routed by       with ordering    to L0/L1/L2      hierarchical    compressed)      checkpointed)
                  policy)         guarantees)      IndexRecords)    expansion)

Each phase has different constraints. Extraction must be incremental (don't reprocess old messages). Storage must be atomic (incomplete writes are detectable). Indexing must be asynchronous (don't block the Agent's response). Retrieval must be budget-aware (load only what fits). Compression must preserve signal (protect important context).

A system that handles only one or two of these phases — say, just retrieval — leaves the rest to chance. ContextEngine covers the full lifecycle.

The Six Interception Points

The Agent loop isn't a black box. It has well-defined execution phases. Each phase presents a different opportunity to read, write, or transform context.

  ┌──────────────────────────────────────────────────────────────────────┐
  │                         Agent Loop (infinite)                         │
  │                                                                       │
  │   ┌─────┐     ┌──────────┐     ┌──────────┐     ┌─────────┐         │
  │   │ ①   │     │    ②     │     │    ③     │     │   ④     │         │
  │   │ MSG │────▶│ REASON   │────▶│ TOOL     │────▶│ TURN    │         │
  │   │ IN  │     │ PREP     │     │ CALL     │     │ END     │         │
  │   └──┬──┘     └──────────┘     └──────────┘     └────┬────┘         │
  │      ▲                                              │               │
  │      │         ┌──────────┐         ┌─────────┐     │               │
  │      │         │    ⑤     │         │   ⑥     │     │               │
  │      └─────────│ COMPRESS │◀────────│ SHUTDOWN │◀────┘               │
  │                └──────────┘         └─────────┘                      │
  │                                                                       │
  └──────────────────────────────────────────────────────────────────────┘

The key design choice: intercept at the loop boundary, not inside model inference. Stages ①–⑥ are all outside the LLM call. This means zero latency impact on model inference itself — all context operations happen before or after, never during.

Not All Context Is Equal

Seven types of context, each with fundamentally different lifecycle behavior. This isn't arbitrary categorization — it's driven by the semantics of the information itself.

Type Why this lifecycle? Write Policy
Profile User state changes — "I live in London" may become "I moved to Tokyo" Merge — new overwrites old on conflict
Preference Preferences accumulate per topic but each topic has one current view Aggregate by slug — merge within topic
Entity Entities accumulate facts, but "Project Alpha" is still "Project Alpha" Aggregate by slug — merge within entity
Event History is immutable — "completed migration on March 15" never changes Append only — never overwrite
Case Problem-solving traces are historical records Append only — never overwrite
Pattern Patterns emerge from repeated observations and evolve over time Aggregate by slug — refine over time
Skill Tool expertise grows cumulatively — more experience = better knowledge Cumulative append — knowledge accumulates

Four distinct write policies, each justified by the information's lifecycle semantics. A single "store everything the same way" approach would either lose mutable state (if append-only) or corrupt immutable history (if overwriting).

In addition to the 7 user-facing types, 3 system-internal schemas exist (tool, session_archive, session_summary) — these are created programmatically by the chunking and compression pipelines, not extracted by the LLM ReAct loop.


Key Architectural Decisions

Why YAML-driven schemas?

The problem: Hardcoding extraction schemas in Python means every new context type requires code changes, testing, and deployment.

The solution: YAML schemas declare what to extract and how to categorize it. The SchemaRegistry, PolicyRouter, and URIResolver all consume these schemas at runtime. Adding a new type — say "Decision" or "Handoff" — means adding a YAML file, not modifying the extraction pipeline.

What breaks without it: Every new context type becomes a code change that touches extraction, commit routing, and URI resolution — three packages that must be co-released. In practice, teams just skip adding new types, and everything gets shoved into generic "notes."

Why the Outbox pattern for async indexing?

The problem: Embedding + upserting to a vector index takes 100–500ms. If synchronous, every afterTurn blocks for that duration, adding latency to the Agent's response.

The solution: The Outbox pattern decouples writing (fast, ~50ms) from indexing (slow, async). Each write deposits an OutboxEvent. A background worker consumes events with at-least-once delivery and a DLQ for permanent failures. The write path returns immediately; the index catches up within seconds.

What breaks without it: The Agent's perceived response time balloons by 2–10x. Worse, if the vector index is temporarily down, writes fail and context is lost — the system becomes fragile to infrastructure transients.

Why L0/L1/L2 three-level retrieval?

The problem: Longer content doesn't mean better vector similarity — in fact, the opposite. An embedding of a 5000-token document must represent every concept it contains, diluting the signal for any single topic.

The solution: Each context node is indexed at three granularities. L0 abstracts (~100 tokens) produce focused embeddings — they're accurate topic signposts. L1 overviews (~500 tokens) are mid-grain. L2 full content (~5000 tokens) has the detail but diffuse embeddings. The retrieval engine does a single vector search across all levels, then uses L0/L1 hits as directory entry points for recursive tree expansion: when an L0 abstract matches, the searcher expands into its children to discover L2 content that flat search would miss. Score propagation (final = α·child + (1-α)·parent) boosts borderline L2 hits under strongly-matching parents. Hotness blending (blended = (1-α_hotness)·semantic + α_hotness·h_score) is applied during tree expansion.

  ┌─────────────┐    ┌─────────────┐    ┌──────────────┐
  │  L0 Abstract │    │  L1 Overview │    │  L2 Content   │
  │  ~100 tokens │    │  ~500 tokens │    │  ~5000 tokens │
  │  Focused     │    │  Balanced    │    │  Comprehensive│
  │  embedding   │    │  signal      │    │  but diffuse  │
  │  → signpost  │    │  → decision  │    │  → full detail│
  └──────┬───────┘    └──────┬───────┘    └──────┬────────┘
         │                   │                    │
         ▼                   ▼                    ▼
    .abstract.md        .overview.md          content.md

Recall benefit: Flat vector search on L2 content alone misses relevant chunks whose embeddings got diluted by surrounding detail. The L0/L1 signposts guide the searcher to the right neighborhood, then tree expansion discovers the full content underneath — including chunks whose raw vector score was below threshold but whose parent topic was a strong match.

Why a filesystem abstraction?

The problem: Context operations — create, read, update, link, delete — need to be composable, atomic, and multi-tenant-safe. Building each as a bespoke API means reimplementing concurrency, permissions, and consistency for every operation.

The solution: ContextFS maps every context operation to a file-like operation on ctx:// URIs. Multi-tenant isolation is enforced at the filesystem level (via account_id + owner_space in every path), making it impossible to bypass even by buggy callers. The actual ContextFS protocol provides these operations:

Protocol Method Context Operation
write_node(node, ctx) Atomic write with 4-step order (content → relations → abstract/overview → meta.json)
read_node(uri, ctx) Load full node (content + meta + relations)
delete_node(uri, ctx) Permanent removal
archive_node(uri, ctx) Soft-delete (status → ARCHIVED)
move_node(from, to, ctx) Relocate node
list_children(uri, ctx) Enumerate child URIs
exists(uri, ctx) True only if ACTIVE (PENDING not visible)

Relations are managed through a separate RelationStore protocol, not part of ContextFS itself. Two backends implement ContextFS: the AGFS adapter (Go file server, default) and the SQL adapter (PostgreSQL).

What breaks without it: Tenant isolation becomes "each caller must remember to check permissions." Atomic writes become "each caller must implement the 4-step write protocol correctly." Every new feature re-solves the same infrastructure problems.

Why optimistic locking for concurrent writes?

The problem: Profile nodes are written by every session a user has. Two sessions can extract conflicting profile updates simultaneously.

The solution: Optimistic locking — read the current .meta.json version, write only if unchanged. This handles the common case (no contention) with zero infrastructure overhead. On the rare case of concurrent writes, the second writer retries with fresh state.

What breaks without it: Distributed locks require a coordination service (etcd/ZooKeeper) — adding infrastructure complexity for a problem that occurs rarely. Without any locking, last-writer-wins silently loses the first writer's updates.

Why the ReAct loop for extraction?

The problem: Extraction quality depends on knowing what already exists. If the system already knows "Alice is a backend engineer," extracting that again wastes LLM tokens and creates duplicate entries that must be deduplicated later.

The solution: The LLM is given tool access — read(uri), list(uri), get_relations(uri), get_access_stats(uri), plus extract_* actions — and runs in a loop. Each iteration: it reads existing memory nodes (Reason), decides what's genuinely new, then calls the appropriate extract_* tool (Act). If unsure, it reads more nodes and loops. This is true ReAct: interleaved reasoning and tool use, not single-shot extraction.

  Iteration 1:  read(profile_uri) → "Alice, backend engineer, London"
                 → Nothing new to extract. Skip.

  Iteration 2:  read(entities/go) → "Go expert, prefers error handling pattern X"
                 → New: "Alice now also uses Rust for side projects"
                 → extract_entity(slug="rust", ...)

What breaks without it: Every turn re-extracts the same facts. Over 100 turns, "Alice is a backend engineer" gets extracted 100 times, creating 100 Profile merge operations — each one burning LLM tokens on content that was already stored. Extraction cost grows linearly with conversation length instead of with information density.


Architecture

Overall Architecture

         CLI Agent (Claude Code / OpenClaw / SDK)
              │  HTTP REST (port 8090) or Python SDK
              ▼
    ┌─ HTTP Layer ──── Flask REST · Auth/RBAC · Sessions ─┐
    └──────────────────────┬───────────────────────────────┘
    ┌─ Service Layer ── MemoryWriteAPI · MemoryService ────┐
    └──────────────────────┬───────────────────────────────┘
        ┌──────────────────┴──────────────────┐
        │  Write Path          │  Read Path    │
        │  ReAct Extract Loop  │  QueryPlanner │
        │  PolicyRouter        │  SeedRetriever│
        │  ContextWriter       │  (+BM25 fuse) │
        │  OutboxStore         │  HierSearcher │
        │                      │  ResultRanker │
        └──────────────────────┬───────────────┘
    ┌─ ContextFS ── AGFS Adapter · SQL Adapter ───────────┐
    └──────────────────────┬───────────────────────────────┘
    ┌─ Async Index ── OutboxWorker · DirSummarizer · RepairJob ─┐
    └──────────────────────┬────────────────────────────────────┘
              ┌─────────────▼─────────────┐
              │  AGFS · PostgreSQL · ChromaDB/pgvector│
              └───────────────────────────┘

Dev mode: ChromaDB vectors + AGFS file storage — zero external dependencies (default). SQL mode: ChromaDB/pgvector + PostgreSQL — install postgresql and pgvector extension. Production: pgvector + PostgreSQL RLS — horizontally scalable with row-level tenant isolation.

Write Path: Conversation to Persistent Memory

  Agent Turn completes
        │
        ▼
  ┌─────────────────────────────────────────────────┐
  │  ReAct Extraction Loop                          │
  │                                                 │
  │  LLM has tools: read(uri), list(uri),           │
  │  get_relations(uri), get_access_stats(uri),      │
  │  extract_*()                                    │
  └────────────────────┬────────────────────────────┘
                       │  CandidateMemory[]
                       ▼
  ┌──────────────────┐     ┌──────────────────┐
  │  PolicyRouter     │     │  ContextWriter    │
  │  (schema-driven)  │────▶│                  │
  │  Profile→Merge    │     │  Plan → Build     │
  │  Entity→Aggregate │     │  → Write (4-step) │
  │  Event→Append     │     │  → Outbox         │
  │  Skill→SkillTool  │     │  → DirSummary     │
  └──────────────────┘     └────────┬─────────┘
                                    │
                          ┌─────────▼─────────┐
                          │  Outbox Event     │
                          │  (async, durable) │
                          └─────────┬─────────┘
                                    │
                    ┌───────────────▼───────────────┐
                    │  Index Worker (background)    │
                    │  embed(abstract) → L0 upsert  │
                    │  embed(overview)  → L1 upsert  │
                    │  embed(content)   → L2 upsert  │
                    │  DirectorySummarizer (L0/L1    │
                    │  for parent nodes)             │
                    └───────────────────────────────┘

The 4-step atomic write (content.md → .relations → abstract+overview → .meta.json) is the AGFS adapter's physical file protocol. ContextWriter orchestrates at a higher level: Plan → Build → Write → Outbox → Directory Summary (5 logical steps).

Read Path: Query to Assembled Context

  User Message arrives
        │
        ▼
  ┌───────────────────┐     ┌──────────────────────────────────────────┐
  │  QueryPlanner      │     │  SeedRetriever                          │
  │                    │     │                                          │
  │  Type classify:    │────▶│  Vector search across L0+L1+L2          │
  │  regex → MEMORY/   │     │  + BM25 keyword fusion (alpha-weighted)  │
  │  SKILL/RESOURCE    │     │  → L0/L1 hits = directory signposts      │
  │  Intent classify:  │     │  → L2 hits = direct content matches      │
  │  RetrievalIntent   │     └────────────────────┬─────────────────────┘
  └───────────────────┘                          │
                                          ┌──────▼──────┐
                                          │  L0/L1 hit? │
                                          └──────┬──────┘
                                           Yes   │   No
                                          ┌──────▼──────▼──────┐
                                          │  HierarchicalSearch │  Use L2 hits
                                          │                    │  directly
                                          │  search_children() │
                                          │  per directory node│
                                          │  Score propagation │
                                          │  + hotness blend   │
                                          └────────┬───────────┘
                                                   │
                                          ┌────────▼─────────┐
                                          │  ResultRanker     │
                                          │  Dedup by URI     │
                                          │  Sort by score    │
                                          │  Truncate to top_k│
                                          │  Fill content     │
                                          └──────────────────┘

One vector search, not three sequential passes. BM25 fusion is performed inside SeedRetriever (not a separate pipeline stage) using Vector-Anchored Fusion: final = α·vec + (1-α)·sat_bm25. Hotness blending happens during HierarchicalSearcher expansion, not in ResultRanker. Deduplication is by exact URI, not cosine similarity.

Namespace isolation: Queries are scoped by intent type. MEMORY queries search both users/{user}/memories/ and agents/{agent}/memories/. SKILL queries search only agents/{agent}/skills/. The owner_space filter is set by the QueryPlanner based on context_type and visible_owner_spaces, enforced at the vector index level — callers cannot override it.


The Agent Context Lifecycle

The six stages form a pipeline where each stage's output feeds the next. The design principle: never block model inference. All context operations happen at loop boundaries — before the LLM thinks or after it acts, never during.

Stage When What happens Key invariant
① message_received Before inference Classify intent, prefetch candidates from L0 Never block; fail silently if timeout
② bootstrap + ingest + assemble Before prompt build Cold-start profile injection, budget-aware L0→L1→L2 loading, dedup, skill injection Never exceed token budget
③ before_tool_call + tool_result_persist Around tool execution Inject known failure patterns, compress oversized results, extract immediate facts Tool params informed by history
④ afterTurn After Agent responds Incremental extraction of delta, policy-routed writes, relation building, async index Only extract what's new
⑤ before_compaction + compact When context fills Score signal, protect critical nodes, compress redundancy Never lose Profile or active task
⑥ session_end + dispose Session closes Archive completed tasks, snapshot state for next session, audit integrity Next session picks up seamlessly

The lifecycle is circular: Stage ⑥'s task snapshot becomes Stage ②'s Handoff injection. Context born in Stage ④ gets recalled in Stage ①. What persists across sessions is what makes the system learn, not just remember.


Data Flow: End-to-End Walkthrough

  Session 1 — Write Path: "I'm Alice, a backend engineer based in London"
  ────────────────────────────────────────────────────────────────────────
    User Message → Stage ④ afterTurn
        → Incremental Extraction (name, role, location)
        → CandidateMemory(category="profile")
        → PolicyRouter → ProfilePolicy (merge)
        → ContextWriter (Plan → Build → Write → Outbox → DirSummary)
        → OutboxEvent → async IndexRecordBuilder (L0 + L1 + L2 embed + upsert)

  Session 2 — Read Path: "What does Alice do for a living?"
  ────────────────────────────────────────────────────────────────────────
    User Message → Stage ① message_received
        → QueryPlanner → type=MEMORY, intent=BACKGROUND_SUPPLEMENT
        → SeedRetriever → vector search + BM25 → L0 hit (profile matches)
        → HierarchicalSearcher → expand from L0 → discover L2 content
        → ResultRanker → dedup by URI, sort, truncate
        → Inject "Alice is a backend engineer"
        → Agent responds correctly

Write costs ~50ms (synchronous) + ~100ms (async indexing). Read costs ~50ms. The cost is paid once at write time, amortized over many reads.


How It Differs

ContextEngine is not "vector RAG with extra steps." The fundamental differences are in lifecycle coverage, write policies, and retrieval granularity.

Dimension ContextEngine Standard Vector RAG Mem0
Lifecycle coverage 6 stages: extract → store → index → recall → compress → archive 1 stage: retrieve 2 stages: write + retrieve
Write policies 4 policies (merge, aggregate, append, cumulative) matched to information semantics Single: upsert Single: upsert with memory_id
Retrieval granularity L0/L1 directory signposts → hierarchical tree expansion → L2 content discovery, BM25 fusion, score propagation + hotness Flat top-k similarity search Flat top-k with graph expansion
Context types 7 types with distinct lifecycle behavior per type (+ 3 system-internal schemas) 1 type: "document chunk" 1 type: "memory" with scope labels
Multi-tenant isolation Enforced at filesystem level (account_id + owner_space in every path), impossible to bypass Application-level filtering (depends on caller) Application-level filtering
Concurrent writes Optimistic locking with version check Last-writer-wins (or external lock) Last-writer-wins
Atomicity 4-step ordered write with detectable incomplete state Best-effort Best-effort
Compression Signal scoring + protected nodes + summary chain (Phase 2, in progress) None None

The core distinction: ContextEngine treats context as a managed lifecycle, not a store-and-retrieve problem. Write policies are not configuration — they're architectural decisions derived from information semantics. Retrieval is not a single operation — it's a multi-stage decision process with budget awareness. And the system is designed to run continuously across sessions, not just serve queries.


Roadmap

Completed

Milestone Key Deliverables
Core Foundation Domain models, ContextFS abstraction (AGFS + SQL adapters), ctx:// URI scheme, 4-step atomic write
Extraction Pipeline YAML SchemaRegistry (10 schemas: 7 user-facing + 3 system), ReAct extraction loop, schema-driven PolicyRouter, 4 merge policies
Hierarchical Retrieval L0/L1/L2 IndexRecords, SeedRetriever with BM25 fusion, HierarchicalSearcher (tree expansion + score propagation + hotness), QueryPlanner, IntentClassifier
L0 Structured Summary Dual-template extraction (L0 abstract + L1 overview per node), overview-first retrieval, session summary generation
Session Lifecycle SessionManager, TopicBuffer, session commit (archive + extract), session context assembly, RollingCompressor
Multi-Tenant Auth RBAC (ROOT/ADMIN/MEMBER), API key auth, IP allowlist with proxy trust, agent sharing (off/whitelist/all), audit logging
Agent Integration Claude Code hooks plugin, OpenClaw TypeScript bridge, ogmem unified CLI (onboard/start/stop/check/config/status/logs/eval)
Multi-Agent Handoff Sub-agent spawn/ended lifecycle, context handoff between agents, result merge back to parent
Boundary Detection LLM-based conversation chunking, message boundary detection, configurable segment sizing

Benchmark Progression (LoCoMo10)

Run Accuracy Key Improvement
Run76 88.2% (1358/1540) L0 structured summary + BM25 hybrid + extraction prompt optimization
Run69 88.2% L0 summary injection + prompt optimization (+6.6% over run68)
Run68 81.6% session_time fix (baseline)

Evaluated on LoCoMo long-context conversational memory benchmark (10 sessions, 4 categories, 1540 questions).

In Progress

Milestone Description
Context Compression (⑤ compact) Signal scoring, protected nodes (Profile, active task), summary chain, budget-aware truncation
Graph Retrieval Entity-relation graph traversal for multi-hop queries, relation-weighted expansion

Planned

Milestone Description
Tool Context Injection (③) Before/after tool call hooks — inject known failure patterns, compress oversized tool results
Session Restore (⑥) Cross-session state handoff — snapshot at session end, cold-start injection at session begin
Graph Clustering Community detection for entity grouping — auto-cluster related entities into topics
Adaptive Planning Query-aware retrieval strategy selection — route complex queries through deeper pipelines
Observability OpenTelemetry tracing, token usage dashboards, retrieval quality metrics
AI Functions Tool-augmented retrieval actions — context-aware tool selection and parameter suggestion

Quick Start

Prerequisites

  • Python 3.11+
  • PostgreSQL 14+ with pgvector extension (for PostgreSQL mode)
  • Docker (optional, for containerized deployment)

Install

git clone https://gitcode.com/opengauss/oGMemory.git
cd oGMemory
python3 -m venv .venv && source .venv/bin/activate
Option A: AGFS mode (default)
pip install -e .

# Interactive setup wizard (guides LLM + Embedding + Vector DB + storage config)
ogmem onboard

# One-command start (AGFS + ContextEngine)
ogmem start local
Non-interactive mode (CI / automation)
ogmem onboard --non-interactive --mode headless \
  --provider openai --api-key sk-xxx \
  --embedding-model text-embedding-ada-002 --vector-db chroma \
  --storage-backend sql
Option B: PostgreSQL mode (direct SQL storage)

1. Install & configure PostgreSQL

# Ubuntu / Debian
sudo apt-get install postgresql postgresql-contrib
sudo apt-get install postgresql-16-pgvector   # adjust version to match your PG

# macOS
brew install postgresql@16
brew install pgvector

# Start PostgreSQL
sudo service postgresql start   # Ubuntu
brew services start postgresql  # macOS

# Create database & enable pgvector
sudo -u postgres createdb ogmemory
sudo -u postgres psql -d ogmemory -c "CREATE EXTENSION IF NOT EXISTS vector;"

2. Install ContextEngine with SQL extras

pip install -e ".[dev,sql]"

3. Configure connection

cp config/ogmem.reference.yaml config/ogmem.yaml
# Edit storage.connection_string to point at your PostgreSQL instance

Use It

HTTP Server (recommended)

AGFS mode:

ogmem start local    # Start AGFS + ContextEngine, ports 1833 + 8090

PostgreSQL mode:

cp config/ogmem.reference.yaml config/ogmem.yaml
# Edit ogmem.yaml: set storage.backend to sql and storage.connection_string to your PostgreSQL DSN
python server/app.py

Ingest a conversation turn

curl -X POST http://localhost:8090/api/v1/after_turn
-H "Content-Type: application/json"
-d '{"userId":"user-1","sessionId":"session-1", "messages":[{"role":"user","content":"I am Alice, a backend engineer"}, {"role":"assistant","content":"Nice to meet you!"}]}'

Search memory

curl -X POST http://localhost:8090/api/v1/compose
-H "Content-Type: application/json"
-d '{"userId":"user-1","sessionId":"session-2","query":"what is alice job"}'


</details>

<details>
<summary><strong>Python SDK</strong></summary>

```python
from service.api import MemoryWriteAPI
from core.models import RequestContext
from fs.sql_adapter import SQLContextFS
from providers.config import ProviderConfig

config = ProviderConfig.from_env()
fs = SQLContextFS(connection_string="host=127.0.0.1 port=5432 dbname=ogmemory user=postgres password=postgres")
write_api = MemoryWriteAPI(fs=fs, llm=config.create_llm())

ctx = RequestContext(account_id="acct", user_id="u1", agent_id="a1", session_id="s1", trace_id="t1")
result = write_api.commit_session([
    {"role": "user", "content": "I'm Alice, backend engineer, London"},
    {"role": "assistant", "content": "Nice to meet you, Alice!"},
], ctx)
Docker / HTTP API Reference
docker compose up   # Server on :8090, PostgreSQL on :5432
Endpoint Method Description
/api/v1/compose POST Search memory, return context for current turn
/api/v1/after_turn POST Extract + persist memories from conversation
/api/v1/ingest POST Single message ingest
/api/v1/ingest_batch POST Batch message ingest
/api/v1/bootstrap POST Cold-start session with profile + preferences
/api/v1/compact POST Trigger context compression
/api/v1/prepare_compaction POST Prepare compaction token (pre-compaction planning)
/api/v1/dispose POST Session disposal — archive + cleanup
/api/v1/prepare_subagent_spawn POST Sub-agent context handoff (multi-agent)
/api/v1/on_subagent_ended POST Sub-agent result merge back to parent
/api/v1/token_stats GET/POST LLM & embedding token usage (POST with reset to clear)
/api/v1/sessions/{id}/messages POST Add message to session buffer
/api/v1/sessions/{id} GET Get session metadata + pending tokens
/api/v1/sessions/{id}/commit POST Commit session: archive + extract
/api/v1/sessions/{id}/context GET Get assembled session context
/api/v1/call/<method> POST Generic method dispatch (forward-compat)
/api/v1/health GET Health check (storage + LLM + vector DB)
/api/v1/admin/accounts GET List accounts
/api/v1/admin/accounts/{id} GET Get account
/api/v1/admin/accounts/{id}/users GET/POST List / create users
/api/v1/admin/accounts/{id}/users/{uid} DELETE Delete user
/api/v1/admin/accounts/{id}/users/{uid}/role PATCH Set user role (ROOT/ADMIN/MEMBER)
/api/v1/admin/accounts/{id}/roles GET List roles
/api/v1/admin/accounts/{id}/agents GET/POST List / register agents
/api/v1/admin/accounts/{id}/agents/{aid} GET/PATCH Get / update agent
/api/v1/admin/accounts/{id}/audit-logs GET List audit logs
/api/v1/admin/accounts/{id}/audit-logs/{log_id} GET Get single audit log
/api/v1/admin/config/agent-sharing GET Agent sharing configuration

Configuration

Key environment variables (see ENV.md for full reference). Config priority: YAML value > environment variable > hard-coded default.

Variable Default Description
OGMEM_API_KEY LLM API key (OpenAI-compatible)
OGMEM_BASE_URL Custom LLM API base URL
OGMEM_LLM_MODEL gpt-4o-mini LLM model for extraction + classification
OGMEM_EMBEDDING_MODEL text-embedding-ada-002 Embedding model for vector indexing
OGMEM_EMBEDDING_API_KEY Separate key for embedding API (falls back to OGMEM_API_KEY)
EMBEDDING_PROVIDER Separate embedding provider (openai / volcengine / st / mock)
VECTOR_DB_TYPE chroma Vector backend: chroma / opengauss (pgvector) / memory
STORAGE_BACKEND agfs Storage backend: agfs (AGFS file server) / sql (PostgreSQL)
SQL_CONNECTION_STRING PostgreSQL DSN (required when STORAGE_BACKEND=sql)
AGFS_BASE_URL http://127.0.0.1:1833 AGFS server URL (used when STORAGE_BACKEND=agfs)
OGMEM_HTTP_PORT 8090 HTTP server listen port
OGMEM_CONFIG config/ogmem.yaml Path to YAML config file
OG_ACCOUNT_ID acct-demo Default account ID
OG_USER_ID u-alice Default user ID
OG_AGENT_ID main Default agent ID
OG_ROLE_CONTROL_ENABLED false Enable RBAC auth
OG_ROOT_API_KEY Root API key for admin access
OG_AGENT_SHARED_MODE off Agent memory sharing: off / whitelist / all
OGMEM_AFTER_TURN_THRESHOLD 200 Min message length to trigger extraction
OGMEM_CACHE_ENABLED true Enable retrieval cache
CHUNKING_ENABLED false Enable boundary detection + chunking
INDEX_INTERVAL 30 Index worker polling interval (seconds)

Repository Layout

ContextEngine/
├── core/                   # Domain models, Protocol interfaces, enums, errors
├── fs/                     # ContextFS abstraction (filesystem metaphor for context ops)
│   ├── agfs_adapter/       #   AGFS-backed ContextFS (Go file server, default backend)
│   └── sql_adapter/        #   PostgreSQL-backed ContextFS (atomic upsert, RLS tenant isolation)
├── extraction/             # CandidateExtractor (ReAct loop + YAML SchemaRegistry)
│   ├── prompts/            #   LLM prompt templates (Jinja2)
│   └── schemas/            #   Schema registry + YAML definitions per category
│       └── definitions/    #     10 YAML files: profile, entity, event, skill, tool, etc.
├── commit/                 # Write chain: PolicyRouter → MergePolicy → ContextWriter → OutboxStore
├── index/                  # Async indexing: OutboxWorker → IndexRecordBuilder → DirectorySummarizer
├── retrieval/              # Read chain: QueryPlanner → IntentClassifier → SeedRetriever (+BM25)
│                          #   → HierarchicalSearcher → ResultRanker
├── providers/              # External adapters: LLM, Embedder, VectorIndex, RelationStore
│   ├── llm/                #   OpenAI-compatible LLM (works with OpenAI/Volcengine/DashScope/Zhipu)
│   ├── embedder/           #   4 backends (OpenAI, Volcengine, SentenceTransformers, Mock)
│   ├── vector_index/       #   InMemory / ChromaDB / pgvector (OpenGauss)
│   └── relation_store/     #   SQL + AGFS relation stores
├── service/                # API layer: MemoryWriteAPI, MemoryService, IndexService
├── server/                 # HTTP REST server (Flask), auth/RBAC, IP allowlist, sessions
├── session/                # SessionManager, TopicBuffer, RollingCompressor, ArchiveStore
├── tests/
│   ├── contract/           # Cross-team contract tests (invariants)
│   ├── unit/               # Per-package unit tests
│   ├── integration/        # End-to-end integration tests
│   ├── e2e/                # LoCoMo benchmark evaluation framework
│   ├── benchmark/          # Performance and quality benchmarks
│   ├── ab/                 # A/B comparison tests
│   └── fixtures/           # Shared test data
├── docs/                   # Architecture, deployment, quickstart guides
├── examples/               # Usage examples (SDK, agent integration)
├── cli/                    # Unified management CLI (ogmem command)
│   └── commands/           #   onboard, start, stop, check, config, status, logs, eval
├── claude-plugin/          # Claude Code hooks integration (hooks, scripts, skills)
├── openclaw_context_engine_plugin/  # OpenClaw plugin (TypeScript bridge)
├── agfs/                   # AGFS Go server source (cmd/, pkg/, sdk/)
├── config/                 # Configuration files (ogmem.yaml, .env, AGFS configs)
├── deploy/                 # Docker deployment scripts and configs
├── docker/                 # Docker Compose files
└── scripts/                # Auxiliary scripts (index service, etc.)

Agent Integration

Claude Code

ContextEngine provides native Claude Code hooks for zero-config memory:

ogmem onboard                           # Interactive: choose "Agent Plugin" → "Claude Code"
ogmem start plugin                      # Start CE server + install hooks

OpenClaw

cd openclaw_context_engine_plugin && openclaw plugins install -l .
Tool Stage Operation
og_memory_write ④ Turn End Extract + persist new context
og_memory_search ① Message Prefetch relevant context
og_memory_read ② Reasoning Prep Load full context by URI

Automatic behaviors (no Agent code changes): new message triggers prefetch, turn end triggers extraction, context fill triggers compression, session close triggers archival.


Implementation Status

Phase 0 + 1 (production quality): See Roadmap for detailed milestone breakdown.

Core models, ContextFS abstraction (AGFS + PostgreSQL adapters), extraction pipeline (YAML SchemaRegistry + ReAct loop), write chain (4 merge policies + OutboxStore), hierarchical retrieval (L0/L1/L2) with BM25 hybrid search, async indexing with DLQ, providers (OpenAI/Volcengine/DashScope/Zhipu LLM, 4 embedders, InMemory/ChromaDB/pgvector), HTTP REST API with RBAC auth, session management, L0 structured summary generation, multi-agent handoff, boundary detection/chunking. 100+ test files (contract + unit + integration + e2e + benchmark).

Lifecycle stages covered: ① message_received, ② bootstrap + ingest + compose, ④ afterTurn, ⑤ prepare_compaction + compact, and the session-state persistence portion of ⑥ session_end + dispose.

Phase 2 (in progress): Context compression pipeline, graph retrieval. Phase 3 (planned): Tool context injection, session restore, graph clustering, adaptive planning, observability, AI functions.

Lifecycle stages to cover: ③ before_tool_call + tool_result_persist and the remaining operational hardening for ⑥ session_end + dispose.

Documentation: CLAUDE.md (full technical spec) | ENV.md (environment setup) | Quickstart | Deployment | Claude Plugin | OpenClaw Plugin | Benchmark (LoCoMo evaluation)

Testing: pytest tests/contract/ -v (core invariants) | pytest tests/ -v (all tests) | pytest tests/ --cov=core --cov=fs --cov=service --cov-report=html


References

Benchmark & Evaluation

Memory Architectures for LLM Agents

Hierarchical Retrieval & Multi-Granularity

Context Compression

Reasoning & Acting

Multi-Agent & Governance

Infrastructure

License

Apache License 2.0