ContextEngine: Context Lifecycle Engine
for CLI Agents
Across every phase of the Agent loop, manage what goes in, what stays, what gets evicted, and what can be recalled. Seven context types. Three-level cache hierarchy. One virtual filesystem. Cut token cost. Preserve signal density. Survive session boundaries.
English | 中文
The Problem: Token Budget is the Bottleneck
Every CLI Agent runs on a fixed token budget. The context window is both the most expensive and the most scarce resource.
| Symptom | Root Cause | Missing Lifecycle Stage |
|---|---|---|
| Agent forgets earlier conversation | Context window fills up, old content is evicted with no persistence | ④ afterTurn — no extraction + persist mechanism |
| Same mistakes repeated across sessions | No mechanism to carry lessons from one session to the next | ⑥ session_end — no archival + ② bootstrap — no cold-start injection |
| A query that should cost $0.05 costs $0.50 | Flat retrieval loads entire documents when an abstract would suffice | ② assemble — no L0/L1/L2 hierarchical retrieval |
| Multi-agent coordination failures | Agents cannot see each other's working context | ④ afterTurn — no cross-session sharing + multi-tenant isolation |
| Context bloat in long sessions | No systematic compression — the Agent drowns in its own history | ⑤ compact — no signal scoring + summary chain |
This isn't a model problem. It's an infrastructure problem.
ContextEngine provides the missing infrastructure: a full-stack context lifecycle management system that intercepts the Agent loop at six defined stages, treating the context window as a managed resource — not an unbounded buffer.
Design Philosophy
Core Insight: Context Has a Lifecycle
Current RAG systems treat retrieval as a single operation — embed a query, search a vector store, return results. This misses the fundamental reality: context in an Agent system has a complete lifecycle, just like data in a database.
Born Structured Stored Indexed Recalled Compressed Archived
(extracted (categorized (written (embedded + (vector (summarized, (session end,
from by type, atomically upserted search, deduplicated, archived,
conversation) routed by with ordering to L0/L1/L2 hierarchical compressed) checkpointed)
policy) guarantees) IndexRecords) expansion)
Each phase has different constraints. Extraction must be incremental (don't reprocess old messages). Storage must be atomic (incomplete writes are detectable). Indexing must be asynchronous (don't block the Agent's response). Retrieval must be budget-aware (load only what fits). Compression must preserve signal (protect important context).
A system that handles only one or two of these phases — say, just retrieval — leaves the rest to chance. ContextEngine covers the full lifecycle.
The Six Interception Points
The Agent loop isn't a black box. It has well-defined execution phases. Each phase presents a different opportunity to read, write, or transform context.
┌──────────────────────────────────────────────────────────────────────┐
│ Agent Loop (infinite) │
│ │
│ ┌─────┐ ┌──────────┐ ┌──────────┐ ┌─────────┐ │
│ │ ① │ │ ② │ │ ③ │ │ ④ │ │
│ │ MSG │────▶│ REASON │────▶│ TOOL │────▶│ TURN │ │
│ │ IN │ │ PREP │ │ CALL │ │ END │ │
│ └──┬──┘ └──────────┘ └──────────┘ └────┬────┘ │
│ ▲ │ │
│ │ ┌──────────┐ ┌─────────┐ │ │
│ │ │ ⑤ │ │ ⑥ │ │ │
│ └─────────│ COMPRESS │◀────────│ SHUTDOWN │◀────┘ │
│ └──────────┘ └─────────┘ │
│ │
└──────────────────────────────────────────────────────────────────────┘
The key design choice: intercept at the loop boundary, not inside model inference. Stages ①–⑥ are all outside the LLM call. This means zero latency impact on model inference itself — all context operations happen before or after, never during.
Not All Context Is Equal
Seven types of context, each with fundamentally different lifecycle behavior. This isn't arbitrary categorization — it's driven by the semantics of the information itself.
| Type | Why this lifecycle? | Write Policy |
|---|---|---|
| Profile | User state changes — "I live in London" may become "I moved to Tokyo" | Merge — new overwrites old on conflict |
| Preference | Preferences accumulate per topic but each topic has one current view | Aggregate by slug — merge within topic |
| Entity | Entities accumulate facts, but "Project Alpha" is still "Project Alpha" | Aggregate by slug — merge within entity |
| Event | History is immutable — "completed migration on March 15" never changes | Append only — never overwrite |
| Case | Problem-solving traces are historical records | Append only — never overwrite |
| Pattern | Patterns emerge from repeated observations and evolve over time | Aggregate by slug — refine over time |
| Skill | Tool expertise grows cumulatively — more experience = better knowledge | Cumulative append — knowledge accumulates |
Four distinct write policies, each justified by the information's lifecycle semantics. A single "store everything the same way" approach would either lose mutable state (if append-only) or corrupt immutable history (if overwriting).
In addition to the 7 user-facing types, 3 system-internal schemas exist (tool, session_archive, session_summary) — these are created programmatically by the chunking and compression pipelines, not extracted by the LLM ReAct loop.
Key Architectural Decisions
Why YAML-driven schemas?
The problem: Hardcoding extraction schemas in Python means every new context type requires code changes, testing, and deployment.
The solution: YAML schemas declare what to extract and how to categorize it. The SchemaRegistry, PolicyRouter, and URIResolver all consume these schemas at runtime. Adding a new type — say "Decision" or "Handoff" — means adding a YAML file, not modifying the extraction pipeline.
What breaks without it: Every new context type becomes a code change that touches extraction, commit routing, and URI resolution — three packages that must be co-released. In practice, teams just skip adding new types, and everything gets shoved into generic "notes."
Why the Outbox pattern for async indexing?
The problem: Embedding + upserting to a vector index takes 100–500ms. If synchronous, every afterTurn blocks for that duration, adding latency to the Agent's response.
The solution: The Outbox pattern decouples writing (fast, ~50ms) from indexing (slow, async). Each write deposits an OutboxEvent. A background worker consumes events with at-least-once delivery and a DLQ for permanent failures. The write path returns immediately; the index catches up within seconds.
What breaks without it: The Agent's perceived response time balloons by 2–10x. Worse, if the vector index is temporarily down, writes fail and context is lost — the system becomes fragile to infrastructure transients.
Why L0/L1/L2 three-level retrieval?
The problem: Longer content doesn't mean better vector similarity — in fact, the opposite. An embedding of a 5000-token document must represent every concept it contains, diluting the signal for any single topic.
The solution: Each context node is indexed at three granularities. L0 abstracts (~100 tokens) produce focused embeddings — they're accurate topic signposts. L1 overviews (~500 tokens) are mid-grain. L2 full content (~5000 tokens) has the detail but diffuse embeddings. The retrieval engine does a single vector search across all levels, then uses L0/L1 hits as directory entry points for recursive tree expansion: when an L0 abstract matches, the searcher expands into its children to discover L2 content that flat search would miss. Score propagation (final = α·child + (1-α)·parent) boosts borderline L2 hits under strongly-matching parents. Hotness blending (blended = (1-α_hotness)·semantic + α_hotness·h_score) is applied during tree expansion.
┌─────────────┐ ┌─────────────┐ ┌──────────────┐
│ L0 Abstract │ │ L1 Overview │ │ L2 Content │
│ ~100 tokens │ │ ~500 tokens │ │ ~5000 tokens │
│ Focused │ │ Balanced │ │ Comprehensive│
│ embedding │ │ signal │ │ but diffuse │
│ → signpost │ │ → decision │ │ → full detail│
└──────┬───────┘ └──────┬───────┘ └──────┬────────┘
│ │ │
▼ ▼ ▼
.abstract.md .overview.md content.md
Recall benefit: Flat vector search on L2 content alone misses relevant chunks whose embeddings got diluted by surrounding detail. The L0/L1 signposts guide the searcher to the right neighborhood, then tree expansion discovers the full content underneath — including chunks whose raw vector score was below threshold but whose parent topic was a strong match.
Why a filesystem abstraction?
The problem: Context operations — create, read, update, link, delete — need to be composable, atomic, and multi-tenant-safe. Building each as a bespoke API means reimplementing concurrency, permissions, and consistency for every operation.
The solution: ContextFS maps every context operation to a file-like operation on ctx:// URIs. Multi-tenant isolation is enforced at the filesystem level (via account_id + owner_space in every path), making it impossible to bypass even by buggy callers. The actual ContextFS protocol provides these operations:
| Protocol Method | Context Operation |
|---|---|
write_node(node, ctx) |
Atomic write with 4-step order (content → relations → abstract/overview → meta.json) |
read_node(uri, ctx) |
Load full node (content + meta + relations) |
delete_node(uri, ctx) |
Permanent removal |
archive_node(uri, ctx) |
Soft-delete (status → ARCHIVED) |
move_node(from, to, ctx) |
Relocate node |
list_children(uri, ctx) |
Enumerate child URIs |
exists(uri, ctx) |
True only if ACTIVE (PENDING not visible) |
Relations are managed through a separate RelationStore protocol, not part of ContextFS itself. Two backends implement ContextFS: the AGFS adapter (Go file server, default) and the SQL adapter (PostgreSQL).
What breaks without it: Tenant isolation becomes "each caller must remember to check permissions." Atomic writes become "each caller must implement the 4-step write protocol correctly." Every new feature re-solves the same infrastructure problems.
Why optimistic locking for concurrent writes?
The problem: Profile nodes are written by every session a user has. Two sessions can extract conflicting profile updates simultaneously.
The solution: Optimistic locking — read the current .meta.json version, write only if unchanged. This handles the common case (no contention) with zero infrastructure overhead. On the rare case of concurrent writes, the second writer retries with fresh state.
What breaks without it: Distributed locks require a coordination service (etcd/ZooKeeper) — adding infrastructure complexity for a problem that occurs rarely. Without any locking, last-writer-wins silently loses the first writer's updates.
Why the ReAct loop for extraction?
The problem: Extraction quality depends on knowing what already exists. If the system already knows "Alice is a backend engineer," extracting that again wastes LLM tokens and creates duplicate entries that must be deduplicated later.
The solution: The LLM is given tool access — read(uri), list(uri), get_relations(uri), get_access_stats(uri), plus extract_* actions — and runs in a loop. Each iteration: it reads existing memory nodes (Reason), decides what's genuinely new, then calls the appropriate extract_* tool (Act). If unsure, it reads more nodes and loops. This is true ReAct: interleaved reasoning and tool use, not single-shot extraction.
Iteration 1: read(profile_uri) → "Alice, backend engineer, London"
→ Nothing new to extract. Skip.
Iteration 2: read(entities/go) → "Go expert, prefers error handling pattern X"
→ New: "Alice now also uses Rust for side projects"
→ extract_entity(slug="rust", ...)
What breaks without it: Every turn re-extracts the same facts. Over 100 turns, "Alice is a backend engineer" gets extracted 100 times, creating 100 Profile merge operations — each one burning LLM tokens on content that was already stored. Extraction cost grows linearly with conversation length instead of with information density.
Architecture
Overall Architecture
CLI Agent (Claude Code / OpenClaw / SDK)
│ HTTP REST (port 8090) or Python SDK
▼
┌─ HTTP Layer ──── Flask REST · Auth/RBAC · Sessions ─┐
└──────────────────────┬───────────────────────────────┘
┌─ Service Layer ── MemoryWriteAPI · MemoryService ────┐
└──────────────────────┬───────────────────────────────┘
┌──────────────────┴──────────────────┐
│ Write Path │ Read Path │
│ ReAct Extract Loop │ QueryPlanner │
│ PolicyRouter │ SeedRetriever│
│ ContextWriter │ (+BM25 fuse) │
│ OutboxStore │ HierSearcher │
│ │ ResultRanker │
└──────────────────────┬───────────────┘
┌─ ContextFS ── AGFS Adapter · SQL Adapter ───────────┐
└──────────────────────┬───────────────────────────────┘
┌─ Async Index ── OutboxWorker · DirSummarizer · RepairJob ─┐
└──────────────────────┬────────────────────────────────────┘
┌─────────────▼─────────────┐
│ AGFS · PostgreSQL · ChromaDB/pgvector│
└───────────────────────────┘
Dev mode: ChromaDB vectors + AGFS file storage — zero external dependencies (default).
SQL mode: ChromaDB/pgvector + PostgreSQL — install postgresql and pgvector extension.
Production: pgvector + PostgreSQL RLS — horizontally scalable with row-level tenant isolation.
Write Path: Conversation to Persistent Memory
Agent Turn completes
│
▼
┌─────────────────────────────────────────────────┐
│ ReAct Extraction Loop │
│ │
│ LLM has tools: read(uri), list(uri), │
│ get_relations(uri), get_access_stats(uri), │
│ extract_*() │
└────────────────────┬────────────────────────────┘
│ CandidateMemory[]
▼
┌──────────────────┐ ┌──────────────────┐
│ PolicyRouter │ │ ContextWriter │
│ (schema-driven) │────▶│ │
│ Profile→Merge │ │ Plan → Build │
│ Entity→Aggregate │ │ → Write (4-step) │
│ Event→Append │ │ → Outbox │
│ Skill→SkillTool │ │ → DirSummary │
└──────────────────┘ └────────┬─────────┘
│
┌─────────▼─────────┐
│ Outbox Event │
│ (async, durable) │
└─────────┬─────────┘
│
┌───────────────▼───────────────┐
│ Index Worker (background) │
│ embed(abstract) → L0 upsert │
│ embed(overview) → L1 upsert │
│ embed(content) → L2 upsert │
│ DirectorySummarizer (L0/L1 │
│ for parent nodes) │
└───────────────────────────────┘
The 4-step atomic write (content.md → .relations → abstract+overview → .meta.json) is the AGFS adapter's physical file protocol. ContextWriter orchestrates at a higher level: Plan → Build → Write → Outbox → Directory Summary (5 logical steps).
Read Path: Query to Assembled Context
User Message arrives
│
▼
┌───────────────────┐ ┌──────────────────────────────────────────┐
│ QueryPlanner │ │ SeedRetriever │
│ │ │ │
│ Type classify: │────▶│ Vector search across L0+L1+L2 │
│ regex → MEMORY/ │ │ + BM25 keyword fusion (alpha-weighted) │
│ SKILL/RESOURCE │ │ → L0/L1 hits = directory signposts │
│ Intent classify: │ │ → L2 hits = direct content matches │
│ RetrievalIntent │ └────────────────────┬─────────────────────┘
└───────────────────┘ │
┌──────▼──────┐
│ L0/L1 hit? │
└──────┬──────┘
Yes │ No
┌──────▼──────▼──────┐
│ HierarchicalSearch │ Use L2 hits
│ │ directly
│ search_children() │
│ per directory node│
│ Score propagation │
│ + hotness blend │
└────────┬───────────┘
│
┌────────▼─────────┐
│ ResultRanker │
│ Dedup by URI │
│ Sort by score │
│ Truncate to top_k│
│ Fill content │
└──────────────────┘
One vector search, not three sequential passes. BM25 fusion is performed inside SeedRetriever (not a separate pipeline stage) using Vector-Anchored Fusion: final = α·vec + (1-α)·sat_bm25. Hotness blending happens during HierarchicalSearcher expansion, not in ResultRanker. Deduplication is by exact URI, not cosine similarity.
Namespace isolation: Queries are scoped by intent type. MEMORY queries search both users/{user}/memories/ and agents/{agent}/memories/. SKILL queries search only agents/{agent}/skills/. The owner_space filter is set by the QueryPlanner based on context_type and visible_owner_spaces, enforced at the vector index level — callers cannot override it.
The Agent Context Lifecycle
The six stages form a pipeline where each stage's output feeds the next. The design principle: never block model inference. All context operations happen at loop boundaries — before the LLM thinks or after it acts, never during.
| Stage | When | What happens | Key invariant |
|---|---|---|---|
| ① message_received | Before inference | Classify intent, prefetch candidates from L0 | Never block; fail silently if timeout |
| ② bootstrap + ingest + assemble | Before prompt build | Cold-start profile injection, budget-aware L0→L1→L2 loading, dedup, skill injection | Never exceed token budget |
| ③ before_tool_call + tool_result_persist | Around tool execution | Inject known failure patterns, compress oversized results, extract immediate facts | Tool params informed by history |
| ④ afterTurn | After Agent responds | Incremental extraction of delta, policy-routed writes, relation building, async index | Only extract what's new |
| ⑤ before_compaction + compact | When context fills | Score signal, protect critical nodes, compress redundancy | Never lose Profile or active task |
| ⑥ session_end + dispose | Session closes | Archive completed tasks, snapshot state for next session, audit integrity | Next session picks up seamlessly |
The lifecycle is circular: Stage ⑥'s task snapshot becomes Stage ②'s Handoff injection. Context born in Stage ④ gets recalled in Stage ①. What persists across sessions is what makes the system learn, not just remember.
Data Flow: End-to-End Walkthrough
Session 1 — Write Path: "I'm Alice, a backend engineer based in London"
────────────────────────────────────────────────────────────────────────
User Message → Stage ④ afterTurn
→ Incremental Extraction (name, role, location)
→ CandidateMemory(category="profile")
→ PolicyRouter → ProfilePolicy (merge)
→ ContextWriter (Plan → Build → Write → Outbox → DirSummary)
→ OutboxEvent → async IndexRecordBuilder (L0 + L1 + L2 embed + upsert)
Session 2 — Read Path: "What does Alice do for a living?"
────────────────────────────────────────────────────────────────────────
User Message → Stage ① message_received
→ QueryPlanner → type=MEMORY, intent=BACKGROUND_SUPPLEMENT
→ SeedRetriever → vector search + BM25 → L0 hit (profile matches)
→ HierarchicalSearcher → expand from L0 → discover L2 content
→ ResultRanker → dedup by URI, sort, truncate
→ Inject "Alice is a backend engineer"
→ Agent responds correctly
Write costs ~50ms (synchronous) + ~100ms (async indexing). Read costs ~50ms. The cost is paid once at write time, amortized over many reads.
How It Differs
ContextEngine is not "vector RAG with extra steps." The fundamental differences are in lifecycle coverage, write policies, and retrieval granularity.
| Dimension | ContextEngine | Standard Vector RAG | Mem0 |
|---|---|---|---|
| Lifecycle coverage | 6 stages: extract → store → index → recall → compress → archive | 1 stage: retrieve | 2 stages: write + retrieve |
| Write policies | 4 policies (merge, aggregate, append, cumulative) matched to information semantics | Single: upsert | Single: upsert with memory_id |
| Retrieval granularity | L0/L1 directory signposts → hierarchical tree expansion → L2 content discovery, BM25 fusion, score propagation + hotness | Flat top-k similarity search | Flat top-k with graph expansion |
| Context types | 7 types with distinct lifecycle behavior per type (+ 3 system-internal schemas) | 1 type: "document chunk" | 1 type: "memory" with scope labels |
| Multi-tenant isolation | Enforced at filesystem level (account_id + owner_space in every path), impossible to bypass | Application-level filtering (depends on caller) | Application-level filtering |
| Concurrent writes | Optimistic locking with version check | Last-writer-wins (or external lock) | Last-writer-wins |
| Atomicity | 4-step ordered write with detectable incomplete state | Best-effort | Best-effort |
| Compression | Signal scoring + protected nodes + summary chain (Phase 2, in progress) | None | None |
The core distinction: ContextEngine treats context as a managed lifecycle, not a store-and-retrieve problem. Write policies are not configuration — they're architectural decisions derived from information semantics. Retrieval is not a single operation — it's a multi-stage decision process with budget awareness. And the system is designed to run continuously across sessions, not just serve queries.
Roadmap
Completed
| Milestone | Key Deliverables |
|---|---|
| Core Foundation | Domain models, ContextFS abstraction (AGFS + SQL adapters), ctx:// URI scheme, 4-step atomic write |
| Extraction Pipeline | YAML SchemaRegistry (10 schemas: 7 user-facing + 3 system), ReAct extraction loop, schema-driven PolicyRouter, 4 merge policies |
| Hierarchical Retrieval | L0/L1/L2 IndexRecords, SeedRetriever with BM25 fusion, HierarchicalSearcher (tree expansion + score propagation + hotness), QueryPlanner, IntentClassifier |
| L0 Structured Summary | Dual-template extraction (L0 abstract + L1 overview per node), overview-first retrieval, session summary generation |
| Session Lifecycle | SessionManager, TopicBuffer, session commit (archive + extract), session context assembly, RollingCompressor |
| Multi-Tenant Auth | RBAC (ROOT/ADMIN/MEMBER), API key auth, IP allowlist with proxy trust, agent sharing (off/whitelist/all), audit logging |
| Agent Integration | Claude Code hooks plugin, OpenClaw TypeScript bridge, ogmem unified CLI (onboard/start/stop/check/config/status/logs/eval) |
| Multi-Agent Handoff | Sub-agent spawn/ended lifecycle, context handoff between agents, result merge back to parent |
| Boundary Detection | LLM-based conversation chunking, message boundary detection, configurable segment sizing |
Benchmark Progression (LoCoMo10)
| Run | Accuracy | Key Improvement |
|---|---|---|
| Run76 | 88.2% (1358/1540) | L0 structured summary + BM25 hybrid + extraction prompt optimization |
| Run69 | 88.2% | L0 summary injection + prompt optimization (+6.6% over run68) |
| Run68 | 81.6% | session_time fix (baseline) |
Evaluated on LoCoMo long-context conversational memory benchmark (10 sessions, 4 categories, 1540 questions).
In Progress
| Milestone | Description |
|---|---|
| Context Compression (⑤ compact) | Signal scoring, protected nodes (Profile, active task), summary chain, budget-aware truncation |
| Graph Retrieval | Entity-relation graph traversal for multi-hop queries, relation-weighted expansion |
Planned
| Milestone | Description |
|---|---|
| Tool Context Injection (③) | Before/after tool call hooks — inject known failure patterns, compress oversized tool results |
| Session Restore (⑥) | Cross-session state handoff — snapshot at session end, cold-start injection at session begin |
| Graph Clustering | Community detection for entity grouping — auto-cluster related entities into topics |
| Adaptive Planning | Query-aware retrieval strategy selection — route complex queries through deeper pipelines |
| Observability | OpenTelemetry tracing, token usage dashboards, retrieval quality metrics |
| AI Functions | Tool-augmented retrieval actions — context-aware tool selection and parameter suggestion |
Quick Start
Prerequisites
- Python 3.11+
- PostgreSQL 14+ with pgvector extension (for PostgreSQL mode)
- Docker (optional, for containerized deployment)
Install
git clone https://gitcode.com/opengauss/oGMemory.git
cd oGMemory
python3 -m venv .venv && source .venv/bin/activate
Option A: AGFS mode (default)
pip install -e .
# Interactive setup wizard (guides LLM + Embedding + Vector DB + storage config)
ogmem onboard
# One-command start (AGFS + ContextEngine)
ogmem start local
Non-interactive mode (CI / automation)
ogmem onboard --non-interactive --mode headless \
--provider openai --api-key sk-xxx \
--embedding-model text-embedding-ada-002 --vector-db chroma \
--storage-backend sql
Option B: PostgreSQL mode (direct SQL storage)
1. Install & configure PostgreSQL
# Ubuntu / Debian
sudo apt-get install postgresql postgresql-contrib
sudo apt-get install postgresql-16-pgvector # adjust version to match your PG
# macOS
brew install postgresql@16
brew install pgvector
# Start PostgreSQL
sudo service postgresql start # Ubuntu
brew services start postgresql # macOS
# Create database & enable pgvector
sudo -u postgres createdb ogmemory
sudo -u postgres psql -d ogmemory -c "CREATE EXTENSION IF NOT EXISTS vector;"
2. Install ContextEngine with SQL extras
pip install -e ".[dev,sql]"
3. Configure connection
cp config/ogmem.reference.yaml config/ogmem.yaml
# Edit storage.connection_string to point at your PostgreSQL instance
Use It
HTTP Server (recommended)
AGFS mode:
ogmem start local # Start AGFS + ContextEngine, ports 1833 + 8090
PostgreSQL mode:
cp config/ogmem.reference.yaml config/ogmem.yaml
# Edit ogmem.yaml: set storage.backend to sql and storage.connection_string to your PostgreSQL DSN
python server/app.py
Ingest a conversation turn
curl -X POST http://localhost:8090/api/v1/after_turn
-H "Content-Type: application/json"
-d '{"userId":"user-1","sessionId":"session-1",
"messages":[{"role":"user","content":"I am Alice, a backend engineer"},
{"role":"assistant","content":"Nice to meet you!"}]}'
Search memory
curl -X POST http://localhost:8090/api/v1/compose
-H "Content-Type: application/json"
-d '{"userId":"user-1","sessionId":"session-2","query":"what is alice job"}'
</details>
<details>
<summary><strong>Python SDK</strong></summary>
```python
from service.api import MemoryWriteAPI
from core.models import RequestContext
from fs.sql_adapter import SQLContextFS
from providers.config import ProviderConfig
config = ProviderConfig.from_env()
fs = SQLContextFS(connection_string="host=127.0.0.1 port=5432 dbname=ogmemory user=postgres password=postgres")
write_api = MemoryWriteAPI(fs=fs, llm=config.create_llm())
ctx = RequestContext(account_id="acct", user_id="u1", agent_id="a1", session_id="s1", trace_id="t1")
result = write_api.commit_session([
{"role": "user", "content": "I'm Alice, backend engineer, London"},
{"role": "assistant", "content": "Nice to meet you, Alice!"},
], ctx)
Docker / HTTP API Reference
docker compose up # Server on :8090, PostgreSQL on :5432
| Endpoint | Method | Description |
|---|---|---|
/api/v1/compose |
POST | Search memory, return context for current turn |
/api/v1/after_turn |
POST | Extract + persist memories from conversation |
/api/v1/ingest |
POST | Single message ingest |
/api/v1/ingest_batch |
POST | Batch message ingest |
/api/v1/bootstrap |
POST | Cold-start session with profile + preferences |
/api/v1/compact |
POST | Trigger context compression |
/api/v1/prepare_compaction |
POST | Prepare compaction token (pre-compaction planning) |
/api/v1/dispose |
POST | Session disposal — archive + cleanup |
/api/v1/prepare_subagent_spawn |
POST | Sub-agent context handoff (multi-agent) |
/api/v1/on_subagent_ended |
POST | Sub-agent result merge back to parent |
/api/v1/token_stats |
GET/POST | LLM & embedding token usage (POST with reset to clear) |
/api/v1/sessions/{id}/messages |
POST | Add message to session buffer |
/api/v1/sessions/{id} |
GET | Get session metadata + pending tokens |
/api/v1/sessions/{id}/commit |
POST | Commit session: archive + extract |
/api/v1/sessions/{id}/context |
GET | Get assembled session context |
/api/v1/call/<method> |
POST | Generic method dispatch (forward-compat) |
/api/v1/health |
GET | Health check (storage + LLM + vector DB) |
/api/v1/admin/accounts |
GET | List accounts |
/api/v1/admin/accounts/{id} |
GET | Get account |
/api/v1/admin/accounts/{id}/users |
GET/POST | List / create users |
/api/v1/admin/accounts/{id}/users/{uid} |
DELETE | Delete user |
/api/v1/admin/accounts/{id}/users/{uid}/role |
PATCH | Set user role (ROOT/ADMIN/MEMBER) |
/api/v1/admin/accounts/{id}/roles |
GET | List roles |
/api/v1/admin/accounts/{id}/agents |
GET/POST | List / register agents |
/api/v1/admin/accounts/{id}/agents/{aid} |
GET/PATCH | Get / update agent |
/api/v1/admin/accounts/{id}/audit-logs |
GET | List audit logs |
/api/v1/admin/accounts/{id}/audit-logs/{log_id} |
GET | Get single audit log |
/api/v1/admin/config/agent-sharing |
GET | Agent sharing configuration |
Configuration
Key environment variables (see ENV.md for full reference). Config priority: YAML value > environment variable > hard-coded default.
| Variable | Default | Description |
|---|---|---|
OGMEM_API_KEY |
— | LLM API key (OpenAI-compatible) |
OGMEM_BASE_URL |
— | Custom LLM API base URL |
OGMEM_LLM_MODEL |
gpt-4o-mini |
LLM model for extraction + classification |
OGMEM_EMBEDDING_MODEL |
text-embedding-ada-002 |
Embedding model for vector indexing |
OGMEM_EMBEDDING_API_KEY |
— | Separate key for embedding API (falls back to OGMEM_API_KEY) |
EMBEDDING_PROVIDER |
— | Separate embedding provider (openai / volcengine / st / mock) |
VECTOR_DB_TYPE |
chroma |
Vector backend: chroma / opengauss (pgvector) / memory |
STORAGE_BACKEND |
agfs |
Storage backend: agfs (AGFS file server) / sql (PostgreSQL) |
SQL_CONNECTION_STRING |
— | PostgreSQL DSN (required when STORAGE_BACKEND=sql) |
AGFS_BASE_URL |
http://127.0.0.1:1833 |
AGFS server URL (used when STORAGE_BACKEND=agfs) |
OGMEM_HTTP_PORT |
8090 |
HTTP server listen port |
OGMEM_CONFIG |
config/ogmem.yaml |
Path to YAML config file |
OG_ACCOUNT_ID |
acct-demo |
Default account ID |
OG_USER_ID |
u-alice |
Default user ID |
OG_AGENT_ID |
main |
Default agent ID |
OG_ROLE_CONTROL_ENABLED |
false |
Enable RBAC auth |
OG_ROOT_API_KEY |
— | Root API key for admin access |
OG_AGENT_SHARED_MODE |
off |
Agent memory sharing: off / whitelist / all |
OGMEM_AFTER_TURN_THRESHOLD |
200 |
Min message length to trigger extraction |
OGMEM_CACHE_ENABLED |
true |
Enable retrieval cache |
CHUNKING_ENABLED |
false |
Enable boundary detection + chunking |
INDEX_INTERVAL |
30 |
Index worker polling interval (seconds) |
Repository Layout
ContextEngine/
├── core/ # Domain models, Protocol interfaces, enums, errors
├── fs/ # ContextFS abstraction (filesystem metaphor for context ops)
│ ├── agfs_adapter/ # AGFS-backed ContextFS (Go file server, default backend)
│ └── sql_adapter/ # PostgreSQL-backed ContextFS (atomic upsert, RLS tenant isolation)
├── extraction/ # CandidateExtractor (ReAct loop + YAML SchemaRegistry)
│ ├── prompts/ # LLM prompt templates (Jinja2)
│ └── schemas/ # Schema registry + YAML definitions per category
│ └── definitions/ # 10 YAML files: profile, entity, event, skill, tool, etc.
├── commit/ # Write chain: PolicyRouter → MergePolicy → ContextWriter → OutboxStore
├── index/ # Async indexing: OutboxWorker → IndexRecordBuilder → DirectorySummarizer
├── retrieval/ # Read chain: QueryPlanner → IntentClassifier → SeedRetriever (+BM25)
│ # → HierarchicalSearcher → ResultRanker
├── providers/ # External adapters: LLM, Embedder, VectorIndex, RelationStore
│ ├── llm/ # OpenAI-compatible LLM (works with OpenAI/Volcengine/DashScope/Zhipu)
│ ├── embedder/ # 4 backends (OpenAI, Volcengine, SentenceTransformers, Mock)
│ ├── vector_index/ # InMemory / ChromaDB / pgvector (OpenGauss)
│ └── relation_store/ # SQL + AGFS relation stores
├── service/ # API layer: MemoryWriteAPI, MemoryService, IndexService
├── server/ # HTTP REST server (Flask), auth/RBAC, IP allowlist, sessions
├── session/ # SessionManager, TopicBuffer, RollingCompressor, ArchiveStore
├── tests/
│ ├── contract/ # Cross-team contract tests (invariants)
│ ├── unit/ # Per-package unit tests
│ ├── integration/ # End-to-end integration tests
│ ├── e2e/ # LoCoMo benchmark evaluation framework
│ ├── benchmark/ # Performance and quality benchmarks
│ ├── ab/ # A/B comparison tests
│ └── fixtures/ # Shared test data
├── docs/ # Architecture, deployment, quickstart guides
├── examples/ # Usage examples (SDK, agent integration)
├── cli/ # Unified management CLI (ogmem command)
│ └── commands/ # onboard, start, stop, check, config, status, logs, eval
├── claude-plugin/ # Claude Code hooks integration (hooks, scripts, skills)
├── openclaw_context_engine_plugin/ # OpenClaw plugin (TypeScript bridge)
├── agfs/ # AGFS Go server source (cmd/, pkg/, sdk/)
├── config/ # Configuration files (ogmem.yaml, .env, AGFS configs)
├── deploy/ # Docker deployment scripts and configs
├── docker/ # Docker Compose files
└── scripts/ # Auxiliary scripts (index service, etc.)
Agent Integration
Claude Code
ContextEngine provides native Claude Code hooks for zero-config memory:
ogmem onboard # Interactive: choose "Agent Plugin" → "Claude Code"
ogmem start plugin # Start CE server + install hooks
OpenClaw
cd openclaw_context_engine_plugin && openclaw plugins install -l .
| Tool | Stage | Operation |
|---|---|---|
og_memory_write |
④ Turn End | Extract + persist new context |
og_memory_search |
① Message | Prefetch relevant context |
og_memory_read |
② Reasoning Prep | Load full context by URI |
Automatic behaviors (no Agent code changes): new message triggers prefetch, turn end triggers extraction, context fill triggers compression, session close triggers archival.
Implementation Status
Phase 0 + 1 (production quality): See Roadmap for detailed milestone breakdown.
Core models, ContextFS abstraction (AGFS + PostgreSQL adapters), extraction pipeline (YAML SchemaRegistry + ReAct loop), write chain (4 merge policies + OutboxStore), hierarchical retrieval (L0/L1/L2) with BM25 hybrid search, async indexing with DLQ, providers (OpenAI/Volcengine/DashScope/Zhipu LLM, 4 embedders, InMemory/ChromaDB/pgvector), HTTP REST API with RBAC auth, session management, L0 structured summary generation, multi-agent handoff, boundary detection/chunking. 100+ test files (contract + unit + integration + e2e + benchmark).
Lifecycle stages covered: ① message_received, ② bootstrap + ingest + compose, ④ afterTurn, ⑤ prepare_compaction + compact, and the session-state persistence portion of ⑥ session_end + dispose.
Phase 2 (in progress): Context compression pipeline, graph retrieval. Phase 3 (planned): Tool context injection, session restore, graph clustering, adaptive planning, observability, AI functions.
Lifecycle stages to cover: ③ before_tool_call + tool_result_persist and the remaining operational hardening for ⑥ session_end + dispose.
Documentation: CLAUDE.md (full technical spec) | ENV.md (environment setup) | Quickstart | Deployment | Claude Plugin | OpenClaw Plugin | Benchmark (LoCoMo evaluation)
Testing: pytest tests/contract/ -v (core invariants) | pytest tests/ -v (all tests) | pytest tests/ --cov=core --cov=fs --cov=service --cov-report=html
References
Benchmark & Evaluation
- LoCoMo: Evaluating Long-Context Conversational Memory — Maharana et al., 2024 — long-context conversational memory benchmark used in our evaluation
Memory Architectures for LLM Agents
- MemoryBank: Enhancing LLMs with Long-Term Memory — Zhong et al., 2023 — sentiment-centric memory processing for persistent conversational memory
- A-MEM: Agentic Memory for LLM Agents — 2025 — dynamic, agentic memory organization with dynamic indexing
- Mem0: Production-Ready AI Agents with Scalable Long-Term Memory — 2025 — user/session/agent scoped memory with vector store + knowledge graph
- SeCom: Memory Construction and Retrieval for Personalized Conversational Agents — 2025 — segment-level memory bank with conversation segmentation (related to our two-phase extraction)
Hierarchical Retrieval & Multi-Granularity
- ReadAgent: Gist Memory of Very Long Contexts — Lee et al., 2024 — human-inspired reading agent with gist memory, 20x effective context length (related to our L0/L1/L2 hierarchy)
- MemoRAG: Memory-Augmented Retrieval — Qian et al., 2024 — dual-system architecture with global memory + retrieval-augmented generation
- LATTICE: LLM-guided Hierarchical Retrieval — Gupta et al., 2025 — hierarchical retrieval framework for large document collections
Context Compression
- LLMLingua-2: Task-Agnostic Prompt Compression — Pan et al., 2024 — data distillation for efficient prompt compression up to 20x
- LongLLMLingua: Question-Aware Compression for Long Context — Jiang et al., 2024, ACL — coarse-to-fine compression with 17.1% improvement at 4x compression (related to our compression pipeline design)
- Prompt Compression for Large Language Models: A Survey — 2024 — comprehensive survey of prompt and context compression techniques
Reasoning & Acting
- ReAct: Synergizing Reasoning and Acting in Language Models — Yao et al., 2023, ICLR — interleaved reasoning traces and task-specific actions (our lazy extraction mode is based on this paradigm)
Multi-Agent & Governance
- Governed Memory — Taheri, 2026 — memory governance and access control
- Collaborative Memory — multi-user memory sharing + dynamic ACL (related to our multi-tenant isolation)
- Multi-Agent Memory Systems for Production — Mem0, 2026
- Cemri et al. — multi-agent coordination failure analysis
Infrastructure
- OpenViking — AGFS file storage layer and core design inspiration
- AI Agent Memory Architectures — Zylos Research, 2026