Setnamefeat(memory): refine compact mechanism runtime state

ContextEngine: Context Lifecycle Engine
for CLI Agents

Across every phase of the Agent loop, manage what goes in, what stays, what gets evicted, and what can be recalled. Seven context types. Three-level cache hierarchy. One virtual filesystem. Cut token cost. Preserve signal density. Survive session boundaries.

English | 中文

The Problem: Token Budget is the Bottleneck

Every CLI Agent runs on a fixed token budget. The context window is both the most expensive and the most scarce resource.

Symptom	Root Cause	Missing Lifecycle Stage
Agent forgets earlier conversation	Context window fills up, old content is evicted with no persistence	④ afterTurn — no extraction + persist mechanism
Same mistakes repeated across sessions	No mechanism to carry lessons from one session to the next	⑥ session_end — no archival + ② bootstrap — no cold-start injection
A query that should cost $0.05 costs $0.50	Flat retrieval loads entire documents when an abstract would suffice	② assemble — no L0/L1/L2 hierarchical retrieval
Multi-agent coordination failures	Agents cannot see each other's working context	④ afterTurn — no cross-session sharing + multi-tenant isolation
Context bloat in long sessions	No systematic compression — the Agent drowns in its own history	⑤ compact — no signal scoring + summary chain

This isn't a model problem. It's an infrastructure problem.

ContextEngine provides the missing infrastructure: a full-stack context lifecycle management system that intercepts the Agent loop at six defined stages, treating the context window as a managed resource — not an unbounded buffer.

Design Philosophy

Core Insight: Context Has a Lifecycle

Current RAG systems treat retrieval as a single operation — embed a query, search a vector store, return results. This misses the fundamental reality: context in an Agent system has a complete lifecycle, just like data in a database.

  Born          Structured       Stored           Indexed          Recalled        Compressed       Archived
  (extracted    (categorized     (written         (embedded +      (vector         (summarized,     (session end,
   from          by type,        atomically       upserted         search,         deduplicated,    archived,
   conversation) routed by       with ordering    to L0/L1/L2      hierarchical    compressed)      checkpointed)
                  policy)         guarantees)      IndexRecords)    expansion)

Each phase has different constraints. Extraction must be incremental (don't reprocess old messages). Storage must be atomic (incomplete writes are detectable). Indexing must be asynchronous (don't block the Agent's response). Retrieval must be budget-aware (load only what fits). Compression must preserve signal (protect important context).

A system that handles only one or two of these phases — say, just retrieval — leaves the rest to chance. ContextEngine covers the full lifecycle.

The Six Interception Points

The Agent loop isn't a black box. It has well-defined execution phases. Each phase presents a different opportunity to read, write, or transform context.

  ┌──────────────────────────────────────────────────────────────────────┐
  │                         Agent Loop (infinite)                         │
  │                                                                       │
  │   ┌─────┐     ┌──────────┐     ┌──────────┐     ┌─────────┐         │
  │   │ ①   │     │    ②     │     │    ③     │     │   ④     │         │
  │   │ MSG │────▶│ REASON   │────▶│ TOOL     │────▶│ TURN    │         │
  │   │ IN  │     │ PREP     │     │ CALL     │     │ END     │         │
  │   └──┬──┘     └──────────┘     └──────────┘     └────┬────┘         │
  │      ▲                                              │               │
  │      │         ┌──────────┐         ┌─────────┐     │               │
  │      │         │    ⑤     │         │   ⑥     │     │               │
  │      └─────────│ COMPRESS │◀────────│ SHUTDOWN │◀────┘               │
  │                └──────────┘         └─────────┘                      │
  │                                                                       │
  └──────────────────────────────────────────────────────────────────────┘

The key design choice: intercept at the loop boundary, not inside model inference. Stages ①–⑥ are all outside the LLM call. This means zero latency impact on model inference itself — all context operations happen before or after, never during.

Not All Context Is Equal

Seven types of context, each with fundamentally different lifecycle behavior. This isn't arbitrary categorization — it's driven by the semantics of the information itself.

Type	Why this lifecycle?	Write Policy
Profile	User state changes — "I live in London" may become "I moved to Tokyo"	Merge — new overwrites old on conflict
Preference	Preferences accumulate per topic but each topic has one current view	Aggregate by slug — merge within topic
Entity	Entities accumulate facts, but "Project Alpha" is still "Project Alpha"	Aggregate by slug — merge within entity
Event	History is immutable — "completed migration on March 15" never changes	Append only — never overwrite
Case	Problem-solving traces are historical records	Append only — never overwrite
Pattern	Patterns emerge from repeated observations and evolve over time	Aggregate by slug — refine over time
Skill	Tool expertise grows cumulatively — more experience = better knowledge	Cumulative append — knowledge accumulates

Four distinct write policies, each justified by the information's lifecycle semantics. A single "store everything the same way" approach would either lose mutable state (if append-only) or corrupt immutable history (if overwriting).

In addition to the 7 user-facing types, 3 system-internal schemas exist (tool, session_archive, session_summary) — these are created programmatically by the chunking and compression pipelines, not extracted by the LLM ReAct loop.

Key Architectural Decisions

Why YAML-driven schemas?

The problem: Hardcoding extraction schemas in Python means every new context type requires code changes, testing, and deployment.

The solution: YAML schemas declare what to extract and how to categorize it. The SchemaRegistry, PolicyRouter, and URIResolver all consume these schemas at runtime. Adding a new type — say "Decision" or "Handoff" — means adding a YAML file, not modifying the extraction pipeline.

What breaks without it: Every new context type becomes a code change that touches extraction, commit routing, and URI resolution — three packages that must be co-released. In practice, teams just skip adding new types, and everything gets shoved into generic "notes."

Why the Outbox pattern for async indexing?

The problem: Embedding + upserting to a vector index takes 100–500ms. If synchronous, every afterTurn blocks for that duration, adding latency to the Agent's response.

The solution: The Outbox pattern decouples writing (fast, ~50ms) from indexing (slow, async). Each write deposits an OutboxEvent. A background worker consumes events with at-least-once delivery and a DLQ for permanent failures. The write path returns immediately; the index catches up within seconds.

What breaks without it: The Agent's perceived response time balloons by 2–10x. Worse, if the vector index is temporarily down, writes fail and context is lost — the system becomes fragile to infrastructure transients.

Why L0/L1/L2 three-level retrieval?

The problem: Longer content doesn't mean better vector similarity — in fact, the opposite. An embedding of a 5000-token document must represent every concept it contains, diluting the signal for any single topic.

The solution: Each context node is indexed at three granularities. L0 abstracts (~100 tokens) produce focused embeddings — they're accurate topic signposts. L1 overviews (~500 tokens) are mid-grain. L2 full content (~5000 tokens) has the detail but diffuse embeddings. The retrieval engine does a single vector search across all levels, then uses L0/L1 hits as directory entry points for recursive tree expansion: when an L0 abstract matches, the searcher expands into its children to discover L2 content that flat search would miss. Score propagation (final = α·child + (1-α)·parent) boosts borderline L2 hits under strongly-matching parents. Hotness blending (blended = (1-α_hotness)·semantic + α_hotness·h_score) is applied during tree expansion.

  ┌─────────────┐    ┌─────────────┐    ┌──────────────┐
  │  L0 Abstract │    │  L1 Overview │    │  L2 Content   │
  │  ~100 tokens │    │  ~500 tokens │    │  ~5000 tokens │
  │  Focused     │    │  Balanced    │    │  Comprehensive│
  │  embedding   │    │  signal      │    │  but diffuse  │
  │  → signpost  │    │  → decision  │    │  → full detail│
  └──────┬───────┘    └──────┬───────┘    └──────┬────────┘
         │                   │                    │
         ▼                   ▼                    ▼
    .abstract.md        .overview.md          content.md

Recall benefit: Flat vector search on L2 content alone misses relevant chunks whose embeddings got diluted by surrounding detail. The L0/L1 signposts guide the searcher to the right neighborhood, then tree expansion discovers the full content underneath — including chunks whose raw vector score was below threshold but whose parent topic was a strong match.

Why a filesystem abstraction?

The problem: Context operations — create, read, update, link, delete — need to be composable, atomic, and multi-tenant-safe. Building each as a bespoke API means reimplementing concurrency, permissions, and consistency for every operation.

The solution: ContextFS maps every context operation to a file-like operation on ctx:// URIs. Multi-tenant isolation is enforced at the filesystem level (via account_id + owner_space in every path), making it impossible to bypass even by buggy callers. The actual ContextFS protocol provides these operations:

Protocol Method	Context Operation
`write_node(node, ctx)`	Atomic write with 4-step order (content → relations → abstract/overview → meta.json)
`read_node(uri, ctx)`	Load full node (content + meta + relations)
`delete_node(uri, ctx)`	Permanent removal
`archive_node(uri, ctx)`	Soft-delete (status → ARCHIVED)
`move_node(from, to, ctx)`	Relocate node
`list_children(uri, ctx)`	Enumerate child URIs
`exists(uri, ctx)`	True only if ACTIVE (PENDING not visible)

Relations are managed through a separate RelationStore protocol, not part of ContextFS itself. Two backends implement ContextFS: the AGFS adapter (Go file server, default) and the SQL adapter (PostgreSQL).

What breaks without it: Tenant isolation becomes "each caller must remember to check permissions." Atomic writes become "each caller must implement the 4-step write protocol correctly." Every new feature re-solves the same infrastructure problems.

Why optimistic locking for concurrent writes?

The problem: Profile nodes are written by every session a user has. Two sessions can extract conflicting profile updates simultaneously.

The solution: Optimistic locking — read the current .meta.json version, write only if unchanged. This handles the common case (no contention) with zero infrastructure overhead. On the rare case of concurrent writes, the second writer retries with fresh state.

What breaks without it: Distributed locks require a coordination service (etcd/ZooKeeper) — adding infrastructure complexity for a problem that occurs rarely. Without any locking, last-writer-wins silently loses the first writer's updates.

Why the ReAct loop for extraction?

The problem: Extraction quality depends on knowing what already exists. If the system already knows "Alice is a backend engineer," extracting that again wastes LLM tokens and creates duplicate entries that must be deduplicated later.

The solution: The LLM is given tool access — read(uri), list(uri), get_relations(uri), get_access_stats(uri), plus extract_* actions — and runs in a loop. Each iteration: it reads existing memory nodes (Reason), decides what's genuinely new, then calls the appropriate extract_* tool (Act). If unsure, it reads more nodes and loops. This is true ReAct: interleaved reasoning and tool use, not single-shot extraction.

  Iteration 1:  read(profile_uri) → "Alice, backend engineer, London"
                 → Nothing new to extract. Skip.

  Iteration 2:  read(entities/go) → "Go expert, prefers error handling pattern X"
                 → New: "Alice now also uses Rust for side projects"
                 → extract_entity(slug="rust", ...)

What breaks without it: Every turn re-extracts the same facts. Over 100 turns, "Alice is a backend engineer" gets extracted 100 times, creating 100 Profile merge operations — each one burning LLM tokens on content that was already stored. Extraction cost grows linearly with conversation length instead of with information density.

Architecture

Overall Architecture

         CLI Agent (Claude Code / OpenClaw / SDK)
              │  HTTP REST (port 8090) or Python SDK
              ▼
    ┌─ HTTP Layer ──── Flask REST · Auth/RBAC · Sessions ─┐
    └──────────────────────┬───────────────────────────────┘
    ┌─ Service Layer ── MemoryWriteAPI · MemoryService ────┐
    └──────────────────────┬───────────────────────────────┘
        ┌──────────────────┴──────────────────┐
        │  Write Path          │  Read Path    │
        │  ReAct Extract Loop  │  QueryPlanner │
        │  PolicyRouter        │  SeedRetriever│
        │  ContextWriter       │  (+BM25 fuse) │
        │  OutboxStore         │  HierSearcher │
        │                      │  ResultRanker │
        └──────────────────────┬───────────────┘
    ┌─ ContextFS ── AGFS Adapter · SQL Adapter ───────────┐
    └──────────────────────┬───────────────────────────────┘
    ┌─ Async Index ── OutboxWorker · DirSummarizer · RepairJob ─┐
    └──────────────────────┬────────────────────────────────────┘
              ┌─────────────▼─────────────┐
              │  AGFS · PostgreSQL · ChromaDB/pgvector│
              └───────────────────────────┘

Dev mode: ChromaDB vectors + AGFS file storage — zero external dependencies (default). SQL mode: ChromaDB/pgvector + PostgreSQL — install postgresql and pgvector extension. Production: pgvector + PostgreSQL RLS — horizontally scalable with row-level tenant isolation.

Write Path: Conversation to Persistent Memory

  Agent Turn completes
        │
        ▼
  ┌─────────────────────────────────────────────────┐
  │  ReAct Extraction Loop                          │
  │                                                 │
  │  LLM has tools: read(uri), list(uri),           │
  │  get_relations(uri), get_access_stats(uri),      │
  │  extract_*()                                    │
  └────────────────────┬────────────────────────────┘
                       │  CandidateMemory[]
                       ▼
  ┌──────────────────┐     ┌──────────────────┐
  │  PolicyRouter     │     │  ContextWriter    │
  │  (schema-driven)  │────▶│                  │
  │  Profile→Merge    │     │  Plan → Build     │
  │  Entity→Aggregate │     │  → Write (4-step) │
  │  Event→Append     │     │  → Outbox         │
  │  Skill→SkillTool  │     │  → DirSummary     │
  └──────────────────┘     └────────┬─────────┘
                                    │
                          ┌─────────▼─────────┐
                          │  Outbox Event     │
                          │  (async, durable) │
                          └─────────┬─────────┘
                                    │
                    ┌───────────────▼───────────────┐
                    │  Index Worker (background)    │
                    │  embed(abstract) → L0 upsert  │
                    │  embed(overview)  → L1 upsert  │
                    │  embed(content)   → L2 upsert  │
                    │  DirectorySummarizer (L0/L1    │
                    │  for parent nodes)             │
                    └───────────────────────────────┘

The 4-step atomic write (content.md → .relations → abstract+overview → .meta.json) is the AGFS adapter's physical file protocol. ContextWriter orchestrates at a higher level: Plan → Build → Write → Outbox → Directory Summary (5 logical steps).

Read Path: Query to Assembled Context

  User Message arrives
        │
        ▼
  ┌───────────────────┐     ┌──────────────────────────────────────────┐
  │  QueryPlanner      │     │  SeedRetriever                          │
  │                    │     │                                          │
  │  Type classify:    │────▶│  Vector search across L0+L1+L2          │
  │  regex → MEMORY/   │     │  + BM25 keyword fusion (alpha-weighted)  │
  │  SKILL/RESOURCE    │     │  → L0/L1 hits = directory signposts      │
  │  Intent classify:  │     │  → L2 hits = direct content matches      │
  │  RetrievalIntent   │     └────────────────────┬─────────────────────┘
  └───────────────────┘                          │
                                          ┌──────▼──────┐
                                          │  L0/L1 hit? │
                                          └──────┬──────┘
                                           Yes   │   No
                                          ┌──────▼──────▼──────┐
                                          │  HierarchicalSearch │  Use L2 hits
                                          │                    │  directly
                                          │  search_children() │
                                          │  per directory node│
                                          │  Score propagation │
                                          │  + hotness blend   │
                                          └────────┬───────────┘
                                                   │
                                          ┌────────▼─────────┐
                                          │  ResultRanker     │
                                          │  Dedup by URI     │
                                          │  Sort by score    │
                                          │  Truncate to top_k│
                                          │  Fill content     │
                                          └──────────────────┘

One vector search, not three sequential passes. BM25 fusion is performed inside SeedRetriever (not a separate pipeline stage) using Vector-Anchored Fusion: final = α·vec + (1-α)·sat_bm25. Hotness blending happens during HierarchicalSearcher expansion, not in ResultRanker. Deduplication is by exact URI, not cosine similarity.

Namespace isolation: Queries are scoped by intent type. MEMORY queries search both users/{user}/memories/ and agents/{agent}/memories/. SKILL queries search only agents/{agent}/skills/. The owner_space filter is set by the QueryPlanner based on context_type and visible_owner_spaces, enforced at the vector index level — callers cannot override it.

The Agent Context Lifecycle

The six stages form a pipeline where each stage's output feeds the next. The design principle: never block model inference. All context operations happen at loop boundaries — before the LLM thinks or after it acts, never during.

Stage	When	What happens	Key invariant
① message_received	Before inference	Classify intent, prefetch candidates from L0	Never block; fail silently if timeout
② bootstrap + ingest + assemble	Before prompt build	Cold-start profile injection, budget-aware L0→L1→L2 loading, dedup, skill injection	Never exceed token budget
③ before_tool_call + tool_result_persist	Around tool execution	Inject known failure patterns, compress oversized results, extract immediate facts	Tool params informed by history
④ afterTurn	After Agent responds	Incremental extraction of delta, policy-routed writes, relation building, async index	Only extract what's new
⑤ before_compaction + compact	When context fills	Score signal, protect critical nodes, compress redundancy	Never lose Profile or active task
⑥ session_end + dispose	Session closes	Archive completed tasks, snapshot state for next session, audit integrity	Next session picks up seamlessly

The lifecycle is circular: Stage ⑥'s task snapshot becomes Stage ②'s Handoff injection. Context born in Stage ④ gets recalled in Stage ①. What persists across sessions is what makes the system learn, not just remember.

Data Flow: End-to-End Walkthrough

  Session 1 — Write Path: "I'm Alice, a backend engineer based in London"
  ────────────────────────────────────────────────────────────────────────
    User Message → Stage ④ afterTurn
        → Incremental Extraction (name, role, location)
        → CandidateMemory(category="profile")
        → PolicyRouter → ProfilePolicy (merge)
        → ContextWriter (Plan → Build → Write → Outbox → DirSummary)
        → OutboxEvent → async IndexRecordBuilder (L0 + L1 + L2 embed + upsert)

  Session 2 — Read Path: "What does Alice do for a living?"
  ────────────────────────────────────────────────────────────────────────
    User Message → Stage ① message_received
        → QueryPlanner → type=MEMORY, intent=BACKGROUND_SUPPLEMENT
        → SeedRetriever → vector search + BM25 → L0 hit (profile matches)
        → HierarchicalSearcher → expand from L0 → discover L2 content
        → ResultRanker → dedup by URI, sort, truncate
        → Inject "Alice is a backend engineer"
        → Agent responds correctly

Write costs ~50ms (synchronous) + ~100ms (async indexing). Read costs ~50ms. The cost is paid once at write time, amortized over many reads.

How It Differs

ContextEngine is not "vector RAG with extra steps." The fundamental differences are in lifecycle coverage, write policies, and retrieval granularity.

Dimension	ContextEngine	Standard Vector RAG	Mem0
Lifecycle coverage	6 stages: extract → store → index → recall → compress → archive	1 stage: retrieve	2 stages: write + retrieve
Write policies	4 policies (merge, aggregate, append, cumulative) matched to information semantics	Single: upsert	Single: upsert with memory_id
Retrieval granularity	L0/L1 directory signposts → hierarchical tree expansion → L2 content discovery, BM25 fusion, score propagation + hotness	Flat top-k similarity search	Flat top-k with graph expansion
Context types	7 types with distinct lifecycle behavior per type (+ 3 system-internal schemas)	1 type: "document chunk"	1 type: "memory" with scope labels
Multi-tenant isolation	Enforced at filesystem level (account_id + owner_space in every path), impossible to bypass	Application-level filtering (depends on caller)	Application-level filtering
Concurrent writes	Optimistic locking with version check	Last-writer-wins (or external lock)	Last-writer-wins
Atomicity	4-step ordered write with detectable incomplete state	Best-effort	Best-effort
Compression	Signal scoring + protected nodes + summary chain (Phase 2, in progress)	None	None

The core distinction: ContextEngine treats context as a managed lifecycle, not a store-and-retrieve problem. Write policies are not configuration — they're architectural decisions derived from information semantics. Retrieval is not a single operation — it's a multi-stage decision process with budget awareness. And the system is designed to run continuously across sessions, not just serve queries.

Roadmap

Completed

Milestone	Key Deliverables
Core Foundation	Domain models, ContextFS abstraction (AGFS + SQL adapters), `ctx://` URI scheme, 4-step atomic write
Extraction Pipeline	YAML SchemaRegistry (10 schemas: 7 user-facing + 3 system), ReAct extraction loop, schema-driven PolicyRouter, 4 merge policies
Hierarchical Retrieval	L0/L1/L2 IndexRecords, SeedRetriever with BM25 fusion, HierarchicalSearcher (tree expansion + score propagation + hotness), QueryPlanner, IntentClassifier
L0 Structured Summary	Dual-template extraction (L0 abstract + L1 overview per node), overview-first retrieval, session summary generation
Session Lifecycle	SessionManager, TopicBuffer, session commit (archive + extract), session context assembly, RollingCompressor
Multi-Tenant Auth	RBAC (ROOT/ADMIN/MEMBER), API key auth, IP allowlist with proxy trust, agent sharing (off/whitelist/all), audit logging
Agent Integration	Claude Code hooks plugin, OpenClaw TypeScript bridge, `ogmem` unified CLI (onboard/start/stop/check/config/status/logs/eval)
Multi-Agent Handoff	Sub-agent spawn/ended lifecycle, context handoff between agents, result merge back to parent
Boundary Detection	LLM-based conversation chunking, message boundary detection, configurable segment sizing

Benchmark Progression (LoCoMo10)

Run	Accuracy	Key Improvement
Run76	88.2% (1358/1540)	L0 structured summary + BM25 hybrid + extraction prompt optimization
Run69	88.2%	L0 summary injection + prompt optimization (+6.6% over run68)
Run68	81.6%	session_time fix (baseline)

Evaluated on LoCoMo long-context conversational memory benchmark (10 sessions, 4 categories, 1540 questions).

In Progress

Milestone	Description
Context Compression (⑤ compact)	Signal scoring, protected nodes (Profile, active task), summary chain, budget-aware truncation
Graph Retrieval	Entity-relation graph traversal for multi-hop queries, relation-weighted expansion

Planned

Milestone	Description
Tool Context Injection (③)	Before/after tool call hooks — inject known failure patterns, compress oversized tool results
Session Restore (⑥)	Cross-session state handoff — snapshot at session end, cold-start injection at session begin
Graph Clustering	Community detection for entity grouping — auto-cluster related entities into topics
Adaptive Planning	Query-aware retrieval strategy selection — route complex queries through deeper pipelines
Observability	OpenTelemetry tracing, token usage dashboards, retrieval quality metrics
AI Functions	Tool-augmented retrieval actions — context-aware tool selection and parameter suggestion

Quick Start

Prerequisites

Python 3.11+
PostgreSQL 14+ with pgvector extension (for PostgreSQL mode)
Docker (optional, for containerized deployment)

Install

git clone https://gitcode.com/opengauss/oGMemory.git
cd oGMemory
python3 -m venv .venv && source .venv/bin/activate

Option A: AGFS mode (default)

pip install -e .

# Interactive setup wizard (guides LLM + Embedding + Vector DB + storage config)
ogmem onboard

# One-command start (AGFS + ContextEngine)
ogmem start local

Non-interactive mode (CI / automation)

ogmem onboard --non-interactive --mode headless \
  --provider openai --api-key sk-xxx \
  --embedding-model text-embedding-ada-002 --vector-db chroma \
  --storage-backend sql

Option B: PostgreSQL mode (direct SQL storage)

1. Install & configure PostgreSQL

# Ubuntu / Debian
sudo apt-get install postgresql postgresql-contrib
sudo apt-get install postgresql-16-pgvector   # adjust version to match your PG

# macOS
brew install postgresql@16
brew install pgvector

# Start PostgreSQL
sudo service postgresql start   # Ubuntu
brew services start postgresql  # macOS

# Create database & enable pgvector
sudo -u postgres createdb ogmemory
sudo -u postgres psql -d ogmemory -c "CREATE EXTENSION IF NOT EXISTS vector;"

2. Install ContextEngine with SQL extras

pip install -e ".[dev,sql]"

3. Configure connection

cp config/ogmem.reference.yaml config/ogmem.yaml
# Edit storage.connection_string to point at your PostgreSQL instance

Use It

HTTP Server (recommended)

AGFS mode:

ogmem start local    # Start AGFS + ContextEngine, ports 1833 + 8090

PostgreSQL mode:

cp config/ogmem.reference.yaml config/ogmem.yaml
# Edit ogmem.yaml: set storage.backend to sql and storage.connection_string to your PostgreSQL DSN
python server/app.py

Ingest a conversation turn

curl -X POST http://localhost:8090/api/v1/after_turn
-H "Content-Type: application/json"
-d '{"userId":"user-1","sessionId":"session-1", "messages":[{"role":"user","content":"I am Alice, a backend engineer"}, {"role":"assistant","content":"Nice to meet you!"}]}'

Search memory

curl -X POST http://localhost:8090/api/v1/compose
-H "Content-Type: application/json"
-d '{"userId":"user-1","sessionId":"session-2","query":"what is alice job"}'


</details>

<details>
<summary><strong>Python SDK</strong></summary>

```python
from service.api import MemoryWriteAPI
from core.models import RequestContext
from fs.sql_adapter import SQLContextFS
from providers.config import ProviderConfig

config = ProviderConfig.from_env()
fs = SQLContextFS(connection_string="host=127.0.0.1 port=5432 dbname=ogmemory user=postgres password=postgres")
write_api = MemoryWriteAPI(fs=fs, llm=config.create_llm())

ctx = RequestContext(account_id="acct", user_id="u1", agent_id="a1", session_id="s1", trace_id="t1")
result = write_api.commit_session([
    {"role": "user", "content": "I'm Alice, backend engineer, London"},
    {"role": "assistant", "content": "Nice to meet you, Alice!"},
], ctx)

Docker / HTTP API Reference

docker compose up   # Server on :8090, PostgreSQL on :5432

Endpoint	Method	Description
`/api/v1/compose`	POST	Search memory, return context for current turn
`/api/v1/after_turn`	POST	Extract + persist memories from conversation
`/api/v1/ingest`	POST	Single message ingest
`/api/v1/ingest_batch`	POST	Batch message ingest
`/api/v1/bootstrap`	POST	Cold-start session with profile + preferences
`/api/v1/compact`	POST	Trigger context compression
`/api/v1/prepare_compaction`	POST	Prepare compaction token (pre-compaction planning)
`/api/v1/dispose`	POST	Session disposal — archive + cleanup
`/api/v1/prepare_subagent_spawn`	POST	Sub-agent context handoff (multi-agent)
`/api/v1/on_subagent_ended`	POST	Sub-agent result merge back to parent
`/api/v1/token_stats`	GET/POST	LLM & embedding token usage (POST with `reset` to clear)
`/api/v1/sessions/{id}/messages`	POST	Add message to session buffer
`/api/v1/sessions/{id}`	GET	Get session metadata + pending tokens
`/api/v1/sessions/{id}/commit`	POST	Commit session: archive + extract
`/api/v1/sessions/{id}/context`	GET	Get assembled session context
`/api/v1/call/<method>`	POST	Generic method dispatch (forward-compat)
`/api/v1/health`	GET	Health check (storage + LLM + vector DB)
`/api/v1/admin/accounts`	GET	List accounts
`/api/v1/admin/accounts/{id}`	GET	Get account
`/api/v1/admin/accounts/{id}/users`	GET/POST	List / create users
`/api/v1/admin/accounts/{id}/users/{uid}`	DELETE	Delete user
`/api/v1/admin/accounts/{id}/users/{uid}/role`	PATCH	Set user role (ROOT/ADMIN/MEMBER)
`/api/v1/admin/accounts/{id}/roles`	GET	List roles
`/api/v1/admin/accounts/{id}/agents`	GET/POST	List / register agents
`/api/v1/admin/accounts/{id}/agents/{aid}`	GET/PATCH	Get / update agent
`/api/v1/admin/accounts/{id}/audit-logs`	GET	List audit logs
`/api/v1/admin/accounts/{id}/audit-logs/{log_id}`	GET	Get single audit log
`/api/v1/admin/config/agent-sharing`	GET	Agent sharing configuration

Configuration

Key environment variables (see ENV.md for full reference). Config priority: YAML value > environment variable > hard-coded default.

Variable	Default	Description
`OGMEM_API_KEY`	—	LLM API key (OpenAI-compatible)
`OGMEM_BASE_URL`	—	Custom LLM API base URL
`OGMEM_LLM_MODEL`	`gpt-4o-mini`	LLM model for extraction + classification
`OGMEM_EMBEDDING_MODEL`	`text-embedding-ada-002`	Embedding model for vector indexing
`OGMEM_EMBEDDING_API_KEY`	—	Separate key for embedding API (falls back to `OGMEM_API_KEY`)
`EMBEDDING_PROVIDER`	—	Separate embedding provider (`openai` / `volcengine` / `st` / `mock`)
`VECTOR_DB_TYPE`	`chroma`	Vector backend: `chroma` / `opengauss` (pgvector) / `memory`
`STORAGE_BACKEND`	`agfs`	Storage backend: `agfs` (AGFS file server) / `sql` (PostgreSQL)
`SQL_CONNECTION_STRING`	—	PostgreSQL DSN (required when `STORAGE_BACKEND=sql`)
`AGFS_BASE_URL`	`http://127.0.0.1:1833`	AGFS server URL (used when `STORAGE_BACKEND=agfs`)
`OGMEM_HTTP_PORT`	`8090`	HTTP server listen port
`OGMEM_CONFIG`	`config/ogmem.yaml`	Path to YAML config file
`OG_ACCOUNT_ID`	`acct-demo`	Default account ID
`OG_USER_ID`	`u-alice`	Default user ID
`OG_AGENT_ID`	`main`	Default agent ID
`OG_ROLE_CONTROL_ENABLED`	`false`	Enable RBAC auth
`OG_ROOT_API_KEY`	—	Root API key for admin access
`OG_AGENT_SHARED_MODE`	`off`	Agent memory sharing: `off` / `whitelist` / `all`
`OGMEM_AFTER_TURN_THRESHOLD`	`200`	Min message length to trigger extraction
`OGMEM_CACHE_ENABLED`	`true`	Enable retrieval cache
`CHUNKING_ENABLED`	`false`	Enable boundary detection + chunking
`INDEX_INTERVAL`	`30`	Index worker polling interval (seconds)

Repository Layout

ContextEngine/
├── core/                   # Domain models, Protocol interfaces, enums, errors
├── fs/                     # ContextFS abstraction (filesystem metaphor for context ops)
│   ├── agfs_adapter/       #   AGFS-backed ContextFS (Go file server, default backend)
│   └── sql_adapter/        #   PostgreSQL-backed ContextFS (atomic upsert, RLS tenant isolation)
├── extraction/             # CandidateExtractor (ReAct loop + YAML SchemaRegistry)
│   ├── prompts/            #   LLM prompt templates (Jinja2)
│   └── schemas/            #   Schema registry + YAML definitions per category
│       └── definitions/    #     10 YAML files: profile, entity, event, skill, tool, etc.
├── commit/                 # Write chain: PolicyRouter → MergePolicy → ContextWriter → OutboxStore
├── index/                  # Async indexing: OutboxWorker → IndexRecordBuilder → DirectorySummarizer
├── retrieval/              # Read chain: QueryPlanner → IntentClassifier → SeedRetriever (+BM25)
│                          #   → HierarchicalSearcher → ResultRanker
├── providers/              # External adapters: LLM, Embedder, VectorIndex, RelationStore
│   ├── llm/                #   OpenAI-compatible LLM (works with OpenAI/Volcengine/DashScope/Zhipu)
│   ├── embedder/           #   4 backends (OpenAI, Volcengine, SentenceTransformers, Mock)
│   ├── vector_index/       #   InMemory / ChromaDB / pgvector (OpenGauss)
│   └── relation_store/     #   SQL + AGFS relation stores
├── service/                # API layer: MemoryWriteAPI, MemoryService, IndexService
├── server/                 # HTTP REST server (Flask), auth/RBAC, IP allowlist, sessions
├── session/                # SessionManager, TopicBuffer, RollingCompressor, ArchiveStore
├── tests/
│   ├── contract/           # Cross-team contract tests (invariants)
│   ├── unit/               # Per-package unit tests
│   ├── integration/        # End-to-end integration tests
│   ├── e2e/                # LoCoMo benchmark evaluation framework
│   ├── benchmark/          # Performance and quality benchmarks
│   ├── ab/                 # A/B comparison tests
│   └── fixtures/           # Shared test data
├── docs/                   # Architecture, deployment, quickstart guides
├── examples/               # Usage examples (SDK, agent integration)
├── cli/                    # Unified management CLI (ogmem command)
│   └── commands/           #   onboard, start, stop, check, config, status, logs, eval
├── claude-plugin/          # Claude Code hooks integration (hooks, scripts, skills)
├── openclaw_context_engine_plugin/  # OpenClaw plugin (TypeScript bridge)
├── agfs/                   # AGFS Go server source (cmd/, pkg/, sdk/)
├── config/                 # Configuration files (ogmem.yaml, .env, AGFS configs)
├── deploy/                 # Docker deployment scripts and configs
├── docker/                 # Docker Compose files
└── scripts/                # Auxiliary scripts (index service, etc.)

Agent Integration

Claude Code

ContextEngine provides native Claude Code hooks for zero-config memory:

ogmem onboard                           # Interactive: choose "Agent Plugin" → "Claude Code"
ogmem start plugin                      # Start CE server + install hooks

OpenClaw

cd openclaw_context_engine_plugin && openclaw plugins install -l .

Tool	Stage	Operation
`og_memory_write`	④ Turn End	Extract + persist new context
`og_memory_search`	① Message	Prefetch relevant context
`og_memory_read`	② Reasoning Prep	Load full context by URI

Automatic behaviors (no Agent code changes): new message triggers prefetch, turn end triggers extraction, context fill triggers compression, session close triggers archival.

Implementation Status

Phase 0 + 1 (production quality): See Roadmap for detailed milestone breakdown.

Core models, ContextFS abstraction (AGFS + PostgreSQL adapters), extraction pipeline (YAML SchemaRegistry + ReAct loop), write chain (4 merge policies + OutboxStore), hierarchical retrieval (L0/L1/L2) with BM25 hybrid search, async indexing with DLQ, providers (OpenAI/Volcengine/DashScope/Zhipu LLM, 4 embedders, InMemory/ChromaDB/pgvector), HTTP REST API with RBAC auth, session management, L0 structured summary generation, multi-agent handoff, boundary detection/chunking. 100+ test files (contract + unit + integration + e2e + benchmark).

Lifecycle stages covered: ① message_received, ② bootstrap + ingest + compose, ④ afterTurn, ⑤ prepare_compaction + compact, and the session-state persistence portion of ⑥ session_end + dispose.

Phase 2 (in progress): Context compression pipeline, graph retrieval. Phase 3 (planned): Tool context injection, session restore, graph clustering, adaptive planning, observability, AI functions.

Lifecycle stages to cover: ③ before_tool_call + tool_result_persist and the remaining operational hardening for ⑥ session_end + dispose.

Testing: pytest tests/contract/ -v (core invariants) | pytest tests/ -v (all tests) | pytest tests/ --cov=core --cov=fs --cov=service --cov-report=html

References

Benchmark & Evaluation

LoCoMo: Evaluating Long-Context Conversational Memory — Maharana et al., 2024 — long-context conversational memory benchmark used in our evaluation

Memory Architectures for LLM Agents

MemoryBank: Enhancing LLMs with Long-Term Memory — Zhong et al., 2023 — sentiment-centric memory processing for persistent conversational memory
A-MEM: Agentic Memory for LLM Agents — 2025 — dynamic, agentic memory organization with dynamic indexing
Mem0: Production-Ready AI Agents with Scalable Long-Term Memory — 2025 — user/session/agent scoped memory with vector store + knowledge graph
SeCom: Memory Construction and Retrieval for Personalized Conversational Agents — 2025 — segment-level memory bank with conversation segmentation (related to our two-phase extraction)

Hierarchical Retrieval & Multi-Granularity

ReadAgent: Gist Memory of Very Long Contexts — Lee et al., 2024 — human-inspired reading agent with gist memory, 20x effective context length (related to our L0/L1/L2 hierarchy)
MemoRAG: Memory-Augmented Retrieval — Qian et al., 2024 — dual-system architecture with global memory + retrieval-augmented generation
LATTICE: LLM-guided Hierarchical Retrieval — Gupta et al., 2025 — hierarchical retrieval framework for large document collections

Context Compression

LLMLingua-2: Task-Agnostic Prompt Compression — Pan et al., 2024 — data distillation for efficient prompt compression up to 20x
LongLLMLingua: Question-Aware Compression for Long Context — Jiang et al., 2024, ACL — coarse-to-fine compression with 17.1% improvement at 4x compression (related to our compression pipeline design)
Prompt Compression for Large Language Models: A Survey — 2024 — comprehensive survey of prompt and context compression techniques

Reasoning & Acting

ReAct: Synergizing Reasoning and Acting in Language Models — Yao et al., 2023, ICLR — interleaved reasoning traces and task-specific actions (our lazy extraction mode is based on this paradigm)

Multi-Agent & Governance

Governed Memory — Taheri, 2026 — memory governance and access control
Collaborative Memory — multi-user memory sharing + dynamic ACL (related to our multi-tenant isolation)
Multi-Agent Memory Systems for Production — Mem0, 2026
Cemri et al. — multi-agent coordination failure analysis

Infrastructure

OpenViking — AGFS file storage layer and core design inspiration
AI Agent Memory Architectures — Zylos Research, 2026

License

Apache License 2.0

ContextEngine: Context Lifecycle Engine for CLI Agents