| refactor: dead code removal and embedding model migration to bge-m3
Remove unused code across the codebase and migrate embedding model
from bge-large-en-v1.5 (dim 1536) to bge-m3 (dim 1024).
Dead code removed:
- lifecycle/aging_job.py (entire module)
- OwnerScope enum, unused VectorIndex methods
- _flush_session, poll_task/TaskStatus, _normalize_path
- GPT-3.5 constants, get_st_embedder, UserInfo dataclass
Cleanup: unused imports, datetime.utc modernization, import sorting.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
| 1 个月前 |
| refactor: dead code removal and embedding model migration to bge-m3
Remove unused code across the codebase and migrate embedding model
from bge-large-en-v1.5 (dim 1536) to bge-m3 (dim 1024).
Dead code removed:
- lifecycle/aging_job.py (entire module)
- OwnerScope enum, unused VectorIndex methods
- _flush_session, poll_task/TaskStatus, _normalize_path
- GPT-3.5 constants, get_st_embedder, UserInfo dataclass
Cleanup: unused imports, datetime.utc modernization, import sorting.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
| 1 个月前 |
| add ogmem search logic
| 2 个月前 |
| feat(pgdirect): SQL-backed storage backend replacing AGFS
Direct PostgreSQL storage implementation:
- SQLContextFS: full ContextFS implementation with RLS tenant isolation
- SQLOutboxStore: listen-notify based async indexing
- SQLRelationStore, SQLArchiveStore: relation and session persistence
- SQLControlPlaneStore: multi-instance auth state sharing
- Shared connection pool (PoolAdapterMixin) with health checks
- Atomic write+outbox in single transaction
- Archive chain support (soft delete + outbox event)
- Owner-level ACL with visible_owner_spaces
- PostgreSQL schema with idempotent ensure_schema()
- Updated docs/setup scripts to replace AGFS with PostgreSQL
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
| 26 天前 |
| refactor: shared detect_language, compressor prompt, abstract length alignment
- Extract detect_language() to core/language.py (supports zh-CN/ja/ko/ru/en)
- Replace duplicate implementations in extraction/tools.py and index/directory_summarizer.py
- Enhance RollingCompressor prompt: preserve proper nouns, output language, 150-word summary limit
- Fix abstract length mismatch: DirectorySummarizer truncates to 100 chars (was 200, prompt says 100)
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
| 1 个月前 |
| feat: Docker deployment — openclaw + ogmem plugin Dockerfile
| 1 个月前 |
| feat(provenance): preserve message IDs and pre-generate archive_id
- Add message.id to extraction state in _build_incremental_extraction_state
- Extend _Span with message_ids field, extract from messages[span_start:span_end+1]
- Add provenance_ids field to CandidateMemory
- Pass archive_id through extraction pipeline (Extractor -> CandidatePipeline -> MemoryWriteAPI)
- SessionManager.commit_snapshot accepts optional archive_id parameter
- Consolidate archive_id generation into generate_archive_id() in session_manager
- ArchiveBuilder writes provenance_ids to ContextNode.metadata
Co-Authored-By: Claude [glm-5] <noreply@anthropic.com>
| 15 天前 |
| fix(provenance): merge provenance_ids with dedup, accept list detail, append not overwrite
Three fixes to the provenance tracking system:
1. merge_policies: when an existing memory node is updated, combine
existing and incoming provenance_ids with order-preserving dedup
via _merge_provenance_ids. ArchiveBuilder prefers the merged list
from plan.merged_fields over the raw candidate field.
2. ProvenanceResolver: build_id accepts list[str] as detail for
archive source_type (resolver handles comma join internally).
parse_id returns list[str] for archive, str for others.
Validation centralized in validate_input — only archive allows
list detail.
3. extraction/tools: provenance_ids assignment changed from overwrite
(= [prov_id]) to append (.append(prov_id)) so that existing IDs
from prior pipeline steps are preserved.
Co-Authored-By: Claude [glm-5] <noreply@anthropic.com>
| 15 天前 |
| test: cover schema registry integration
| 11 天前 |
| !60 merge feat/speaker-attribution into dev
feat(extraction): add speaker attribution for profile disambiguation (closes #32)
Created-by: akushonkamen
Commit-by: akushonkamen
Merged-by: opengauss_bot
Description: 【标题】说话人归因:群聊 profile/entity 降级机制
【实现内容】:
- extraction prompt 增加 IDENTITY ANCHOR 块,告知 LLM 用户身份
- profile schema 新增 3 个必填归因字段:evidence_quote, attributed_speaker, attribution_basis
- 新增 validate_attribution():attribution_basis 为 other_named 时自动降级 profile→entity
- 更新 extraction.yaml 模板,增加群聊/转发/第三人称示例
- commit pipeline 集成归因验证
- 330 行单元测试覆盖各种归因场景
【根因分析】:
- 群聊场景中 LLM 难以区分用户自己 vs 别人的话
- 导致抽取的 profile 错误归因到当前用户
【实现方案】:
- 在 prompt 中注入用户身份锚点
- 要求 LLM 对每条 profile 标注归因依据
- 归因验证:非第一人称且非 self_named 时降级为 entity
【关联需求或issue】: Closes #32
【开发自验报告】:
1. 单元测试全部通过(330 行新增测试) ✅
2. LoCoMo benchmark 81.6% 准确率 ✅
3. 群聊场景 profile 错误归因显著减少 ✅
See merge request: opengauss/oG-Memory!60 | 1 个月前 |