CompassMem: Memory & Query Optimization
- CompassMem is a family of memory architectures that segments events and employs in-memory statistical sketches to support efficient reasoning and structured data retrieval.
- It integrates LLM-driven segmentation, active planning, and graph-based logical navigation to enhance multi-hop and temporal question answering performance.
- CompassMem delivers robust context management in multi-agent systems and optimized query processing in databases by balancing fine-grained accuracy with memory efficiency.
CompassMem encompasses a family of memory architectures rigorously designed to support efficient reasoning, long-horizon context management, and query optimization across both agentic and database systems. In LLM agents, CompassMem frameworks address the critical bottleneck of organizing, structuring, and retrieving relevant past experiences or evidence, particularly for tasks with extended temporal or logical dependencies. In database query optimization, CompassMem denotes an in-memory statistical sketch system that delivers high-fidelity cardinality estimates with fine-grained memory–accuracy trade-offs. Although differing in application scope, all CompassMem variants prioritize compositional memory structuring, active selection, and efficiency guarantees over naive, flat memory approaches.
1. Event-Centric Memory: Agent Reasoning with Logic Maps
CompassMem, as introduced for LLM agents, is an event-centric memory architecture inspired by Event Segmentation Theory. It segments streaming experiences into discrete events, constructs an explicit Event Graph (nodes = events; edges = logical relations), and supports active, multi-step graph traversal for memory retrieval and reasoning (Hu et al., 8 Jan 2026). Each event is represented as a 4-tuple —comprising the original observation span, temporal anchor, semantic summary, and event participants. The framework incrementally updates the Event Graph through LLM-prompted segmentation and relation extraction, organizing experiences hierarchically with logical edges (e.g., causal, temporal, motivational, part-of).
Query retrieval within CompassMem is realized by a three-component agent pipeline (Planner, Explorers, Responder). The Planner decomposes the query into subgoals, Explorer agents navigate the Event Graph combining embedding similarity and edge structure, and the Responder synthesizes answers based on the aggregated evidence nodes. Crucially, retrieval is active and goal-directed, relying on logic-aware navigation rather than simple nearest-neighbor lookups. This architecture demonstrates marked improvements in multi-hop and temporal QA over baselines, with F1 scores exceeding alternatives such as CAM and HippoRAG by up to 9 points on temporal questions (Hu et al., 8 Jan 2026).
2. Context Management in Multi-Agent Frameworks
Within hierarchical agent systems such as COMPASS, CompassMem denotes a lightweight, bullet-point memory buffer and retrieval-summarization module specifically tasked with evolving context management for the Main Agent (Wan et al., 9 Oct 2025). The persistent NoteStore is an append-only log of structured memory records , where each record summarizes verified facts, active constraints, and unresolved questions at each reasoning turn.
At each outer iteration, CompassMem synthesizes a concise context brief :
- Inputs: current query , NoteStore , Main Agent’s recent trajectory, and Meta-Thinker’s strategic signals ().
- Algorithm: retrieves top- relevant records via embedding similarity , then invokes a summarizer (e.g., Gemini 2.5 Pro or a distilled 12B model) to compress notes plus current trace into a context brief (≤300 tokens).
- Output: delivers the brief to the Main Agent; appends extracted content as a new record in NoteStore.
Empirical ablation indicates removing CompassMem drops BrowseComp Pass@1 from 35.4% to 26.4% and degrades strategic stability. Substituting the summarizer with a compact model (Context-12B) trebles token efficiency at negligible accuracy cost, supporting both effectiveness and scalability (Wan et al., 9 Oct 2025).
3. In-Memory Statistical Sketching for Query Optimization
In in-memory database contexts, CompassMem refers to the Fast-AGMS sketch infrastructure underpinning online query optimization (Izenov et al., 2021). Here, each sketch is an matrix of 32-bit counters (default: , ; KB per sketch), updated row-wise via hash functions per incoming tuple. The per-query memory budget is strictly bounded: up to 56 sketches ( MB) for the most complex JOB benchmark queries, scaling linearly by where is the number of attributes and the number of joins. For GPU deployments, full replication amounts to MB but remains minute compared to intermediate data sizes in classic systems.
Unlike sampling or histogram-based techniques, CompassMem’s sketches are constructed only over relevant (WHERE-passing) tuples, piggyback on scans used for exact selectivities, and are merged incrementally for plan enumeration—never rebuilt from scratch—ensuring zero extra I/O and pure in-memory computation. Theoretical bounds ensure variance , with , for error and confidence . Empirical validation shows sub-10 errors even for high-way joins, a significant gain over baselines—all within a restricted memory envelope (Izenov et al., 2021).
Table: Memory Usage by System (per-query, approximate)
| System | Per-sketch | Max # sketches | Total RAM |
|---|---|---|---|
| CompassMem | 45 KB | up to 56 | ≃ 2.5 MB |
| CompassMem (GPU-replicated) | 45 KB | 56 × 26 | ≃ 65 MB |
| PostgreSQL/MonetDB/DBMS A | – | – | < 1 MB |
| MapD | – | – | 0 MB |
4. Retrieval, Reasoning, and Navigation Algorithms
Agentic CompassMem features algorithmic pipelines for active evidence collection:
- Subgoal decomposition: Planner generates , .
- Initial candidate localization: retrieves top- events per subgoal/topic cluster.
- Parallel explorer navigation: Each explorer traverses nodes by edge type, using a global subgoal satisfaction vector and evidence set , making stepwise decisions to SKIP, EXPAND, or ANSWER.
- Scheduling: global priority queue with per-node priorities ; focuses search on unsatisfied subgoals.
- Query refinement: If not all subgoals are satisfied, queries are dynamically refined and reissued to direct search adaptively.
All operator calls, segmentation, and relation extraction within CompassMem are implemented by LLM prompting, leveraging specialized templates. The architecture ensures retrieval performance is tightly linked to the logic-aware graph structure, not mere embedding proximity (Hu et al., 8 Jan 2026).
Within context-management settings, retrieval algorithms rely on fixed dimensional similarity scores and softmax-based selection over memory records. Compression of the active context brief is guaranteed via retrieval + summarization, keeping reasoning context within token budgets for efficiency at scale (Wan et al., 9 Oct 2025).
5. Experimental Findings and Efficiency
On QA and reasoning benchmarks (LoCoMo, NarrativeQA), CompassMem yields consistent improvements over flat and tree-based memory baselines. Representative results include:
- LoCoMo (GPT-4o-mini): Avg F1 = 52.18% vs. 47.92% for HippoRAG; Temporal F1 = 57.96% vs. 48.93%.
- NarrativeQA: F1 improves by 5.49–8.03 points over CAM, HippoRAG, and similar baselines (Hu et al., 8 Jan 2026).
CompassMem’s memory construction latency is below Mem0, A-Mem, and MemoryOS, with comparable or lower overall latency and token costs relative to graph-based alternatives. Ablation studies confirm that each architectural component (topic clustering, event partitioning, logical edges, active planning) is necessary for multi-hop and temporal QA performance; any removal causes degradation, particularly for complex queries.
Efficiency measures in context management indicate per-turn insertion at , retrieval at (with -dimensional embeddings), and end-to-end context synthesis within ~300 tokens per agent turn. Deploying compact specialized summarizers (Context-12B) retains accuracy while reducing token budgets by up to 30% (Wan et al., 9 Oct 2025). In database workloads, the entire sketching subsystem’s live memory usage remains several orders of magnitude below that of intermediate or materialized join outputs (Izenov et al., 2021).
6. Limitations and Prospective Advances
CompassMem’s core limitations derive from reliance on LLM-based segmentation and relation extraction: inaccuracies at this stage propagate through the Event Graph or context notes. Current evaluations primarily target QA and agentic planning, with less coverage of domains such as multimodal reasoning. Event segmentation presently leverages LLM prompting rather than explicit statistical detectors; robustness could increase by adopting supervised or hybrid event/edge labellers or by learning edge importance/retrieval strategies via reinforcement learning.
For in-memory sketch systems, accuracy is ultimately constrained by sketch width and row count . Increasing coverage of rare join keys or higher-way merges may further benefit from dynamic reallocation of sketch capacity or hybridization with on-demand sampling. Extending CompassMem to multimodal agent memory, or integrating explicit temporal reasoning and timeline inference modules, is an active research direction (Hu et al., 8 Jan 2026).
A plausible implication is that further automation and scale-up of both evidence structuring and active retrieval—especially with lightweight, specialized components—will be central to robust long-horizon reasoning agents and high-throughput, memory-efficient query systems.