Papers
Topics
Authors
Recent
Search
2000 character limit reached

WorldDB: A Vector Graph-of-Worlds Memory Engine with Ontology-Aware Write-Time Reconciliation

Published 20 Apr 2026 in cs.AI and cs.CL | (2604.18478v1)

Abstract: Persistent memory is the bottleneck separating stateless chatbots from long-running agentic systems. Retrieval-augmented generation (RAG) over flat vector stores fragments facts into chunks, loses cross-session identity, and has no first-class notion of supersession or contradiction. Recent bitemporal knowledge-graph systems (Graphiti, Memento, Hydra DB) add typed edges and valid-time metadata, but the graph itself remains flat: no recursive composition, no content-addressed invariants on nodes, and edge types carry no behavior beyond a label. We present WorldDB, a memory engine built on three commitments: (i) every node is a world -- a container with its own interior subgraph, ontology scope, and composed embedding, recursive to arbitrary depth; (ii) nodes are content-addressed and immutable, so any edit produces a new hash at the node and every ancestor, giving a Merkle-style audit trail for free; (iii) edges are write-time programs -- each edge type ships on_insert/on_delete/on_query_rewrite handlers (supersession closes validity, contradicts preserves both sides, same_as stages a merge proposal), so no raw append path exists. On LongMemEval-s (500 questions, ~115k-token conversational stacks), WorldDB with Claude Opus 4.7 as answerer achieves 96.40% overall / 97.11% task-averaged accuracy, a +5.61pp improvement over the previously reported Hydra DB state-of-the-art (90.79%) and +11.20pp over Supermemory (85.20%), with perfect single-session-assistant recall and robust performance on temporal reasoning (96.24%), knowledge update (98.72%), and preference synthesis (96.67%). Ablations show that the engine's graph layer -- resolver-unified entities and typed refers_to edges -- contributes +7.0pp task-averaged independently of the underlying answerer.

Summary

  • The paper introduces WorldDB, a memory engine that leverages recursive worlds, content-addressed immutability, and write-time programs to enforce structural correctness.
  • It employs a four-stage reconciliation protocol combining candidate extraction, multi-tier resolution, and atomic commitment to ensure robust entity matching.
  • Empirical evaluation on LongMemEval-s shows significant accuracy improvements and efficient read/write performance, outpacing traditional memory systems.

WorldDB: Vector Graph-of-Worlds Memory Engine with Ontology-Aware Write-Time Reconciliation

Motivation and Problem Setting

WorldDB addresses the systemic limitations of RAG-based memory for persistent agent systems. Flat vector stores, the standard for retrieval-augmented generation, introduce fragmentation, temporal ambiguity, and identity drift due to naive chunking, lack of valid-time awareness, and disconnected embeddings. Prior attempts to mitigate these issues include bitemporal knowledge-graph systems (e.g., Hydra DB, Memento, Zep), which add typed edges and valid-time metadata, but remain fundamentally flat and operationally detached from behavior enforcement at the graph level.

The paper posits that robust memory for agentic LLMs requires a structurally coherent and fully auditable substrate, where compositional recursion, content-addressed immutability, and edge-program semantics are foundational. This is formalized in WorldDB’s design, fundamentally revising standard assumptions about memory-layer organization for long-running agents.

Core Architectural Commitments

WorldDB’s architecture revolves around three uncompromising principles:

  1. Recursive Worlds: Each node is a ‘world’—a container with its own subgraph, ontology scope, and recursively composed embedding. This hierarchy enables explicit context scoping, preventing leakage between worlds and supporting deep containment semantics. Queries re-scope to active worlds, enforcing compositional boundaries.
  2. Content-Addressed Immutability: Nodes are identified via Blake3 hashes over their content, children, edges, and creation timestamp. Any modification propagates as a new hash up the ancestry chain, yielding Merkle-style auditability and deduplication. Validity intervals (tvalidt_{valid}) are mutable but excluded from content addressing, allowing for temporal supersession without violating immutability.
  3. Edges as Write-Time Programs: All edge types (e.g., supersedes, contradicts, same_as, refers_to) are paired with executable handlers—on_insert, on_delete, on_query_rewrite—that enforce semantics, disallowing raw appends. Supersession programmatically closes validity, contradictions are surfaced but preserved, merges are staged and explicit.

These commitments enforce structural correctness and consistency, eliminating append-only bypasses and silent corruption prevalent in traditional stores.

Reconciliation and Entity Resolution Pipeline

WorldDB incorporates a four-stage reconciliation protocol:

  • Extraction: Candidate nodes and edges are produced by a model-backed extractor.
  • Resolution: Entities are matched via a tiered resolver: exact, fuzzy, phonetic, embedding similarity, and tiebreaker. Confirmed matches stage merge proposals; new entities are introduced without silent conflation.
  • Reconciliation: Edge handlers atomically process candidate edges, enforcing validity closures, contradictions, and merges.
  • Commit: Content-addressed writes are persisted, and ANN indexes are incrementally updated.

Incremental reclustering after each ingest ensures that entity drift is minimized as new facts accumulate. The resolver’s synthesizing approach supports robust cross-session identity unification.

Retrieval Layer and Composed Embeddings

Reads in WorldDB are deterministic and compositional, with no LLMs invoked on the query path. Retrieval merges three lanes—BM25, HNSW-based semantic matching, and entity-graph traversal—via reciprocal rank fusion. Entity traversal exploits cross-session identity, substantially improving recall.

World embeddings are composed via two modes: mean pool and parameter-free scaled dot-product attention. The latter, inspired by HAKG aggregation but without learned parameters, substantially improves top-1 retrieval accuracy on synthetic benchmarks. The content/effective embedding split precludes silent embedding drift, stabilizing world semantics against incremental composition.

Background Consolidator and Summarization

The consolidator periodically generates exhaustive summary nodes, computes transitive closures for causal/type edges, and sweeps for structural contradictions. These summaries enable fast summary-first queries (6.5x faster than full-detail traversal), preserving all leaf nodes and offering depth-sensitive querying. The architecture supports future integration of bio-mimetic decay engines, enabling sophisticated retention and forgetting dynamics aligned with agentic memory needs.

Empirical Evaluation

WorldDB demonstrates strong empirical performance on LongMemEval-s, a benchmark comprising 500 conversational stacks (~115k tokens average length):

  • Overall Accuracy: 96.40%, outperforming Hydra DB (90.79%) and Supermemory (85.20%) by 5.61 and 11.20 percentage points, respectively.
  • Task-Averaged Accuracy: 97.11% (vs. Hydra DB’s 93.66%).
  • Single-Session, Multi-Session, Temporal, Update Reasoning: Maintains >92.48% across all categories with largest gains in multi-session reasoning (+15.79pp) and temporal reasoning (+5.27pp).
  • Ablation: The graph layer contributes +10.66pp task-averaged independently of answerer model capacity.
  • Engineering Benchmarks: 1M nodes and 2.5M edges loaded at >5,400 writes/s; read latencies (P95) are consistently under 100ms for all shapes. Fuzz testing confirms structural invariants across 4,000 random ops with zero violations.
  • Cross-Model Generalization: Architecture contributes more to accuracy than answerer model selection; e.g., Claude Opus 4.7, Sonnet 4.6, and GPT-4o all benefit substantially, but engine improvements dominate model selection.

Model Context Protocol and Tooling

WorldDB’s MCP surface provides nine memory tools covering writing, recalling, listing, and amending memories, with scope-aware containment implemented as nodes and edges, ensuring that cross-scope recall operates structurally rather than via flat record tags. Full support for stdio and streamable HTTP transports, pluggable extractors and summarizers, and deterministic hashing enables robust cross-agent, cross-app deployments.

Discussion, Limitations, and Implications

WorldDB advances persistent memory for agents by enforcing provenance, structural correctness, and compositional retrieval. The never-appends invariant ensures that all facts are traceable, all contradictions visible. This comes at a modest ingest-time cost, but eliminates silent memory corruption. The parameter-free attention aggregator sets a baseline for future learned world embeddings; bio-mimetic decay techniques are architecturally compatible but not yet implemented.

Practically, WorldDB is a substrate for agentic systems requiring multistage, recursively scoped, and temporally coherent memory. Theoretical implications include establishing Merkle-tree state witnesses for persistent agent memory, robust provenance chains, and generalized graph composition mechanisms. The architecture is conducive to further advances in self-supervised world aggregation, decay/reinforcement models, and temporal event modeling.

Future developments may include learned embedding aggregators via contrastive supervision, retention/retrieval-frequency-driven decay, and comprehensive evaluation on DMR-style benchmarks.

Conclusion

WorldDB achieves a structurally coherent, immutable, and ontology-enforced memory substrate for long-running agents, evidenced by substantial gains in recall, consistency, and reasoning accuracy on demanding benchmarks. Its graph-layer commitments are empirically and architecturally dominant over both answerer capacity and prior store designs. The model sets a new bar for persistent, queryable memory in agentic AI frameworks, with broad implications for agent state management, provenance, and compositional reasoning (2604.18478).

Paper to Video (Beta)

No one has generated a video about this paper yet.

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Collections

Sign up for free to add this paper to one or more collections.