Structured Episodic Event Memory (SEEM)
- Structured Episodic Event Memory is a hierarchical framework that organizes experiential data into semantically and temporally structured event frames.
- The architecture employs two-phase extraction and fusion algorithms to consolidate related conversational turns and reduce memory fragmentation.
- Empirical evaluations show enhanced coherence, multi-hop reasoning, and up to a 4.4-point accuracy gain on benchmark tests.
Structured Episodic Event Memory (SEEM) is a hierarchical memory architecture for LLMs and autonomous agents, engineered to promote narrative coherence, temporal logic, and tractable event provenance in long-term interaction. Unlike statically chunked memory strategies such as standard Retrieval-Augmented Generation (RAG), SEEM organizes experiential data streams into semantically and temporally structured Episodic Event Frames (EEFs), tightly coupled with a static graph memory of relational tuples. SEEM leverages cognitive frame theory, provenance tracking, and both agentic fusion and expansion mechanisms to address the scattered retrieval and flat recall that limit the reasoning depth of traditional architectures (Lu et al., 10 Jan 2026).
1. Formalism and Definition of Episodic Event Frames
Within SEEM, the canonical memory unit is the Episodic Event Frame, defined as a semantically structured tuple anchored by provenance:
where:
- : Pointer to the original text passage .
- : Abstractive summary (1–2 sentences) of .
- Each role tuple comprises: Participants, Action, Time, Location, Causality, Manner for subevents in .
This representation explicitly encodes “who did what, when, where, why, and how,” directly reflecting cognitive frame theory’s emphasis on multiple semantic roles and traceable context. The provenance pointer enables downstream context reconstruction, supporting full explainability and error tracing (Lu et al., 10 Jan 2026).
2. Construction and Fusion Algorithms
Transformation from raw text passages to EEFs is achieved via a two-phase algorithm:
- Extraction (): Parses the passage into summary and role tuple(s).
- Fusion (): Optionally merges a new frame with the most semantically similar existing frame if they are judged part of the same event.
Pseudocode (as formalized in (Lu et al., 10 Jan 2026)):
1 2 3 4 5 6 7 8 9 10 11 12 |
initialize EpisodicStore = [] for t in 1..T: p = passage[t] e_new = F_ext(p) e_prev = retrieve_most_similar(EpisodicStore, e_new) delta = F_judge(e_new, e_prev) if delta == 1: e_fused = F_fuse(e_prev, e_new) remove e_prev from EpisodicStore EpisodicStore.append(e_fused) else: EpisodicStore.append(e_new) |
The fusion step aggregates role tuples and expands provenance to cover all constituent source passages. This streaming approach continually consolidates semantically overlapping events, reducing fragmentation and compressing narratives into minimal, self-contained event units (Lu et al., 10 Jan 2026).
3. Two-Layer Memory Hierarchy: Graph and Episodic Layers
SEEM’s memory architecture is explicitly two-layered:
- Graph Memory Layer (GML): Stores static, time-anchored relational facts as tuples encoding subject, relation, object, and timestamp.
- Episodic Memory Layer (EML): Maintains the chronological sequence of EEFs, capturing the full unfolding of the narrative stream.
The interface between these layers proceeds as follows for a query :
- GML retrieval seeds a set of potentially relevant passages.
- These are mapped to corresponding EEFs.
- Reverse Provenance Expansion (RPE) recursively incorporates all raw passages associated with partially matched EEFs.
- The resulting union is serialized with graph tuples and summaries into a prompt.
- The answer is generated conditioned on this serialized context.
Graph-episodic synergy is achieved through concatenation of EEF abstractions and relational facts within the prompt, without explicit cross-attention (Lu et al., 10 Jan 2026).
4. Agentic Associative Fusion and Reverse Provenance Expansion
Two core mechanisms structure long-horizon episodic memory, reducing drift and loss of coherence:
- Agentic Associative Fusion: Prevents fragmentation by merging new EEFs with prior ones deemed semantically identical. The fusion operation unifies both the role attributes and all provenance pointers, ensuring events spanning multiple dialogue turns or passages are represented as single, semantically rich frames.
with
- Reverse Provenance Expansion (RPE): Following an initial passage retrieval, RPE recursively surfaces all passages linked via to any EEFs returned, thereby reuniting fragmented evidence of multi-turn or discontinuous events. This process guarantees that partially retrieved events yield their complete supporting context, restoring narrative continuity and referential integrity.
These mechanisms greatly reduce cases where logically related event elements are missed by query-first or window-based retrieval (Lu et al., 10 Jan 2026).
5. Example: Multi-Turn Dialog Event Fusion
A representative example, adapted from (Lu et al., 10 Jan 2026), demonstrates the consolidation of related conversational turns into a unified EEF:
- Summary:
"On January 23, 2022 at 2:01 pm, Joanna asked Nate how long he had owned ‘them,’ and Nate replied that he’d had them for three years (since January 2019) and that they brought him joy."
- Events:
| Participant | Action | Time | Reason | Method | |-------------|-----------------------------------|--------------------------|-----------------------|------------------| | Joanna | Asked about Nate’s ownership | 2:01 pm, 23 Jan 2022 | Affectionate curiosity| Verbal inquiry | | Nate | Replied ownership and experience | Jan 2019 – Jan 2022 | In response to Joanna | Verbal response |
Roles are grounded in specific turns via provenance. The fusion process links both turns through one frame, preserving the causal and referential relationship.
6. Empirical Evaluation and Quantitative Impact
Empirical evaluation on LoCoMo and LongMemEval benchmarks demonstrates substantial improvements attributable to EEF structuring. Key results:
- LoCoMo: BLEU-1 = 56.1, F1 = 61.1, LLM-Judge = 78.0 (improvements of +2.8 F1 and +1.5 LLM-Judge compared to HippoRAG 2).
- LongMemEval: Accuracy increases from 60.6 to 65.0 (+4.4 points).
Ablation removing EEFs reduces LoCoMo to BLEU-1 = 53.5, F1 = 58.5, LLM-Judge = 75.0—an ablation capturing nearly half the gain in coherence and long-term QA accuracy. EEFs specifically enhance multi-hop and temporal reasoning, with temporal category accuracy rising from 54.6 to 63.4 (Lu et al., 10 Jan 2026).
7. Significance and Broader Context
SEEM’s formalization of episodic memory establishes a rigorous foundation for interpretability, narrative preservation, and robust reasoning in long-horizon LLM and agentic architectures. The definition, extraction, and fusion of EEFs represent a convergence of cognitive frame theory, structured provenance, and hierarchical memory. Mechanisms such as agentic associative fusion and RPE address key challenges in fragmentation and context assembly that are not resolved by static retrieval methods. The architecture’s empirical gains highlight the benefits of structured event-centric memory for applications demanding temporal, causal, and entity coherence over extended interactions (Lu et al., 10 Jan 2026).