Papers
Topics
Authors
Recent
Search
2000 character limit reached

Nemori: Adaptive Memory for LLM Agents

Updated 3 July 2026
  • Nemori is a self-organizing agent memory architecture designed to overcome the 'agent amnesia' of LLM-based systems by integrating episodic segmentation and proactive semantic memory distillation.
  • It employs a Two-Step Alignment Principle and a Predict-Calibrate Principle to structure and refine both episodic and semantic memories in real-time conversational environments.
  • Nemori demonstrates superior long-range contextual understanding and token efficiency on benchmarks like LoCoMo and LongMemEval, outperforming traditional static memory systems.

Nemori is a self-organizing agent memory architecture for LLM-based agents, designed to address the persistent memory limitations inherent in current systems. Drawing inspiration from cognitive science, Nemori integrates episodic segmentation and proactive knowledge distillation through two foundational principles: the Two-Step Alignment Principle, informed by Event Segmentation Theory, and the Predict-Calibrate Principle, grounded in the Free-Energy Principle. Nemori equips autonomous agents with both persistent and adaptively evolving memory, suitable for long-term, streaming conversational settings, and demonstrates state-of-the-art empirical performance on benchmarks demanding long-range contextual understanding (Nan et al., 5 Aug 2025).

1. Motivation and Problem Setting

LLM agents, despite strong performance in isolated sessions, exhibit "agent amnesia": all conversational history is lost between sessions. This limitation is rooted in two technical factors—quadratic O(n2)O(n^2) attention scaling, restricting feasible context length, and the "Lost in the Middle" phenomenon, which degrades retrieval and reasoning over lengthy contexts. Retrieval-Augmented Generation (RAG) and early Memory-Augmented Generation (MAG) systems have made partial progress, but are fundamentally limited:

  • They operate on static, pre-indexed knowledge rather than true online conversational streams.
  • They segment data using heuristics or fixed-size chunks, not semantically coherent units.
  • Their fact extraction is passive and rule-based, precluding any adaptation or evolution of agent knowledge.

For truly autonomous, long-term interactive agents, memory requires both cross-session persistence and the ability to evolve through interaction, paralleling human episodic and semantic memory formation processes. Nemori is designed to realize these dual cognitive processes in LLM-based agents (Nan et al., 5 Aug 2025).

2. Core Architectural Modules

Nemori consists of three interacting modules, maintained by a unified vector-based retrieval engine:

  • Topic Segmentation (Boundary Alignment): Buffers incoming messages, applying an LLM-based boundary detector to determine episodic boundaries.
  • Episodic Memory Generation (Representation Alignment): Upon boundary detection, converts a segment into a structured episodic memory—comprising a concise title and a third-person narrative—supporting event chunking.
  • Semantic Memory Generation (Predict-Calibrate Cycle): Employs the episode title to retrieve related semantic memories, predict episode content, identify prediction gaps, and distill novel knowledge, fostering a continually adapting semantic memory base.

Both episodic and semantic stores are indexed for high-efficiency vector retrieval, facilitating rapid, contextually relevant access during subsequent reasoning tasks (Nan et al., 5 Aug 2025).

3. Two-Step Alignment Principle

The Two-Step Alignment Principle operationalizes cognitive Event Segmentation Theory for agent memory systems:

  • Boundary Alignment: Formally, for message buffer M={m1,,mt}M = \{m_1,\dots,m_t\}, a boundary detector fθf_\theta computes a boolean indicator and confidence (bboundary,cboundary)(b_{\text{boundary}}, c_{\text{boundary}}). Segmentation is triggered if a high-confidence event boundary is detected or the buffer reaches a predefined maximum length:

T=(bboundarycboundary>σboundary)(Mβmax)T = (b_{\text{boundary}} \land c_{\text{boundary}} > \sigma_{\text{boundary}}) \lor (|M| \geq \beta_{\max})

Upon T=TrueT = \text{True}, MM is emitted as an episode and the buffer is reset.

  • Representation Alignment: The raw segment MM is passed to an episode generator gϕg_\phi to produce e=(ξ,ζ)e = (\xi, \zeta) where M={m1,,mt}M = \{m_1,\dots,m_t\}0 is a title summarizing the segment and M={m1,,mt}M = \{m_1,\dots,m_t\}1 a third-person narrative, both inserted into the Episodic Memory Database. The title M={m1,,mt}M = \{m_1,\dots,m_t\}2 serves as a key for triggering semantic memory learning (Nan et al., 5 Aug 2025).

4. Predict-Calibrate Principle

Inspired by the Free-Energy Principle, Nemori's Predict-Calibrate Principle comprises a multistage process:

  • Prediction Stage: Given a new episode M={m1,,mt}M = \{m_1,\dots,m_t\}3, a dense vector search retrieves relevant semantic memories M={m1,,mt}M = \{m_1,\dots,m_t\}4. An LLM-based predictor M={m1,,mt}M = \{m_1,\dots,m_t\}5 forecasts the episode content M={m1,,mt}M = \{m_1,\dots,m_t\}6 using M={m1,,mt}M = \{m_1,\dots,m_t\}7 and M={m1,,mt}M = \{m_1,\dots,m_t\}8.

M={m1,,mt}M = \{m_1,\dots,m_t\}9

fθf_\theta0

  • Calibration Stage: The predicted episode fθf_\theta1 is compared not to the narrative fθf_\theta2, but to the original message buffer fθf_\theta3. A semantic knowledge distiller fθf_\theta4 extracts novel facts representing the prediction gap:

fθf_\theta5

  • Integration Stage: Newly distilled facts fθf_\theta6 are merged into the Semantic Memory Database fθf_\theta7. While Nemori does not directly use gradient descent to update LLM parameters, this external knowledge refinement mirrors variational free-energy minimization strategies in cognitive science (Nan et al., 5 Aug 2025).

5. Empirical Evaluation

Nemori's efficacy is evaluated on two primary benchmarks:

  • LoCoMo: Consists of 10 dialogues averaging 24K tokens with 1,540 questions over four reasoning types.
  • LongMemEvalfθf_\theta8: Consists of 500 conversations averaging 105K tokens, focused on scalability testing.

Methods compared include FullContext (upper-bound LLM context), RAG-4096 (static chunking), and MAG systems such as LangMem, Zep, and Mem0. Key metrics are LLM-Score (GPT-4judges), F1, and BLEU-1. The backbone LLMs used are GPT-4o-mini and GPT-4.1-mini, with retrieval of top-fθf_\theta9 episodic and top-(bboundary,cboundary)(b_{\text{boundary}}, c_{\text{boundary}})0 semantic memories (default (bboundary,cboundary)(b_{\text{boundary}}, c_{\text{boundary}})1).

Nemori achieves:

  • On LoCoMo (GPT-4o-mini), Nemori scores 0.744 LLM-Score, surpassing FullContext (0.723), with only 12% of the token usage.
  • On LongMemEval(bboundary,cboundary)(b_{\text{boundary}}, c_{\text{boundary}})2, average accuracy is 64.2% vs. 55.0% (FullContext), requiring only 3.7–4.8K tokens—a 95% reduction (Nan et al., 5 Aug 2025).
Method LLM-Score F1 BLEU-1
FullContext 0.723 0.462 0.378
LangMem 0.513 0.358 0.294
Mem0 0.613 0.415 0.342
RAG-4096 0.302 0.208 0.164
Zep 0.585 0.375 0.309
Nemori 0.744 0.495 0.385

Nemori's advantage is especially pronounced in temporal reasoning, where accurate episode-level organization and semantic memory retrieval are essential.

6. Ablation, Analysis, and Implications

Ablation studies on LoCoMo demonstrate:

  • Removing both episodic and semantic memory ("w/o Nemori") leads to near-zero performance.
  • Direct semantic extraction (Nemori-s) is inferior (0.518 LLM-Score) to the full Predict-Calibrate approach (0.615 for "w/o e"), confirming that proactive distillation outperforms passive methods.
  • Dropping episodic memory ("w/o e") degrades performance (to 0.615) more than removing semantic memory ("w/o s": 0.705), indicating their complementarity.

Performance plateaus beyond top-(bboundary,cboundary)(b_{\text{boundary}}, c_{\text{boundary}})3 episodic retrieval, suggesting diminishing returns for larger retrieval sets. Qualitative examples show Nemori's episodic segmentation successfully groups semantically related conversational turns that baseline chunking methods fragment, yielding better factual coherence on retrieval and reasoning tasks. On the more challenging LongMemEval(bboundary,cboundary)(b_{\text{boundary}}, c_{\text{boundary}})4, Nemori maintains a lead over FullContext, particularly for user-preference questions. Slight accuracy declines on assistant-focused queries suggest challenges in preserving highly granular information over extremely long spans—a candidate for future investigation (Nan et al., 5 Aug 2025).

7. Future Directions

Planned developments for Nemori include end-to-end fine-tuning of Predict-Calibrate components and further integration with memory-efficient LLM architectures. Extension to multimodal episodic memory—incorporating sensory or visual data—remains an area for future research. The architectural framework established by Nemori lays groundwork for truly autonomous, self-evolving agents with human-like, long-term memory organization and adaptive knowledge evolution (Nan et al., 5 Aug 2025).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Nemori.