Papers
Topics
Authors
Recent
Search
2000 character limit reached

Nemori: Self-Organizing Agent Memory Inspired by Cognitive Science

Published 5 Aug 2025 in cs.AI | (2508.03341v3)

Abstract: LLMs demonstrate remarkable capabilities, yet their inability to maintain persistent memory in long contexts limits their effectiveness as autonomous agents in long-term interactions. While existing memory systems have made progress, their reliance on arbitrary granularity for defining the basic memory unit and passive, rule-based mechanisms for knowledge extraction limits their capacity for genuine learning and evolution. To address these foundational limitations, we present Nemori, a novel self-organizing memory architecture inspired by human cognitive principles. Nemori's core innovation is twofold: First, its Two-Step Alignment Principle, inspired by Event Segmentation Theory, provides a principled, top-down method for autonomously organizing the raw conversational stream into semantically coherent episodes, solving the critical issue of memory granularity. Second, its Predict-Calibrate Principle, inspired by the Free-energy Principle, enables the agent to proactively learn from prediction gaps, moving beyond pre-defined heuristics to achieve adaptive knowledge evolution. This offers a viable path toward handling the long-term, dynamic workflows of autonomous agents. Extensive experiments on the LoCoMo and LongMemEval benchmarks demonstrate that Nemori significantly outperforms prior state-of-the-art systems, with its advantage being particularly pronounced in longer contexts.

Summary

  • The paper introduces Nemori’s dual-memory system using the Two-Step Alignment and Predict-Calibrate principles to optimize long-term memory retention.
  • The approach transforms linear conversation processing into structured narrative episodes, achieving superior results in temporal reasoning benchmarks.
  • Experimental results demonstrate Nemori's efficiency and scalability, effectively handling dialogues with over 105K tokens and outperforming existing models.

Nemori: Self-Organizing Agent Memory Inspired by Cognitive Science

Introduction

The limitations of LLMs in maintaining continuity across long interactions present a significant obstacle for their application in autonomous agents. Nemori addresses this limitation by introducing a novel memory architecture refined through human cognitive principles, facilitating effective longer-term interactions. The architecture relies on two main frameworks: the Two-Step Alignment Principle and the Predict-Calibrate Principle. Through these frameworks, Nemori transforms the linear processing of conversations into a dynamic memory system capable of adaptive learning.

Two-Step Alignment Principle

The Two-Step Alignment Principle focuses on solving complexities in memory granularity by demarcating conversation into meaningful, coherent episodes. This principle is operationalized via the concepts of Boundary Alignment and Representation Alignment. For Boundary Alignment, Nemori employs a semantic boundary detector to segment conversational flows at semantically significant junctures: Figure 1

Figure 1: The conceptual framework of Nemori, illustrating the mapping from problem to principle to computation.

The Representation Alignment follows, translating these segments into structured episodes maintained in Episodic Memory. This transition into narrative structures preserves semantic integrity and contextual information, akin to episodic memory in humans. Through intelligent boundary detection and narrative generation, Nemori optimizes the input chunking process without the arbitrary segmentation pitfalls observed in existing systems (Figure 2). Figure 2

Figure 2: An illustration of different conversation segmentation methods, highlighting the superiority of episodic segmentation.

Predict-Calibrate Principle

Nemori extends its capability for memory organization via the Predict-Calibrate Principle, inspired by the Free-energy Principle. This principle enables proactive learning by addressing prediction gaps, akin to error-driven learning mechanisms in human cognition. It involves three stages: Prediction, Calibration, and Integration.

  • Prediction: Relevant memories are retrieved to predict new episode content.
  • Calibration: Discrepancies between predicted and actual conversations are analyzed, and new semantic knowledge is distilled from these gaps.
  • Integration: New insights are integrated into the semantic memory, refining the system's knowledge base iteratively.

This synergistic framework of predictive adaptation ensures that Nemori continuously refines its semantic memory, learning from each interaction. Figure 3

Figure 3: The Nemori system features modules that collaboratively enhance episodic and semantic memory generation.

Experimental Results

Nemori significantly outperformed existing systems in benchmarks such as LoCoMo and LongMemEvalS_\text{S}, particularly in tasks requiring temporal reasoning. Detailed comparisons reveal substantial advantages in long-term memory retention and query understanding:

  • Efficiency: Despite complex tasks, Nemori requires less input data to outpace the Full Context model by optimizing important memories.
  • Robustness in Large Contexts: Nemori manages dialogues with 105K tokens effectively, demonstrating scalability for extensive interactions.

Nemori's dual-memory system was found to synchronize memory organization effectively, outperforming comparison systems by leveraging its dual-pillar framework. The evidence points to a transformative approach to LLM memory management.

Conclusion

Nemori's contribution to memory systems for LLMs offers a scalable solution to long-term memory retention issues in autonomous agents. By incorporating principles reflective of human cognitive processes, it provides a versatile template for future advancements in AI memory systems, presenting a dynamic, self-optimizing memory capable of supporting complex, temporally extended interactions without excessive computational burden.

The principles and framework established by Nemori bring forth a methodology with broad implications for future developments in autonomous LLM applications, guiding more nuanced, context-sensitive memory architectures. Figure 4

Figure 4: Impact of top-k episodes on LLM score across different models, ensuring optimal performance with minimal data.

Paper to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Explain it Like I'm 14

What is this paper about?

This paper tackles a big problem in today’s chat AIs: they’re great at responding in the moment, but they forget across long or repeated conversations. The authors introduce Nemori, a new “memory system” that helps an AI remember, organize, and learn from past chats—more like a person would.

Nemori is built on two simple, brain-inspired ideas:

  • Two-Step Alignment: split a long chat into meaningful “episodes” (like scenes in a movie), then turn each episode into a clear, short story.
  • Predict-Calibrate: have the AI guess what an episode should contain based on what it already knows, then compare that guess to the real conversation to learn new facts it missed.

Together, these let the AI keep useful memories, forget clutter, and get smarter over time.

What questions are the researchers asking?

They focus on four easy-to-understand questions:

  • Can Nemori help an AI remember and use important information in long chats better than other systems?
  • Which parts of Nemori matter most for good performance?
  • How many past memories does the AI actually need to retrieve to answer well?
  • Does Nemori still work when conversations become extremely long and realistic?

How does Nemori work?

Think of Nemori as a neat combination of a diary and a fact notebook, plus a smart study routine.

Step 1: Split the chat into episodes (like movie scenes)

Long chats jump between topics. Nemori uses an AI “boundary detector” to spot when the conversation changes subject or purpose. When it sees a clear shift (or when a temporary buffer gets full), it cuts the chat there. Each chunk is an episode that stays semantically coherent—like keeping an entire “apple discussion” together instead of breaking it in the middle.

Why this helps: If you slice conversations at random places, you lose context. Grouping by meaning keeps the storyline intact.

Step 2: Turn each episode into a short, readable story

For each episode, Nemori creates:

  • a short title (what this episode is about), and
  • a third-person narrative summary (what happened and why it matters).

This is called Representation Alignment. It preserves the important details and context in a human-friendly way—like writing a journal entry you can quickly scan later.

Why this helps: Clear titles and summaries make retrieval faster and reduce confusion.

Step 3: Learn facts by “guessing, then checking” (Predict-Calibrate)

This is Nemori’s “study routine,” inspired by how people learn better by trying first, then comparing with the answer key.

  • Prediction: Given a new episode’s title and what the AI already knows, Nemori predicts what the content will be. It first pulls related memories (using meaning-based search), then tries to “fill in” the episode.
  • Calibration: It compares the prediction to the actual conversation for that episode and looks for differences—these are “gaps,” or new things the AI didn’t know.
  • Integration: It turns those new bits into clean, reusable facts and adds them to the semantic memory (the fact notebook).

Over time, this loop builds a high-quality knowledge base from lived conversations—without just copying everything.

Finding the right memories quickly

Nemori uses meaning-based search (vector retrieval) to find relevant episodes and facts. Instead of exact keyword matches, it looks for information that “means the same thing,” even if the words differ.

Why this helps: When answering a new question, the AI can pull just the most relevant pieces, staying accurate without wading through thousands of tokens.

What did they find?

The team tested Nemori on two challenging benchmarks:

  • LoCoMo: long chats with various question types.
  • LongMemEvalₛ: much longer, more realistic conversations (around 105,000 tokens on average).

Here’s what stood out:

  • Higher accuracy than other systems: Nemori beat popular baselines (like standard RAG and other memory systems) and, in some cases, even outperformed giving the model the entire conversation history. It was especially strong at time-related questions (temporal reasoning), because its episodes keep events in order and its facts turn relative times (like “yesterday”) into exact dates.
  • Much more efficient: On LoCoMo, Nemori used about 88% fewer tokens than feeding the full chat, while still answering better. On the very long LongMemEvalₛ, it used about 95–96% less context and still improved average accuracy.
  • Both memory types matter: Removing episodic memories (stories) or semantic memories (facts) made the system worse. They’re complementary—stories preserve context and flow; facts make reasoning quicker.
  • “Guess-and-check” learning works: The Predict-Calibrate approach produced better knowledge than simply extracting facts directly from the chat. Learning from prediction gaps makes the AI’s knowledge cleaner and more useful.
  • You don’t need to retrieve everything: Performance improved quickly as the system retrieved more episodes, then flattened after about 10. That means you can keep retrieval small and still do great.

Why does this matter?

Nemori shows a practical path to AI that:

  • remembers meaningful things about you and past interactions,
  • learns over time from real conversations,
  • stays efficient, so it’s cheaper and faster to run,
  • and handles very long, dynamic workflows without drowning in text.

This could make virtual assistants more personal and reliable, research copilots more consistent across sessions, and autonomous agents more capable of long-term planning and adaptation. In short, Nemori moves AI memory from “just storing stuff” to “actively understanding and evolving”—a key step toward AI that can truly grow with its users.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Collections

Sign up for free to add this paper to one or more collections.