Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
98 tokens/sec
GPT-4o
61 tokens/sec
Gemini 2.5 Pro Pro
46 tokens/sec
o3 Pro
8 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

MemoryBank: Enhancing Large Language Models with Long-Term Memory (2305.10250v3)

Published 17 May 2023 in cs.CL and cs.AI

Abstract: Revolutionary advancements in LLMs have drastically reshaped our interactions with artificial intelligence systems. Despite this, a notable hindrance remains-the deficiency of a long-term memory mechanism within these models. This shortfall becomes increasingly evident in situations demanding sustained interaction, such as personal companion systems and psychological counseling. Therefore, we propose MemoryBank, a novel memory mechanism tailored for LLMs. MemoryBank enables the models to summon relevant memories, continually evolve through continuous memory updates, comprehend, and adapt to a user personality by synthesizing information from past interactions. To mimic anthropomorphic behaviors and selectively preserve memory, MemoryBank incorporates a memory updating mechanism, inspired by the Ebbinghaus Forgetting Curve theory, which permits the AI to forget and reinforce memory based on time elapsed and the relative significance of the memory, thereby offering a human-like memory mechanism. MemoryBank is versatile in accommodating both closed-source models like ChatGPT and open-source models like ChatGLM. We exemplify application of MemoryBank through the creation of an LLM-based chatbot named SiliconFriend in a long-term AI Companion scenario. Further tuned with psychological dialogs, SiliconFriend displays heightened empathy in its interactions. Experiment involves both qualitative analysis with real-world user dialogs and quantitative analysis with simulated dialogs. In the latter, ChatGPT acts as users with diverse characteristics and generates long-term dialog contexts covering a wide array of topics. The results of our analysis reveal that SiliconFriend, equipped with MemoryBank, exhibits a strong capability for long-term companionship as it can provide emphatic response, recall relevant memories and understand user personality.

Overview

The paper "MemoryBank: Enhancing LLMs with Long-Term Memory" (Zhong et al., 2023 ) addresses inherent limitations in contemporary LLMs, particularly their inability to persist and leverage long-term memory within extended dialogues. By explicitly incorporating a memory mechanism that mirrors human cognitive decay and reinforcement, the methodology proposed extends LLM applicability to tasks that require sustained interaction, such as psychological counseling and companion systems.

Main Contributions

The work introduces a compositional memory module termed MemoryBank, which systematically augments LLMs with the following capabilities:

  • Dynamic Memory Storage and Retrieval: MemoryBank implements a hierarchical event-based memory storage system, where conversation logs are segmented into daily event summaries and overarching global profiles. The module leverages dense representations with dual-tower encoding models and efficient retrieval via FAISS, thereby optimizing context matching amid long-term interactions.
  • Memory Updating Based on Cognitive Decay: Drawing on the exponential decay principle encapsulated in the Ebbinghaus Forgetting Curve, MemoryBank modulates memory retention by adjusting the discrete memory strength upon recall events. The module reinforces frequently accessed memories and attenuates less significant data over time, thereby ensuring a balance between retention and computational efficiency.
  • Prototype Deployment in SiliconFriend: The integration of MemoryBank into SiliconFriend showcases the practical potential of sustained, contextually enriched dialogue. Fine-tuning on approximately 38k psychologically oriented dialogs has led to a system capable of adaptive empathic responses, memory recall, and nuanced personality modeling over long-term interactions.

Technical Architecture

MemoryBank is built upon a tripartite architecture:

1. Memory Storage

  • Hierarchical Summarization: Conversations are recorded with accurate timestamps and are subsequently distilled into multi-level summaries. Daily event summaries aggregate short-term interactions, which are in turn compacted into global user profiles.
  • Efficient Data Structures: The memory items are stored with associated metadata (e.g., timestamps, significance scores) to facilitate rapid retrieval and dynamic updating. This approach supports both the storage of raw dialogue context and distilled semantic representations.

2. Memory Retrieval

  • Dense Retrieval Framework: Similar to DPR, both query and memory pieces are embedded into a shared semantic vector space. Dual-tower encoding architectures generate these embeddings, which are then indexed via FAISS to enable scalable similarity search.
  • Contextual Relevance: During inference, the system retrieves contextually relevant memories based on the ongoing conversation, ensuring that the LLM leverages historic interaction data to produce coherent outputs.

3. Memory Updating Mechanism

  • Ebbinghaus-Inspired Decay Model: Memory retention is modelled according to an exponential decay function:

R(t)=et/SR(t) = e^{-t/S}

where R(t)R(t) represents the retention score, tt denotes elapsed time, and SS is the subjective memory strength which is incremented upon recall. The algorithm selectively reinforces memories at each retrieval, effectively resetting or elevating their decay curve parameter.

  • Discrete Update Protocol: The memory updating module employs a discrete time-step update rule. Upon each retrieval:

1
2
3
4
5
6
7
for each memory_item in retrieved_memories:
    if recall_event(memory_item):
        memory_item.strength ← memory_item.strength + Δ
        memory_item.last_recall_time ← current_time
    else:
        time_elapsed ← current_time - memory_item.last_update
        memory_item.strength ← memory_item.strength * exp(-time_elapsed / S)

This mechanism ensures that frequently retrieved and significant memory items persist, whereas outdated or less pertinent memories naturally fade.

Experimental Validation

The experimental evaluation is two-fold:

Qualitative Analysis

  • Real-world User Interactions: SiliconFriend has been deployed in controlled environments where qualitative metrics such as empathic accuracy, contextual coherence, and user profiling were meticulously analyzed. Empirical observations indicate that users perceive enhanced continuity in dialogue flow due to memory recall capabilities.
  • Psychological Consistency: Through the integration of psychological dialogue data, the system exhibits strong empathic responses and contextual sensitivity, aligning with the desired companion characteristics.

Quantitative Analysis

  • Longitudinal Simulations: Simulated dialogues spanning 10 days with approximately 15 diverse virtual users (generated by ChatGPT) were executed. The system’s performance was evaluated on 194 memory probing queries, with metrics focusing on:
    • Memory retrieval accuracy
    • Response correctness and contextual coherence
    • Ranking score of retrieved memories
  • Numerical Findings: The quantitative results demonstrate that the MemoryBank-enhanced LLM, particularly when integrated with ChatGPT, achieved superior retention metrics and retrieval precision relative to baseline models. These findings substantiate the claim that the memory updating mechanism significantly contributes to coherent long-term dialogue.

Practical Implementation Considerations

  • Integration Flexibility: MemoryBank is designed to be agnostic to LLM architectures, supporting both closed-source models like ChatGPT and open-source variants like ChatGLM. This flexibility allows for iterative implementation and facilitates adaptation across heterogeneous platforms.
  • Computational Overhead: The memory indexing and retrieval phases, while efficient owing to FAISS, introduce additional latency. System designers must balance the resolution of dense embeddings and the frequency of memory updates with available computational resources.
  • Scalability and Maintenance: Deploying MemoryBank in a production environment necessitates robust mechanisms for memory pruning and periodic re-indexing. Ensuring that memory storage and selective forgetting align with user-specific data retention policies is critical for maintaining system performance over extended operational periods.

Conclusion

The MemoryBank paper outlines a robust framework for augmenting LLMs with a human-like long-term memory mechanism. By structuring memory storage into hierarchical levels, employing dense retrieval systems, and applying an Ebbinghaus-inspired updating mechanism, the work addresses key deficiencies in traditional LLMs. The practical deployment in SiliconFriend validates the approach, with both qualitative and quantitative analyses supporting the efficacy of the method in sustained interactive scenarios. The work provides a viable pathway for future research and practical systems requiring persistent contextual awareness and adaptive memory management.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (5)
  1. Wanjun Zhong (49 papers)
  2. Lianghong Guo (6 papers)
  3. Qiqi Gao (2 papers)
  4. He Ye (16 papers)
  5. Yanlin Wang (76 papers)
Citations (80)