Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash 79 tok/s
Gemini 2.5 Pro 49 tok/s Pro
GPT-5 Medium 45 tok/s
GPT-5 High 43 tok/s Pro
GPT-4o 103 tok/s
GPT OSS 120B 475 tok/s Pro
Kimi K2 215 tok/s Pro
2000 character limit reached

Long-Term Memory Module

Updated 1 July 2025
  • Long-Term Memory Modules are computational systems that extend AI models by storing and recalling information beyond immediate input contexts.
  • They overcome traditional neural network constraints through internal and external architectures that support multi-step reasoning over long sequences.
  • These modules enhance applications in NLP, computer vision, robotics, and continual learning by maintaining context, facilitating memory consolidation, and enabling efficient retrieval.

A long-term memory module in artificial intelligence refers to a computational mechanism designed to store, retrieve, and manage information over extended periods, often spanning beyond the immediate input context or a model's typical processing window. These modules are developed to overcome limitations in traditional neural network architectures, such as the fixed context window size of Transformers or the vanishing/exploding gradient problem in Recurrent Neural Networks (RNNs), which hinder their ability to effectively process and reason over long sequences or maintain state across numerous interactions. Long-term memory modules aim to enable AI systems to retain knowledge, track states, recall relevant past information, adapt to user preferences over time, and perform reasoning that requires integrating facts or events from distant history. Implementations range from explicit external databases and structured data stores to internal network architectures with specialized memory components or parameters optimized for persistence.

Architectural Approaches for Long-Term Memory

Architectures incorporating long-term memory can broadly be categorized based on whether the memory is primarily internal to the neural network or maintained externally.

Internal memory modules are integrated directly within the network's processing layers. The Tree Memory Network (TMN) (Fernando et al., 2017) uses a recursive binary tree structure composed of Tree-LSTM cells to store and aggregate historical states hierarchically. This structure allows for multi-level abstraction, enabling the model to capture both short-term dependencies (at the leaves) and longer-range relationships (higher in the tree). The memory is queried via an attention mechanism, and updated through hierarchical Tree-LSTM operations. The Large Memory Model (LM2) (Kang et al., 9 Feb 2025) introduces an auxiliary memory bank within each Transformer decoder block. This explicit memory bank interacts with the input sequence via cross-attention for retrieval and updates through input and forget gating mechanisms. The gated update rule is given by: Mt+1=gintanh(Emem)+gforgetMt\mathbf{M}_{t+1} = g_\text{in} \cdot \tanh(\mathbf{E}_\text{mem}) + g_\text{forget} \cdot \mathbf{M}_t where Mt\mathbf{M}_t is the memory bank, Emem\mathbf{E}_\text{mem} is the memory output from cross-attention, and ging_\text{in} and gforgetg_\text{forget} are learned gates. This internal design supports multi-step reasoning and information synthesis over long contexts. The Long Term Memory network (LTM) (Nugaliyadde, 2023), based on an RNN-like structure, uses an additive memory update rule without a forget gate, combined with sigmoid scaling to maintain stable gradients and retain information from very long sequences. Its core cell state update is Ct=σ(W4(Lt+Ct1))C_t = \sigma(W_4 \cdot (L'_t + C_{t-1})), where CtC_t is the cell state and LtL'_t is a combination of current input and previous output.

External memory modules utilize separate data structures or databases managed outside the core neural network parameters. The Long-Term Memory Network (LTMN) for question answering (Ma et al., 2017) combines an external memory module storing input sentences with an LSTM for answer generation. The external memory is accessed via an attention mechanism, computing matching probabilities pi=softmax(uTmi)p_i = \text{softmax}(u^T m_i) between the question embedding uu and sentence embeddings mim_i, and returning a weighted sum o=ipimio = \sum_i p_i m_i as context. MemoryBank (Zhong et al., 2023) employs a hierarchical external storage system for user-AI interactions, including raw logs, daily summaries, and personality profiles. This memory is retrieved using dense vector similarity search (e.g., FAISS) on embeddings hm=E(m)h_m = E(m) and fused with the current context. The LLMs Augmented with Long-Term Memory (LongMem) framework (Wang et al., 2023) uses a decoupled architecture where a frozen backbone LLM encodes past contexts into key-value pairs stored in a cached memory bank. A separate, trainable residual side-network acts as the retriever and reader, fusing retrieved memory with current representations using joint attention. SuMem (M+) (Wang et al., 1 Feb 2025) extends this by offloading older latent-space memory tokens from GPU to CPU, managing a flexible long-term memory pool per layer with a co-trained retriever for dynamic access.

Hybrid approaches combine internal processing with external storage or complex parameter dynamics. Cognitive Memory in LLMs (Shan et al., 3 Apr 2025) outlines a cognitive architecture with sensory, short-term (context window), and long-term memory (external databases like vector stores, graphs). It also discusses parameter-based methods (LoRA, TTT, MoE) and KV cache strategies for extending effective context. The position paper on episodic memory for LLM agents (Pink et al., 10 Feb 2025) proposes a framework integrating in-context (short-term), external episodic (fast-learning, instance-specific), and parametric (slow-learning, generalized) memory, mediated by encoding, retrieval, and consolidation processes.

Memory Structures and Representations

The choice of memory structure significantly influences how information is stored, retrieved, and managed. Different structures are suited for various data types and tasks.

Vector stores are commonly used for semantic memory. MemoryBank (Zhong et al., 2023), RecaLLM (Kynoch et al., 2023), and Cognitive Memory frameworks (Shan et al., 3 Apr 2025) store embeddings of facts, events, or conversation snippets in vector databases, enabling retrieval based on semantic similarity using metrics like cosine similarity.

Sim(vq,vm)=vqvmvqvm\text{Sim}(v_q, v_m) = \frac{v_q \cdot v_m}{\|v_q\|\|v_m\|}

This approach is effective for retrieving semantically related information but may struggle with structured or temporally ordered knowledge.

Structured memory representations, such as trees or graphs, offer advantages for representing relationships and hierarchies. TMN (Fernando et al., 2017) utilizes a recursive binary tree of LSTM cells, where higher nodes abstract lower-level representations, capturing hierarchical temporal dependencies. Cognitive Memory (Shan et al., 3 Apr 2025) and the modular network model of association cortex (Mari, 2021) discuss knowledge graphs (entities and relations) for reasoning over structured information. The modular network model (Mari, 2021) also uses a modular structure where each module stores local "features" and global patterns emerge from combinations of features across interconnected modules, modeled as a random graph. Hebbian learning principles are applied within and between modules for storage and retrieval.

Parameter-based memory methods encode long-term knowledge directly into the model's weights or auxiliary parameters. Cognitive Memory (Shan et al., 3 Apr 2025) describes techniques like LoRA (Low-Rank Adaptation), Test-Time Training (TTT), and Mixture of Experts (MoE) as ways to incorporate learned knowledge into model parameters. This can be seen as a form of memory consolidation where explicit memories are internalized as implicit model knowledge. The Task-Core Memory Management strategy in Long-CL (Huai et al., 15 May 2025) specifically identifies and adaptively updates a subset of "task-core" parameters θt=βtφt+(1βt)θt1\theta_t = \beta_t \cdot \varphi_t + (1 - \beta_t) \cdot \theta_{t-1} critical for retaining knowledge from sequential tasks.

KV cache-based strategies manage the Transformer's internal Key-Value cache to extend the effective context window. Papers like Cognitive Memory (Shan et al., 3 Apr 2025) and MELODI (Chen et al., 4 Oct 2024) discuss techniques for selecting, pruning, or compressing KV pairs (e.g., using attention scores, learned importance, low-rank approximations, chunking) to maintain relevant history within computational constraints. MELODI (Chen et al., 4 Oct 2024) proposes a hierarchical compression scheme with multi-layer recurrent compression for short-term memory and single-layer compressed KV storage for long-term memory, significantly reducing memory footprint while retaining performance.

Embodied AI applications utilize specialized spatial-temporal memory structures. 3DLLM-Mem (Hu et al., 28 May 2025) employs a dual memory system: working memory for current observations and episodic memory realized as a feature bank of dense 3D features from past observations. The long-term spatial memory is maintained as an explicit static point map, updated incrementally using methods like TSDF fusion, filtering out dynamic objects to preserve persistent scene geometry.

Memory Management Mechanisms

Effective long-term memory modules require sophisticated mechanisms for writing new information (acquisition/encoding), reading relevant information (retrieval/access), updating or altering stored memories, and managing capacity (forgetting/consolidation).

Acquisition involves deciding what information from the input or internal state to store in memory. This can be selective based on saliency, relevance, or explicit identification (e.g., persona information in PLATO-LTM (Xu et al., 2022)). MemoryBank (Zhong et al., 2023) uses LLM prompting to summarize raw conversations into daily and global event summaries. Structured Memory (Xing et al., 28 May 2025) uses a learned gated writing mechanism gw=σ(Wwht+bw)g_w = \sigma(W_w h_t + b_w) to control which semantic representations are written to memory.

Retrieval mechanisms fetch relevant memories based on the current context or query. Attention is a common method, used in TMN (Fernando et al., 2017), LTMN (Ma et al., 2017), and LM2 (Kang et al., 9 Feb 2025) to weigh memory contents based on their relevance to the current input. Dense vector search is used in MemoryBank (Zhong et al., 2023), RecaLLM (Kynoch et al., 2023), and LongMem (Wang et al., 2023) to find semantically similar memories. LongMem employs token-to-chunk retrieval for efficiency. SuMem (Wang et al., 1 Feb 2025) uses a co-trained retriever for similarity search in the latent space of long-term memory tokens. 3DLLM-Mem (Hu et al., 28 May 2025) uses working memory tokens as queries to attend to and fuse relevant spatial-temporal features from episodic memory.

Updating and forgetting are crucial for managing memory content over time, ensuring accuracy and preventing capacity exhaustion. RecaLLM (Kynoch et al., 2023) utilizes explicit temporal ordering and an overwrite-by-recency rule for belief updating, ensuring that the most recent "truth statement" for a fact is prioritized. MemoryBank (Zhong et al., 2023) incorporates a memory updating mechanism inspired by the Ebbinghaus Forgetting Curve R=eStR = e^{-S t}, where retention RR depends on time tt since last recall and memory strength SS. This allows for selective forgetting and reinforcement. Structured Memory (Xing et al., 28 May 2025) includes a forgetting function mi(t+1)=(1gf)mi(t)+gwm~im_i(t+1) = (1 - g_f) \cdot m_i(t) + g_w \cdot \tilde{m}_i with a learned forget gate gfg_f. Long-CL (Huai et al., 15 May 2025) uses Adaptive Memory Updating θt=βtφt+(1βt)θt1\theta_t = \beta_t \cdot \varphi_t + (1 - \beta_t) \cdot \theta_{t-1} based on task similarity. Memory consolidation, discussed in the episodic memory paper (Pink et al., 10 Feb 2025) and Long-CL (Huai et al., 15 May 2025), refers to processes that integrate memories into more stable, generalized forms, often involving periodic updates to model parameters or selection of key experiences for rehearsal (MemCon in Long-CL selects hard and differential samples).

Conflict resolution mechanisms handle inconsistencies in memory. RecaLLM (Kynoch et al., 2023) resolves factual conflicts by prioritizing the most recent information. Cognitive Memory (Shan et al., 3 Apr 2025) discusses merging conflicting memories, disambiguation based on context, and even retaining contradictory memories in some cases to enhance realism, drawing inspiration from human cognition.

Applications Across Domains

Long-term memory modules are being applied across a range of AI domains where maintaining state, remembering history, or reasoning over extended contexts is critical.

In NLP, long-term memory addresses limitations in handling long documents and multi-turn dialogue. LTMN (Ma et al., 2017) enhances question answering by allowing models to reason over document facts and generate multi-word answers. PLATO-LTM (Xu et al., 2022) improves open-domain conversation by maintaining explicit, dynamic persona memory for both the user and chatbot across interactions. MemoryBank (Zhong et al., 2023) enables long-term AI companions capable of recalling past conversations, understanding user personality, and providing empathetic responses over time. RecaLLM (Kynoch et al., 2023) facilitates long-term interaction and temporal understanding in LLMs for factual QA and belief updating. LTM (Nugaliyadde, 2023), LongMem (Wang et al., 2023), SuMem (Wang et al., 1 Feb 2025), and LM2 (Kang et al., 9 Feb 2025) extend the effective context window of LLMs, enabling applications like long-document understanding, summarization, and in-context learning with thousands of examples. Structured Memory (Xing et al., 28 May 2025) demonstrates improved consistency and cross-context reasoning in long texts and multi-turn QA.

In computer vision and robotics, long-term memory is essential for tracking objects, navigating environments, and understanding spatial-temporal dynamics over extended periods. TMN (Fernando et al., 2017) models long-term dependencies for aircraft and pedestrian trajectory prediction. MeMOTR (Gao et al., 2023) integrates long-term memory into a Transformer for multi-object tracking, stabilizing track embeddings and improving association accuracy over long video sequences. 3DLLM-Mem (Hu et al., 28 May 2025) provides embodied 3D LLM agents with spatial-temporal memory for planning, acting, and reasoning in complex 3D environments over long horizons. Video world models (Wu et al., 5 Jun 2025) use geometry-grounded long-term spatial memory to maintain scene consistency during camera revisits, overcoming forgetting in long video generation.

Continual learning, where models learn sequentially from a stream of tasks, inherently requires robust long-term memory to prevent catastrophic forgetting. Long-CL (Huai et al., 15 May 2025) proposes specific task-core memory management and consolidation mechanisms to retain knowledge and adapt parameters effectively over vast streams of tasks.

Addressing Limitations and Scaling

A primary motivation for long-term memory modules is to overcome limitations of standard architectures, particularly the bounded context window of Transformers, which limits their ability to handle long inputs and interactions. Approaches like LongMem (Wang et al., 2023) and SuMem (Wang et al., 1 Feb 2025) explicitly tackle this by augmenting LLMs with external memory banks that can store context far exceeding the native window size (e.g., >65k tokens for LongMem, >160k tokens for SuMem).

Computational and memory efficiency are critical challenges when dealing with long sequences. Dense attention in Transformers scales quadratically with sequence length, making it infeasible for very long contexts. Long-term memory modules mitigate this by selectively storing and retrieving information. MELODI (Chen et al., 4 Oct 2024) achieves significant memory reduction (8x compared to a baseline) through hierarchical compression of KV pairs. SuMem (Wang et al., 1 Feb 2025) offloads its long-term memory to CPU to reduce GPU memory footprint, making it feasible for long-context processing on less hardware. KV cache management strategies discussed in Cognitive Memory (Shan et al., 3 Apr 2025) also focus on efficient token selection and compression.

Catastrophic forgetting, where learning new tasks causes performance degradation on previously learned tasks, is a major challenge in continual learning and long-term interaction. Long-term memory mechanisms help retain old knowledge. Long-CL (Huai et al., 15 May 2025) directly addresses this with memory management (MemMan) and consolidation (MemCon) strategies that protect task-critical parameters and rehearse important samples. The episodic memory framework (Pink et al., 10 Feb 2025) proposes consolidation into parametric memory as a way to generalize knowledge from episodes while preserving specific instances in external memory. The modular network model (Mari, 2021) suggests that the combinatorial nature of feature sharing across modules, managed by retrieval dynamics, is key to robust storage capacity and resistance to interference.

Performance Evaluation and Benchmarks

The development of long-term memory modules necessitates specialized benchmarks that test the ability to retain and utilize information over extended sequences or interactions. Standard benchmarks like LLMing datasets (Penn Treebank, WikiText-2, Google Billion Word) are used to evaluate perplexity, demonstrating a model's ability to predict the next token based on history (LTM (Nugaliyadde, 2023), LongMem (Wang et al., 2023), MELODI (Chen et al., 4 Oct 2024)).

Question Answering (QA) datasets, particularly those requiring reasoning over long documents (e.g., SQuAD (Ma et al., 2017), NarrativeQA (Xing et al., 28 May 2025)) or historical context (bAbI (Ma et al., 2017), BABILong (Kang et al., 9 Feb 2025), LongBook-QA (Wang et al., 1 Feb 2025)), are critical for evaluating memory retrieval and reasoning. Metrics include Exact Match (EM), F1 score, and BLEU score. RecaLLM (Kynoch et al., 2023) uses the TruthfulQA benchmark and custom synthetic temporal datasets to evaluate factual accuracy and temporal understanding.

Dialogue consistency and engagingness are key metrics for conversational AI with long-term memory. PLATO-LTM (Xu et al., 2022) introduces the DuLeMon dataset with explicit persona grounding and evaluates models using human judgments on coherence, consistency, and engagingness, alongside automatic metrics.

Tracking performance in vision tasks is evaluated using metrics like HOTA (Higher Order Tracking Accuracy), AssA (Association Accuracy), and IDF1, as demonstrated by MeMOTR (Gao et al., 2023) on datasets like DanceTrack, MOT17, and BDD100K. Embodied AI tasks require metrics like success rate and navigation efficiency, tested on benchmarks like 3DMem-Bench (Hu et al., 28 May 2025). Video world models evaluate consistency and quality using metrics like PSNR, SSIM, LPIPS, and VBench, often with user studies (Wu et al., 5 Jun 2025).

Continual learning benchmarks, such as MMLongCL-Bench and TextLongCL-Bench (Huai et al., 15 May 2025), measure Average Performance (AP) and Average Forgetting (AF) across long sequences of distinct tasks.

Biological Inspiration and Future Directions

Many long-term memory architectures draw inspiration from biological memory systems, particularly human cognition. The categorization into sensory, short-term, and long-term memory is a common cognitive model applied to LLMs (Shan et al., 3 Apr 2025). The Ebbinghaus Forgetting Curve, describing memory decay and reinforcement, inspires update mechanisms in MemoryBank (Zhong et al., 2023) and is discussed in Cognitive Memory (Shan et al., 3 Apr 2025).

Synaptic plasticity, the biological mechanism of memory storage at synapses, informs computational models of learning and memory. The paper "Computational models of long term plasticity and memory" (Fusi, 2017) reviews models like the Cascade Model and Bidirectional Cascade Model, showing how synaptic complexity (multiple interacting meta-stable states) can achieve superior memory lifetime and capacity compared to simple models. Key findings suggest that biological complexity, including metaplasticity (history-dependent plasticity), is crucial for both rapid learning and long-term stability. These models provide theoretical guidance for designing memory modules with multiple timescales and dynamic update rules.

Modularity in the brain, such as functional specialization in the neocortex, also provides inspiration. The modular network model (Mari, 2021) models long-term memory in association cortex as a network of interconnected Hebbian autoassociator modules, where global memories are combinations of local features. This modularity, combined with structured dynamics and feature sharing, supports high storage capacity and robust retrieval.

Episodic memory, which stores specific, contextualized experiences, is highlighted as a crucial component for long-term LLM agents (Pink et al., 10 Feb 2025). The proposed framework integrates in-context, external episodic, and parametric memory, emphasizing properties like single-shot learning, instance-specificity, and contextualization. Similarly, 3DLLM-Mem (Hu et al., 28 May 2025) uses a dual memory system (working and episodic) for embodied agents.

Future research directions include developing more human-like memory systems that are more context-dependent, exhibit controlled ambiguity, and support memory reconstruction (Shan et al., 3 Apr 2025). Integrating memory with user profiling and external knowledge systems is crucial for personalized, factual agents (Shan et al., 3 Apr 2025, Xu et al., 2022). Learning what and when to remember or forget dynamically, rather than relying on fixed rules, is an active area (Shan et al., 3 Apr 2025). Combining symbolic structures (graphs, tables) with neural representations is explored (Shan et al., 3 Apr 2025). Scalability and efficiency of memory management at the infrastructure level remain critical (Shan et al., 3 Apr 2025, Wang et al., 1 Feb 2025). Extending memory beyond text to handle complex multi-modal sequences, dynamic tree structures, and integrating memory with low-level policy control in embodied agents are also key areas (Fernando et al., 2017, Hu et al., 28 May 2025, Wu et al., 5 Jun 2025). The episodic memory framework (Pink et al., 10 Feb 2025) calls for unified architectures and benchmarks that support all properties of episodic memory for efficient continual learning in agents.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (19)