Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
173 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
46 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Rethinking Memory in AI: Taxonomy, Operations, Topics, and Future Directions (2505.00675v2)

Published 1 May 2025 in cs.CL

Abstract: Memory is a fundamental component of AI systems, underpinning LLMs-based agents. While prior surveys have focused on memory applications with LLMs (e.g., enabling personalized memory in conversational agents), they often overlook the atomic operations that underlie memory dynamics. In this survey, we first categorize memory representations into parametric and contextual forms, and then introduce six fundamental memory operations: Consolidation, Updating, Indexing, Forgetting, Retrieval, and Compression. We map these operations to the most relevant research topics across long-term, long-context, parametric modification, and multi-source memory. By reframing memory systems through the lens of atomic operations and representation types, this survey provides a structured and dynamic perspective on research, benchmark datasets, and tools related to memory in AI, clarifying the functional interplay in LLMs based agents while outlining promising directions for future research\footnote{The paper list, datasets, methods and tools are available at \href{https://github.com/Elvin-Yiming-Du/Survey_Memory_in_AI}{https://github.com/Elvin-Yiming-Du/Survey\_Memory\_in\_AI}.}.

Summary

  • The paper introduces a comprehensive taxonomy categorizing AI memory into parametric, contextual unstructured, and contextual structured types.
  • The paper defines six fundamental memory operations—consolidation, indexing, updating, forgetting, retrieval, and compression—to manage memory effectively.
  • The paper maps these operations to four key research topics and outlines future directions including lifelong learning and multi-modal integration.

Memory is a fundamental component of AI systems, particularly LLMs and LLM-based agents, enabling coherent and long-term interactions. While previous surveys have focused on memory applications, they often overlook the underlying atomic operations. This paper proposes a structured and dynamic perspective on memory in AI by categorizing memory types and defining fundamental memory operations, mapping these to key research topics, and outlining future directions.

The paper introduces a taxonomy of memory representations:

  • Parametric Memory: Knowledge implicitly stored within the model's weights, acquired during training. It offers fast, context-free retrieval but lacks transparency and is difficult to selectively update.
  • Contextual Unstructured Memory: Explicit memory storing heterogeneous, modality-general information (text, images, audio, video) without predefined structures. It enables grounding reasoning in perceptual signals and multi-modal context. This can be short-term (transient, session-level) or long-term (persistent, cross-session).
  • Contextual Structured Memory: Explicit memory organized into interpretable formats like knowledge graphs, tables, or ontologies. It supports symbolic reasoning and precise querying, complementing the associative capabilities of models. Like unstructured memory, it can be short-term or long-term.

Based on these memory types, six fundamental memory operations are defined, categorized into Memory Management and Memory Utilization:

Memory Management: Governs how memory is stored, maintained, and pruned.

  • Consolidation: Transforming short-term experiences (e.g., dialogue turns, agent trajectories) into persistent memory. Methods involve summarization, structuring (e.g., into graphs or KBs), or encoding into model parameters.
  • Indexing: Creating auxiliary structures (e.g., entities, timestamps, content-based representations) to organize memory for efficient retrieval. Graph-based, signal-enhanced, and timeline-based indexing approaches exist.
  • Updating: Modifying existing memory in response to new data. This includes intrinsic methods (e.g., selective editing, recursive summarization, memory blending) and extrinsic methods (incorporating user feedback). For parametric memory, this often involves locate-and-edit mechanisms.
  • Forgetting: Selectively removing outdated, irrelevant, or harmful memory content. This can be time-based decay or active forgetting through unlearning techniques for parametric memory or semantic filtering for contextual memory.

Memory Utilization: How stored memory is retrieved and used during inference.

  • Retrieval: Accessing relevant memory content based on a query. Methods include query-centered (improving query formulation), memory-centered (enhancing memory organization/ranking), and event-centered (retrieving based on temporal/causal structure) approaches. Retrieval can span multiple sources and modalities.
  • Compression: Reducing memory size while preserving essential information for efficient storage and reasoning, especially relevant for long contexts. This can be pre-input (compressing the full context before feeding to the model) or post-retrieval (compressing retrieved content). Unlike consolidation, compression focuses on efficiency during inference.

The paper maps these operations and memory types to four key research topics:

  1. Long-Term Memory: Focuses on persistent memory (contextual-structured and unstructured) for multi-session dialogues, personalized agents, and RAG. It involves all six memory operations, with particular emphasis on management (consolidation of dialogue history, indexing for retrieval) and utilization (retrieval for relevant context, integration for grounded generation). Personalization, a key application, utilizes both model-level adaptation (encoding preferences in parameters) and memory-level augmentation (retrieving user-specific info). Datasets like LongMemEval [wu2024longmemeval] and MemoryBank [zhong2024memorybank] are relevant.
  2. Long-Context Memory: Addresses managing very long input sequences, particularly challenging for Transformers due to quadratic complexity. It primarily concerns contextual-unstructured and parametric memory (KV cache).
    • Parametric Efficiency: Optimizing the KV cache (parametric memory). Methods include KV cache dropping (static or dynamic selection to remove less important entries), KV cache storing optimization (quantization, low-rank approximation to preserve entries in a smaller footprint), and KV cache selection (selectively loading required entries for faster inference). Papers like H2_2O [zhang2023ho] and KVQuant [NEURIPS2024_028fcbcf] are examples.
    • Contextual Utilization: Optimizing the use of long contextual memory, addressing issues like "lost in the middle." Methods involve context retrieval (identifying and locating key information, e.g., graph-based, token/fragment selection) and context compression (soft prompt or hard prompt compression to fit context windows). Examples include GraphReader [li-etal-2024-graphreader] and LongLLMLingua [jiang-etal-2024-longLLMlingua]. Benchmarks like LongBench [bai-etal-2024-longbench] are used.
  3. Parametric Memory Modification: Focuses on dynamically adapting the knowledge encoded in model parameters (parametric memory). It mainly involves the Updating and Forgetting operations, sometimes supported by Retrieval (e.g., in RAG-based continual learning).
    • Editing: Localized modification of knowledge without full retraining. Approaches include locating-then-editing (identify and modify specific weights), meta-learning (an editor network predicts weight changes), prompt-based (steer output via prompts), and additional-parameter methods (add external modules). ROME [meng2022locating] and AlphaEdit [fang2024alphaedit] are notable editing methods.
    • Unlearning: Selectively removing unwanted or sensitive information. Strategies include additional-parameter methods (add components to adjust memory), prompt-based (manipulate inputs/ICL), locating-then-unlearning (identify and target parameters), and training objective-based methods (modify loss/optimization). TOFU [maini2024tofu] and WAGLE [jia2024wagle] are recent examples.
    • Continual Learning: Incrementally incorporating new knowledge while preventing catastrophic forgetting. Approaches include regularization-based (constrain updates to important weights, e.g., TaSL [feng2024tasl]) and replay-based methods (reintroduce past samples, e.g., DSI++ [mehta2022dsi++]). LSCS [wang2024towards] explores this in interactive agents.
  4. Multi-Source Memory: Deals with reasoning over and integrating knowledge from diverse sources, including internal parameters and external contextual memories (structured and unstructured, potentially multi-modal). It primarily involves Retrieval and Fusion/Integration operations.
    • Cross-Textual Integration: Combining textual information from multiple structured and unstructured sources. Challenges include reasoning over heterogeneous formats (e.g., ChatDB [hu2023chatdb], StructRAG [li2024structrag]) and resolving factual or semantic conflicts between sources (e.g., RKC-LLM [wang2023resolving], BGC-KC [tan2024blinded]).
    • Multi-modal Coordination: Fusion and retrieval across modalities like text, image, audio, and video. Fusion involves aligning cross-modal information (e.g., embedding into shared space like in UniTranSeR [ma-etal-2022-unitranser] or long-term integration like in LifelongMemory [wang2023lifelongmemory]). Retrieval enables accessing stored knowledge across modalities, often using embedding similarity (e.g., VISTA [zhou2024vista], IGSR [wang2025new] for sticker retrieval).

The paper also discusses practical tools supporting memory integration, organized in layers:

  • Components: Foundational infrastructure like vector databases (FAISS [douze2024faiss]), graph databases (Neo4j [neo4j2012]), LLMs (Llama [touvron2023llama], GPT-4 [achiam2023gpt]), and retrieval mechanisms (BM25 [robertson1995okapi], Contriever [izacard2021unsupervised], embeddings).
  • Frameworks: Modular interfaces for memory operations, abstracting complex processes (e.g., LlamaIndex [llamaindex], LangChain [langchain], EasyEdit [wang-etal-2024-easyedit]).
  • Memory Layer Systems: Platforms providing orchestration and lifecycle management for memory as a service (e.g., Mem0 [mem0], Zep [rasmussen2025zep], Memobase [memobase2025]).
  • Products: End-user systems leveraging memory for personalization and long-term interaction (e.g., ChatGPT [openai2023chatgpt], Grok [grok2023]).

Finally, the paper highlights open challenges and future directions, including developing spatio-temporal memory, enabling effective parametric memory retrieval, building lifelong learning agents that integrate different memory types, drawing inspiration from brain-inspired memory models (complementary learning systems, hierarchical structures like K-Lines Theory [MINSKY1980117]), achieving unified memory representations, managing multi-agent memory, and addressing memory threats and safety issues, particularly in machine unlearning [machine-unlearning-lina-2025].

Youtube Logo Streamline Icon: https://streamlinehq.com
Reddit Logo Streamline Icon: https://streamlinehq.com