Papers
Topics
Authors
Recent
Search
2000 character limit reached

AI Meets Brain: Memory Systems from Cognitive Neuroscience to Autonomous Agents

Published 29 Dec 2025 in cs.CL, cs.AI, and cs.CV | (2512.23343v1)

Abstract: Memory serves as the pivotal nexus bridging past and future, providing both humans and AI systems with invaluable concepts and experience to navigate complex tasks. Recent research on autonomous agents has increasingly focused on designing efficient memory workflows by drawing on cognitive neuroscience. However, constrained by interdisciplinary barriers, existing works struggle to assimilate the essence of human memory mechanisms. To bridge this gap, we systematically synthesizes interdisciplinary knowledge of memory, connecting insights from cognitive neuroscience with LLM-driven agents. Specifically, we first elucidate the definition and function of memory along a progressive trajectory from cognitive neuroscience through LLMs to agents. We then provide a comparative analysis of memory taxonomy, storage mechanisms, and the complete management lifecycle from both biological and artificial perspectives. Subsequently, we review the mainstream benchmarks for evaluating agent memory. Additionally, we explore memory security from dual perspectives of attack and defense. Finally, we envision future research directions, with a focus on multimodal memory systems and skill acquisition.

Summary

  • The paper presents a unified survey bridging cognitive neuroscience and LLM-driven agents through detailed memory systems analysis.
  • It examines diverse memory types, storage mechanisms, and dynamic management methods across biological and artificial domains.
  • The study highlights methods to overcome context limitations, enable long-term personalization, and mitigate security threats in AI systems.

Memory Systems in Cognitive Neuroscience and Autonomous Agents: A Unified Survey

Introduction

"AI Meets Brain: A Unified Survey on Memory Systems from Cognitive Neuroscience to Autonomous Agents" (2512.23343) presents a comprehensive synthesis of memory research across cognitive neuroscience and LLM-driven agent frameworks. The survey charts foundational definitions, taxonomies, storage and management mechanisms, benchmarking protocols, security considerations, and future research directions, emphasizing deep parallels and multidirectional inspirations between biological and artificial memory systems.

Definitions and Functions of Memory

The survey delineates memory from three perspectives: cognitive neuroscience, LLMs, and autonomous agents.

From a neuroscientific stance, memory is characterized as a cognitive process involving the encoding, storage, consolidation, and retrieval of information, supporting adaptivity, learning, and foresight in dynamic environments. LLM memory is dissected into three categories: parametric memory (internalized in model weights), working memory (context window), and explicit external memory (auxiliary storage mechanisms such as RAG). Autonomous agent memory is positioned as a dynamic system, transcending mere storage, now entailing structured storage, dynamic scheduling, and evolving cognitive processing.

Functional Utility of Agent Memory

Memory augments LLM-driven agents by addressing several critical limitations and aspirations:

  1. Alleviating Context Window Constraints: Through structured information management -- both heuristic and learnable, e.g., context folding and reinforcement-optimized summarization -- infinite interaction histories are mapped into finite, efficient representations, reducing computational overhead and mitigating phenomena like "lost-in-the-middle".
  2. Facilitating Long-term Personalization: Memory supports the formation of persistent, evolving user profiles and enables agents to align decision-making with user-specific preferences and historical behaviors, supporting both static and online adaptation.
  3. Enabling Experience-based Reasoning: By instantiating procedural and strategic memory, agents execute both strategic guidance (retrieval of instructive precedents, trajectories, or guidelines) and procedural solidification (distillation of workflows, templates, or executable skill libraries), closing the gap between static LLMs and continually learning entities. Figure 1

    Figure 1: Memory utility in LLM-driven agents: overcoming context constraints, building long-term personalization, and enabling feedback-driven planning and reasoning.

Taxonomy of Memory Systems

A dual-axis classification is put forth:

  • Nature-based Taxonomy: Episodic memory (procedural, interactional, tool-augmented trajectories) vs. Semantic memory (fact-based, declarative, knowledge-centric content).
  • Scope-based Taxonomy: Inside-trail (ephemeral, within-episode) vs. Cross-trail (persistent, multi-session/trajectory) memory, corresponding to short-term and long-term dynamics. Figure 2

    Figure 2: (a) Nature-based taxonomy: episodic vs. semantic; (b) Scope-based taxonomy: inside-trail vs. cross-trail memory in agents.

Storage Mechanisms: Cognition and Computation

Memory storage is explored across two dimensions: location and format.

  • Neuroscience: Short-term memory leverages distributed sensory-frontoparietal networks and is realized via persistent neural activity or activity-silent synaptic states. Long-term memory depends on hippocampal-neocortical coordination, with consolidation and abstraction into event-based and cognitive map structures.
  • Artificial Agents: Storage locations include context windows for ephemeral information and external memory banks for persistent knowledge. Representational formats comprise text, graphs, parametric (model weight) embedding, and latent vector-based structures. Figure 3

    Figure 3: Memory storage in cognitive neuroscience: locations and formats for short- and long-term memory.

Memory Management

Both biological and artificial systems implement dynamic, closed-loop management frameworks:

  • Neuroscience: Cycles of memory formation (encoding, consolidation, integration), updating (prediction-error driven, differentiation/integration), and retrieval (cue-triggered, reconsolidation-enabled adaptive retrieval). Notably, retrieval itself reopens plasticity windows for trace modification. Figure 4

    Figure 4: Dynamic cycle of memory management in cognitive neuroscience—formation, updating, and retrieval fostering flexible adaptation.

  • Agents: Management encompasses extraction (flat, hierarchical, generative paradigms), updating (inside-trail vs. cross-trail), retrieval (similarity-based and multifactorial, including importance, recency, Q-value-based), and application (context augmentation and parameter distillation). Autonomous memory operations increasingly leverage RL and self-reflection mechanisms. Figure 5

    Figure 5: Memory management pipeline for agents: extraction, updating, retrieval, and utilization for persistent experience regulation.

Benchmarking Agent Memory

Benchmarks are categorized as:

  • Semantic-oriented: Measuring fidelity, memory dynamics, and generalization via tasks focusing on knowledge retention, dynamic updating, and abstraction-driven transfer (e.g., LoCoMo, MemBench, PersonaMem, HaluMem, LifelongAgentBench).
  • Episodic-oriented: Evaluating practical task performance in vertical domains where memory is critical -- web interaction (WebChoreArena, WebArena), tool-use (ToolBench, GAIA), and embodied environments (BabyAI, ScienceWorld, Mind2Web).

The survey stresses that generalized benchmarks probe the transformation from conversationalist to competent executor or problem-solver, with robust memory as a prerequisite.

Security Considerations

Memory in agents expands the attack surface for both extraction-based (privacy leakage) and poisoning-based (backdoor/data manipulation) attacks. Defense strategies span:

  • Retrieval-level: Purification, anomaly detection, structural consensus validation.
  • Response-level: Multi-agent collaborative review and reasoning trajectory rehearsal.
  • Privacy-level: Anonymization, partitioned workspaces, context integrity analysis.

The survey highlights that while retrieval-augmented systems enhance temporal adaptivity, they are susceptible to nuanced attack vectors not present in static parametric memory.

Future Directions

Two priority avenues are emphasized:

  1. Multimodal Memory Systems: The survey identifies open problems in semantic consistency and alignment for non-textual modalities; solutions are emerging through compression, symbolic abstraction, and hybrid memory representations.
  2. Agent Skills and Memory Sharing: Modular skills—encapsulating procedures, instructions, and knowledge—are posited as critical for composability and cross-agent transfer. There is a recognized need for universal representations and APIs to facilitate cross-modal, cross-model, and cross-agent memory portability and transfer.

Conclusion

This survey establishes a unified conceptual framework connecting neuroscientific and artificial memory research, advocating for reciprocal inspiration between fields. It systematically analyzes theoretical foundations, taxonomies, management architectures, evaluation protocols, security risks, and future challenges, advocating for robust, human-like memory mechanisms to further the development of adaptive, resilient, and generalizable AI systems. The prospects for multimodal and sharable memory modules remain central to advancing autonomous agents beyond the limits of current architectures (2512.23343).

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Knowledge Gaps

Knowledge gaps, limitations, and open questions

The following points summarize unresolved issues and concrete research opportunities left open by the paper.

  • Lack of an experimentally validated mapping between cognitive neuroscience mechanisms (e.g., hippocampal–neocortical consolidation, replay, reconsolidation) and specific agent memory modules (encoding, retrieval, updating, forgetting); design controlled, ablation-based studies that implement brain-inspired variants and quantify their impact on agent performance.
  • Absence of standardized definitions and metrics for “agent memory quality” (e.g., relevance, fidelity, temporal accuracy, provenance integrity, utilization cost, personalization alignment); propose and validate a metric suite and reporting protocol.
  • Limited long-horizon benchmarks with months-long, multi-session interactions and ground-truth annotations of what should be remembered, updated, or forgotten; build open datasets with episodic and semantic targets, decay schedules, and contradiction cases.
  • No comparative quantification of trade-offs among parametric, working, and explicit external memory across tasks (latency, accuracy, robustness, maintenance/update cost, privacy risk); conduct systematic head-to-head evaluations with cost–benefit analyses.
  • Heuristic vs. learnable memory management remains under-specified in terms of safety and reliability; develop constrained RL/optimization frameworks for summarization, deletion, and folding actions with safeguards, auditing, and rollback.
  • Context-window mitigation strategies (folding, paging, summarization) lack model-agnostic, generalizable prescriptions; test policies across diverse LLM architectures and quantify lost-in-the-middle reduction versus information loss.
  • Retrieval noise and spurious context injection are not robustly handled; design calibrated retrieval with uncertainty estimates, provenance filters, and conflict-aware reranking to minimize irrelevant or adversarial memory usage.
  • Memory consolidation and forgetting policies are ad hoc; create adaptive retention/decay algorithms that balance recency, relevance, diversity, and redundancy with performance guarantees on downstream tasks.
  • Multimodal memory systems are proposed but not concretely architected; develop cross-modal indexing, alignment, and consolidation for text–image–audio–video, and evaluate cross-modal recall, temporal grounding, and compression effectiveness.
  • Skill acquisition and memory sharing across agents lack standardized interfaces and safety guarantees; specify skill schemas (preconditions, effects, versioning), compatibility checks, and negative-transfer detection/mitigation.
  • The distinction between agent memory and RAG lacks rigorous evaluation criteria; design tasks with temporal evolution and interactive feedback to measure when dynamic agent memory outperforms static RAG (and vice versa).
  • Security threat models for memory (poisoning, backdoors, exfiltration, privacy leaks) are incomplete; adopt formal adversarial evaluations and end-to-end secure memory stores (encryption-at-rest/in-transit, access control, differential privacy).
  • Provenance and auditability of memory entries are underdeveloped; implement tamper-evident logs, edit histories, and causal tracing from outputs back to the memory records that influenced them.
  • Personalization introduces bias, fairness, and consent risks that are not deeply analyzed; build consent-aware profiling pipelines, bias audits for memory-derived behaviors, and user-facing controls (opt-out, right-to-be-forgotten).
  • Scalability of large knowledge graphs/vector stores under continual updates is not addressed; investigate incremental indexing, shard placement, compaction/garbage-collection policies, and their impact on retrieval latency and recall.
  • Quantitative evaluation of cognitive processing modules (reflection, abstraction, workflow induction) is limited; define controlled tasks and metrics isolating their contributions beyond anecdotal demonstrations.
  • Inter-agent memory coordination remains an open problem; design concurrency control, conflict resolution, and eventual-consistency mechanisms (e.g., CRDT-like models) for shared or federated memory.
  • Catastrophic forgetting during parametric updates is unresolved; explore continual-learning protocols (rehearsal, adapters, modular networks) to ingest new memory without degrading prior competencies.
  • Energy/computational cost accounting for memory operations (write/read/compress/update) is missing; add cost-aware benchmarks and optimization objectives that incorporate resource budgets.
  • Human–AI memory alignment is not systematically studied; test how human phenomena (primacy/recency, false memories, confabulation) translate to agent design and develop mitigation techniques.
  • Ethics and regulatory compliance (GDPR, data minimization, retention policies) for agent memory are not operationalized; codify retention schedules, consent tracking, data lineage, and compliance checks within memory lifecycles.
  • Robustness under distribution shift and conflicting new evidence is underexplored; build conflict-detection, revalidation, and memory versioning pipelines to reconcile stale or contradictory entries.
  • Formal reliability guarantees for evolving memory stores are absent; investigate invariants, consistency checks, and verification methods to ensure correctness and safe convergence of memory updates.

Collections

Sign up for free to add this paper to one or more collections.

Tweets

Sign up for free to view the 11 tweets with 0 likes about this paper.