Long-Term Memory (LTM) in Biological and AI Systems
- Long-term memory (LTM) is defined by its persistent encoding, vast capacity, and role in cross-episode recall in both biological and artificial systems.
- LTM research integrates methods like vector databases, graph-structured engrams, and parametric/non-parametric architectures to support efficient, multi-hop retrieval and adaptive reasoning.
- Key challenges in LTM include optimizing consolidation, mitigating redundancy, and ensuring privacy while supporting lifelong, dynamic learning.
Long-term memory (LTM) refers, across neuroscience and artificial intelligence, to mechanisms and architectures that enable the persistent encoding, storage, and retrieval of knowledge, experiences, or strategies over extended horizons — spanning long dialogue sessions, lifelong interactions, or even an entire lifetime of events. In both biological and artificial systems, LTM is fundamentally distinguished from short-term or working memory modules by its persistence, capacity, and its role in supporting cross-episode inference, abstraction, and adaptive self-modification. LTM is realized via a wide spectrum of techniques, ranging from associative graphs and vector databases to explicit graph-structured engrams in cortical models and real-time adaptive memory operations in AI agents. This entry surveys the formal definitions, theoretical and empirical principles, system architectures, evaluation protocols, and open challenges in LTM research, as established in the literature.
1. Formal Definitions and Theoretical Foundations
LTM in both biological and artificial systems is characterized by persistent storage and retrieval, outlasting the transient timescales of working memory or session buffer. In classical cognitive models, LTM encodes declarative (semantic, episodic) and procedural knowledge, is accessed via cue-based retrieval, and is subject to forgetting via interference or active suppression (He et al., 2024).
Formalization in AI Systems
An AI agent at time observes a stimulus and must decide (i) whether and how to store in LTM, (ii) how to retrieve relevant memories for a given query , and (iii) how to prune or consolidate its LTM to maintain efficiency (He et al., 2024):
- Parametric memory, : Information stored implicitly in model parameters , e.g., pre-trained transformer weights.
- Non-parametric memory, : Information stored in external structures such as databases, logs, or vector embeddings, accessible via similarity search or key-based retrieval.
Retrieval may proceed via dense embedding similarity:
or by parametric forward pass . Capacity in neural LTM scales as with compute 0 (He et al., 2024).
Biological formalization: LTM corresponds to connected subgraphs (engrams) in the global cortical directed graph, with capacity scaling as 1 for 2 neurons and 3-sized engrams (Wei et al., 2024).
2. Biological, Cognitive, and Graph-Theoretic Models
Engram Theory and Cortical LTM
Human cortical LTM comprises weakly or strongly connected neural ensembles—engrams—formally defined as connected induced subgraphs with at least one Hamiltonian cycle to guarantee robust recall (Wei et al., 2024). Using probabilistic models of synaptic density, the minimum connectivity 4 ensures that almost all subsets 5 of reasonable (6) size form such cycle-rich subgraphs. The available storage is exponential in 7, explaining the immense empirical LTM capacity of cortex.
Associative and Small-World Dynamics
Text-driven models (0801.0887) describe LTM as an ever-growing associative net, incrementally updated via working-memory (WM) driven attachment rules. Nodes correspond to lexical concepts; links encode context-conditioned association strengths. The crucial dynamics are:
- Fitness-based preferential attachment in WM:
8
with 9 the Jaccard co-occurrence fitness.
- Weight updates and normalization in LTM:
0
The resulting graphs exhibit power-law degree distributions 1, high clustering coefficient, and short average path-length—a scale-free, small-world structure. Iterative WM–LTM–WM loops drive “information amplification” and spontaneous emergence of semantic modules.
3. Architectures and Mechanisms in Artificial Agents
Memory-Augmented Neural Networks and Explicit LTM Modules
Modern AI systems implement LTM using a spectrum of designs:
- Vector-based external LTM: AI assistant LTM modules typically use a vector database of embedded summaries, events, or knowledge tuples, retrievable via cosine or dot-product similarity (Lee, 2024, Zhang et al., 16 Dec 2025, Yu et al., 5 Jan 2026).
- Graph-structured LTM: Some LLM agents consolidate distilled knowledge into a de-identified graph with nodes (facts/concepts), node embeddings, and typed relations (IsA, HasProperty, etc.), enabling multi-hop retrieval (Zhang et al., 9 Apr 2026).
- Memory abstractions: CogMem’s LTM is a vector-indexed, cross-session store of distilled reasoning strategies; new items are merged, updated, or appended via cosine-similarity thresholds, supporting “direct access” and “focus of attention” modules for session-level reasoning rather than full-context replay (Zhang et al., 16 Dec 2025).
- Multiple memory systems: MMS decomposes STM into high-quality fragments (keywords, cognitive perspectives, episodic/semantic traces) and builds dual retrieval/contextual units for efficient retrieval and generation (Zhang et al., 21 Aug 2025).
- Tool-based LTM interfaces: Agentic Memory exposes LTM read/write/update/delete as discrete actions, trained with progressive reinforcement learning to manage both LTM and STM adaptively across long-horizon tasks (Yu et al., 5 Jan 2026).
Biological and Cognitive Inspiration
Architectures are frequently inspired by multiple-memory-systems theory (episodic, semantic, procedural subdivision) (Zhang et al., 21 Aug 2025), event segmentation (boundary-anchored writing) (Zhong et al., 8 Apr 2026), and neocortical columnar organization (modular, distributed storage) (Jiang et al., 2024). These influence both the semantic granularity and compositional design of LTM modules.
4. Application Domains and Empirical Benchmarks
LTM is critical wherever persistent, cross-episode knowledge is required, including:
- Dialogue and conversational AI: LTM supports persona continuity, context-sensitive recall, and long-horizon coherence (Xu et al., 2022, Zhang et al., 9 Apr 2026, Zhong et al., 8 Apr 2026). For instance, PLATO-LTM maintains user and bot persona memory, dynamically updating and retrieving facts for each new turn.
- Lifelong and self-evolving agents: LTM enables agents to adapt and personalize through accumulated interaction history, supporting model evolution by experience (OMNE/GAIA benchmark) (Jiang et al., 2024).
- Long-horizon reasoning and planning: CogMem and LightMem demonstrate that explicit LTM layers dramatically reduce context bloat and hallucination in multi-hop tasks, with accuracy gains from 0.84 to 0.93 and halved token usage after 15 turns (Zhang et al., 16 Dec 2025, Zhang et al., 9 Apr 2026).
- Benchmarking LTM performance: StoryBench and LoCoMo quantify LTM by measuring accuracy, retention, retry-count, and hardest-case correction in multi-turn, high-dependency settings (Wan et al., 16 Jun 2025, Zhang et al., 21 Aug 2025). Ablation studies repeatedly show that adding LTM yields >10 F1–point improvements in multi-hop and temporal QA.
5. Memory Management: Consolidation, Retrieval, Forgetting
Consolidation and Compression
Many LTM systems abstract and merge recent episodes (from mid-term or short-term stores) into permanent, de-duplicated knowledge units via summarization, local graph merging, or ridge regression projections (in video; 2-Video) (Zhang et al., 9 Apr 2026, Santos et al., 31 Jan 2025). Pruning is often driven by retention weights decaying according to Ebbinghaus curves; low-confidence or infrequently accessed nodes are periodically dropped to control growth (Lee, 2024, Zhang et al., 9 Apr 2026).
Retrieval Strategies
Retrieval from LTM exploits adaptive mechanisms:
- Plan-driven, element-conditioned retrieval aligns query intent with relevant indexed evidence, with retrieval depth estimated per query type (enumeration, single-fact, judgment) (Zhong et al., 8 Apr 2026).
- Embedding-based, multi-stage selection (coarse vector search, fine-grained reranking) maximizes retrieval quality under fixed budget or latency constraints (Zhang et al., 9 Apr 2026).
- One-to-one matched units (retrieval/context) in MMS ensure encoding specificity and prevent informational mismatch (Zhang et al., 21 Aug 2025).
6. Limitations, Ethical Considerations, and Open Challenges
Open Questions and Challenges
- Redundancy and Scalability: LTM modules risk accumulating outdated or redundant information; research targets adaptive gating, hierarchical clustering, and efficient, domain-specific pruning (Zhang et al., 16 Dec 2025, He et al., 2024).
- Catastrophic/gradual forgetting: Unlike LSTM/GRU, pure additive “no-forget” LTM cells maintain unbounded histories but may dilute with noise; conversely, LTM written at only high-salience boundaries can miss fine detail (Nugaliyadde, 2023, Zhong et al., 8 Apr 2026).
- Personalization vs. Privacy: Embedding user preferences and personal histories in persistent LTM modules brings privacy, data retention, and manipulation risks; systems must offer user control (consent, audit, erasure), federated architectures, and robust privacy-preserving retrieval and consolidation (Lee, 2024).
- Evaluation Limitations: Many empirical studies are benchmark-bound; cross-task generalization, multimodal LTM integration, and domain transfer remain unsolved (Zhang et al., 16 Dec 2025, Jiang et al., 2024).
Recommendations and Future Prospects
Emerging directions include:
- End-to-end self-adaptive LTM architectures (see SALM), with RL-trained adapters for storage, retrieval, and forgetting (He et al., 2024).
- Real-time, multimodal, hybrid parametric–nonparametric LTM to boost robustness, lifelong learning, and knowledge transfer (Jiang et al., 2024).
- Expanded ethical frameworks, combining technical, social, and regulatory safeguards for AI systems with human-level LTM capabilities (Lee, 2024).
7. Representative Systematic Table of AI LTM Types
| Memory Type | Storage Location | Retrieval Mechanism | Example Systems / Papers |
|---|---|---|---|
| Parametric LTM | Model weights | Forward pass, gradient update | Transformer pretrain, RL policies |
| Vector LTM (external) | Vector DB | Embedding similarity search | LightMem, CogMem, MMS |
| Graph-structured LTM | External DB | Multihop graph navigation, reranking | LightMem (LTM), HingeMem |
| Hybrid LTM | Both | RAG + parametric fine-tuning | OMNE, Agentic Memory |
Parametric and non-parametric LTM differ substantially in update flexibility, capacity scaling, and retrieval precision. System designs should select and compose LTM modalities according to application needs and anticipated query types (He et al., 2024, Zhang et al., 9 Apr 2026).
LTM is thus a multifaceted substrate for stability, adaptation, and deep reasoning across both neural and artificial systems. Its implementation in modern AI draws directly on cognitive principles, memory theory, graph combinatorics, and advanced architectural engineering, and is a focal point of ongoing research bridging neuroscience, language modeling, and trustworthy interactive systems.