Multiple Memory Systems (MMS)
- Multiple Memory Systems (MMS) is a framework that divides memory into distinct subsystems, each optimized for specific functions like rapid encoding and gradual abstraction.
- It leverages interdisciplinary insights from cognitive neuroscience and computational models to enable long-term reasoning, continual learning, and personalized multimodal interactions.
- MMS architectures enhance system performance by employing layered memory strategies that improve recall accuracy and efficiency in both single-agent and multi-agent environments.
Multiple Memory System (MMS) denotes a class of neurocognitive and computational architectures characterized by explicit subdivision of memory into distinct, interacting subsystems—each type optimized for specific functions, storage formats, retrieval processes, and temporal dynamics. The MMS principle is a core construct in human memory research, with direct translational analogs in AI, LLMs, agentic reasoning, and memory management for distributed systems. Modern MMS research encompasses theoretical, algorithmic, and engineering perspectives, and underpins advances in long-term reasoning, continual learning, multimodal agency, and scalable memory management in both individual and multi-agent frameworks.
1. Theoretical Foundations and Biological MMS
The concept of Multiple Memory Systems traces to 20th-century cognitive neuroscience, notably Tulving’s and Squire’s taxonomies, which identified functionally dissociable human memory subsystems: episodic, semantic, procedural, and emotional, each with distinct neural substrates and behavioral profiles. Early philosophical and anatomical notions of passive vs. active recollection evolved through experimental amnesia studies (notably the case of H.M.), lesion research, and formal behavioral paradigms. Empirical findings solidified the dichotomy between declarative memory (episodic/semantic) and non-declarative memory (procedural, priming, associative, non-associative), localized to the medial temporal lobe (MTL), neocortex, striatum, cerebellum, and amygdala, respectively (Pastor, 2020).
Distinct computational models, such as the Atkinson–Shiffrin modal model (1968), Baddeley’s multi-component working memory model, Hebb’s dual-trace theory, and Complementary Learning Systems, formalize MMS as a division of labor: rapid encoding and flexible retrieval co-exist with gradual abstraction, minimizing catastrophic interference and supporting flexible navigation, skill learning, and context-dependent recall.
2. MMS in LLM Agents
Recent research operationalizes the MMS principle in artificial agents, particularly LLM-powered systems. A representative approach instantiates MMS by extracting and structuring multiple long-term fragments from transient short-term interactions. For each dialogue turn, an encoder LLM generates: keywords, cognitive perspectives, episodic summaries, and semantic summaries. These fragments are used to construct two paired unit types per interaction:
- Retrieval Memory Units: Concatenate keywords, raw short-term text, cognitive perspectives, and episodic summaries, optimized for high-recall text-based matching against user queries.
- Contextual Memory Units: Combine keywords, short-term text, cognitive perspectives, and semantic summaries, injected into the generation context for deep knowledge augmentation.
Dense embeddings for retrieval units and cosine similarity enable scoring and ranking memory against queries; during response, contextual units corresponding to the highest-scoring retrieval units are aggregated and fed into the LLM for answer generation. The system design mirrors encoding-specificity effects and deepens the long-term trace quality, with empirical results showing superior performance to flat or monolithic memory architectures in long-horizon multi-turn tasks (e.g., LoCoMo), with recall@1 and F1/accuracy improvements over prior A-MEM and MemoryBank baselines (Zhang et al., 21 Aug 2025).
Key MMS agent metrics include retrieval recall@N, F1, and BLEU-1 for generation, and latency/storage overhead, with ablation studies confirming the necessity of fragment diversity for robust recall and generative quality.
3. Heterogeneous Tiered MMS in Memory OS
The MMS approach is foundational in operating systems for AI, exemplified by MemOS (Li et al., 4 Jul 2025), which implements a three-tier memory hierarchy to bridge fine-grained prompt-context, fast working memory, and persistent model capability layers:
- Plaintext Memory: Explicit, medium-term content—external passages, templates, or graph nodes—governed by time-to-live policies or user archiving.
- Activation Memory: Short-lived, implicit key-value caches and attention weights, providing rapid local recall and session coherence. Can be promoted to higher tiers through repeated use.
- Parameter Memory: Long-term, stable “knowledge” encoded within model weights or LoRA-style parameter adapters. Updated only via secondary adaptation steps (fine-tuning, module installation), and can be offloaded to lower tiers when unused.
All contents are encapsulated within MemCubes, composite structures with comprehensive provenance, access controls, lifespan, behavioral usage metrics (frequency, recency, contextual similarity), and versioning chains. MMS scheduling policies utilize weighted utility scoring and temperature-controlled softmax prioritization to balance recall, retention, and computational budget, automatically orchestrating migration, fusion, lifecycle transitions, and retention/eviction under finite memory constraints. Cross-tier migration allows, for example, promotion of hot plaintext facts to LoRA adapters for inference efficiency—a hybrid between retrieval-augmented generation and parameter learning.
Empirical evaluation demonstrates systematic accuracy/longevity trade-offs, acceleration in prompt and cache utilization, and end-to-end gains in reasoning benchmarks (e.g., LoCoMo, with up to +8.7 points absolute LLM-Judge improvements and strong temporal/multi-hop performance) (Li et al., 4 Jul 2025).
4. MMS for Multimodal and Personalized Agents
Contemporary MMS research addresses the integration of multimodal data and long-horizon personalization. M2A implements an agentic, dual-layer MMS for personalized QA systems (Feng et al., 7 Feb 2026). The architecture comprises:
- RawMessageStore: Append-only, immutable log of utterances (text, images, timestamps) admitting O(1) insertion and range-based retrieval.
- SemanticMemoryStore: High-level, evidence-linked semantic summaries, including captions and image embeddings, built atop the episodic log via automated extraction. Each summary entry holds back-pointers to supporting messages.
Retrieval employs hybrid dense (cosine), sparse (BM25), and visual embedding scoring, fused through Reciprocal Rank Fusion, with agentic separation between user-facing (ChatAgent) and memory-managed (MemoryManager) roles. Iterative multi-path retrieval disambiguates and contextualizes evidence, while online memory updates allow edit/delete/insert cycles for evolving user profiles.
Experimental evaluation on LoCoMo with injected visual-centric QA confirms the superiority of dual-layer over single-layer retrieval, with measurable gains in average accuracy (e.g., M2A at 44.64% vs. A-MEM 36.26%), and robustness to ablations in retrieval modalities and context window sizes (Feng et al., 7 Feb 2026).
5. Hierarchical MMS for Multi-Agent Systems
In multi-agent contexts, MMS underpins architectures capable of encoding both distributed episodic traces and abstract cross-trial heuristics. G-Memory formalizes a hierarchical MMS using three graph tiers (Zhang et al., 9 Jun 2025):
- Interaction Graph: Fine-grained temporal graphs of agent utterances, with edge structure reflecting sequential dependencies within tasks.
- Query Graph: Nodes index past queries, status, and corresponding interaction graphs; inter-query edges encode semantic or procedural relationships, supporting hop expansion during recall.
- Insight Graph: Nodes encapsulate distilled cross-trial insights extracted via LLM summarization over clusters of related queries, with hyperedges indexing provenance.
Retrieval traverses upward (from Query to Insight Graph) for generalizable schemas and downward (from Query to Interaction Graph) for referential, procedural exemplars. Bi-directional aggregation populates agent prompts with both macroscopic and microscopic memory, enhancing both coarse generalization and granular recall. Experiments across embodied action, QA, and planning benchmarks (e.g., ALFWorld, HotpotQA) consistently demonstrate performance gains (up to +20.89% and +10.12% absolute over no-memory baselines), with ablation confirming complementary contributions of insights and interactions (Zhang et al., 9 Jun 2025).
6. Open Challenges and Extensions
Despite empirical gains, MMS implementations face open challenges: dynamic consolidation and pruning to address memory bloat, automated weighting/tuning for heterogeneous memory fragments, domain generalization, multimodal expansion (vision, audio), active forgetting mechanisms, and theoretical integration bridging biological, computational, and engineering perspectives. Plug-and-play MMS modules increasingly appear as standardized system resources in agentic and OS-level platforms, laying foundations for continual learning, lifelong agency, and self-evolving multi-agent cooperation (Zhang et al., 21 Aug 2025, Li et al., 4 Jul 2025, Zhang et al., 9 Jun 2025).
Active investigation targets optimization of retrieval scoring, adaptive memory management policies, scalable consolidation algorithms, and integration of perceptual/motor channels, with particular emphasis on continual adaptation, personalization, and resilience to catastrophic interference.
References:
- "Memory systems of the brain" (Pastor, 2020)
- "Multiple Memory Systems for Enhancing the Long-term Memory of Agent" (Zhang et al., 21 Aug 2025)
- "MemOS: A Memory OS for AI System" (Li et al., 4 Jul 2025)
- "M2A: Multimodal Memory Agent with Dual-Layer Hybrid Memory for Long-Term Personalized Interactions" (Feng et al., 7 Feb 2026)
- "G-Memory: Tracing Hierarchical Memory for Multi-Agent Systems" (Zhang et al., 9 Jun 2025)