Dual-Memory Representations in Computational Models
- Dual-memory representations are defined by architectures that split memory into fast-learning episodic systems and slow integrative semantic systems, preserving both detailed experiences and abstract concepts.
- They mitigate catastrophic forgetting by employing replay mechanisms and dual buffering, which improve sample efficiency and support robust relational reasoning.
- Applications include neural sequence modeling, continual learning, spatial navigation, and anomaly detection, with evidence of improved performance metrics across various benchmarks.
Dual-memory representations refer to computational and neural architectures that maintain two parallel or complementary memory systems, typically distinguished by functional specialization—such as rapid, high-fidelity episodic encoding versus slower, integrative semantic abstraction. This construct is motivated by cognitive neuroscience (e.g., the complementary learning systems theory, hippocampal–prefrontal dichotomy) and is instantiated in a variety of algorithms ranging from continual learning and spatial navigation to anomaly detection and neural sequence modeling. Dual-memory frameworks enable the preservation of detailed experiences alongside generalizable knowledge, mitigating catastrophic forgetting, improving sample efficiency, and supporting sophisticated forms of relational and hierarchical reasoning.
1. Theoretical Foundations and Biological Analogs
Dual-memory representations are grounded in the observation that distinct neural substrates implement complementary memory processes. The hippocampus rapidly encodes unique episodes, supporting pattern separation and contextual binding on short timescales (episodic memory), whereas the prefrontal cortex (PFC) or neocortex consolidates regularities and supports slow, schema-driven abstraction (semantic memory). This division is formalized within the Complementary Learning Systems (CLS) theory and is recapitulated in computational models for both artificial agents and continual learning systems (Momennejad, 16 Jan 2024, Kamra et al., 2017).
Empirically, dual-memory architectures are associated with distinct behavioral outcomes, such as:
- Recency and primacy effects in free recall, accounted for by separate memory traces emphasizing recent and early items in a sequence (L- and R-states in non-associative algebraic models) (Reimann, 13 May 2025).
- Robust retention of both specific experiences and semantic generalizations, mapped to hippocampal and prefrontal circuits with differential time constants and learning rules (Momennejad, 16 Jan 2024).
- Efficient integration of new information without catastrophic overwriting, as in sleep-driven replay and synaptic consolidation (Kamra et al., 2017).
2. Formal Models and Computational Mechanisms
Dual-memory representations are realized in diverse geometric, algebraic, and algorithmic forms, each embodying concrete mathematical operators and learning workflows:
2.1 Non-associative Sequence Algebra
In the “memory states” framework (Reimann, 13 May 2025), sequences are encoded via non-associative bundling:
- Left-associative L-state: —recency-weighted, rapidly updatable.
- Right-associative R-state: —primacy-weighted, chunk-oriented.
- Retrieval utilizes mutual information between cue and each memory state, naturally producing the Serial Position Curve.
2.2 Multiscale Predictive Representations
Dual-memory in the RL context involves learning Successor Representations (SR) at different timescales (Momennejad, 16 Jan 2024):
- Fast, low-discount (small ) SR: fine-grained, episodic predictions (analogous to hippocampal maps).
- Slow, high-discount (large ) SR: abstracted, multistep predictions (PFC-like semantic codes).
- The joint system enables the agent to exploit both detailed trajectories and schematic planning bases.
2.3 Systems for Lifelong and Continual Learning
Several dual-memory architectures partition memories into:
- A Plastic, fast-learning module (episodic/working memory): e.g., G-EM in spatiotemporal self-organization (Parisi et al., 2018); working model in sparse-coding SCoMMER (Sarfraz et al., 2022); EMA-updated fast buffers in ITDMS (Wu et al., 13 Jan 2025).
- A Stable, slow-learning module (semantic/long-term memory): e.g., G-SM in Gamma-GWR (Parisi et al., 2018); EMA “semantic” memory in SCoMMER (Sarfraz et al., 2022); information-theoretically optimized buffer in ITDMS (Wu et al., 13 Jan 2025).
Interactions generally involve replay (generative or rehearsal-based), stochastic consolidation, or bi-directional feature alignment, often gated by performance, diversity, or information-theoretic criteria.
3. Architectural Instantiations
3.1 Deep Generative and Self-Organizing Dual Memories
DGDMN segregates memory into short-term (STM, per-task VAE-classifiers) and long-term (LTM, integrated generative model), connected by periodic consolidation via replay (Kamra et al., 2017). This allows both rapid task acquisition and robust, global knowledge accumulation.
The Gamma-GWR dual-net framework creates an episodic map growing with sensory novelty (G-EM) and a semantic map that consolidates only when necessary for task discrimination (G-SM), with replay-driven consolidation reinforcing both (Parisi et al., 2018).
3.2 Mechanisms in Reinforcement Learning and Program Synthesis
Dual-memory structures in RL (main + cache memory) optimize for both diversity retention and priority sampling, as in prioritized replay systems where a large main memory supports long-term coverage and a cache focuses on high-error, recent transitions (Ko et al., 2019).
ExpeRepair leverages dual memories in LLM-based program repair: an episodic memory of concrete repair trajectories and a semantic memory of distilled repair heuristics and insights, both feeding into dynamic prompt composition (Mu et al., 12 Jun 2025).
3.3 Dual-Memory in Anomaly Detection and Multimodal Reasoning
Models such as UR-DMU and DREAM maintain separate prototype memories for “normal” and “abnormal” events (Zhou et al., 2023, Guo et al., 2021), enforcing disentangled spaces by explicit prototype separation and (in UR-DMU) uncertainty regularization. This duality enhances discrimination under severe imbalance and aids the detection of novel anomalies.
In self-attentive or multi-view sequence processing, dual memories may respectively encode items and higher-order relations, or asynchronous views, supporting flexible retrieval and fusion for relational reasoning and joint prediction (Le et al., 2020, Le et al., 2018).
4. Empirical Evidence and Benchmarks
Studies of dual-memory systems report:
- Substantial reductions in catastrophic forgetting relative to single-memory baselines, with instance and category retention improved by 5–25% across CORe50, Split-CIFAR, and TinyImageNet benchmarks (Parisi et al., 2018, Sarfraz et al., 2022, Wu et al., 13 Jan 2025).
- In ExpeRepair, ablations reveal complementary benefits for episodic (demonstrations) vs. semantic (insights) memory: dual memory substantially lifts pass@1, ESR, and RSR rates by 4–15% on SWE-bench-lite (Mu et al., 12 Jun 2025).
- For video anomaly detection, dual-memory models yield higher AUCs (3–5 points improvement) over single memory/discriminator setups, with ablations confirming necessity of both normal/abnormal banks (Guo et al., 2021, Zhou et al., 2023).
- In navigation, JanusVLN’s dual implicit memory yields up to 20% gain in SPL/SR over explicit memory baselines, confirming that decoupling spatial and semantic caches in a fixed-size blackboard is both computationally and functionally superior (Zeng et al., 26 Sep 2025).
5. Functional Specialization, Interaction, and Information Optimization
A recurrent structural motif is the explicit specialization of fast and slow modules along functional axes:
- Episodic/fast/working components: high plasticity, instance-specific encoding, one-shot integration, rapid updates; risk of interference and overwriting.
- Semantic/slow/long-term components: low plasticity, abstraction, slow aggregation (often via exponential moving average or information-theoretic selection), resilience to drift; supports robust generalization.
Information-theoretic optimization in slow buffers (e.g., minimizing redundancy via Rényi entropy and Cauchy-Schwarz divergence in ITDMS) ensures diversity and representativeness, while fast buffers (often implemented by reservoir sampling) guarantee up-to-date coverage of recent data (Wu et al., 13 Jan 2025).
Replay mechanisms (generative or sample-based) are ubiquitously used to reconcile fast and slow systems, maintain equilibrium in the plasticity–stability trade-off, and prevent recency or class-imbalance bias (Kamra et al., 2017, Sarfraz et al., 2022, Gowda et al., 2023).
6. Extensions: Non-associative, Relational, and Multi-view Dualities
Recent directions extend dual-memory principles beyond strictly episodic–semantic dichotomies:
- Non-associative algebraic frameworks yield dual representations (L and R states) that encode ordered sequences naturally, linking insights about recency/primacy effects to concrete vector-symbolic operations (Reimann, 13 May 2025).
- Dual-memory architectures for relational reasoning separate item memory (high capacity storage) from relational memory (SAM-constructed binding of items), enabling both memorization and relational queries in sequential and combinatorial domains (Le et al., 2020).
- In asynchronous multi-view scenarios, dual memories per view (e.g., in DMNC) enable late and early fusion for flexible cross-view interaction (Le et al., 2018).
7. Open Challenges and Future Directions
Despite the breadth of dual-memory deployments, unresolved issues persist:
- Scaling dual-memory frameworks to high-dimensional, real-world domains while preserving computational tractability (addressed by fixed-size caches and information-theoretic thinning).
- Biological realism of current dual-memory models: architectural abstractions such as Gamma-GWR growth, stochastic EMA updates, and Hebbian/anti-Hebbian rules await alignment with neurophysiological detail (Parisi et al., 2018, Gowda et al., 2023).
- Understanding how non-associative, relational, and heterogeneous dual memories interoperate in complex reasoning and planning pipelines remains an open field of investigation.
Continued development is expected to unify mechanisms for memory specialization, interaction, and optimization, driving progress in lifelong learning, robust reasoning, and cognitive mapping in both biological and artificial agents.