VAE+MHN: Dual Neural Model for Continual Learning
- VAE+MHN model is a dual-system architecture combining a generative VAE for feature generalization with an MHN for robust episodic memory storage, directly addressing catastrophic forgetting.
- The approach employs replay of compressed latent codes from sequential tasks, achieving near offline baseline performance on Split-MNIST with significant reduction in forgetting.
- Inspired by Complementary Learning Systems theory, the model distinctly separates pattern completion and separation, mirroring neocortical and hippocampal functions in biological memory.
The VAE+MHN model is a dual-system neural architecture motivated by the Complementary Learning Systems (CLS) theory from cognitive neuroscience. It integrates a Variational Autoencoder (VAE) for generalized representation and pattern completion with a Modern Hopfield Network (MHN) for robust memory storage and pattern separation. This architecture is designed to address catastrophic forgetting in continual learning, specifically by capturing the functions of memory consolidation observed in biological systems. The VAE+MHN model has demonstrated high continual learning performance and empirical representational dissociation, substantiating its effectiveness in both pattern completion and separation.
1. Theoretical Motivation and Architecture
The model is directly inspired by the CLS framework, positing that the brain utilizes two complementary subsystems: the neocortex, responsible for extracting generalized features and supporting pattern completion, and the hippocampus, specialized in storing distinct episodic memories for pattern separation. The VAE+MHN mapping is as follows:
- VAE: Encodes inputs into latent variables that generalize across stimuli, facilitating reconstruction and pattern completion.
- MHN: Stores compressed latent representations as attractor states, supporting robust recall and pattern separation.
During continual learning, as each new task is presented (in Split-MNIST: sequential binary classification across five splits), the VAE learns current input distributions, while the MHN archives a fraction of corresponding latent codes per task (about 5%, ≈600 codes per split).
2. Mathematical Formulation
VAE Loss Function: The VAE operates with a standard objective,
where is the variational posterior, the prior, and the first term quantifies reconstruction accuracy while the second penalizes deviation from the prior.
MHN Energy Function: The MHN dynamics are governed by the minimization of an energy function,
with as the memory state, the weight matrix, an inverse temperature controlling attractor sharpness, and the bias.
Pattern Separation Measurement: Intra-class separation is captured by pairwise Euclidean distances,
where large within MHN representations signifies strong pattern separation.
Pattern Completion Measurement: The fidelity of reconstructed images, particularly from occluded inputs, is quantified by the Structural Similarity Index Measure (SSIM). Higher SSIM values indicate superior pattern completion capacity, predominantly observed in VAE reconstructions.
3. Continual Learning Protocol
Training proceeds with task-wise presentation. In the initial Split-MNIST task, the VAE is trained on real images; latent codes are then selected for MHN storage. For subsequent splits:
- The MHN receives randomly generated cues to retrieve stored latent representations.
- Retrieved codes are decoded by the VAE to generate synthetic samples.
- These replay samples are interleaved with fresh data, facilitating generative rehearsal and mitigating catastrophic forgetting.
This approach obviates the need for full image storage, relying instead on latent code replay.
4. Empirical Performance
On Split-MNIST, performance metrics are as follows:
| Model | Accuracy (%) | Catastrophic Forgetting |
|---|---|---|
| VAE+MHN (replay) | 89.71 | Significantly reduced |
| Offline VAE (all classes) | 95.55 | None (upper bound) |
| Sequential VAE (no replay) | 67.75 | Severe |
The VAE+MHN model nearly matches the offline upper baseline, far outperforming vanilla sequential training. Generative replay with compressed latent codes provides effective continual learning and resistance to interference.
5. Representational Dissociation: Pattern Separation vs. Completion
A direct empirical dissociation between MHN and VAE latent subspaces substantiates CLS functional roles:
- Pattern Separation (MHN): Intra-class Euclidean distances among MHN representations are significantly greater than those for VAE latent codes. Statistical evaluation (Bonferroni-corrected) yields for the separation difference.
- Pattern Completion (VAE): When reconstructing perturbed inputs, VAE reconstructions yield SSIM scores near the offline baseline and higher than MHN-generated reconstructions, indicating effective completion through generalized features.
This division of labor mirrors hippocampal (separation) and neocortical (completion) roles in biological memory systems.
6. Biological Implications and Artificial Memory Systems
Findings suggest that robust continual learning in artificial systems benefits from explicitly separating representation spaces for pattern generalization and episodic storage. The MHN sharply encodes newly encountered experiences with minimal interference, promoting memory consolidation. The VAE aggregates statistical regularities, enabling the system to reconstruct full input features even from partial or noisy cues.
A plausible implication is that architectures combining attractor-based memory with generative models may closely approximate the complementary functions of neural memory as described by CLS theory.
7. Significance and Future Directions
The VAE+MHN architecture provides an experimentally validated blueprint for memory in continual learning applications, enabling both retention of prior knowledge and flexible generalization. Its replay-based approach, using compressed latent codes, is computationally tractable and biologically inspired. Further research may investigate scaling to richer multimodal domains, increasingly complex tasks, and direct mapping to cognitive phenomena in neuroscience.
The functional dissociation and empirical results on Split-MNIST establish the VAE+MHN paradigm as a promising template for both artificial continual learning frameworks and mechanistic models of memory consolidation (Jun et al., 15 Jul 2025).