MyGO: Memory Yielding Generative Offline-consolidation for Lifelong Learning Systems (2508.21296v1)

Published 29 Aug 2025 in cs.LG and cs.AI

Abstract: Continual or Lifelong Learning aims to develop models capable of acquiring new knowledge from a sequence of tasks without catastrophically forgetting what has been learned before. Existing approaches often rely on storing samples from previous tasks (experience replay) or employing complex regularization terms to protect learned weights. However, these methods face challenges related to data privacy, storage limitations, and performance degradation when tasks are dissimilar. To address these challenges, we introduce MyGO (Memory Yielding Generative Offline-consolidation), a novel lifelong learning framework inspired by the biological wake-sleep cycle. During the "wake" phase, the system rapidly learns a new task and trains a compact generative model (Generative Memory, G-mem) to capture its data distribution. During the "sleep" phase, the system enters an offline state, using all learned G-mem models to generate pseudo-data ("dreams") and consolidate new and old knowledge into a core feature extractor via knowledge distillation. This approach obviates the need to store any raw data, retaining only compact generative models, which offers significant advantages in privacy and storage efficiency. We evaluate MyGO on computer vision (Split-MNIST) and natural language processing (Split-AG News) benchmarks, comparing it against a sequential fine-tuning baseline. The results demonstrate that MyGO significantly mitigates catastrophic forgetting and maintains high average accuracy across tasks, proving the framework's effectiveness and domain-generality.

Summary

The paper introduces the MyGO framework that leverages generative models and a wake-sleep cycle to prevent catastrophic forgetting.
It employs a dual-phase process where the wake phase captures task-specific data and the sleep phase consolidates knowledge via teacher-student distillation.
Experiments on Split-MNIST and Split-AG News demonstrate high accuracy retention in computer vision and improved performance in NLP despite complex data distributions.

MyGO: Memory Yielding Generative Offline-consolidation for Lifelong Learning Systems

This essay provides an academic summary and critical analysis of the paper "MyGO: Memory Yielding Generative Offline-consolidation for Lifelong Learning Systems" (2508.21296). It introduces the MyGO framework, a novel approach aimed at addressing catastrophic forgetting in lifelong learning systems through the use of generative models and knowledge distillation.

Introduction to Lifelong Learning and Catastrophic Forgetting

Lifelong learning, a critical objective in AI, seeks to develop models that can continuously accrue knowledge without the loss of previous information, a challenge known as catastrophic forgetting. Traditional methods for tackling this involve replay-based, regularization-based, and parameter isolation techniques, each with inherent limitations related to privacy, storage, and scalability. Inspired by biological memory consolidation processes, MyGO offers a fresh perspective by leveraging a unique generative offline-consolidation approach during a biologically-inspired wake-sleep cycle.

MyGO Framework

Architecture

The MyGO framework is built on two main components:

Neocortex Net ( $M_{ctx}$ ): Contains a shared feature extractor and a scalable set of task-specific classification heads, providing both general representation learning and task-specific decision-making.
Generative Memories ( $G_{mem}$ ): Task-specific generative models trained to capture and reproduce the data distributions of their respective tasks. Implemented via Conditional Generative Adversarial Networks (GANs), they are stored to generate pseudo-data for memory replay without the need to preserve raw input data.

Wake-Sleep Cycle

MyGO operates through a two-phased cycle:

Wake Phase: In this phase, the model rapidly learns new tasks while training generative models specific to each task, capturing the task's data distribution. This dual process enables effective task-specific learning and data representation without interference from previously learned tasks.
Sleep Phase: The model enters an offline state to consolidate knowledge. It utilizes the generative models to produce pseudo-data ("dreams") that simulate knowledge from previous tasks, which is then integrated into the existing knowledge base via knowledge distillation. By using a teacher-student network configuration, MyGO achieves a balance between maintaining past task performance and adapting to new tasks.

Experimental Setup and Results

Datasets and Architectures

Two distinct tasks were used to evaluate MyGO’s efficacy:

Split-MNIST (CV): The MNIST dataset, split into five sequential tasks, tested the model's capacity to prevent forgetting in computer vision applications.
Split-AG News (NLP): The AG News dataset, divided into two tasks, offered a testbed for natural language processing.

The Neocortex Net for CV tasks consisted of convolutional and feedforward layers, whereas the NLP tasks employed embedding and linear layers. Both tasks utilized lightweight generative models for memory representation.

Performance Metrics

The key metric was Average Accuracy across all seen tasks, providing a holistic measure of MyGO's learning retention capabilities.

Results Analysis

Computer Vision (Split-MNIST): MyGO significantly outperformed the sequential fine-tuning baseline, maintaining high accuracy (97.19%) across tasks, effectively demonstrating its robust consolidation process.
Natural Language Processing (Split-AG News): Although MyGO's absolute performance was marginally lower than the baseline, it demonstrated improved knowledge retention. This indicates that despite the challenges related to text data's complex feature space, MyGO effectively balances new learning with past knowledge retention.

Conclusion and Future Directions

MyGO presents an effective framework for mitigating catastrophic forgetting through innovative use of generative models and knowledge distillation, showing promise across varied domains like CV and NLP. Future research directions include integrating advanced generative techniques (e.g., VAEs), refining distillation processes, and scaling MyGO to more complex, diverse task structures. By advancing the principle of generative memory consolidation, MyGO contributes a significant step towards truly adaptive lifelong learning AI systems.