Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
41 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Dark Experience for General Continual Learning: a Strong, Simple Baseline (2004.07211v2)

Published 15 Apr 2020 in stat.ML and cs.LG
Dark Experience for General Continual Learning: a Strong, Simple Baseline

Abstract: Continual Learning has inspired a plethora of approaches and evaluation settings; however, the majority of them overlooks the properties of a practical scenario, where the data stream cannot be shaped as a sequence of tasks and offline training is not viable. We work towards General Continual Learning (GCL), where task boundaries blur and the domain and class distributions shift either gradually or suddenly. We address it through mixing rehearsal with knowledge distillation and regularization; our simple baseline, Dark Experience Replay, matches the network's logits sampled throughout the optimization trajectory, thus promoting consistency with its past. By conducting an extensive analysis on both standard benchmarks and a novel GCL evaluation setting (MNIST-360), we show that such a seemingly simple baseline outperforms consolidated approaches and leverages limited resources. We further explore the generalization capabilities of our objective, showing its regularization being beneficial beyond mere performance.

Dark Experience for General Continual Learning: A Strong, Simple Baseline

The paper introduces a novel approach called Dark Experience Replay (DER) for General Continual Learning (GCL). The premise of DER combines experience replay with knowledge distillation and regularization to mitigate catastrophic forgetting, a persistent challenge in Continual Learning (CL). The authors assert that DER, despite its simplicity, surpasses state-of-the-art methodologies in both standard benchmarks and a newly proposed GCL evaluation setting, MNIST-360.

Addressing the Continual Learning Context

Continual Learning diverges from traditional machine learning paradigms by requiring models to assimilate and retain knowledge from a stream of non-i.i.d. samples without succumbing to catastrophic forgetting. Existing CL methods often depend on task-specific boundaries and identifiers, which limits their practical applicability in real-world scenarios where such clear distinctions are unavailable. The authors respond to this challenge by advancing General Continual Learning, which assumes blurred task boundaries and dynamic shifts in domain and class distributions.

Methodology: Dark Experience Replay

The DER framework operates by maintaining a buffer that stores past experiences, coupled with the network's logits associated with those experiences throughout the optimization trajectory. This promotes consistency in the model's predictions over time. Key elements of DER include:

  1. Reservoir Sampling: DER employs reservoir sampling to ensure an even probability for all incoming data to be stored in the buffer, thus avoiding reliance on task boundaries.
  2. Logit Matching: Instead of focusing solely on ground truth labels, DER leverages stored logits for replay, which empirically fosters flatter minima and better-calibrated models.
  3. Regularization: DER introduces a regularization term that minimizes the Euclidean distance between stored logits and current model outputs, hence preserving past knowledge effectively.

Experimental Evaluation

The robustness of DER is evaluated through extensive experiments across various settings, including Task-IL, Class-IL, and Domain-IL, using datasets such as CIFAR-10, Tiny ImageNet, and MNIST variants. DER conforms to key GCL requirements, including constant memory utilization, no dependency on task boundaries, and no need for a test-time oracle.

Standard Benchmarks:

  • CIFAR-10 and Tiny ImageNet: DER and its extension, DER++, consistently outperform ER and other replay-based methods in both Task-IL and Class-IL settings. DER++ particularly excels by combining logit replay with ground truth label replay.
  • Permuted and Rotated MNIST: For Domain-IL tasks, which involve transformations of data distributions rather than class distributions, DER demonstrates superior efficacy due to its ability to preserve and transfer knowledge among related tasks effectively.

MNIST-360:

  • The newly proposed MNIST-360 evaluation protocol presents a mix of gradual and abrupt distribution changes, simulating real-world continual learning challenges. DER shows notable performance gains over GSS, MER, and other rehearsal methods, reinforcing its utility as a robust baseline for future GCL studies.

Theoretical and Practical Implications

The stronger performance in DER can be attributed to its ability to achieve flatter minima, which translates into more robust knowledge retention and adaptability to new tasks. Furthermore, DER models exhibit better calibration, hence delivering more reliable predictions—a crucial trait for practical, real-world AI systems.

The success of DER in various settings hints at promising future directions:

  • Model Robustness: More sophisticated regularization strategies could further enhance the stability and adaptability of continual learning models.
  • Efficient Memory Utilization: Innovations in memory buffer management might offer even more scalable continual learning solutions.
  • Applicability to Diverse Domains: Extending DER concepts to domains like reinforcement learning or unsupervised learning could broaden its applicability and utility.

Conclusion

Dark Experience Replay emerges as a powerful, simple baseline for General Continual Learning. By addressing the limitations of task-specific methods and leveraging the dual strengths of knowledge distillation and rehearsal, DER sets a new standard for continual learning research and applications. The introduction of MNIST-360 as a GCL benchmark provides a crucial step towards more realistic evaluation protocols, encouraging future research to adhere to practical desiderata.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (5)
  1. Pietro Buzzega (11 papers)
  2. Matteo Boschini (17 papers)
  3. Angelo Porrello (32 papers)
  4. Davide Abati (15 papers)
  5. Simone Calderara (64 papers)
Citations (763)
X Twitter Logo Streamline Icon: https://streamlinehq.com