Weighted Ensemble Models Are Strong Continual Learners (2312.08977v4)

Published 14 Dec 2023 in cs.LG, cs.AI, and cs.CV

Abstract: In this work, we study the problem of continual learning (CL) where the goal is to learn a model on a sequence of tasks, such that the data from the previous tasks becomes unavailable while learning on the current task data. CL is essentially a balancing act between being able to learn on the new task (i.e., plasticity) and maintaining the performance on the previously learned concepts (i.e., stability). Intending to address the stability-plasticity trade-off, we propose to perform weight-ensembling of the model parameters of the previous and current tasks. This weighted-ensembled model, which we call Continual Model Averaging (or CoMA), attains high accuracy on the current task by leveraging plasticity, while not deviating too far from the previous weight configuration, ensuring stability. We also propose an improved variant of CoMA, named Continual Fisher-weighted Model Averaging (or CoFiMA), that selectively weighs each parameter in the weights ensemble by leveraging the Fisher information of the weights of the model. Both variants are conceptually simple, easy to implement, and effective in attaining state-of-the-art performance on several standard CL benchmarks. Code is available at: https://github.com/IemProg/CoFiMA.

References (84)

Citations (8)

View on Semantic Scholar

Summary

The paper introduces CoFiMA, a method that ensembles model weights using Fisher information to balance adaptation with retention.
It demonstrates that weighted ensemble techniques consistently deliver higher accuracy on standard continual learning benchmarks.
The study shows that model performance is further enhanced by selecting optimal pre-trained architectures for both supervised and self-supervised settings.

Understanding the Balance Between Stability and Plasticity in Continual Learning

Continual learning (CL) is a critical area of AI research that tackles the challenge of enabling models to learn from a sequence of tasks without forgetting previously acquired knowledge. This requires a delicate balance between plasticity—adapting to new tasks—and stability—retaining existing knowledge.

Novel Weight-Ensemble Methods for Continual Learning

Recent strides in CL have introduced two distinct methods to aid models in achieving this balance. The first approach, Continual Model Averaging (CoMA), involves averaging the parameters of models from the current and previous tasks. This is designed to foster plasticity for the new task while maintaining stability for past knowledge. An illustration of this can be seen in how a model trained on task A can be combined with a model fine-tuned on both task A and B to reside along a linear path that retains proficiency in both tasks.

The second approach, dubbed Continual Fisher-weighted Model Averaging (CoFiMA), refines CoMA by selectively ensembling model weights based on their task-specific importance, as determined by Fisher information. Fisher information serves as a measure of how crucial each parameter is to the task, hence allowing more significant parameters to have a more substantial impact on the combined model. This nuanced weight ensemble addresses the potential shortcoming of treating all weights equally, which could lead to suboptimal performance.

Performance Advantages of CoFiMA

CoFiMA's effectiveness is underscored through extensive experimentation on standard CL benchmarks. It consistently outperforms other PTM-based CL methods, demonstrating significant gains in accuracy across various datasets. Notably, CoFiMA's approach displays robustness not just with supervised pre-trained models, but also when leveraging self-supervised models, proving its flexibility and general applicability.

Insights from Model Comparisons

Competing methods like Sequential Fine-Tuning, PROMPT-based approaches (L2P and DualPrompt), and Experience Replay strategies (DER++) were analyzed alongside CoFiMA. While these methods offer their own advantages, CoFiMA’s integration of Fisher information offers a distinct edge by fine-tuning the weight-ensemble process. The performance reported in the continual learning setting measures up commendably close to that achieved when models are trained on all tasks simultaneously – a stringent upper-bound benchmark often unattainable in real-world applications.

Conclusion

In conclusion, CoFiMA represents a step forward in reconciling the stability-plasticity dilemma that is so central to continual learning. Its ability to discern the relevance of parameters to different tasks allows it to create balanced models that not only excel in new tasks but also resist catastrophic forgetting of historical information. These attributes solidify CoFiMA's position as a potential go-to framework for initiatives confronting the adversities of CL.

It's also crucial to note the research indicates performance can be influenced by the choice of the pre-trained model (PTM) used as a backbone. The paper thoroughly assesses the impact of using different architectures, both supervised and self-supervised, providing valuable insights into the role of pre-training in enhancing continual learning strategies.

PDF Markdown

GitHub

GitHub - IemProg/CoFiMA: Official code for "Weighted Ensemble Models Are Strong Continual Learners" (28 stars)

Tweets

https://twitter.com/danie1marczak/status/1880934457508819331