Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
134 tokens/sec
GPT-4o
10 tokens/sec
Gemini 2.5 Pro Pro
47 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Continual Learning Beyond a Single Model (2202.09826v3)

Published 20 Feb 2022 in cs.LG and cs.AI

Abstract: A growing body of research in continual learning focuses on the catastrophic forgetting problem. While many attempts have been made to alleviate this problem, the majority of the methods assume a single model in the continual learning setup. In this work, we question this assumption and show that employing ensemble models can be a simple yet effective method to improve continual performance. However, ensembles' training and inference costs can increase significantly as the number of models grows. Motivated by this limitation, we study different ensemble models to understand their benefits and drawbacks in continual learning scenarios. Finally, to overcome the high compute cost of ensembles, we leverage recent advances in neural network subspace to propose a computationally cheap algorithm with similar runtime to a single model yet enjoying the performance benefits of ensembles.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (49)
  1. Expert gate: Lifelong learning with a network of experts. In 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp.  7120–7129, 2017.
  2. Generalisation guarantees for continual learning with orthogonal gradient descent. arXiv preprint arXiv:2006.11942, 2020.
  3. Loss surface simplexes for mode connecting volumes and fast ensembling. In Marina Meila and Tong Zhang (eds.), Proceedings of the 38th International Conference on Machine Learning, ICML 2021, 18-24 July 2021, Virtual Event, volume 139 of Proceedings of Machine Learning Research, pp.  769–779. PMLR, 2021.
  4. Leo Breiman. Bagging predictors. Machine learning, 24(2):123–140, 1996.
  5. On anytime learning at macroscale. In Sarath Chandar, Razvan Pascanu, and Doina Precup (eds.), Proceedings of The 1st Conference on Lifelong Learning Agents, volume 199 of Proceedings of Machine Learning Research, pp.  165–182. PMLR, 22–24 Aug 2022. URL https://proceedings.mlr.press/v199/caccia22a.html.
  6. Rose: Robust online self-adjusting ensemble for continual learning on imbalanced drifting data streams. Machine Learning, 04 2022. doi: 10.1007/s10994-022-06168-x.
  7. Ensemble selection from libraries of models. In Proceedings of the twenty-first international conference on Machine learning, pp.  18, 2004.
  8. Efficient lifelong learning with a-GEM. In International Conference on Learning Representations, 2019a.
  9. On tiny episodic memories in continual learning. arXiv preprint arXiv:1902.10486, 2019b.
  10. Thomas G Dietterich. Ensemble methods in machine learning. In International workshop on multiple classifier systems, pp. 1–15. Springer, 2000.
  11. A theoretical analysis of catastrophic forgetting through the ntk overlap matrix. In International Conference on Artificial Intelligence and Statistics, pp.  1072–1080. PMLR, 2021.
  12. Essentially no barriers in neural network energy landscape. In International Conference on Machine Learning, pp. 1308–1317, 2018.
  13. Orthogonal gradient descent for continual learning. In International Conference on Artificial Intelligence and Statistics, pp.  3762–3773. PMLR, 2020.
  14. Pathnet: Evolution channels gradient descent in super neural networks. arXiv preprint arXiv:1701.08734, 2017.
  15. Deep ensembles: A loss landscape perspective. arXiv preprint arXiv:1912.02757, 2019.
  16. Loss surfaces, mode connectivity, and fast ensembling of dnns. In Advances in Neural Information Processing Systems, volume 31, 2018.
  17. Learning a subspace of policies for online adaptation in reinforcement learning. arXiv preprint arXiv:2110.05169, 2021.
  18. An empirical investigation of catastrophic forgeting in gradient-based neural networks. CoRR, abs/1312.6211, 2014a.
  19. Explaining and harnessing adversarial examples. arXiv preprint arXiv:1412.6572, 2014b.
  20. Training independent subnetworks for robust prediction. arXiv preprint arXiv:2010.06610, 2020.
  21. Snapshot ensembles: Train 1, get m for free. arXiv preprint arXiv:1704.00109, 2017.
  22. Jared Kaplan. Notes on contemporary machine learning for physicists. In ”, 2019.
  23. Task-agnostic continual learning with hybrid probabilistic models. In ICML Workshop on Invertible Neural Networks, Normalizing Flows, and Explicit Likelihood Models, 2021.
  24. Overcoming catastrophic forgetting in neural networks. Proceedings of the National Academy of Sciences, 114(13):3521–3526, Mar 2017. ISSN 1091-6490. doi: 10.1073/pnas.1611835114. URL http://dx.doi.org/10.1073/pnas.1611835114.
  25. Alex Krizhevsky et al. Learning multiple layers of features from tiny images. CoRR, 2009.
  26. Simple and scalable predictive uncertainty estimation using deep ensembles. Advances in Neural Information Processing Systems, 30, 2017.
  27. Timothée Lesort. Continual learning: Tackling catastrophic forgetting in deep neural networks with replay processes. arXiv preprint arXiv:2007.00487, 2020.
  28. Continual learning in deep networks: an analysis of the last layer. arXiv preprint arXiv:2106.01834, 2021.
  29. Continual learning of new diseases with dual distillation and ensemble strategy. In Anne L. Martel, Purang Abolmaesumi, Danail Stoyanov, Diana Mateus, Maria A. Zuluaga, S. Kevin Zhou, Daniel Racoceanu, and Leo Joskowicz (eds.), Medical Image Computing and Computer Assisted Intervention – MICCAI 2020, pp.  169–178, Cham, 2020. Springer International Publishing. ISBN 978-3-030-59710-8.
  30. Habitat: A Platform for Embodied AI Research. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2019.
  31. Catastrophic interference in connectionist networks: The sequential learning problem. In Psychology of learning and motivation, volume 24, pp. 109–165. Elsevier, 1989.
  32. Dropout as an implicit gating mechanism for continual learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, pp.  232–233, 2020a.
  33. Understanding the role of training regimes in continual learning. In H. Larochelle, M. Ranzato, R. Hadsell, M. F. Balcan, and H. Lin (eds.), Advances in Neural Information Processing Systems, volume 33, pp.  7308–7320, 2020b.
  34. Linear mode connectivity in multitask and continual learning. In International Conference on Learning Representations, 2021. URL https://openreview.net/forum?id=Fmg_fQYUejf.
  35. Variational continual learning. In International Conference on Learning Representations, 2017.
  36. icarl: Incremental classifier and representation learning. In 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), volume 2017, pp.  5533–5542, 2017.
  37. Learning to learn without forgetting by maximizing transfer and minimizing interference. In International Conference on Learning Representations, 2018.
  38. ImageNet Large Scale Visual Recognition Challenge. International Journal of Computer Vision (IJCV), 115(3):211–252, 2015. doi: 10.1007/s11263-015-0816-y.
  39. Gradient projection memory for continual learning. In International Conference on Learning Representations, 2021. URL https://openreview.net/forum?id=3AOj0RCNC2.
  40. Continual learning with deep generative replay. In Advances in Neural Information Processing Systems, pp. 2990–2999, 2017.
  41. Habitat 2.0: Training home assistants to rearrange their habitat. arXiv preprint arXiv:2106.14405, 2021.
  42. S. Thrun. A lifelong learning perspective for mobile robot control. In Proceedings of IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS’94), volume 1, pp.  23–30 vol.1, 1994. doi: 10.1109/IROS.1994.407413.
  43. Functional regularisation for continual learning with gaussian processes. In ICLR 2020 : Eighth International Conference on Learning Representations, 2020.
  44. Coscl: Cooperation of small continual learners is stronger than a big one. In Shai Avidan, Gabriel J. Brostow, Moustapha Cissé, Giovanni Maria Farinella, and Tal Hassner (eds.), Computer Vision - ECCV 2022 - 17th European Conference, Tel Aviv, Israel, October 23-27, 2022, Proceedings, Part XXVI, volume 13686 of Lecture Notes in Computer Science, pp.  254–271. Springer, 2022. doi: 10.1007/978-3-031-19809-0_15. URL https://doi.org/10.1007/978-3-031-19809-0_15.
  45. Batchensemble: an alternative approach to efficient ensemble and lifelong learning. In International Conference on Learning Representations, 2020. URL https://openreview.net/forum?id=Sklf1yrYDr.
  46. Supermasks in superposition. In Advances in Neural Information Processing Systems, volume 33, pp.  15173–15184, 2020.
  47. Learning neural network subspaces. In International Conference on Machine Learning, 2021.
  48. Optimization and generalization of regularization-based continual learning: a loss approximation viewpoint. arXiv preprint arXiv:2006.10974, 2020.
  49. Continual learning through synaptic intelligence, 2017.
Citations (12)

Summary

We haven't generated a summary for this paper yet.