On the Convergence of Continual Learning with Adaptive Methods (2404.05555v2)
Abstract: One of the objectives of continual learning is to prevent catastrophic forgetting in learning multiple tasks sequentially, and the existing solutions have been driven by the conceptualization of the plasticity-stability dilemma. However, the convergence of continual learning for each sequential task is less studied so far. In this paper, we provide a convergence analysis of memory-based continual learning with stochastic gradient descent and empirical evidence that training current tasks causes the cumulative degradation of previous tasks. We propose an adaptive method for nonconvex continual learning (NCCL), which adjusts step sizes of both previous and current tasks with the gradients. The proposed method can achieve the same convergence rate as the SGD method when the catastrophic forgetting term which we define in the paper is suppressed at each iteration. Further, we demonstrate that the proposed algorithm improves the performance of continual learning over existing methods for several image classification tasks.
- Online continual learning with maximally interfered retrieval. CoRR, abs/1908.04742, 2019a. URL http://arxiv.org/abs/1908.04742.
- Gradient based sample selection for online continual learning. In Advances in Neural Information Processing Systems, pages 11816–11825, 2019b.
- Efficient lifelong learning with A-GEM. In 7th International Conference on Learning Representations, ICLR 2019, New Orleans, LA, USA, May 6-9, 2019. OpenReview.net, 2019a.
- On tiny episodic memories in continual learning. arXiv preprint arXiv:1902.10486, 2019b.
- Using hindsight to anchor past knowledge in continual learning. arXiv preprint arXiv:2002.08165, 3, 2020a.
- Continual learning in low-rank orthogonal subspaces. Advances in Neural Information Processing Systems, 33, 2020b.
- Uncertainty-guided continual learning with bayesian neural networks. In 8th International Conference on Learning Representations, ICLR 2020, Addis Ababa, Ethiopia, April 26-30, 2020. OpenReview.net, 2020.
- Using noise to compute error surfaces in connectionist networks: A novel means of reducing catastrophic forgetting. Neural Computation, 14(7):1755–1769, 2002.
- Stochastic first-and zeroth-order methods for nonconvex stochastic programming. SIAM Journal on Optimization, 23(4):2341–2368, 2013.
- Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 770–778, 2016.
- Gradient-based editing of memory examples for online task-free continual learning. arXiv preprint arXiv:2006.15294, 2020.
- Gradient-based editing of memory examples for online task-free continual learning. Advances in Neural Information Processing Systems, 34:29193–29205, 2021.
- Overcoming catastrophic forgetting in neural networks. Proceedings of the national academy of sciences, 114(13):3521–3526, 2017.
- Optimal continual learning has perfect memory and is np-hard. arXiv preprint arXiv:2006.05188, 2020.
- Learning multiple layers of features from tiny images. 2009.
- Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11):2278–2324, 1998.
- Overcoming catastrophic forgetting with unlabeled data in the wild. In Proceedings of the IEEE International Conference on Computer Vision, pages 312–321, 2019.
- A neural dirichlet process mixture model for task-free continual learning. In 8th International Conference on Learning Representations, ICLR 2020, Addis Ababa, Ethiopia, April 26-30, 2020. OpenReview.net, 2020.
- Non-convex finite-sum optimization via scsg methods. In Advances in Neural Information Processing Systems, pages 2348–2358, 2017.
- Learning without forgetting. IEEE transactions on pattern analysis and machine intelligence, 40(12):2935–2947, 2017.
- Gradient episodic memory for continual learning. In Advances in Neural Information Processing Systems, pages 6467–6476, 2017.
- Understanding the role of training regimes in continual learning. In Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, 2020.
- Linear mode connectivity in multitask and continual learning. In International Conference on Learning Representations, 2021.
- Variational continual learning. In 6th International Conference on Learning Representations, ICLR 2018, Vancouver, BC, Canada, April 30 - May 3, 2018, Conference Track Proceedings. OpenReview.net, 2018.
- icarl: Incremental classifier and representation learning. In Proceedings of the IEEE conference on Computer Vision and Pattern Recognition, pages 2001–2010, 2017.
- Stochastic variance reduction for nonconvex optimization. In International conference on machine learning, pages 314–323, 2016a.
- Stochastic variance reduction for nonconvex optimization. In Proceedings of the 33nd International Conference on Machine Learning, ICML 2016, New York City, NY, USA, June 19-24, 2016, pages 314–323, 2016b. URL http://proceedings.mlr.press/v48/reddi16.html.
- On the convergence of adam and beyond. In International Conference on Learning Representations, 2018.
- Learning to learn without forgetting by maximizing transfer and minimizing interference. In International Conference on Learning Representations, 2018.
- Mark B. Ring. Continual learning in reinforcement environments. PhD thesis, University of Texas at Austin, TX, USA, 1995. URL http://d-nb.info/945690320.
- A tail-index analysis of stochastic gradient noise in deep neural networks. In International Conference on Machine Learning, pages 5827–5837. PMLR, 2019.
- Fractional underdamped langevin dynamics: Retargeting sgd with momentum under heavy-tailed gradient noise. In International Conference on Machine Learning, pages 8970–8980. PMLR, 2020.
- Sebastian Thrun. A lifelong learning perspective for mobile robot control. In Intelligent Robots and Systems, Selections of the International Conference on Intelligent Robots and Systems 1994, IROS 94, Munich, Germany, 12-16 September 1994, pages 201–214, 1994. 10.1016/b978-044482250-5/50015-3. URL https://doi.org/10.1016/b978-044482250-5/50015-3.
- Matching networks for one shot learning. In D. Lee, M. Sugiyama, U. Luxburg, I. Guyon, and R. Garnett, editors, Advances in Neural Information Processing Systems, volume 29. Curran Associates, Inc., 2016. URL https://proceedings.neurips.cc/paper/2016/file/90e1357833654983612fb05e3ec9148c-Paper.pdf.
- Supermasks in superposition. Advances in Neural Information Processing Systems, 33:15173–15184, 2020.
- Lifelong learning with dynamically expandable networks. In 6th International Conference on Learning Representations, ICLR 2018, Vancouver, BC, Canada, April 30 - May 3, 2018, Conference Track Proceedings. OpenReview.net, 2018.
- Adaptive methods for nonconvex optimization. In Advances in neural information processing systems, pages 9793–9803, 2018.
- Continual learning through synaptic intelligence. Proceedings of machine learning research, 70:3987, 2017.
- Why are adaptive methods good for attention models? Advances in Neural Information Processing Systems, 33, 2020.
- Seungyub Han (7 papers)
- Yeongmo Kim (2 papers)
- Taehyun Cho (6 papers)
- Jungwoo Lee (39 papers)