Continual Learning with Weight Interpolation (2404.04002v2)
Abstract: Continual learning poses a fundamental challenge for modern machine learning systems, requiring models to adapt to new tasks while retaining knowledge from previous ones. Addressing this challenge necessitates the development of efficient algorithms capable of learning from data streams and accumulating knowledge over time. This paper proposes a novel approach to continual learning utilizing the weight consolidation method. Our method, a simple yet powerful technique, enhances robustness against catastrophic forgetting by interpolating between old and new model weights after each novel task, effectively merging two models to facilitate exploration of local minima emerging after arrival of new concepts. Moreover, we demonstrate that our approach can complement existing rehearsal-based replay approaches, improving their accuracy and further mitigating the forgetting phenomenon. Additionally, our method provides an intuitive mechanism for controlling the stability-plasticity trade-off. Experimental results showcase the significant performance enhancement to state-of-the-art experience replay algorithms the proposed weight consolidation approach offers. Our algorithm can be downloaded from https://github.com/jedrzejkozal/weight-interpolation-cl.
- Git re-basin: Merging models modulo permutation symmetries, 2023.
- Online continual learning with maximally interfered retrieval. CoRR, abs/1908.04742, 2019.
- Class-incremental continual learning into the extended der-verse. CoRR, abs/2201.00766, 2022.
- Weight-space symmetry in deep networks gives rise to permutation saddles, connected by equal-loss valleys across the loss landscape. CoRR, abs/1907.02911, 2019.
- Dark experience for general continual learning: a strong, simple baseline. In Advances in Neural Information Processing Systems, pages 15920–15930. Curran Associates, Inc., 2020.
- Reducing representation drift in online continual learning. CoRR, abs/2104.05025, 2021.
- Riemannian walk for incremental learning: Understanding forgetting and intransigence. CoRR, abs/1801.10112, 2018a.
- Efficient lifelong learning with A-GEM. CoRR, abs/1812.00420, 2018b.
- Continual learning with tiny episodic memories. CoRR, abs/1902.10486, 2019.
- Lifelong Machine Learning. Morgan & Claypool Publishers, 2nd edition, 2018.
- Online bias correction for task-free continual learning. In The Eleventh International Conference on Learning Representations, 2023.
- The role of permutation invariance in linear mode connectivity of neural networks. In International Conference on Learning Representations, 2022.
- Sharpness-aware minimization for efficiently improving generalization, 2021.
- Robert French. Catastrophic forgetting in connectionist networks. Trends in cognitive sciences, 3:128–135, 1999.
- Loss surfaces, mode connectivity, and fast ensembling of dnns. In Advances in Neural Information Processing Systems. Curran Associates, Inc., 2018.
- Not just selection, but exploration: Online class-incremental continual learning via dual view consistency. In IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2022, New Orleans, LA, USA, June 18-24, 2022, pages 7432–7441. IEEE, 2022.
- Deep residual learning for image recognition. CoRR, abs/1512.03385, 2015.
- Distilling the knowledge in a neural network, 2015.
- Altersgd: Finding flat minima for continual learning by alternative training. CoRR, abs/2107.05804, 2021.
- Repair: Renormalizing permuted activations for interpolation repair, 2022.
- Mildly overparameterized relu networks have a favorable loss landscape, 2024.
- Overcoming catastrophic forgetting in neural networks. CoRR, abs/1612.00796, 2016.
- Class-incremental experience replay for continual learning under concept drift. In IEEE Conference on Computer Vision and Pattern Recognition Workshops, CVPR Workshops, pages 3649–3658, 2021.
- Alex Krizhevsky. Learning multiple layers of features from tiny images. Technical report, 2009.
- Bert weaver: Using weight averaging to enable lifelong learning for transformer-based models in biomedical semantic search engines, 2023.
- Lifelong learning with dynamically expandable networks. CoRR, abs/1708.01547, 2017.
- Visualizing the loss landscape of neural nets. CoRR, abs/1712.09913, 2017.
- Learning without forgetting. CoRR, abs/1606.09282, 2016.
- Gradient episodic memory for continual learning. In Advances in Neural Information Processing Systems. Curran Associates, Inc., 2017.
- Weighted ensemble models are strong continual learners, 2024.
- Mixed precision training. CoRR, abs/1710.03740, 2017.
- Linear mode connectivity in multitask and continual learning. CoRR, abs/2010.04495, 2020.
- Architecture matters in continual learning. CoRR, abs/2202.00275, 2022.
- Uniform convergence may be unable to explain generalization in deep learning. In Advances in Neural Information Processing Systems. Curran Associates, Inc., 2019.
- Center loss regularization for continual learning. CoRR, abs/2110.11314, 2021.
- Continual learning by asymmetric loss approximation with single-side overestimation. In 2019 IEEE/CVF International Conference on Computer Vision, ICCV 2019, Seoul, Korea (South), October 27 - November 2, 2019, pages 3334–3343. IEEE, 2019.
- On the saddle point problem for non-convex optimization. CoRR, abs/1405.4604, 2014.
- Re-basin via implicit sinkhorn differentiation, 2022.
- Gdumb: A simple approach that questions our progress in continual learning. In Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part II 16, pages 524–540. Springer, 2020a.
- Gdumb: A simple approach that questions our progress in continual learning. In Computer Vision – ECCV 2020, pages 524–540, Cham, 2020b. Springer International Publishing.
- icarl: Incremental classifier and representation learning. CoRR, abs/1611.07725, 2016.
- Progressive neural networks. CoRR, abs/1606.04671, 2016.
- Singularity of the hessian in deep learning. CoRR, abs/1611.07476, 2016.
- Inverted residuals and linear bottlenecks: Mobile networks for classification, detection and segmentation. CoRR, abs/1801.04381, 2018.
- A deeper look at the hessian eigenspectrum of deep neural networks and its applications to regularization. ArXiv, abs/2012.03801, 2020.
- Progress & compress: A scalable framework for continual learning, 2018.
- Zipit! merging models from different tasks without training, 2024.
- Efficientnet: Rethinking model scaling for convolutional neural networks. CoRR, abs/1905.11946, 2019.
- Optimizing mode connectivity via neuron alignment. CoRR, abs/2009.02439, 2020.
- Gido M. van de Ven and Andreas S. Tolias. Three scenarios for continual learning. CoRR, abs/1904.07734, 2019.
- Jiayu Wu. Tiny imagenet challenge. 2017.
- Large scale incremental learning. 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 374–382, 2019.
- DER: dynamically expandable representation for class incremental learning. CoRR, abs/2103.16788, 2021.
- Wide residual networks. CoRR, abs/1605.07146, 2016.
- Improved multitask learning through synaptic intelligence. CoRR, abs/1703.04200, 2017a.
- Continual learning through synaptic intelligence. In Proceedings of the 34th International Conference on Machine Learning, pages 3987–3995. PMLR, 2017b.
Sponsor
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.