Provable Contrastive Continual Learning (2405.18756v1)
Abstract: Continual learning requires learning incremental tasks with dynamic data distributions. So far, it has been observed that employing a combination of contrastive loss and distillation loss for training in continual learning yields strong performance. To the best of our knowledge, however, this contrastive continual learning framework lacks convincing theoretical explanations. In this work, we fill this gap by establishing theoretical performance guarantees, which reveal how the performance of the model is bounded by training losses of previous tasks in the contrastive continual learning framework. Our theoretical explanations further support the idea that pre-training can benefit continual learning. Inspired by our theoretical analysis of these guarantees, we propose a novel contrastive continual learning algorithm called CILA, which uses adaptive distillation coefficients for different tasks. These distillation coefficients are easily computed by the ratio between average distillation losses and average contrastive losses from previous tasks. Our method shows great improvement on standard benchmarks and achieves new state-of-the-art performance.
- Gradient based sample selection for online continual learning. In NeurIPS, 2019.
- A theoretical analysis of contrastive unsupervised representation learning. arXiv:1902.09229, 2019.
- Eec: Learning to encode and regenerate images for continual learning. In ICLR, 2021.
- Rainbow memory: Continual learning with a memory of diverse samples. In CVPR, 2021.
- Measuring and regularizing networks in function space, 2019.
- Signature verification using a” siamese” time delay neural network. In NeurIPS, 1993.
- Dark experience for general continual learning: a strong, simple baseline. In NeurIPS, 2020.
- Online learned continual compression with adaptive quantization modules. In ICML, 2019.
- Co2l: Contrastive continual learning. In ICCV, 2021.
- Efficient lifelong learning with a-gem. In ICLR, 2018.
- Using hindsight to anchor past knowledge in continual learning. In AAAI, 2019.
- A simple framework for contrastive learning of visual representations. In ICML, 2020a.
- Improved baselines with momentum contrastive learning. arXiv:2003.04297, 2020b.
- Learning a similarity metric discriminatively, with application to face verification. In CVPR, 2005.
- Gan memory with no forgetting. In NeurIPS, 2020.
- How catastrophic can catastrophic forgetting be in linear regression? In Conference on Learning Theory, pp. 4028–4079. PMLR, 2022.
- Self-supervised models are continual learners. In CVPR, 2022.
- Self-supervised training enhances online continual learning. In BMVC, 2021.
- Knowledge distillation: A survey. IJCV, 2021.
- Nispa: Neuro-inspired stability-plasticity adaptation for continual learning in sparse networks. In ICML, 2022.
- Dimensionality reduction by learning an invariant mapping. In CVPR, 2006.
- Momentum contrast for unsupervised visual representation learning. In CVPR, 2020.
- Distilling the knowledge in a neural network. arXiv:1503.02531, 2015.
- How well does self-supervised pre-training perform with streaming data? In ICLR, 2022.
- Towards the generalization of contrastive self-supervised learning. In The Eleventh International Conference on Learning Representations, 2023.
- Meta-learning representations for continual learning. In NeurIPS, 2019.
- Continual learning with node-importance based adaptive group sparse regularization. In NeurIPS, 2020.
- Supervised contrastive learning. In NeurIPS, 2020.
- Overcoming catastrophic forgetting in neural networks. Proceedings of the National Academy of Sciences, 2017.
- Krizhevsky, A. Learning multiple layers of features from tiny images. https://www.cs.toronto.edu/~kriz/index.html, 2009.
- Tiny imagenet visual recognition challenge. https://tiny-imagenet.herokuapp.com, 2015.
- Gradient-based learning applied to document recognition. Proceedings of the IEEE, 1998.
- Overcoming catastrophic forgetting with unlabeled data in the wild. In ICCV, 2019.
- Fixed design analysis of regularization-based continual learning. In Conference on Lifelong Learning Agents, pp. 513–533. PMLR, 2023.
- Learning without forgetting. TPAMI, 2016.
- Learning without forgetting. In TPAMI, 2017.
- Rotate your networks: Better weight consolidation and less catastrophic forgetting. arXiv:1802.02950, 2018.
- Gradient episodic memory for continual learning. In NeurIPS, 2017.
- Why there are complementary learning systems in the hippocampus and neocortex: insights from the successes and failures of connectionist models of learning and memory. Psychological review, 1995.
- Self-distillation amplifies regularization in hilbert space. In NeurIPS, 2020.
- Heterogeneous knowledge distillation using information flow modeling. In CVPR, 2020.
- Dualnet: Continual learning, fast and slow. In NeurIPS, 2021.
- Gdumb: A simple approach that questions our progress in continual learning. In ECCV, 2020.
- Effect of scale on catastrophic forgetting in neural networks. In ICLR, 2022.
- Model zoo: A growing brain that learns continually. In ICLR, 2021.
- icarl: Incremental classifier and representation learning. In CVPR, 2017.
- Learning to learn without forgetting by maximizing transfer and minimizing interference. In ICLR, 2019.
- Online structured laplace approximations for overcoming catastrophic forgetting. In NeurIPS, 2018.
- Information flow in self-supervised learning. arXiv preprint arXiv:2309.17281, 2023a.
- Contrastive learning is spectral clustering on similarity graph. arXiv preprint arXiv:2303.15103, 2023b.
- Otmatch: Improving semi-supervised learning with optimal transport. arXiv preprint arXiv:2310.17455, 2023c.
- Contrastive multiview coding. In ECCV, 2020.
- Gcr: Gradient coreset based replay buffer selection for continual learning. In CVPR, 2022.
- Three scenarios for continual learning. arXiv:1904.07734, 2019.
- Representation learning with contrastive predictive coding. arXiv:1807.03748, 2019.
- A comprehensive survey of continual learning: Theory, method and application. arXiv:2302.00487, 2023.
- Meta-learning with less forgetting on large-scale non-stationary task distributions. In ECCV, 2022.
- Paying more attention to attention: Improving the performance of convolutional neural networks via attention transfer. In ICLR, 2016.
- Matrix information theory for self-supervised learning. arXiv preprint arXiv:2305.17326, 2023.
- Self-distillation as instance-specific label smoothing. In NeurIPS, 2020.
- A statistical theory of regularization-based continual learning. In International conference on machine learning. PMLR, 2024.
Collections
Sign up for free to add this paper to one or more collections.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.