CLAP4CLIP: Continual Learning with Probabilistic Finetuning for Vision-Language Models (2403.19137v3)
Abstract: Continual learning (CL) aims to help deep neural networks learn new knowledge while retaining what has been learned. Owing to their powerful generalizability, pre-trained vision-LLMs such as Contrastive Language-Image Pre-training (CLIP) have lately gained traction as practical CL candidates. However, the domain mismatch between the pre-training and the downstream CL tasks often calls for finetuning of the CLIP on the latter. Most existing finetuning methods exhibit deterministic nature. This makes them overlook the many possible interactions across the input modalities and deems them unsafe for high-risk tasks requiring reliable uncertainty estimation. To address these, our work proposes Continual LeArning with Probabilistic finetuning (CLAP) - a probabilistic modeling framework over visual-guided text features per task, thus providing more calibrated CL finetuning. Unlike recent data-hungry anti-forgetting CL techniques, CLAP alleviates forgetting by exploiting the rich pre-trained knowledge of CLIP for weight initialization and distribution regularization of task-specific parameters. Cooperating with the diverse range of existing prompting methods, CLAP can surpass the predominant deterministic finetuning approaches for CL with CLIP. We conclude with out-of-the-box applications of superior uncertainty estimation abilities of CLAP including novel data detection and exemplar selection within the existing CL setups. Our code is available at \url{https://github.com/srvCodes/clap4clip}.
- Flamingo: a visual language model for few-shot learning. Advances in Neural Information Processing Systems, 35:23716–23736, 2022.
- Memory aware synapses: Learning what (not) to forget. In Proceedings of the European conference on computer vision (ECCV), pages 139–154, 2018.
- Language models are few-shot learners. Advances in neural information processing systems, 33:1877–1901, 2020.
- Language-aware soft prompting for vision & language foundation models, 2023.
- End-to-end incremental learning. In Proceedings of the European conference on computer vision (ECCV), pages 233–248, 2018.
- Riemannian walk for incremental learning: Understanding forgetting and intransigence. In Proceedings of the European conference on computer vision (ECCV), pages 532–547, 2018.
- On tiny episodic memories in continual learning. arXiv preprint arXiv:1902.10486, 2019.
- Remote sensing image scene classification: Benchmark and state of the art. Proceedings of the IEEE, 105(10):1865–1883, 2017.
- Make prompts adaptable: Bayesian modeling for vision-language prompt learning with data-dependent prior. arXiv preprint arXiv:2401.06799, 2024.
- Describing textures in the wild. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 3606–3613, 2014.
- The relationship between precision-recall and roc curves. In Proceedings of the 23rd international conference on Machine learning, pages 233–240, 2006.
- Bayesian prompt learning for image-language model generalization. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 15237–15246, 2023.
- An image is worth 16x16 words: Transformers for image recognition at scale. In International Conference on Learning Representations, 2021.
- Dytox: Transformers for continual learning with dynamic token expansion. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 9285–9295, 2022.
- Towards robust evaluations of continual learning. International Conference on Machine Learning (ICML) Workshop, page 9, 2018.
- Vincent Fortuin. Priors in bayesian deep learning: A review. International Statistical Review, 90(3):563–591, 2022.
- Clip-adapter: Better vision-language models with feature adapters. arXiv preprint arXiv:2110.04544, 2021.
- Neural processes. arXiv preprint arXiv:1807.01622, 2018.
- Andrew Gelman. Prior distributions for variance parameters in hierarchical models (comment on article by browne and draper). 2006.
- Overcoming the stability gap in continual learning. arXiv preprint arXiv:2306.01904, 2023.
- Out-of-distribution detection in unsupervised continual learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 3850–3855, 2022.
- Eurosat: A novel dataset and deep learning benchmark for land use and land cover classification. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 12(7):2217–2226, 2019.
- The many faces of robustness: A critical analysis of out-of-distribution generalization. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 8340–8349, 2021.
- Learning a unified classifier incrementally via rebalancing. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 831–839, 2019.
- LoRA: Low-rank adaptation of large language models. In International Conference on Learning Representations, 2022.
- Continual learning in sensor-based human activity recognition: An empirical benchmark analysis. Information Sciences, 575:1–21, 2021.
- NPCL: Neural processes for uncertainty-aware continual learning. In Thirty-seventh Conference on Neural Information Processing Systems, 2023.
- Forget-free continual learning with winning subnetworks. In International Conference on Machine Learning, pages 10734–10750. PMLR, 2022.
- Maple: Multi-modal prompt learning. In CVPR, 2023.
- Learning to prompt with text only supervision for vision-language models. arXiv preprint arXiv:2401.02418, 2024.
- Multi-task processes. In International Conference on Learning Representations, 2022.
- Auto-encoding variational bayes. arXiv preprint arXiv:1312.6114, 2013.
- Overcoming catastrophic forgetting in neural networks. Proceedings of the national academy of sciences, 114(13):3521–3526, 2017.
- Learning multiple layers of features from tiny images. 2009.
- Imagenet classification with deep convolutional neural networks. Advances in neural information processing systems, 25, 2012.
- Fine-tuning can distort pretrained features and underperform out-of-distribution. ICLR, 2022.
- Continual evaluation for lifelong learning: Identifying the stability gap. In The Eleventh International Conference on Learning Representations, 2023.
- Empirical evaluation of neural process objectives. In NeurIPS workshop on Bayesian Deep Learning, 2018.
- Yann LeCun. A path towards autonomous machine intelligence version 0.9. 2, 2022-06-27. Open Review, 62, 2022.
- Continual learning in the teacher-student setup: Impact of task similarity. In Proceedings of the 38th International Conference on Machine Learning, pages 6109–6119. PMLR, 2021.
- Regularization shortcomings for continual learning, 2021.
- Bart: Denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 7871–7880, 2020.
- Improved regularization and robustness for fine-tuning in neural networks. In Advances in Neural Information Processing Systems, 2021.
- Learning without forgetting. IEEE transactions on pattern analysis and machine intelligence, 40(12):2935–2947, 2017.
- Energy-based out-of-distribution detection. Advances in neural information processing systems, 33:21464–21475, 2020.
- Gradient episodic memory for continual learning. Advances in neural information processing systems, 30, 2017.
- Prompt distribution learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 5206–5215, 2022.
- Class-incremental learning: survey and performance evaluation on image classification. IEEE Transactions on Pattern Analysis and Machine Intelligence, 45(5):5513–5533, 2022.
- Ranpac: Random projections and pre-trained models for continual learning. Advances in Neural Information Processing Systems, 36, 2024.
- Visual classification via description from large language models. In The Eleventh International Conference on Learning Representations, 2023.
- Revisiting the calibration of modern neural networks. Advances in Neural Information Processing Systems, 34:15682–15694, 2021.
- Obtaining well calibrated probabilities using bayesian binning. In Proceedings of the AAAI conference on artificial intelligence, 2015.
- Continual vision-language representaion learning with off-diagonal information, 2023.
- A visual vocabulary for flower classification. In 2006 IEEE computer society conference on computer vision and pattern recognition (CVPR’06), pages 1447–1454. IEEE, 2006.
- Stochastic differential equations. Springer, 2003.
- Representation learning with contrastive predictive coding. arXiv preprint arXiv:1807.03748, 2018.
- Continual learning via local module composition. Advances in Neural Information Processing Systems, 34:30298–30312, 2021.
- Continual deep learning by functional regularisation of memorable past. In Advances in Neural Information Processing Systems, pages 4453–4464. Curran Associates, Inc., 2020.
- Cats and dogs. In 2012 IEEE Conference on Computer Vision and Pattern Recognition, pages 3498–3505, 2012.
- Combined scaling for zero-shot transfer learning. Neurocomputing, page 126658, 2023.
- What does a platypus look like? generating customized prompts for zero-shot image classification. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 15691–15701, 2023.
- Vt-clip: Enhancing vision-language models with visual-guided texts. arXiv preprint arXiv:2112.02399, 2021.
- Learning transferable visual models from natural language supervision. In International conference on machine learning, pages 8748–8763. PMLR, 2021.
- Anatomy of catastrophic forgetting: Hidden representations and task semantics. ICML workshop on Continual Learning, 2020.
- icarl: Incremental classifier and representation learning. In Proceedings of the IEEE conference on Computer Vision and Pattern Recognition, pages 2001–2010, 2017.
- The precision-recall plot is more informative than the roc plot when evaluating binary classifiers on imbalanced datasets. PloS one, 10(3):e0118432, 2015.
- Coda-prompt: Continual decomposed attention-based prompting for rehearsal-free continual learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 11909–11919, 2023.
- Climb: A continual learning benchmark for vision-and-language tasks. Advances in Neural Information Processing Systems, 35:29440–29453, 2022.
- I2i: Initializing adapters with improvised knowledge. arXiv preprint arXiv:2304.02168, 2023.
- Clip model is an efficient continual learner. ArXiv, abs/2210.03114, 2022.
- Attention is all you need. Advances in neural information processing systems, 30, 2017.
- Rehearsal revealed: The limits and merits of revisiting samples in continual learning. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 9385–9394, 2021.
- The caltech-ucsd birds-200-2011 dataset. 2011.
- Attriclip: A non-incremental learner for incremental knowledge learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 3654–3663, 2023.
- Sparcl: Sparse continual learning on the edge. Advances in Neural Information Processing Systems, 35:20366–20380, 2022a.
- Dualprompt: Complementary prompting for rehearsal-free continual learning. In European Conference on Computer Vision, pages 631–648. Springer, 2022b.
- Learning to prompt for continual learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 139–149, 2022c.
- Max Welling. Herding dynamical weights to learn. In Proceedings of the 26th Annual International Conference on Machine Learning, pages 1121–1128, 2009.
- Robust fine-tuning of zero-shot models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 7959–7971, 2022.
- Large scale incremental learning. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 374–382, 2019.
- Learning bayesian sparse networks with full experience replay for continual learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 109–118, 2022.
- Der: Dynamically expandable representation for class incremental learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 3014–3023, 2021.
- Generalized out-of-distribution detection: A survey. arXiv preprint arXiv:2110.11334, 2021.
- Coca: Contrastive captioners are image-text foundation models. Transactions on Machine Learning Research, 2022.
- Continual learning through synaptic intelligence. In International conference on machine learning, pages 3987–3995. PMLR, 2017.
- A large-scale study of representation learning with the visual task adaptation benchmark. arXiv preprint arXiv:1910.04867, 2019.
- Tip-adapter: Training-free adaption of clip for few-shot classification. In European Conference on Computer Vision, pages 493–510. Springer, 2022a.
- Grow and merge: A unified framework for continuous categories discovery. Advances in Neural Information Processing Systems, 35:27455–27468, 2022b.
- Preventing zero-shot transfer degradation in continual learning of vision-language models. In 2023 IEEE/CVF International Conference on Computer Vision (ICCV), pages 19068–19079, Los Alamitos, CA, USA, 2023. IEEE Computer Society.
- Deep class-incremental learning: A survey. arXiv preprint arXiv:2302.03648, 2023a.
- Revisiting class-incremental learning with pre-trained models: Generalizability and adaptivity are all you need. arXiv preprint arXiv:2303.07338, 2023b.
- Learning without forgetting for vision-language models. arXiv preprint arXiv:2305.19270, 2023c.
- Conditional prompt learning for vision-language models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 16816–16825, 2022a.
- Learning to prompt for vision-language models. International Journal of Computer Vision, 130(9):2337–2348, 2022b.
- Debiased fine-tuning for vision-language models by prompt regularization. arXiv preprint arXiv:2301.12429, 2023.
- Saurav Jha (14 papers)
- Dong Gong (56 papers)
- Lina Yao (194 papers)