CoLeCLIP: Open-Domain Continual Learning via Joint Task Prompt and Vocabulary Learning (2403.10245v1)
Abstract: This paper explores the problem of continual learning (CL) of vision-LLMs (VLMs) in open domains, where the models need to perform continual updating and inference on a streaming of datasets from diverse seen and unseen domains with novel classes. Such a capability is crucial for various applications in open environments, e.g., AI assistants, autonomous driving systems, and robotics. Current CL studies mostly focus on closed-set scenarios in a single domain with known classes. Large pre-trained VLMs like CLIP have demonstrated superior zero-shot recognition ability, and a number of recent studies leverage this ability to mitigate catastrophic forgetting in CL, but they focus on closed-set CL in a single domain dataset. Open-domain CL of large VLMs is significantly more challenging due to 1) large class correlations and domain gaps across the datasets and 2) the forgetting of zero-shot knowledge in the pre-trained VLMs in addition to the knowledge learned from the newly adapted datasets. In this work we introduce a novel approach, termed CoLeCLIP, that learns an open-domain CL model based on CLIP. It addresses these challenges by a joint learning of a set of task prompts and a cross-domain class vocabulary. Extensive experiments on 11 domain datasets show that CoLeCLIP outperforms state-of-the-art methods for open-domain CL under both task- and class-incremental learning settings.
- Rainbow memory: Continual learning with a memory of diverse samples. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 8218–8227, 2021.
- Food-101–mining discriminative components with random forests. In Computer Vision–ECCV 2014: 13th European Conference, Zurich, Switzerland, September 6-12, 2014, Proceedings, Part VI 13, pages 446–461. Springer, 2014.
- Riemannian walk for incremental learning: Understanding forgetting and intransigence. In Proceedings of the European conference on computer vision (ECCV), pages 532–547, 2018.
- Describing textures in the wild. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 3606–3613, 2014.
- Li Deng. The mnist database of handwritten digit images for machine learning research [best of the web]. IEEE signal processing magazine, 29(6):141–142, 2012.
- Learning without memorizing. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2019.
- Don’t stop learning: Towards continual learning for the clip model. arXiv preprint arXiv: 2207.09248, 2022.
- Podnet: Pooled outputs distillation for small-tasks incremental learning. In Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XX 16, pages 86–102. Springer, 2020.
- Dytox: Transformers for continual learning with dynamic token expansion. Computer Vision and Pattern Recognition, 2021.
- Li Fei-Fei. Learning generative visual models from few training examples: An incremental bayesian approach tested on 101 object categories. In 2004 conference on computer vision and pattern recognition workshop, pages 178–178. IEEE, 2004.
- A unified continual learning framework with general parameter-efficient tuning. arXiv preprint arXiv:2303.10070, 2023.
- Eurosat: A novel dataset and deep learning benchmark for land use and land cover classification. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 12(7):2217–2226, 2019.
- Distilling the knowledge in a neural network. arXiv preprint arXiv: 1503.02531, 2015.
- Parameter-efficient transfer learning for nlp. International Conference on Machine Learning, 2019.
- Lora: Low-rank adaptation of large language models. ICLR, 2022.
- Scaling up visual and vision-language representation learning with noisy text supervision. International Conference on Machine Learning, 2021.
- Overcoming catastrophic forgetting in neural networks. Proceedings of the national academy of sciences, 114(13):3521–3526, 2017.
- 3d object representations for fine-grained categorization. In Proceedings of the IEEE international conference on computer vision workshops, pages 554–561, 2013.
- Learning multiple layers of features from tiny images. 2009.
- The power of scale for parameter-efficient prompt tuning. arXiv preprint arXiv: 2104.08691, 2021.
- Blip-2: Bootstrapping language-image pre-training with frozen image encoders and large language models. arXiv preprint arXiv:2301.12597, 2023.
- Prefix-tuning: Optimizing continuous prompts for generation. arXiv preprint arXiv: 2101.00190, 2021.
- Learning without forgetting. IEEE transactions on pattern analysis and machine intelligence, 40(12):2935–2947, 2017.
- Energy-based out-of-distribution detection. In Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, NeurIPS 2020, December 6-12, 2020, virtual, 2020.
- Gradient episodic memory for continual learning. Advances in neural information processing systems, 30, 2017.
- Augmented geometric distillation for data-free incremental person reid. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 7329–7338, 2022.
- Fine-grained visual classification of aircraft. arXiv preprint arXiv:1306.5151, 2013.
- Automated flower classification over a large number of classes. In 2008 Sixth Indian conference on computer vision, graphics & image processing, pages 722–729. IEEE, 2008.
- Cats and dogs. In 2012 IEEE conference on computer vision and pattern recognition, pages 3498–3505. IEEE, 2012.
- Gdumb: A simple approach that questions our progress in continual learning. In Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part II 16, pages 524–540. Springer, 2020.
- Learning transferable visual models from natural language supervision. In International conference on machine learning, pages 8748–8763. PMLR, 2021.
- icarl: Incremental classifier and representation learning. In Proceedings of the IEEE conference on Computer Vision and Pattern Recognition, pages 2001–2010, 2017.
- Coda-prompt: Continual decomposed attention-based prompting for rehearsal-free continual learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 11909–11919, 2023.
- Foster: Feature boosting and compression for class-incremental learning. In European conference on computer vision, pages 398–414. Springer, 2022a.
- S-prompts learning with pre-trained transformers: An occam’s razor for domain incremental learning. Neural Information Processing Systems, 2022b.
- Dualprompt: Complementary prompting for rehearsal-free continual learning. In European Conference on Computer Vision, pages 631–648. Springer, 2022c.
- Learning to prompt for continual learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 139–149, 2022d.
- Large scale incremental learning. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 374–382, 2019.
- Sun database: Large-scale scene recognition from abbey to zoo. In 2010 IEEE computer society conference on computer vision and pattern recognition, pages 3485–3492. IEEE, 2010.
- Der: Dynamically expandable representation for class incremental learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 3014–3023, 2021.
- Lifelong learning with dynamically expandable networks. arXiv preprint arXiv:1708.01547, 2017.
- Continual learning through synaptic intelligence. International Conference on Machine Learning, 2017.
- Preventing zero-shot transfer degradation in continual learning of vision-language models. arXiv preprint arXiv:2303.06628, 2023.
- A model or 603 exemplars: Towards memory-efficient class-incremental learning. arXiv preprint arXiv:2205.13218, 2022.
- Learning without forgetting for vision-language models. arXiv preprint arXiv: 2305.19270, 2023.
- Yukun Li (34 papers)
- Guansong Pang (82 papers)
- Wei Suo (12 papers)
- Chenchen Jing (10 papers)
- Yuling Xi (2 papers)
- Lingqiao Liu (113 papers)
- Hao Chen (1005 papers)
- Guoqiang Liang (22 papers)
- Peng Wang (831 papers)