Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
38 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

CoLeCLIP: Open-Domain Continual Learning via Joint Task Prompt and Vocabulary Learning (2403.10245v1)

Published 15 Mar 2024 in cs.CV

Abstract: This paper explores the problem of continual learning (CL) of vision-LLMs (VLMs) in open domains, where the models need to perform continual updating and inference on a streaming of datasets from diverse seen and unseen domains with novel classes. Such a capability is crucial for various applications in open environments, e.g., AI assistants, autonomous driving systems, and robotics. Current CL studies mostly focus on closed-set scenarios in a single domain with known classes. Large pre-trained VLMs like CLIP have demonstrated superior zero-shot recognition ability, and a number of recent studies leverage this ability to mitigate catastrophic forgetting in CL, but they focus on closed-set CL in a single domain dataset. Open-domain CL of large VLMs is significantly more challenging due to 1) large class correlations and domain gaps across the datasets and 2) the forgetting of zero-shot knowledge in the pre-trained VLMs in addition to the knowledge learned from the newly adapted datasets. In this work we introduce a novel approach, termed CoLeCLIP, that learns an open-domain CL model based on CLIP. It addresses these challenges by a joint learning of a set of task prompts and a cross-domain class vocabulary. Extensive experiments on 11 domain datasets show that CoLeCLIP outperforms state-of-the-art methods for open-domain CL under both task- and class-incremental learning settings.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (45)
  1. Rainbow memory: Continual learning with a memory of diverse samples. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 8218–8227, 2021.
  2. Food-101–mining discriminative components with random forests. In Computer Vision–ECCV 2014: 13th European Conference, Zurich, Switzerland, September 6-12, 2014, Proceedings, Part VI 13, pages 446–461. Springer, 2014.
  3. Riemannian walk for incremental learning: Understanding forgetting and intransigence. In Proceedings of the European conference on computer vision (ECCV), pages 532–547, 2018.
  4. Describing textures in the wild. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 3606–3613, 2014.
  5. Li Deng. The mnist database of handwritten digit images for machine learning research [best of the web]. IEEE signal processing magazine, 29(6):141–142, 2012.
  6. Learning without memorizing. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2019.
  7. Don’t stop learning: Towards continual learning for the clip model. arXiv preprint arXiv: 2207.09248, 2022.
  8. Podnet: Pooled outputs distillation for small-tasks incremental learning. In Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XX 16, pages 86–102. Springer, 2020.
  9. Dytox: Transformers for continual learning with dynamic token expansion. Computer Vision and Pattern Recognition, 2021.
  10. Li Fei-Fei. Learning generative visual models from few training examples: An incremental bayesian approach tested on 101 object categories. In 2004 conference on computer vision and pattern recognition workshop, pages 178–178. IEEE, 2004.
  11. A unified continual learning framework with general parameter-efficient tuning. arXiv preprint arXiv:2303.10070, 2023.
  12. Eurosat: A novel dataset and deep learning benchmark for land use and land cover classification. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 12(7):2217–2226, 2019.
  13. Distilling the knowledge in a neural network. arXiv preprint arXiv: 1503.02531, 2015.
  14. Parameter-efficient transfer learning for nlp. International Conference on Machine Learning, 2019.
  15. Lora: Low-rank adaptation of large language models. ICLR, 2022.
  16. Scaling up visual and vision-language representation learning with noisy text supervision. International Conference on Machine Learning, 2021.
  17. Overcoming catastrophic forgetting in neural networks. Proceedings of the national academy of sciences, 114(13):3521–3526, 2017.
  18. 3d object representations for fine-grained categorization. In Proceedings of the IEEE international conference on computer vision workshops, pages 554–561, 2013.
  19. Learning multiple layers of features from tiny images. 2009.
  20. The power of scale for parameter-efficient prompt tuning. arXiv preprint arXiv: 2104.08691, 2021.
  21. Blip-2: Bootstrapping language-image pre-training with frozen image encoders and large language models. arXiv preprint arXiv:2301.12597, 2023.
  22. Prefix-tuning: Optimizing continuous prompts for generation. arXiv preprint arXiv: 2101.00190, 2021.
  23. Learning without forgetting. IEEE transactions on pattern analysis and machine intelligence, 40(12):2935–2947, 2017.
  24. Energy-based out-of-distribution detection. In Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, NeurIPS 2020, December 6-12, 2020, virtual, 2020.
  25. Gradient episodic memory for continual learning. Advances in neural information processing systems, 30, 2017.
  26. Augmented geometric distillation for data-free incremental person reid. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 7329–7338, 2022.
  27. Fine-grained visual classification of aircraft. arXiv preprint arXiv:1306.5151, 2013.
  28. Automated flower classification over a large number of classes. In 2008 Sixth Indian conference on computer vision, graphics & image processing, pages 722–729. IEEE, 2008.
  29. Cats and dogs. In 2012 IEEE conference on computer vision and pattern recognition, pages 3498–3505. IEEE, 2012.
  30. Gdumb: A simple approach that questions our progress in continual learning. In Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part II 16, pages 524–540. Springer, 2020.
  31. Learning transferable visual models from natural language supervision. In International conference on machine learning, pages 8748–8763. PMLR, 2021.
  32. icarl: Incremental classifier and representation learning. In Proceedings of the IEEE conference on Computer Vision and Pattern Recognition, pages 2001–2010, 2017.
  33. Coda-prompt: Continual decomposed attention-based prompting for rehearsal-free continual learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 11909–11919, 2023.
  34. Foster: Feature boosting and compression for class-incremental learning. In European conference on computer vision, pages 398–414. Springer, 2022a.
  35. S-prompts learning with pre-trained transformers: An occam’s razor for domain incremental learning. Neural Information Processing Systems, 2022b.
  36. Dualprompt: Complementary prompting for rehearsal-free continual learning. In European Conference on Computer Vision, pages 631–648. Springer, 2022c.
  37. Learning to prompt for continual learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 139–149, 2022d.
  38. Large scale incremental learning. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 374–382, 2019.
  39. Sun database: Large-scale scene recognition from abbey to zoo. In 2010 IEEE computer society conference on computer vision and pattern recognition, pages 3485–3492. IEEE, 2010.
  40. Der: Dynamically expandable representation for class incremental learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 3014–3023, 2021.
  41. Lifelong learning with dynamically expandable networks. arXiv preprint arXiv:1708.01547, 2017.
  42. Continual learning through synaptic intelligence. International Conference on Machine Learning, 2017.
  43. Preventing zero-shot transfer degradation in continual learning of vision-language models. arXiv preprint arXiv:2303.06628, 2023.
  44. A model or 603 exemplars: Towards memory-efficient class-incremental learning. arXiv preprint arXiv:2205.13218, 2022.
  45. Learning without forgetting for vision-language models. arXiv preprint arXiv: 2305.19270, 2023.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (9)
  1. Yukun Li (34 papers)
  2. Guansong Pang (82 papers)
  3. Wei Suo (12 papers)
  4. Chenchen Jing (10 papers)
  5. Yuling Xi (2 papers)
  6. Lingqiao Liu (113 papers)
  7. Hao Chen (1005 papers)
  8. Guoqiang Liang (22 papers)
  9. Peng Wang (831 papers)
Citations (6)