Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
102 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

CLIPArTT: Adaptation of CLIP to New Domains at Test Time (2405.00754v2)

Published 1 May 2024 in cs.CV and cs.LG

Abstract: Pre-trained vision-LLMs (VLMs), exemplified by CLIP, demonstrate remarkable adaptability across zero-shot classification tasks without additional training. However, their performance diminishes in the presence of domain shifts. In this study, we introduce CLIP Adaptation duRing Test-Time (CLIPArTT), a fully test-time adaptation (TTA) approach for CLIP, which involves automatic text prompts construction during inference for their use as text supervision. Our method employs a unique, minimally invasive text prompt tuning process, wherein multiple predicted classes are aggregated into a single new text prompt, used as \emph{pseudo label} to re-classify inputs in a transductive manner. Additionally, we pioneer the standardization of TTA benchmarks (e.g., TENT) in the realm of VLMs. Our findings demonstrate that, without requiring additional transformations nor new trainable modules, CLIPArTT enhances performance dynamically across non-corrupted datasets such as CIFAR-100, corrupted datasets like CIFAR-100-C and ImageNet-C, alongside synthetic datasets such as VisDA-C. This research underscores the potential for improving VLMs' adaptability through novel test-time strategies, offering insights for robust performance across varied datasets and environments. The code can be found at: https://github.com/dosowiechi/CLIPArTT.git

Definition Search Book Streamline Icon: https://streamlinehq.com
References (30)
  1. Learning transferable visual models from natural language supervision. In International conference on machine learning, pages 8748–8763. PMLR, 2021.
  2. Scaling up visual and vision-language representation learning with noisy text supervision. In International conference on machine learning, pages 4904–4916. PMLR, 2021.
  3. Segment anything. arXiv preprint arXiv:2304.02643, 2023.
  4. Frozen clip models are efficient video learners. In European Conference on Computer Vision, pages 388–404. Springer, 2022.
  5. Audioclip: Extending clip to image, text and audio. In ICASSP 2022-2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 976–980. IEEE, 2022.
  6. Clip-driven universal model for organ segmentation and tumor detection. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 21152–21164, 2023.
  7. Padclip: Pseudo-labeling with adaptive debiasing in clip for unsupervised domain adaptation. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 16155–16165, October 2023.
  8. Test-time prompt tuning for zero-shot generalization in vision-language models. In S. Koyejo, S. Mohamed, A. Agarwal, D. Belgrave, K. Cho, and A. Oh, editors, Advances in Neural Information Processing Systems, volume 35, pages 14274–14289. Curran Associates, Inc., 2022. URL https://proceedings.neurips.cc/paper_files/paper/2022/file/5bf2b802e24106064dc547ae9283bb0c-Paper-Conference.pdf.
  9. Evaluating prediction-time batch normalization for robustness under covariate shift. arXiv:2006.10963 [cs, stat], January 2021. URL http://arxiv.org/abs/2006.10963. arXiv: 2006.10963.
  10. Tent: Fully test-time adaptation by entropy minimization. 2021. URL https://openreview.net/forum?id=uXl3bZLkr3c.
  11. Test-time adaptation via conjugate pseudo-labels. Advances in Neural Information Processing Systems, 2022.
  12. Memo: Test time robustness via adaptation and augmentation. In S. Koyejo, S. Mohamed, A. Agarwal, D. Belgrave, K. Cho, and A. Oh, editors, Advances in Neural Information Processing Systems, volume 35, pages 38629–38642. Curran Associates, Inc., 2022. URL https://proceedings.neurips.cc/paper_files/paper/2022/file/fc28053a08f59fccb48b11f2e31e81c7-Paper-Conference.pdf.
  13. Improving entropy-based test-time adaptation from a clustering view. arXiv preprint arXiv:2310.20327, 2023.
  14. Towards open-set test-time adaptation utilizing the wisdom of crowds in entropy minimization. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 16380–16389, October 2023.
  15. Do we really need to access the source data? source hypothesis transfer for unsupervised domain adaptation. 2020.
  16. Continual test-time domain adaptation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 7201–7211, 2022.
  17. Domain-agnostic test-time adaptation by prototypical training with auxiliary data. In NeurIPS 2021 Workshop on Distribution Shifts: Connecting Methods and Applications, 2021.
  18. Parameter-free online test-time adaptation. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 8344–8353, 2022.
  19. Test-time training with self-supervision for generalization under distribution shifts. In International Conference on Machine Learning (ICML), 2020.
  20. Ttt++: When does self-supervised test-time training fail or thrive? Neural Information Processing Systems (NeurIPS), 2021.
  21. Tttflow: Unsupervised test-time training with normalizing flow. In 2023 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), pages 2125–2126, Los Alamitos, CA, USA, jan 2023. IEEE Computer Society. doi: 10.1109/WACV56688.2023.00216. URL https://doi.ieeecomputersociety.org/10.1109/WACV56688.2023.00216.
  22. Test-time training with masked autoencoders. In Alice H. Oh, Alekh Agarwal, Danielle Belgrave, and Kyunghyun Cho, editors, Advances in Neural Information Processing Systems, 2022. URL https://openreview.net/forum?id=SHMi1b7sjXk.
  23. Clust3: Information invariant test-time training. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 6136–6145, 2023.
  24. NC-TTT: A Noise Contrastive Approach for Test-Time Training, April 2024. URL http://arxiv.org/abs/2404.08392. arXiv:2404.08392 [cs].
  25. A tutorial on conformal prediction. Journal of Machine Learning Research, 9(3), 2008.
  26. A gentle introduction to conformal prediction and distribution-free uncertainty quantification. arXiv preprint arXiv:2107.07511, 2021.
  27. Semi-supervised learning. 2006. Cambridge, Massachusettes: The MIT Press View Article, 2, 2006.
  28. Label propagation for deep semi-supervised learning. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 5070–5079, 2019.
  29. Benchmarking neural network robustness to common corruptions and perturbations. 2019.
  30. Visda: A synthetic-to-real benchmark for visual domain adaptation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, June 2018.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (8)
  1. Gustavo Adolfo Vargas Hakim (8 papers)
  2. David Osowiechi (12 papers)
  3. Mehrdad Noori (16 papers)
  4. Milad Cheraghalikhani (11 papers)
  5. Ali Bahri (14 papers)
  6. Moslem Yazdanpanah (10 papers)
  7. Ismail Ben Ayed (133 papers)
  8. Christian Desrosiers (75 papers)
Citations (2)