Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
80 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Improving CLIP Robustness with Knowledge Distillation and Self-Training (2309.10361v1)

Published 19 Sep 2023 in cs.CV, cs.LG, and cs.MM

Abstract: This paper examines the robustness of a multi-modal computer vision model, CLIP (Contrastive Language-Image Pretraining), in the context of unsupervised learning. The main objective is twofold: first, to evaluate the robustness of CLIP, and second, to explore strategies for augmenting its robustness. To achieve this, we introduce a novel approach named LP-CLIP. This technique involves the distillation of CLIP features through the incorporation of a linear probing layer positioned atop its encoding structure. This newly added layer is trained utilizing pseudo-labels produced by CLIP, coupled with a self-training strategy. The LP-CLIP technique offers a promising approach to enhance the robustness of CLIP without the need for annotations. By leveraging a simple linear probing layer, we aim to improve the model's ability to withstand various uncertainties and challenges commonly encountered in real-world scenarios. Importantly, our approach does not rely on annotated data, which makes it particularly valuable in situations where labeled data might be scarce or costly to obtain. Our proposed approach increases the robustness of CLIP with SOTA results compared to supervised technique on various datasets.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (57)
  1. Evaluating clip: towards characterization of broader capabilities and downstream implications. arXiv preprint arXiv:2108.02818, 2021.
  2. A simple zero-shot prompt weighting technique to improve prompt ensembling in text-image models. In ICML, 2023.
  3. Pseudo-labeling and confirmation bias in deep semi-supervised learning. In IJCNN, 2020.
  4. Weight uncertainty in neural network. In ICML, 2015.
  5. Language models are few-shot learners. NeurIPS, 2020.
  6. Reproducible scaling laws for contrastive language-image learning. In CVPR, 2023.
  7. An analysis of single-layer networks in unsupervised feature learning. In AISTATS, 2011.
  8. Charles Corbière. Robust deep learning for autonomous driving. arXiv preprint arXiv:2211.07772, 2022.
  9. Randaugment: Practical automated data augmentation with a reduced search space. In CVPR Workshops, 2020.
  10. Imagenet: A large-scale hierarchical image database. In CVPR, 2009.
  11. Improved regularization of convolutional neural networks with cutout. arXiv preprint arXiv:1708.04552, 2017.
  12. An image is worth 16x16 words: Transformers for image recognition at scale. In ICLR, 2020.
  13. Efficient and scalable bayesian neural nets with rank-1 factors. In ICML, 2020.
  14. Zero-shot out-of-distribution detection based on the pre-trained model clip. In AAAI, 2022.
  15. Exploring the limits of out-of-distribution detection. 2021.
  16. Robust semantic segmentation with superpixel-mix. arXiv preprint arXiv:2108.00968, 2021.
  17. One versus all for deep neural network incertitude (OVNNI) quantification. IEEE Access, 2022.
  18. Latent discriminant deterministic uncertainty. In ECCV, 2022.
  19. Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In ICML, 2016.
  20. A survey of uncertainty in deep neural networks. Artificial Intelligence Review, 2023.
  21. On calibration of modern neural networks. In ICML, 2017.
  22. Benchmarking neural network robustness to common corruptions and perturbations. In ICLR, 2019.
  23. A baseline for detecting misclassified and out-of-distribution examples in neural networks. In ICLR, 2017.
  24. Pretrained transformers improve out-of-distribution robustness. In ACL, 2020.
  25. Natural adversarial examples. In CVPR, 2021.
  26. Scaling up visual and vision-language representation learning with noisy text supervision. In ICML, 2021.
  27. An introduction to variational methods for graphical models. Machine Learning, 1999.
  28. What uncertainties do we need in bayesian deep learning for computer vision? In NeurIPS, 2017.
  29. Alex Krizhevsky. Learning multiple layers of features from tiny images. Technical report, MIT, 2009.
  30. Learning multiple layers of features from tiny images. 2009.
  31. Simple and scalable predictive uncertainty estimation using deep ensembles. In NeurIPS, 2017.
  32. Tiny imagenet visual recognition challenge. CS 231N, 7(7):3, 2015.
  33. Minhyeok Lee. A mathematical investigation of hallucination and creativity in gpt models. Mathematics, 2023.
  34. David JC MacKay. A practical bayesian framework for backpropagation networks. Neural computation, 1992.
  35. Visual classification via description from large language models. In ICLR, 2023.
  36. Radford M Neal. Bayesian learning for neural networks. PhD thesis, University of Toronto, 1995.
  37. Reading digits in natural images with unsupervised feature learning. 2011.
  38. Representation learning with contrastive predictive coding. arXiv preprint arXiv:1807.03748, 2018.
  39. Revisiting one-vs-all classifiers for predictive uncertainty and out-of-distribution detection in neural networks. arXiv preprint arXiv:2007.05134, 2020.
  40. Towards reliable misinformation mitigation: Generalization, uncertainty, and gpt-4. arXiv preprint arXiv:2305.14928, 2023.
  41. Combined scaling for zero-shot transfer learning. arXiv preprint arXiv:2111.10050, 2021.
  42. Meta pseudo labels. In CVPR, 2021.
  43. Learning transferable visual models from natural language supervision. In ICML, 2021.
  44. Language models are unsupervised multitask learners. OpenAI blog, 1(8):9, 2019.
  45. Laion-5b: An open large-scale dataset for training next generation image-text models. In NeurIPS, 2022.
  46. Laion-400m: Open dataset of clip-filtered 400 million image-text pairs. arXiv preprint arXiv:2111.02114, 2021.
  47. Evidential deep learning to quantify classification uncertainty. In NeurIPS, 2018.
  48. Test-time prompt tuning for zero-shot generalization in vision-language models. In NeurIPS, 2022.
  49. Clipood: Generalizing clip to out-of-distributions. arXiv preprint arXiv:2302.00864, 2023.
  50. Fixmatch: Simplifying semi-supervised learning with consistency and confidence. In NeurIPS, 2020.
  51. Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. In NeurIPS, 2017.
  52. YFCC100M: The new data in multimedia research. Communications of the ACM, 2016.
  53. Llama: Open and efficient foundation language models. arXiv preprint arXiv:2302.13971, 2023.
  54. ViM: Out-of-distribution with virtual-logit matching. In CVPR, 2022.
  55. Lit: Zero-shot transfer with locked-image text tuning. In CVPR, 2022.
  56. Conditional prompt learning for vision-language models. In CVPR, 2022.
  57. Learning to prompt for vision-language models. IJCV, 2022.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (4)
  1. Clement Laroudie (1 paper)
  2. Andrei Bursuc (55 papers)
  3. Mai Lan Ha (4 papers)
  4. Gianni Franchi (36 papers)
Citations (5)