Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
38 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Robust CLIP: Unsupervised Adversarial Fine-Tuning of Vision Embeddings for Robust Large Vision-Language Models (2402.12336v2)

Published 19 Feb 2024 in cs.LG, cs.AI, cs.CV, and stat.ML

Abstract: Multi-modal foundation models like OpenFlamingo, LLaVA, and GPT-4 are increasingly used for various real-world tasks. Prior work has shown that these models are highly vulnerable to adversarial attacks on the vision modality. These attacks can be leveraged to spread fake information or defraud users, and thus pose a significant risk, which makes the robustness of large multi-modal foundation models a pressing problem. The CLIP model, or one of its variants, is used as a frozen vision encoder in many large vision-LLMs (LVLMs), e.g. LLaVA and OpenFlamingo. We propose an unsupervised adversarial fine-tuning scheme to obtain a robust CLIP vision encoder, which yields robustness on all vision down-stream tasks (LVLMs, zero-shot classification) that rely on CLIP. In particular, we show that stealth-attacks on users of LVLMs by a malicious third party providing manipulated images are no longer possible once one replaces the original CLIP model with our robust one. No retraining or fine-tuning of the down-stream LVLMs is required. The code and robust models are available at https://github.com/chs20/RobustVLM

Definition Search Book Streamline Icon: https://streamlinehq.com
References (64)
  1. Flamingo: a visual language model for few-shot learning. NeurIPS, 2022.
  2. VQA: visual question answering. In ICCV, 2015.
  3. OpenFlamingo: an open-source framework for training large autoregressive vision-language models. arXiv preprint arXiv:2308.01390, 2023.
  4. (ab) using images and sounds for indirect instruction injection in multi-modal LLMs. arXiv:2307.10490, 2023.
  5. Image hijacking: Adversarial images can control generative models at runtime. arXiv preprint arXiv:2309.00236, 2023.
  6. Are aligned neural networks adversarially aligned? arXiv:2306.15447, 2023.
  7. Shikra: Unleashing multimodal LLM’s referential dialogue magic. arXiv:2306.15195, 2023.
  8. A simple framework for contrastive learning of visual representations. In ICML, 2020.
  9. Reproducible scaling laws for contrastive language-image learning. In CVPR, 2023.
  10. Vicuna: An open-source chatbot impressing gpt-4 with 90%* chatgpt quality, 2023. https://lmsys.org/blog/2023-03-30-vicuna/.
  11. Describing textures in the wild. In CVPR, 2014.
  12. An analysis of single-layer networks in unsupervised feature learning. In AISTATS, 2011.
  13. Reliable evaluation of adversarial robustness with an ensemble of diverse parameter-free attacks. In ICML, 2020.
  14. Imagenet: A large-scale hierarchical image database. In CVPR, 2009.
  15. How robust is google’s bard to adversarial image attacks? arXiv preprint arXiv:2309.11751, 2023a.
  16. How robust is google’s bard to adversarial image attacks? arXiv:2309.11751, 2023b.
  17. Hotflip: White-box adversarial examples for text classification. In ACL, 2018.
  18. When does contrastive learning preserve adversarial robustness from pretraining to finetuning? NeurIPS, 2021.
  19. Explaining and harnessing adversarial examples. In ICLR, 2015.
  20. Self-supervised adversarial robustness for the low-label, high-data regime. In ICLR, 2020.
  21. Making the v in vqa matter: Elevating the role of image understanding in visual question answering. In CVPR, 2017.
  22. Caltech-256 object category dataset. 2007.
  23. Bootstrap your own latent-a new approach to self-supervised learning. NeurIPS, 2020.
  24. Agent smith: A single image can jailbreak one million multimodal llm agents exponentially fast. arXiv preprint arXiv:2402.08567, 2024.
  25. Eurosat: A novel dataset and deep learning benchmark for land use and land cover classification. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 12(7), 2019.
  26. The many faces of robustness: A critical analysis of out-of-distribution generalization. In ICCV, 2021.
  27. Adversarial examples for evaluating reading comprehension systems. In EMNLP, 2017.
  28. Robust pre-training by adversarial contrastive learning. In NeurIPS, 2020.
  29. Adversarial self-supervised contrastive learning. In NeurIPS, 2020.
  30. Grounding language models to images for multimodal inputs and outputs. In ICML, 2023.
  31. 3d object representations for fine-grained categorization. In Proceedings of the IEEE international conference on computer vision workshops, 2013.
  32. Krizhevsky, A. Learning multiple layers of features from tiny images. Technical report, 2009.
  33. OBELICS: An open web-scale filtered dataset of interleaved image-text documents. In NeurIPS, 2023. URL https://openreview.net/forum?id=SKN2hflBIZ.
  34. Blip-2: Bootstrapping language-image pre-training with frozen image encoders and large language models. arXiv:2301.12597, 2023a.
  35. Evaluating object hallucination in large vision-language models. arXiv preprint arXiv:2305.10355, 2023b.
  36. Microsoft COCO: common objects in context. In ECCV (5), 2014.
  37. Improved baselines with visual instruction tuning. arXiv:2310.03744, 2023a.
  38. Visual instruction tuning. In NeurIPS, 2023b.
  39. Decoupled weight decay regularization. In ICLR, 2018.
  40. Learn to explain: Multimodal reasoning via thought chains for science question answering. In NeurIPS, 2022.
  41. Towards deep learning models resistant to adversarial attacks. In ICLR, 2018.
  42. Fine-grained visual classification of aircraft, 2013.
  43. Understanding zero-shot adversarial robustness for large-scale models. In ICLR, 2023.
  44. MosaicML. Introducing mpt-7b: A new standard for open-source, commercially usable LLMs, 2023. URL www.mosaicml.com/blog/mpt-7b. www.mosaicml.com/blog/mpt-7b, accessed: 2023-08-02.
  45. Automated flower classification over a large number of classes. In 2008 Sixth Indian conference on computer vision, graphics & image processing. IEEE, 2008.
  46. Cats and dogs. In CVPR, 2012.
  47. Flickr30k entities: Collecting region-to-phrase correspondences for richer image-to-sentence models. In ICCV, 2015.
  48. Visual adversarial examples jailbreak large language models. arXiv:2306.13213, 2023.
  49. Learning transferable visual models from natural language supervision. In ICML, 2021.
  50. On the adversarial robustness of multi-modal foundation models. In ICCV Workshop on Adversarial Robustness In the Real World, 2023.
  51. Jailbreak in pieces: Compositional adversarial attacks on multi-modal language models. arXiv preprint arXiv:2307.14539, 2023.
  52. ” do anything now”: Characterizing and evaluating in-the-wild jailbreak prompts on large language models. arXiv:2308.03825, 2023.
  53. Towards vqa models that can read. In CVPR, 2019.
  54. Revisiting adversarial training for imagenet: Architectures, training and generalization across threat models. In NeurIPS, 2023.
  55. Wit: Wikipedia-based image text dataset for multimodal multilingual machine learning. In Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval, 2021.
  56. Intriguing properties of neural networks. In ICLR, 2014.
  57. Llama: Open and efficient foundation language models. arXiv:2302.13971, 2023.
  58. Cider: Consensus-based image description evaluation. In CVPR, 2015.
  59. Rotation equivariant cnns for digital pathology. In MICCAI. Springer, 2018.
  60. Learning robust global representations by penalizing local predictive power. In NeurIPS, 2019.
  61. Decoupled adversarial contrastive learning for self-supervised adversarial robustness. In ECCV, 2022.
  62. On evaluating adversarial robustness of large vision-language models. In NeurIPS, 2023.
  63. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv:2304.10592, 2023.
  64. Universal and transferable adversarial attacks on aligned language models. arXiv:2307.15043, 2023.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (4)
  1. Christian Schlarmann (5 papers)
  2. Naman Deep Singh (5 papers)
  3. Francesco Croce (34 papers)
  4. Matthias Hein (113 papers)
Citations (15)
X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets