Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
38 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

An Image Is Worth 1000 Lies: Adversarial Transferability across Prompts on Vision-Language Models (2403.09766v1)

Published 14 Mar 2024 in cs.CV

Abstract: Different from traditional task-specific vision models, recent large VLMs can readily adapt to different vision tasks by simply using different textual instructions, i.e., prompts. However, a well-known concern about traditional task-specific vision models is that they can be misled by imperceptible adversarial perturbations. Furthermore, the concern is exacerbated by the phenomenon that the same adversarial perturbations can fool different task-specific models. Given that VLMs rely on prompts to adapt to different tasks, an intriguing question emerges: Can a single adversarial image mislead all predictions of VLMs when a thousand different prompts are given? This question essentially introduces a novel perspective on adversarial transferability: cross-prompt adversarial transferability. In this work, we propose the Cross-Prompt Attack (CroPA). This proposed method updates the visual adversarial perturbation with learnable prompts, which are designed to counteract the misleading effects of the adversarial image. By doing this, CroPA significantly improves the transferability of adversarial examples across prompts. Extensive experiments are conducted to verify the strong cross-prompt adversarial transferability of CroPA with prevalent VLMs including Flamingo, BLIP-2, and InstructBLIP in various different tasks. Our source code is available at \url{https://github.com/Haochen-Luo/CroPA}.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (45)
  1. Controlled caption generation for images through adversarial attacks. arXiv preprint arXiv:2107.03050, 2021.
  2. Flamingo: a visual language model for few-shot learning. Advances in Neural Information Processing Systems, 35:23716–23736, 2022.
  3. Openflamingo: An open-source framework for training large autoregressive vision-language models. arXiv preprint arXiv:2308.01390, 2023.
  4. Attacking visual language grounding with adversarial examples: A case study on neural image captioning. arXiv preprint arXiv:1712.02051, 2017a.
  5. Zoo: Zeroth order optimization based black-box attacks to deep neural networks without training substitute models. In Proceedings of the 10th ACM workshop on artificial intelligence and security, pp.  15–26, 2017b.
  6. Instructblip: Towards general-purpose vision-language models with instruction tuning. ArXiv, abs/2305.06500, 2023. URL https://api.semanticscholar.org/CorpusID:258615266.
  7. Explaining and harnessing adversarial examples. arXiv preprint arXiv:1412.6572, 2014.
  8. Making the v in vqa matter: Elevating the role of image understanding in visual question answering. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp.  6904–6913, 2017.
  9. Segpgd: An effective and efficient adversarial attack for evaluating and boosting segmentation robustness. In European Conference on Computer Vision, pp.  308–325. Springer, 2022.
  10. A systematic survey of prompt engineering on vision-language foundation models. arXiv preprint arXiv:2307.12980, 2023a.
  11. A survey on transferability of adversarial examples across deep neural networks. arXiv preprint arXiv:2310.17626, 2023b.
  12. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp.  770–778, 2016.
  13. Black-box adversarial attacks with limited queries and information. In International conference on machine learning, pp. 2137–2146. PMLR, 2018.
  14. On the efficacy of adversarial data collection for question answering: Results from a large-scale randomized study. arXiv preprint arXiv:2106.00872, 2021.
  15. longhorns at dadc 2022: How many linguists does it take to fool a question answering model? a systematic approach to adversarial attacks. arXiv preprint arXiv:2206.14729, 2022.
  16. Blip: Bootstrapping language-image pre-training for unified vision-language understanding and generation. In International Conference on Machine Learning, pp. 12888–12900. PMLR, 2022.
  17. Blip-2: Bootstrapping language-image pre-training with frozen image encoders and large language models. arXiv preprint arXiv:2301.12597, 2023.
  18. Adversarial vqa: A new benchmark for evaluating the robustness of vqa models. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pp.  2042–2051, 2021.
  19. Microsoft coco: Common objects in context. In Computer Vision–ECCV 2014: 13th European Conference, Zurich, Switzerland, September 6-12, 2014, Proceedings, Part V 13, pp. 740–755. Springer, 2014.
  20. Delving into transferable adversarial examples and black-box attacks. arXiv preprint arXiv:1611.02770, 2016.
  21. Enhancing cross-task black-box transferability of adversarial examples with dispersion reduction. In Proceedings of the IEEE/CVF conference on Computer Vision and Pattern Recognition, pp.  940–949, 2020.
  22. Towards deep learning models resistant to adversarial attacks. arXiv preprint arXiv:1706.06083, 2017.
  23. Universal adversarial perturbations. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp.  1765–1773, 2017.
  24. Fast feature fool: A data independent approach to universal adversarial perturbations. arXiv preprint arXiv:1707.05572, 2017.
  25. Cross-domain transferability of adversarial perturbations. Advances in Neural Information Processing Systems, 32, 2019.
  26. Task-generalizable adversarial attack based on perceptual metric. arXiv preprint arXiv:1811.09020, 2018.
  27. Transferability in machine learning: from phenomena to black-box attacks using adversarial samples. arXiv preprint arXiv:1605.07277, 2016.
  28. Mathieu Salzmann et al. Learning transferable adversarial perturbations. Advances in Neural Information Processing Systems, 34:13950–13962, 2021.
  29. Human-adversarial visual question answering. Advances in Neural Information Processing Systems, 34:20346–20359, 2021.
  30. Intriguing properties of neural networks. In arXiv preprint arXiv:1312.6199, 2013.
  31. The space of transferable adversarial examples. arXiv preprint arXiv:1704.03453, 2017.
  32. Laurens Van der Maaten and Geoffrey Hinton. Visualizing data using t-sne. Journal of machine learning research, 9(11), 2008.
  33. Towards efficient adversarial training on vision transformers. In European Conference on Computer Vision, pp.  307–325. Springer, 2022.
  34. Fooling vision and language models despite localization and attention mechanism. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp.  4951–4961, 2018.
  35. Exact adversarial attack to image captioning via structured output learning with latent variables. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp.  4135–4144, 2019.
  36. Auto-encoding scene graphs for image captioning. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp.  10685–10694, 2019.
  37. Exploring visual relationship for image captioning. In Proceedings of the European conference on computer vision (ECCV), pp.  684–699, 2018.
  38. Reliable evaluation of adversarial transferability. arXiv preprint arXiv:2306.08565, 2023.
  39. Towards adversarial attack on vision-language pre-training models. In Proceedings of the 30th ACM International Conference on Multimedia, pp.  5005–5013, 2022a.
  40. Fooled by imagination: Adversarial attack to image captioning via perturbation in complex domain. In 2020 IEEE International Conference on Multimedia and Expo (ICME), pp.  1–6. IEEE, 2020.
  41. Opt: Open pre-trained transformer language models. arXiv preprint arXiv:2205.01068, 2022b.
  42. On evaluating adversarial robustness of large vision-language models. arXiv preprint arXiv:2305.16934, 2023.
  43. Judging llm-as-a-judge with mt-bench and chatbot arena. arXiv preprint arXiv:2306.05685, 2023.
  44. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:2304.10592, 2023.
  45. Universal and transferable adversarial attacks on aligned language models. arXiv preprint arXiv:2307.15043, 2023.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (4)
  1. Haochen Luo (8 papers)
  2. Jindong Gu (101 papers)
  3. Fengyuan Liu (15 papers)
  4. Philip Torr (172 papers)
Citations (16)
X Twitter Logo Streamline Icon: https://streamlinehq.com