Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
139 tokens/sec
GPT-4o
47 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Efficient Generation of Targeted and Transferable Adversarial Examples for Vision-Language Models Via Diffusion Models (2404.10335v4)

Published 16 Apr 2024 in cs.CV

Abstract: Adversarial attacks, particularly \textbf{targeted} transfer-based attacks, can be used to assess the adversarial robustness of large visual-LLMs (VLMs), allowing for a more thorough examination of potential security flaws before deployment. However, previous transfer-based adversarial attacks incur high costs due to high iteration counts and complex method structure. Furthermore, due to the unnaturalness of adversarial semantics, the generated adversarial examples have low transferability. These issues limit the utility of existing methods for assessing robustness. To address these issues, we propose AdvDiffVLM, which uses diffusion models to generate natural, unrestricted and targeted adversarial examples via score matching. Specifically, AdvDiffVLM uses Adaptive Ensemble Gradient Estimation to modify the score during the diffusion model's reverse generation process, ensuring that the produced adversarial examples have natural adversarial targeted semantics, which improves their transferability. Simultaneously, to improve the quality of adversarial examples, we use the GradCAM-guided Mask method to disperse adversarial semantics throughout the image rather than concentrating them in a single area. Finally, AdvDiffVLM embeds more target semantics into adversarial examples after multiple iterations. Experimental results show that our method generates adversarial examples 5x to 10x faster than state-of-the-art transfer-based adversarial attacks while maintaining higher quality adversarial examples. Furthermore, compared to previous transfer-based adversarial attacks, the adversarial examples generated by our method have better transferability. Notably, AdvDiffVLM can successfully attack a variety of commercial VLMs in a black-box environment, including GPT-4V.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (33)
  1. Language model agnostic gray-box adversarial attack on image captioning. IEEE Transactions on Information Forensics and Security 18 (2022), 626–638.
  2. One transformer fits all distributions in multi-modal diffusion at scale. In International Conference on Machine Learning. 1692–1717.
  3. Ensemble-based blackbox attacks on dense prediction. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 4045–4055.
  4. Rethinking Model Ensemble in Transfer-based Adversarial Attacks. In The Twelfth International Conference on Learning Representations. https://openreview.net/forum?id=AcJrSoArlh
  5. Advdiffuser: Natural adversarial example synthesis with diffusion models. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 4562–4572.
  6. How robust is Google’s bard to adversarial image attacks? arXiv preprint arXiv:2309.11751 (2023).
  7. Boosting adversarial attacks with momentum. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 9185–9193.
  8. From images to textual prompts: Zero-shot visual question answering with frozen large language models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 10867–10877.
  9. Gans trained by a two time-scale update rule converge to a local nash equilibrium. In Advances in Neural Information Processing Systems, Vol. 30. 6629–6640.
  10. Denoising diffusion probabilistic models. Advances in Neural Information Processing Systems 33 (2020), 6840–6851.
  11. Raz Lapid and Moshe Sipper. 2023. I see dead people: Gray-box adversarial attack on image-to-text models. In Proceedings of the European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases.
  12. BLIP-2: Bootstrapping language-image pre-training with frozen image encoders and large language models. In International Conference on Machine Learning. 1–13.
  13. BLIP: Bootstrapping language-image pre-training for unified vision-language understanding and generation. In International Conference on Machine Learning. PMLR, 12888–12900.
  14. Visual instruction tuning. In Thirty-seventh Conference on Neural Information Processing Systems. https://openreview.net/forum?id=w0H2xGHlkw
  15. Frequency domain model augmentation for adversarial attack. In European Conference on Computer Vision. Springer, 549–566.
  16. Calvin Luo. 2022. Understanding diffusion models: A unified perspective. arXiv preprint arXiv:2208.11970 (2022).
  17. Towards deep learning models resistant to adversarial attacks. In International Conference on Learning Representations. https://openreview.net/forum?id=rJzIBfZAb
  18. Blind/referenceless image spatial quality evaluator. In 2011 conference record of the forty fifth asilomar conference on signals, systems and computers (ASILOMAR). IEEE, 723–727.
  19. Diffusion models for adversarial purification. In International Conference on Machine Learning. 1–23.
  20. Learning transferable visual models from natural language supervision. In International Conference on Machine Learning. PMLR, 8748–8763.
  21. High-resolution image synthesis with latent diffusion models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 10684–10695.
  22. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE International Conference on Computer Vision. 618–626.
  23. Colorfool: Semantic adversarial colorization. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 1151–1160.
  24. Denoising diffusion implicit models. In International Conference on Learning Representations. https://openreview.net/forum?id=St1giarCHLP
  25. Constructing unrestricted adversarial examples with generative models. In Advances in Neural Information Processing Systems, Vol. 31. 8322–8333.
  26. Score-based generative modeling through stochastic differential equations. arXiv preprint arXiv:2011.13456 (2020).
  27. Structure invariant transformation for better adversarial transferability. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 4607–4619.
  28. Image quality assessment: from error visibility to structural similarity. IEEE transactions on image processing 13, 4 (2004), 600–612.
  29. Stochastic variance reduced ensemble adversarial attack for boosting the adversarial transferability. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 14983–14992.
  30. The unreasonable effectiveness of deep features as a perceptual metric. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 586–595.
  31. On evaluating adversarial robustness of large vision-language models. In Thirty-seventh Conference on Neural Information Processing Systems.
  32. Generating natural adversarial examples. In International Conference on Learning Representations. https://openreview.net/forum?id=H1BLjgZCb
  33. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:2304.10592 (2023).
Citations (1)

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com