On the Robustness of Large Multimodal Models Against Image Adversarial Attacks (2312.03777v2)
Abstract: Recent advances in instruction tuning have led to the development of State-of-the-Art Large Multimodal Models (LMMs). Given the novelty of these models, the impact of visual adversarial attacks on LMMs has not been thoroughly examined. We conduct a comprehensive study of the robustness of various LMMs against different adversarial attacks, evaluated across tasks including image classification, image captioning, and Visual Question Answer (VQA). We find that in general LMMs are not robust to visual adversarial inputs. However, our findings suggest that context provided to the model via prompts, such as questions in a QA pair helps to mitigate the effects of visual adversarial inputs. Notably, the LMMs evaluated demonstrated remarkable resilience to such attacks on the ScienceQA task with only an 8.10% drop in performance compared to their visual counterparts which dropped 99.73%. We also propose a new approach to real-world image classification which we term query decomposition. By incorporating existence queries into our input prompt we observe diminished attack effectiveness and improvements in image classification accuracy. This research highlights a previously under-explored facet of LMM robustness and sets the stage for future work aimed at strengthening the resilience of multimodal systems in adversarial environments.
- Flamingo: a visual language model for few-shot learning. In Advances in Neural Information Processing Systems, 2022.
- Generating natural language adversarial examples. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pages 2890–2896, Brussels, Belgium, 2018. Association for Computational Linguistics.
- Obfuscated gradients give a false sense of security: Circumventing defenses to adversarial examples. In Proceedings of the 35th International Conference on Machine Learning, ICML 2018, 2018.
- Qwen technical report. arXiv preprint arXiv:2309.16609, 2023.
- Evasion Attacks against Machine Learning at Test Time, page 387–402. Springer Berlin Heidelberg, 2013.
- Towards evaluating the robustness of neural networks. In 2017 IEEE Symposium on Security and Privacy (SP), pages 39–57, 2017.
- Are aligned neural networks adversarially aligned?, 2023.
- Shikra: Unleashing multimodal llm’s referential dialogue magic, 2023.
- Microsoft coco captions: Data collection and evaluation server, 2015.
- Vicuna: An open-source chatbot impressing gpt-4 with 90%* chatgpt quality, 2023.
- Scaling instruction-finetuned language models. arXiv preprint arXiv:2210.11416, 2022.
- Reliable evaluation of adversarial robustness with an ensemble of diverse parameter-free attacks. In Proceedings of the 37th International Conference on Machine Learning, pages 2206–2216. PMLR, 2020.
- InstructBLIP: Towards general-purpose vision-language models with instruction tuning. In Thirty-seventh Conference on Neural Information Processing Systems, 2023.
- Imagenet: A large-scale hierarchical image database. In 2009 IEEE conference on computer vision and pattern recognition, pages 248–255. Ieee, 2009.
- HotFlip: White-box adversarial examples for text classification. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), pages 31–36, Melbourne, Australia, 2018. Association for Computational Linguistics.
- Mme: A comprehensive evaluation benchmark for multimodal large language models. arXiv preprint arXiv:2306.13394, 2023.
- Making the v in vqa matter: Elevating the role of image understanding in visual question answering. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 6904–6913, 2017.
- Deep hash distillation for image retrieval. In Proceedings of the European Conference on Computer Vision (ECCV), 2022.
- Adversarial examples for evaluating reading comprehension systems. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, pages 2021–2031, Copenhagen, Denmark, 2017. Association for Computational Linguistics.
- Hoki Kim. Torchattacks: A pytorch repository for adversarial attacks. arXiv preprint arXiv:2010.01950, 2020.
- Lisa: Reasoning segmentation via large language model. arXiv preprint arXiv:2308.00692, 2023.
- BLIP-2: bootstrapping language-image pre-training with frozen image encoders and large language models. In ICML, 2023a.
- Adversarial vqa: A new benchmark for evaluating the robustness of vqa models. In 2021 IEEE/CVF International Conference on Computer Vision (ICCV), pages 2022–2031, 2021.
- Evaluating object hallucination in large vision-language models. arXiv preprint arXiv:2305.10355, 2023b.
- Microsoft coco: Common objects in context. In Computer Vision–ECCV 2014: 13th European Conference, Zurich, Switzerland, September 6-12, 2014, Proceedings, Part V 13, pages 740–755. Springer, 2014.
- Improved baselines with visual instruction tuning, 2023a.
- Visual instruction tuning. In Thirty-seventh Conference on Neural Information Processing Systems, 2023b.
- Jailbreaking chatgpt via prompt engineering: An empirical study, 2023c.
- Learn to explain: Multimodal reasoning via thought chains for science question answering. Advances in Neural Information Processing Systems, 35:2507–2521, 2022.
- Towards deep learning models resistant to adversarial attacks. In International Conference on Learning Representations, 2018.
- Practical black-box attacks against machine learning. page 506–519, New York, NY, USA, 2017. Association for Computing Machinery.
- Visual adversarial examples jailbreak aligned large language models, 2023.
- Tricking llms into disobedience: Understanding, analyzing, and preventing jailbreaks, 2023.
- Towards vqa models that can read. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 8317–8326, 2019.
- One pixel attack for fooling deep neural networks. IEEE Transactions on Evolutionary Computation, 23:828–841, 2017.
- Intriguing properties of neural networks. In International Conference on Learning Representations, 2014.
- Universal adversarial triggers for attacking and analyzing NLP. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 2153–2162, Hong Kong, China, 2019. Association for Computational Linguistics.
- Jailbroken: How does llm safety training fail?, 2023.
- Finetuned language models are zero-shot learners. In International Conference on Learning Representations, 2022.
- Mitigating adversarial effects through randomization. In International Conference on Learning Representations, 2018.
- Minigpt-4: Enhancing vision-language understanding with advanced large language models, 2023.
- Universal and transferable adversarial attacks on aligned language models. arXiv preprint arXiv:2307.15043, 2023.
- Alejandro Aparcedo (2 papers)
- Young Kyun Jang (12 papers)
- Ser-Nam Lim (116 papers)
- Xuanming Cui (3 papers)