Papers
Topics
Authors
Recent
2000 character limit reached

Stop Reasoning! When Multimodal LLM with Chain-of-Thought Reasoning Meets Adversarial Image

Published 22 Feb 2024 in cs.CV, cs.AI, cs.CR, and cs.LG | (2402.14899v3)

Abstract: Multimodal LLMs (MLLMs) with a great ability of text and image understanding have received great attention. To achieve better reasoning with MLLMs, Chain-of-Thought (CoT) reasoning has been widely explored, which further promotes MLLMs' explainability by giving intermediate reasoning steps. Despite the strong power demonstrated by MLLMs in multimodal reasoning, recent studies show that MLLMs still suffer from adversarial images. This raises the following open questions: Does CoT also enhance the adversarial robustness of MLLMs? What do the intermediate reasoning steps of CoT entail under adversarial attacks? To answer these questions, we first generalize existing attacks to CoT-based inferences by attacking the two main components, i.e., rationale and answer. We find that CoT indeed improves MLLMs' adversarial robustness against the existing attack methods by leveraging the multi-step reasoning process, but not substantially. Based on our findings, we further propose a novel attack method, termed as stop-reasoning attack, that attacks the model while bypassing the CoT reasoning process. Experiments on three MLLMs and two visual reasoning datasets verify the effectiveness of our proposed method. We show that stop-reasoning attack can result in misled predictions and outperform baseline attacks by a significant margin.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (39)
  1. Threat of adversarial attacks on deep learning in computer vision: A survey. Ieee Access, 6:14410–14430, 2018.
  2. Obfuscated gradients give a false sense of security: Circumventing defenses to adversarial examples. In International conference on machine learning, pp. 274–283. PMLR, 2018.
  3. OpenFlamingo: An Open-Source Framework for Training Large Autoregressive Vision-Language Models, August 2023. URL http://arxiv.org/abs/2308.01390. arXiv:2308.01390 [cs].
  4. Complex query answering on eventuality knowledge graph with implicit logical constraints. arXiv preprint arXiv:2305.19068, 2023.
  5. Towards evaluating the robustness of neural networks. In 2017 ieee symposium on security and privacy (sp), pp. 39–57. Ieee, 2017.
  6. On Evaluating Adversarial Robustness, February 2019. URL http://arxiv.org/abs/1902.06705. arXiv:1902.06705 [cs, stat].
  7. Are aligned neural networks adversarially aligned? arXiv preprint arXiv:2306.15447, 2023.
  8. Elements of Information Theory. Wiley, 2012. ISBN 9781118585771. URL https://books.google.de/books?id=VWq5GG6ycxMC.
  9. Large-scale adversarial training for vision-and-language representation learning. Advances in Neural Information Processing Systems, 33:6616–6628, 2020.
  10. Inducing high energy-latency of large vision-language models with verbose images. arXiv preprint arXiv:2401.11170, 2024.
  11. Explaining and harnessing adversarial examples. arXiv preprint arXiv:1412.6572, 2014.
  12. Saliency methods for explaining adversarial attacks. arXiv preprint arXiv:1908.08413, 2019.
  13. Effective and efficient vote attack on capsule networks. arXiv preprint arXiv:2102.10055, 2021.
  14. Ot-attack: Enhancing adversarial transferability of vision-language models via optimal transport optimization. arXiv preprint arXiv:2312.04403, 2023.
  15. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp.  770–778, 2016.
  16. Multi-modal latent space learning for chain-of-thought reasoning in language models. arXiv preprint arXiv:2312.08762, 2023.
  17. Large Language Models are Zero-Shot Reasoners, January 2023. URL http://arxiv.org/abs/2205.11916. arXiv:2205.11916 [cs].
  18. Certifying llm safety against adversarial prompting. arXiv preprint arXiv:2309.02705, 2023.
  19. Interpretable deep learning: Interpretation, interpretability, trustworthiness, and beyond. Knowledge and Information Systems, 64(12):3197–3234, 2022.
  20. Improved Baselines with Visual Instruction Tuning, October 2023. URL http://arxiv.org/abs/2310.03744. arXiv:2310.03744 [cs].
  21. Learn to Explain: Multimodal Reasoning via Thought Chains for Science Question Answering, October 2022. URL http://arxiv.org/abs/2209.09513. arXiv:2209.09513 [cs].
  22. An image is worth 1000 lies: Transferability of adversarial images across prompts on vision-language models. In To appear in ICLR, 2024. URL https://openreview.net/forum?id=nc5GgFAvtk.
  23. Towards deep learning models resistant to adversarial attacks. arXiv preprint arXiv:1706.06083, 2017.
  24. Towards Deep Learning Models Resistant to Adversarial Attacks, September 2019. URL http://arxiv.org/abs/1706.06083. arXiv:1706.06083 [cs, stat].
  25. Gpt-4 technical report, 2023.
  26. Visual adversarial examples jailbreak aligned large language models. In The Second Workshop on New Frontiers in Adversarial Machine Learning, volume 1, 2023.
  27. Imagenet large scale visual recognition challenge, 2015.
  28. A-OKVQA: A Benchmark for Visual Question Answering using World Knowledge, June 2022. URL http://arxiv.org/abs/2206.01718. arXiv:2206.01718 [cs].
  29. Large language models encode clinical knowledge. arXiv preprint arXiv:2212.13138, 2022.
  30. Intriguing properties of neural networks. arXiv preprint arXiv:1312.6199, 2013.
  31. Llama 2: Open Foundation and Fine-Tuned Chat Models, July 2023. URL http://arxiv.org/abs/2307.09288. arXiv:2307.09288 [cs].
  32. Self-consistency improves chain of thought reasoning in language models. In The Eleventh International Conference on Learning Representations, 2022.
  33. Chain-of-Thought Prompting Elicits Reasoning in Large Language Models, January 2023. URL http://arxiv.org/abs/2201.11903. arXiv:2201.11903 [cs].
  34. Analyzing chain-of-thought prompting in large language models via gradient-based feature attributions. arXiv preprint arXiv:2307.13339, 2023.
  35. Automatic Chain of Thought Prompting in Large Language Models, October 2022. URL http://arxiv.org/abs/2210.03493. arXiv:2210.03493 [cs].
  36. Multimodal Chain-of-Thought Reasoning in Language Models, February 2023. URL http://arxiv.org/abs/2302.00923. arXiv:2302.00923 [cs].
  37. On evaluating adversarial robustness of large vision-language models. arXiv preprint arXiv:2305.16934, 2023.
  38. MiniGPT-4: Enhancing Vision-Language Understanding with Advanced Large Language Models, April 2023. URL http://arxiv.org/abs/2304.10592. arXiv:2304.10592 [cs].
  39. Universal and transferable adversarial attacks on aligned language models. arXiv preprint arXiv:2307.15043, 2023.
Citations (10)

Summary

  • The paper introduces a novel stop-reasoning attack that disrupts chain-of-thought reasoning in multimodal large language models.
  • The methodology employs dual-component attacks on both answer and rationale components, revealing critical vulnerabilities.
  • Experimental results show that stop-reasoning attacks substantially weaken the nominal robustness gained from chain-of-thought strategies.

Evaluating Adversarial Robustness in Multimodal LLMs with Chain-of-Thought Reasoning

Multimodal LLMs (MLLMs) have demonstrated impressive capabilities in image understanding by combining visual processing with LLMs. However, they remain susceptible to adversarial attacks similar to conventional vision models, especially when chain-of-thought (CoT) reasoning is integrated. This study critically examines MLLMs' robustness against adversarial images and introduces a novel "stop-reasoning" attack technique. While CoT reasoning marginally enhances robustness against existing adversarial strategies, the stop-reasoning attack effectively neutralizes these improvements by halting the CoT-induced reasoning process.

Introduction

MLLMs enrich image understanding by harmonizing vision and text inputs to generate coherent image-to-text outputs. Despite advancements, such models suffer from heightened vulnerability to adversarial examples, which exploit model weaknesses using imperceptible perturbations. CoT reasoning improves model transparency by elucidating auxiliary reasoning steps, potentially augmenting adversarial robustness. However, the degree of robustness enhancement solely due to CoT reasoning remains underexplored. This investigation centers around whether CoT reasoning genuinely fortifies MLLMs against adversarial intrusions and introduces a specific attack aimed at exploiting CoT reasoning weaknesses.

To address these concerns, our study examines:

  • The influence of CoT reasoning on the adversarial robustness of MLLMs and whether certain attack modalities effectively undermine models using CoT.
  • Insights into CoT-generated rationales when MLLMs mispredict answers in the presence of adversarial images. Figure 1

    Figure 1: The chain of thought reasoning provides an explanation for the incorrect predictions made by multimodal LLMs when confronted with adversarial images. The phrases highlighted in red are found to inaccurately depict the actual facts.

Methodology: Attack Strategy Design

The research formulates a dual-component attack targeting both the rationale and answer components of CoT reasoning:

  • Answer Attack: Utilizing cross-entropy loss between predicted and true answers, this approach maximizes divergence using Projected Gradient Descent, fostering incorrect answer prediction.
  • Rationale Attack: This strategy quantifies KL divergence between clean and adversarial rationale, encouraging model missteps in rationale. Concurrently, adversarial image perturbations aim to disrupt coherent rationale outcomes.
  • Stop-Reasoning Attack: This novel approach seeks to interrupt rationale formation by imposing an answer template, effectively sidestepping embedded reasoning paths and direct answers without rational justification. Figure 2

    Figure 2: Attack pipeline diagram. First, the original textual question (with choices) and the input image (containing a horse carriage) are given into an MLLM and a clean prediction (at the top) is generated. Then, an adversarial prediction is generated with the perturbed image and the original text input.

Experimental Results: Comparative Analysis

Across diverse MLLMs and datasets, we observed:

  • CoT reasoning marginally augments model robustness, yet primarily under answer and rationale attacks, as evident from evaluation on models such as MiniGPT4, OpenFlamingo, and LLaVA.
  • Answer attacks frequently modify both answer and rationale components, suggesting interconnected alterations within CoT reasoning. Figure 3

    Figure 3: Prediction with CoT. The complete inference process, illustrated on the left, can be divided into two components: the rationale and the answer, presented on the right. The rationale comprises a sequence of intermediate reasoning steps employed for deducing the final answer.

Stop-reasoning attacks notably undermine robustness more effectively than rationale or answer attacks, indicating the illusory nature of CoT-induced resilience. Further statistical analyses confirm altered rationale components under successful attacks, highlighting the difficulty in precisely identifying and modifying critical rationale information.

Implications

While CoT reasoning introduces nominal robustness enhancements against traditional adversarial strategies, the stop-reasoning attack effectively undermines these defenses. Our findings illuminate the necessity for defensive protocols safeguarding CoT paths in MLLMs. Future increases in adversarial resistance will likely necessitate stronger defensive architectures against specific adversarial strategies. Moreover, CoT-enhanced models offer transparency in understanding adversarial missteps, promising future security advances via clearer rationalization processes.

Conclusion

Though CoT reasoning brings slight adversarial robustness improvements to MLLMs, the stop-reasoning attack effectively dismantles these advances by bypassing the reasoning process. This study instills deeper comprehension of the adversarial attack mechanics within CoT paradigms and presents tangible insights for the development of robust MLLM defenses against adversarial threats. By exposing latent vulnerabilities, this research paves the way for strategic advancements in the secure deployment of multimodal models in critical applications.

Paper to Video (Beta)

Whiteboard

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Collections

Sign up for free to add this paper to one or more collections.

Tweets

Sign up for free to view the 2 tweets with 0 likes about this paper.