Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
41 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Unbridled Icarus: A Survey of the Potential Perils of Image Inputs in Multimodal Large Language Model Security (2404.05264v2)

Published 8 Apr 2024 in cs.CR and cs.CV

Abstract: Multimodal LLMs (MLLMs) demonstrate remarkable capabilities that increasingly influence various aspects of our daily lives, constantly defining the new boundary of AGI. Image modalities, enriched with profound semantic information and a more continuous mathematical nature compared to other modalities, greatly enhance the functionalities of MLLMs when integrated. However, this integration serves as a double-edged sword, providing attackers with expansive vulnerabilities to exploit for highly covert and harmful attacks. The pursuit of reliable AI systems like powerful MLLMs has emerged as a pivotal area of contemporary research. In this paper, we endeavor to demostrate the multifaceted risks associated with the incorporation of image modalities into MLLMs. Initially, we delineate the foundational components and training processes of MLLMs. Subsequently, we construct a threat model, outlining the security vulnerabilities intrinsic to MLLMs. Moreover, we analyze and summarize existing scholarly discourses on MLLMs' attack and defense mechanisms, culminating in suggestions for the future research on MLLM security. Through this comprehensive analysis, we aim to deepen the academic understanding of MLLM security challenges and propel forward the development of trustworthy MLLM systems.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (70)
  1. J. Ji, T. Qiu, B. Chen, B. Zhang, H. Lou, K. Wang, Y. Duan, Z. He, J. Zhou, Z. Zhang, et al., “Ai alignment: A comprehensive survey,” arXiv preprint arXiv:2310.19852, 2023.
  2. J. Fan, M. Xu, Z. Liu, H. Ye, C. Gu, D. Niyato, and K.-Y. Lam, “A learning-based incentive mechanism for mobile aigc service in decentralized internet of vehicles,” in 2023 IEEE 98th Vehicular Technology Conference (VTC2023-Fall), pp. 1–5, IEEE, 2023.
  3. E. Shayegani, Y. Dong, and N. Abu-Ghazaleh, “Jailbreak in pieces: Compositional adversarial attacks on multi-modal language models,” in The Twelfth International Conference on Learning Representations, 2023.
  4. Y. Li, H. Guo, K. Zhou, W. X. Zhao, and J.-R. Wen, “Images are achilles’ heel of alignment: Exploiting visual vulnerabilities for jailbreaking multimodal large language models,” arXiv preprint arXiv:2403.09792, 2024.
  5. Y. Gong, D. Ran, J. Liu, C. Wang, T. Cong, A. Wang, S. Duan, and X. Wang, “Figstep: Jailbreaking large vision-language models via typographic visual prompts,” arXiv preprint arXiv:2311.05608, 2023.
  6. Y. Zhao, T. Pang, C. Du, X. Yang, C. Li, N.-M. M. Cheung, and M. Lin, “On evaluating adversarial robustness of large vision-language models,” Advances in Neural Information Processing Systems, vol. 36, 2024.
  7. E. Shayegani, M. Mamun, Y. Fu, P. Zaree, Y. Dong, and N. Abu-Ghazaleh, “Survey of vulnerabilities in large language models revealed by adversarial attacks. arxiv. doi: 10.48550,” arXiv preprint arXiv.2310.10844, 2023.
  8. X. Huang, W. Ruan, W. Huang, G. Jin, Y. Dong, C. Wu, S. Bensalem, R. Mu, Y. Qi, X. Zhao, et al., “A survey of safety and trustworthiness of large language models through the lens of verification and validation,” arXiv preprint arXiv:2305.11391, 2023.
  9. T. Cui, Y. Wang, C. Fu, Y. Xiao, S. Li, X. Deng, Y. Liu, Q. Zhang, Z. Qiu, P. Li, et al., “Risk taxonomy, mitigation, and assessment benchmarks of large language model systems,” arXiv preprint arXiv:2401.05778, 2024.
  10. D. Zhang, Y. Yu, C. Li, J. Dong, D. Su, C. Chu, and D. Yu, “Mm-llms: Recent advances in multimodal large language models,” arXiv preprint arXiv:2401.13601, 2024.
  11. R. Rombach, A. Blattmann, D. Lorenz, P. Esser, and B. Ommer, “High-resolution image synthesis with latent diffusion models,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 10684–10695, 2022.
  12. X. Wang, S. Duan, X. Yi, J. Yao, S. Zhou, Z. Wei, P. Zhang, D. Xu, M. Sun, and X. Xie, “On the essence and prospect: An investigation of alignment approaches for big models,” arXiv preprint arXiv:2403.04204, 2024.
  13. C. Schlarmann and M. Hein, “On the adversarial robustness of multi-modal foundation models,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 3677–3685, 2023.
  14. X. Qi, K. Huang, A. Panda, M. Wang, and P. Mittal, “Visual adversarial examples jailbreak aligned large language models,” in The Second Workshop on New Frontiers in Adversarial Machine Learning, 2023.
  15. H. Luo, J. Gu, F. Liu, and P. Torr, “An image is worth 1000 lies: Adversarial transferability across prompts on vision-language models,” arXiv preprint arXiv:2403.09766, 2024.
  16. L. Bailey, E. Ong, S. Russell, and S. Emmons, “Image hijacks: Adversarial images can control generative models at runtime,” arXiv preprint arXiv:2309.00236, 2023.
  17. E. Bagdasaryan, T.-Y. Hsieh, B. Nassi, and V. Shmatikov, “(ab) using images and sounds for indirect instruction injection in multi-modal llms,” arXiv preprint arXiv:2307.10490, 2023.
  18. Z. Wang, Z. Han, S. Chen, F. Xue, Z. Ding, X. Xiao, V. Tresp, P. Torr, and J. Gu, “Stop reasoning! when multimodal llms with chain-of-thought reasoning meets adversarial images,” arXiv preprint arXiv:2402.14899, 2024.
  19. D. Lu, T. Pang, C. Du, Q. Liu, X. Yang, and M. Lin, “Test-time backdoor attacks on multimodal large language models,” arXiv preprint arXiv:2402.08577, 2024.
  20. X. Gu, X. Zheng, T. Pang, C. Du, Q. Liu, Y. Wang, J. Jiang, and M. Lin, “Agent smith: A single image can jailbreak one million multimodal llm agents exponentially fast,” arXiv preprint arXiv:2402.08567, 2024.
  21. Z. Tan, C. Zhao, R. Moraffah, Y. Li, Y. Kong, T. Chen, and H. Liu, “The wolf within: Covert injection of malice into mllm societies via an mllm operative,” arXiv preprint arXiv:2402.14859, 2024.
  22. M. Qraitem, N. Tasnim, K. Saenko, and B. A. Plummer, “Vision-llms can fool themselves with self-generated typographic attacks,” arXiv preprint arXiv:2402.00626, 2024.
  23. Y. Wu, X. Li, Y. Liu, P. Zhou, and L. Sun, “Jailbreaking gpt-4v via self-adversarial attacks with system prompts,” arXiv preprint arXiv:2311.09127, 2023.
  24. Y. Dong, H. Chen, J. Chen, Z. Fang, X. Yang, Y. Zhang, Y. Tian, H. Su, and J. Zhu, “How robust is google’s bard to adversarial image attacks?,” arXiv preprint arXiv:2309.11751, 2023.
  25. X. Wang, Z. Ji, P. Ma, Z. Li, and S. Wang, “Instructta: Instruction-tuned targeted attack for large vision-language models,” arXiv preprint arXiv:2312.01886, 2023.
  26. E. Bagdasaryan and V. Shmatikov, “Ceci n’est pas une pomme: Adversarial illusions in multi-modal embeddings,” arXiv preprint arXiv:2308.11804, 2023.
  27. D. Han, X. Jia, Y. Bai, J. Gu, Y. Liu, and X. Cao, “Ot-attack: Enhancing adversarial transferability of vision-language models via optimal transport optimization,” arXiv preprint arXiv:2312.04403, 2023.
  28. Z. Niu, H. Ren, X. Gao, G. Hua, and R. Jin, “Jailbreaking attack against multimodal large language model,” arXiv preprint arXiv:2402.02309, 2024.
  29. X. Tao, S. Zhong, L. Li, Q. Liu, and L. Kong, “Imgtrojan: Jailbreaking vision-language models with one image,” arXiv preprint arXiv:2403.02910, 2024.
  30. Y. Xu, J. Yao, M. Shu, Y. Sun, Z. Wu, N. Yu, T. Goldstein, and F. Huang, “Shadowcast: Stealthy data poisoning attacks against vision-language models,” arXiv preprint arXiv:2402.06659, 2024.
  31. J. Liang, S. Liang, M. Luo, A. Liu, D. Han, E.-C. Chang, and X. Cao, “Vl-trojan: Multimodal instruction backdoor attacks against autoregressive visual language models,” arXiv preprint arXiv:2402.13851, 2024.
  32. J. Rehberger, “Image to prompt injection with google bard.” https://embracethered.com/blog/posts/2023/google-bard-image-to-prompt-injection/, 2023.
  33. J. Zhang, Q. Yi, and J. Sang, “Towards adversarial attack on vision-language pre-training models,” in Proceedings of the 30th ACM International Conference on Multimedia, pp. 5005–5013, 2022.
  34. D. Lu, Z. Wang, T. Wang, W. Guan, H. Gao, and F. Zheng, “Set-level guidance attack: Boosting adversarial transferability of vision-language pre-training models,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 102–111, 2023.
  35. Z. Zhou, S. Hu, M. Li, H. Zhang, Y. Zhang, and H. Jin, “Advclip: Downstream-agnostic adversarial examples in multimodal contrastive learning,” in Proceedings of the 31st ACM International Conference on Multimedia, pp. 6311–6320, 2023.
  36. Z. Yin, M. Ye, T. Zhang, T. Du, J. Zhu, H. Liu, J. Chen, T. Wang, and F. Ma, “Vlattack: Multimodal adversarial attacks on vision-language tasks via pre-trained models,” arXiv preprint arXiv:2310.04655, 2023.
  37. Y. Wang, W. Hu, Y. Dong, and R. Hong, “Exploring transferability of multimodal adversarial samples for vision-language pre-training models with contrastive learning,” arXiv preprint arXiv:2308.12636, 2023.
  38. B. He, X. Jia, S. Liang, T. Lou, Y. Liu, and X. Cao, “Sa-attack: Improving adversarial transferability of vision-language pre-training models via self-augmentation,” arXiv preprint arXiv:2312.04913, 2023.
  39. Y. Nesterov and V. Spokoiny, “Random gradient-free minimization of convex functions,” Foundations of Computational Mathematics, vol. 17, no. 2, pp. 527–566, 2017.
  40. P. Chao, A. Robey, E. Dobriban, H. Hassani, G. J. Pappas, and E. Wong, “Jailbreaking black box large language models in twenty queries,” arXiv preprint arXiv:2310.08419, 2023.
  41. S. Li, M. Xue, B. Z. H. Zhao, H. Zhu, and X. Zhang, “Invisible backdoor attacks on deep neural networks via steganography and regularization,” IEEE Transactions on Dependable and Secure Computing, vol. 18, no. 5, pp. 2088–2105, 2021.
  42. S. Li, H. Liu, T. Dong, B. Z. H. Zhao, M. Xue, H. Zhu, and J. Lu, “Hidden backdoors in human-centric language models,” in Proceedings of ACM CCS, 2021.
  43. M. Li, L. Li, Y. Yin, M. Ahmed, Z. Liu, and Q. Liu, “Red teaming visual language models,” arXiv preprint arXiv:2401.12915, 2024.
  44. W. Yang, J. Gao, and B. Mirzasoleiman, “Robust contrastive language-image pretraining against data poisoning and backdoor attacks,” Advances in Neural Information Processing Systems, vol. 36, 2024.
  45. J. Zhang, X. Ma, X. Wang, L. Qiu, J. Wang, Y.-G. Jiang, and J. Sang, “Adversarial prompt tuning for vision-language models,” arXiv preprint arXiv:2311.11261, 2023.
  46. L. Li, H. Guan, J. Qiu, and M. Spratling, “One prompt word is enough to boost adversarial robustness for pre-trained vision-language models,” arXiv preprint arXiv:2403.01849, 2024.
  47. Y. Chen, K. Sikka, M. Cogswell, H. Ji, and A. Divakaran, “Dress: Instructing large vision-language models to align and interact with humans via natural language feedback,” arXiv preprint arXiv:2311.10081, 2023.
  48. X. Zhang, C. Zhang, T. Li, Y. Huang, X. Jia, X. Xie, Y. Liu, and C. Shen, “A mutation-based method for multi-modal jailbreaking attack detection,” arXiv preprint arXiv:2312.10766, 2023.
  49. R. Pi, T. Han, Y. Xie, R. Pan, Q. Lian, H. Dong, J. Zhang, and T. Zhang, “Mllm-protector: Ensuring mllm’s safety without hurting performance,” arXiv preprint arXiv:2401.02906, 2024.
  50. P. Wang, D. Zhang, L. Li, C. Tan, X. Wang, K. Ren, B. Jiang, and X. Qiu, “Inferaligner: Inference-time alignment for harmlessness through cross-model guidance,” arXiv preprint arXiv:2401.11206, 2024.
  51. Y. Wang, X. Liu, Y. Li, M. Chen, and C. Xiao, “Adashield: Safeguarding multimodal large language models from structure-based attack via adaptive shield prompting,” arXiv preprint arXiv:2403.09513, 2024.
  52. Y. Gou, K. Chen, Z. Liu, L. Hong, H. Xu, Z. Li, D.-Y. Yeung, J. T. Kwok, and Y. Zhang, “Eyes closed, safety on: Protecting multimodal llms via image-to-text transformation,” arXiv preprint arXiv:2403.09572, 2024.
  53. D. Zhang, P. Finckenberg-Broman, T. Hoang, S. Pan, Z. Xing, M. Staples, and X. Xu, “Right to be forgotten in the era of large language models: Implications, challenges, and solutions,” arXiv preprint arXiv:2307.03941, 2023.
  54. A. Lynch, P. Guo, A. Ewart, S. Casper, and D. Hadfield-Menell, “Eight methods to evaluate robust unlearning in llms,” arXiv preprint arXiv:2402.16835, 2024.
  55. C. Dwork, “Differential privacy,” in International colloquium on automata, languages, and programming, pp. 1–12, Springer, 2006.
  56. Z. Liu, J. Guo, M. Yang, W. Yang, J. Fan, and K.-Y. Lam, “Privacy-enhanced knowledge transfer with collaborative split learning over teacher ensembles,” in Proceedings of the 2023 Secure and Trustworthy Deep Learning Systems Workshop, pp. 1–13, 2023.
  57. K. Zhang, Y. Zhang, R. Sun, P.-W. Tsai, M. U. Hassan, X. Yuan, M. Xue, and J. Chen, “Bounded and unbiased composite differential privacy,” in 2024 IEEE Symposium on Security and Privacy (SP), 2024.
  58. X. Yin, Y. Zhu, and J. Hu, “A comprehensive survey of privacy-preserving federated learning: A taxonomy, review, and future directions,” ACM Computing Surveys (CSUR), vol. 54, no. 6, pp. 1–36, 2021.
  59. Z. Liu, J. Guo, W. Yang, J. Fan, K.-Y. Lam, and J. Zhao, “Dynamic user clustering for efficient and privacy-preserving federated learning,” IEEE Transactions on Dependable and Secure Computing, 2024.
  60. Z. Liu, H.-Y. Lin, and Y. Liu, “Long-term privacy-preserving aggregation with user-dynamics for federated learning,” IEEE Transactions on Information Forensics and Security, 2023.
  61. H. Hu, S. Wang, J. Chang, H. Zhong, R. Sun, S. Hao, H. Zhu, and M. Xue, “A duty to forget, a right to be assured? exposing vulnerabilities in machine unlearning services,” in Proceedings of the Network and Distributed System Security Symposium, 2024.
  62. H. Hu, S. Wang, T. Dong, and M. Xue, “Learn what you want to unlearn: Unlearning inversion attacks against machine unlearning,” in 2024 IEEE Symposium on Security and Privacy (SP), 2024.
  63. Z. Liu, H. Ye, C. Chen, and K.-Y. Lam, “Threats, attacks, and defenses in machine unlearning: A survey,” arXiv preprint arXiv:2403.13682, 2024.
  64. Y. Jiang, J. Shen, Z. Liu, C. W. Tan, and K.-Y. Lam, “Towards efficient and certified recovery from poisoning attacks in federated learning,” arXiv preprint arXiv:2401.08216, 2024.
  65. Z. Liu, Y. Jiang, J. Shen, M. Peng, K.-Y. Lam, and X. Yuan, “A survey on federated unlearning: Challenges, methods, and future directions,” arXiv preprint arXiv:2310.20448, 2023.
  66. H. Lee, S. Phatale, H. Mansoor, K. Lu, T. Mesnard, C. Bishop, V. Carbune, and A. Rastogi, “Rlaif: Scaling reinforcement learning from human feedback with ai feedback,” arXiv preprint arXiv:2309.00267, 2023.
  67. S. Wang, Y. Zhu, H. Liu, Z. Zheng, C. Chen, et al., “Knowledge editing for large language models: A survey,” arXiv preprint arXiv:2310.16218, 2023.
  68. X. Wang, S. Mao, N. Zhang, S. Deng, Y. Yao, Y. Shen, L. Liang, J. Gu, and H. Chen, “Editing conceptual knowledge for large language models,” arXiv preprint arXiv:2403.06259, 2024.
  69. M. Wang, N. Zhang, Z. Xu, Z. Xi, S. Deng, Y. Yao, Q. Zhang, L. Yang, J. Wang, and H. Chen, “Detoxifying large language models via knowledge editing,” arXiv preprint arXiv:2403.14472, 2024.
  70. Q. Zhao, M. Xu, K. Gupta, A. Asthana, L. Zheng, and S. Gould, “The first to know: How token distributions reveal hidden knowledge in large vision-language models?,” arXiv preprint arXiv:2403.09037, 2024.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (5)
  1. Yihe Fan (4 papers)
  2. Yuxin Cao (16 papers)
  3. Ziyu Zhao (28 papers)
  4. Ziyao Liu (22 papers)
  5. Shaofeng Li (16 papers)
Citations (6)
X Twitter Logo Streamline Icon: https://streamlinehq.com