A Survey of Attacks on Large Vision-Language Models: Resources, Advances, and Future Trends (2407.07403v2)

Published 10 Jul 2024 in cs.CV

Abstract: With the significant development of large models in recent years, Large Vision-LLMs (LVLMs) have demonstrated remarkable capabilities across a wide range of multimodal understanding and reasoning tasks. Compared to traditional LLMs, LVLMs present great potential and challenges due to its closer proximity to the multi-resource real-world applications and the complexity of multi-modal processing. However, the vulnerability of LVLMs is relatively underexplored, posing potential security risks in daily usage. In this paper, we provide a comprehensive review of the various forms of existing LVLM attacks. Specifically, we first introduce the background of attacks targeting LVLMs, including the attack preliminary, attack challenges, and attack resources. Then, we systematically review the development of LVLM attack methods, such as adversarial attacks that manipulate model outputs, jailbreak attacks that exploit model vulnerabilities for unauthorized actions, prompt injection attacks that engineer the prompt type and pattern, and data poisoning that affects model training. Finally, we discuss promising research directions in the future. We believe that our survey provides insights into the current landscape of LVLM vulnerabilities, inspiring more researchers to explore and mitigate potential safety issues in LVLM developments. The latest papers on LVLM attacks are continuously collected in https://github.com/liudaizong/Awesome-LVLM-Attack.

PDF HTML Abstract

Overview of Attacks on Large Vision-LLMs: Insights and Future Directions

In the context of advancing artificial intelligence, Large Vision-LLMs (LVLMs) have demonstrated considerable success in executing complex tasks that involve the integration of visual and linguistic information. This integration, while enhancing capability, simultaneously exposes LVLMs to a wider array of security threats unseen in traditional LLMs or unimodal systems. The vulnerabilities of LVLMs arise primarily from their multi-modal nature, which introduces unique challenges and new attack vectors. This paper offers an exhaustive survey of attack methodologies targeting LVLMs, focusing on adversarial, jailbreak, prompt injection, and data poisoning/backdoor attacks.

Types of Attacks

Adversarial Attacks: These attacks manipulate the inputs to LVLMs to cause erroneous or specific outputs. Notably, adversarial perturbations can extend from the vision domain into multi-modal settings, exploiting vulnerabilities across both visual and textual inputs. The ML community has explored various attack strategies across white-box, gray-box, and black-box scenarios, each differing in terms of the attacker’s knowledge of the model’s architecture and parameters.

Jailbreak Attacks: By exploiting the alignment mechanisms of models, jailbreak attacks aim to bypass restrictions designed to prevent unauthorized actions or the generation of harmful content. These attacks can involve crafting adversarial perturbations or manipulating prompts to extract sensitive information or execute unintended commands.

Prompt Injection Attacks: These attacks modify the input prompts to influence the model’s behavior, potentially causing it to generate unwanted responses. The survey details how prompt injection can be particularly hazardous in applications requiring precise outputs, such as those used in sensitive contexts like healthcare.

Data Poisoning/Backdoor Attacks: These attacks involve compromising the model during its training phase by introducing malicious data that embeds triggers for future undesired behavior. Such backdoors can be activated later, leading to misclassifications or exploitable conditions.

Methodology and Insights

The paper methodically categorizes past research efforts, exploring how they have contributed to understanding LVLM vulnerabilities. It critically examines the available resources, proposed methodologies, and defense mechanisms against these attacks. The survey incorporates insights from various studies, categorizing them into different types of attacks, modalities impacted, and specific methodologies employed. In the process, it highlights some sophisticated attack methodologies, such as dual optimization of adversarial prefixes in jailbreak attacks and perturbation techniques that leverage surrogate models for gray-box adversarial efforts.

Implications and Future Directions

The exploration of LVLM attacks underscores the necessity for robust security measures in designing future AI systems. The paper's taxonomy of attacks serves as a critical framework for researchers aiming to fortify LVLMs against these evolving threats. Practically, the insights can assist in developing stronger defense strategies, such as adaptive and dynamic checking mechanisms, improved alignment techniques, and enhanced robustness against adversarial inputs.

Speculatively, future developments in AI should focus on improving model robustness and transferability of defense mechanisms across different LVLM architectures. Of particular interest is the need to address the challenges posed by the integration of large volumes of training data, which can introduce biases and vulnerabilities that attackers may exploit. As this field matures, researchers are encouraged to examine cross-modal interactions in adversarial contexts and explore how multi-modal understanding can bolster model security.

Conclusion

This paper's survey of LVLM attack methodologies forms an essential resource for academic and industry practitioners focused on AI safety and security. By systematically analyzing existing attacks and proposing future research trajectories, it provides a foundational understanding necessary to develop more resilient and secure multimodal AI systems. The integration of theory and real-world application proposed by this survey prompts a forward-looking view into the dynamic landscape of AI security, essential for advancing both technological understanding and practical implementations in the field of machine learning safety.