Safety of Multimodal Large Language Models on Images and Texts (2402.00357v3)

Published 1 Feb 2024 in cs.CV

Abstract: Attracted by the impressive power of Multimodal LLMs (MLLMs), the public is increasingly utilizing them to improve the efficiency of daily work. Nonetheless, the vulnerabilities of MLLMs to unsafe instructions bring huge safety risks when these models are deployed in real-world scenarios. In this paper, we systematically survey current efforts on the evaluation, attack, and defense of MLLMs' safety on images and text. We begin with introducing the overview of MLLMs on images and text and understanding of safety, which helps researchers know the detailed scope of our survey. Then, we review the evaluation datasets and metrics for measuring the safety of MLLMs. Next, we comprehensively present attack and defense techniques related to MLLMs' safety. Finally, we analyze several unsolved issues and discuss promising research directions. The latest papers are continually collected at https://github.com/isXinLiu/MLLM-Safety-Collection.

PDF Abstract

Safety of Multimodal LLMs on Images and Text

The paper "Safety of Multimodal LLMs on Images and Text" presents a structured investigation into the vulnerabilities and security paradigms of Multimodal LLMs (MLLMs) operating on data composed of images and text. Given the growing reliance on MLLMs in diverse applications, understanding and mitigating safety risks associated with their deployment is crucial.

The authors commence by outlining the substantial progress and potential of MLLMs, emphasizing their capacity to process and interpret multimodal data. Models such as LLaVA, MiniGPT-4, and GPT-4V are highlighted as examples showcasing the integration of language and vision capabilities. However, the paper recognizes that the inclusion of visual data augments the complexity and susceptibility of these models to safety risks.

A notable contribution of the paper is its thorough survey of existing evaluation strategies, attack vectors, and defensive techniques specifically tailored for assessing the safety of MLLMs. The researchers identify three primary facets of visual modality that introduce unique challenges:

Adversarial Perturbations: The paper discusses how slight modifications to images can manipulate model predictions, a tactic successfully studied using adversarial attack strategies. For instance, the use of Projected Gradient Descent (PGD) to create adversarial inputs that subvert MLLMs' intended functionality.
Optical Character Recognition (OCR) Exploits: MLLMs' ability to interpret textual content within images can be misused. The paper highlights instances where models comply with malicious visual instructions that would typically be rejected if presented in text form.
Cross-modal Training Impacts: The authors assess how training MLLMs to handle multimodal information may dilute alignment and exacerbate vulnerabilities inherited from the base LLMs.

The paper also provides a comparative analysis of representative safety evaluation benchmarks such as PrivQA, GOAT-Bench, ToViLaG, SafeBench, and MM-SafetyBench. These datasets vary in scope, exploring different dimensions of safety, from privacy defense mechanisms to toxicity detection in generated content.

From the perspective of defense mechanisms, the paper proposes categorizing solutions into inference-time and training-time alignments. Inference-time methods, such as self-moderation and system prompt engineering, provide immediate and flexible safety adjustments without needing to retrain the models. Conversely, training-time aligners advocate for more intrinsic solutions, such as reinforcement learning informed by feedback, to better embed safety-oriented behavior in MLLMs.

In their analysis, the researchers underscore the inadequacy of current defense strategies in achieving a robust balance between safety and operational utility, which prompts a call for innovation in alignment techniques. Future work is suggested to focus on optimizing visual instruction tuning and engaging reinforcement learning paradigms to bolster safety alignment without compromising the utility of MLLMs. Moreover, given the perpetual evolutionary nature of these models, continuous safety assessment and improvement are deemed necessary to keep pace with advancing challenges.

In conclusion, this survey elucidates the existing landscape of safety constructs in MLLMs, accentuating both the importance of safe deployment and the breadth of untapped research potential. It sets the stage for developing more refined safety measures that can aid in securely capitalizing on MLLMs' capabilities.

PDF Markdown Bookmark Chat (Pro)

Authors (5)

Xin Liu (820 papers)
Yichen Zhu (51 papers)
Yunshi Lan (30 papers)
Chao Yang (333 papers)
Yu Qiao (563 papers)

Citations (14)

View on Semantic Scholar

Safety of Multimodal Large Language Models on Images and Texts (2402.00357v3)

Safety of Multimodal LLMs on Images and Text

Related Papers

GitHub

YouTube