MSTS: A Multimodal Safety Test Suite for Vision-Language Models (2501.10057v1)

Published 17 Jan 2025 in cs.CL

Abstract: Vision-LLMs (VLMs), which process image and text inputs, are increasingly integrated into chat assistants and other consumer AI applications. Without proper safeguards, however, VLMs may give harmful advice (e.g. how to self-harm) or encourage unsafe behaviours (e.g. to consume drugs). Despite these clear hazards, little work so far has evaluated VLM safety and the novel risks created by multimodal inputs. To address this gap, we introduce MSTS, a Multimodal Safety Test Suite for VLMs. MSTS comprises 400 test prompts across 40 fine-grained hazard categories. Each test prompt consists of a text and an image that only in combination reveal their full unsafe meaning. With MSTS, we find clear safety issues in several open VLMs. We also find some VLMs to be safe by accident, meaning that they are safe because they fail to understand even simple test prompts. We translate MSTS into ten languages, showing non-English prompts to increase the rate of unsafe model responses. We also show models to be safer when tested with text only rather than multimodal prompts. Finally, we explore the automation of VLM safety assessments, finding even the best safety classifiers to be lacking.

Authors (22)

Paul Röttger (37 papers)
Giuseppe Attanasio (21 papers)
Felix Friedrich (40 papers)
Janis Goldzycher (7 papers)
Alicia Parrish (31 papers)
Rishabh Bhardwaj (30 papers)
Chiara Di Bonaventura (4 papers)
Roman Eng (2 papers)
Gaia El Khoury Geagea (1 paper)
Sujata Goswami (8 papers)
Jieun Han (12 papers)
Dirk Hovy (57 papers)
Seogyeong Jeong (3 papers)
Paloma Jeretič (2 papers)
Flor Miriam Plaza-del-Arco (10 papers)
Donya Rooein (8 papers)
Patrick Schramowski (48 papers)
Anastassia Shaitarova (3 papers)
Xudong Shen (19 papers)
Richard Willats (1 paper)

Summary

MSTS: A Comprehensive Framework for Evaluating Multimodal Safety in Vision-LLMs

The proliferation of Vision-LLMs (VLMs) in consumer AI applications such as chat assistants has been met with growing concerns regarding their operational safety. Unlike LLMs that are primarily text-focused, VLMs integrate multimodal inputs, specifically text and images, which potentially introduce new vectors for unsafe interactions. The paper "MSTS: A Multimodal Safety Test Suite for Vision-LLMs" introduces a meticulously structured framework to rigorously evaluate such safety risks inherent in VLMs.

Contributions and Findings

The centerpiece of the research is the Multimodal Safety Test Suite (MSTS), which is comprised of 400 distinct test prompts across 40 categories of hazards. This arrangement permits a nuanced, category-specific analysis of VLM performance and safety. The paper involved testing ten state-of-the-art VLMs, which revealed varying safety standards and underlying operational discrepancies among them.

Hazard Taxonomy: The paper sets forth a comprehensive taxonomy covering 40 categories of potentially harmful situations. This taxonomy is critical for systematically evaluating VLM responses to combined text and image prompts that possess unsafe meanings derived only through their conjunction.
Model Assessment: Evaluation of commercial and open-weight VLMs on the MSTS highlighted a stark safety discrepancy. Commercial models generally displayed a high degree of safety with less than 0.5% unsafe responses, whereas open VLMs showed higher rates of both unsafe responses and misinterpretations, indicating a propensity to be safe "by accident."
Multilingual Testing: By translating MSTS into ten additional languages, the paper explored the efficacy of VLM safety in non-English contexts. It was observed that certain open models, such as MiniCPM-2.6, exhibited increased safety issues when dealing with languages other than English.
Text-Only Prompt Effectiveness: Testing with text-only versions of the prompts yielded interesting insights, particularly for MiniCPM-2.6, which demonstrated comparatively higher safety levels. This suggests that text-only processing remains more mature than handling multimodal input, which requires more advanced comprehension capabilities.
Automation of Safety Evaluations: The research discusses the challenges in automating safety assessments using state-of-the-art models. Despite efforts, the precision in identifying unsafe outputs remains suboptimal, illustrating the complexity of dynamically assessing multimodal interactions.

Implications and Future Directions

The MSTS framework’s robust approach to testing unveils critical insights into the present state of VLM safety. Its thorough taxonomic assessment provides a foundational structure that advances the understanding of multimodal risks. Practically, this can guide developers in refining VLM safety protocols and addressing the highlighted gaps, especially in open models.

Theoretical implications drive further inquiry into integrating nuanced context understanding in AI models. As AI systems increasingly engage with diverse cultures and languages, enhancing multilingual safety assessments will be imperative.

Future research should focus on developing more sophisticated dynamic evaluation tools that align more closely with human annotative interpretations to reduce biases in automated safety assessments. Additionally, expanding the MSTS framework to incorporate adversarially constructed prompts could enhance its robustness and applicability across evolving AI ecosystems.

In sum, this paper marks a substantial progression in the systematic evaluation of VLM safety, ultimately facilitating the creation of more reliable and context-aware multimodal AI systems.

PDF Markdown

Related Papers

Tweets

https://twitter.com/paul_rottger/status/1881665916783743035

https://twitter.com/javaeeeee1/status/1883472561910284386