REVEAL: Multi-turn Evaluation of Image-Input Harms for Vision LLM (2505.04673v1)

Published 7 May 2025 in cs.CL and cs.AI

Abstract: Vision LLMs (VLLMs) represent a significant advancement in artificial intelligence by integrating image-processing capabilities with textual understanding, thereby enhancing user interactions and expanding application domains. However, their increased complexity introduces novel safety and ethical challenges, particularly in multi-modal and multi-turn conversations. Traditional safety evaluation frameworks, designed for text-based, single-turn interactions, are inadequate for addressing these complexities. To bridge this gap, we introduce the REVEAL (Responsible Evaluation of Vision-Enabled AI LLMs) Framework, a scalable and automated pipeline for evaluating image-input harms in VLLMs. REVEAL includes automated image mining, synthetic adversarial data generation, multi-turn conversational expansion using crescendo attack strategies, and comprehensive harm assessment through evaluators like GPT-4o. We extensively evaluated five state-of-the-art VLLMs, GPT-4o, Llama-3.2, Qwen2-VL, Phi3.5V, and Pixtral, across three important harm categories: sexual harm, violence, and misinformation. Our findings reveal that multi-turn interactions result in significantly higher defect rates compared to single-turn evaluations, highlighting deeper vulnerabilities in VLLMs. Notably, GPT-4o demonstrated the most balanced performance as measured by our Safety-Usability Index (SUI) followed closely by Pixtral. Additionally, misinformation emerged as a critical area requiring enhanced contextual defenses. Llama-3.2 exhibited the highest MT defect rate ($16.55 \%$) while Qwen2-VL showed the highest MT refusal rate ($19.1 \%$).

Authors (2)

Madhur Jindal (2 papers)
Saurabh Deshpande (7 papers)

Summary

Overview of REVEAL: Multi-turn Evaluation Framework for Image-Input Harms in Vision LLMs

The paper "REVEAL: Multi-turn Evaluation of Image-Input Harms for Vision LLMs" introduces an innovative framework designed to assess the safety and ethical challenges posed by Vision LLMs (VLLMs). By incorporating multi-modal capabilities—combining image processing with textual understanding—VLLMs enhance user interactions but also introduce novel vulnerabilities, particularly in multi-turn conversations. Traditional evaluations, focusing on single-turn conversations, fall short in addressing these complexities.

The REVEAL Framework

The REVEAL Framework stands out by providing a scalable, automated pipeline for evaluating potential harms in VLLMs. Its methodology includes automated image mining, synthetic adversarial data generation, and extensive harm assessment using powerful evaluators like GPT-4o. The framework consists of multiple components, each tailored to fulfill specific roles in the evaluation process, ensuring comprehensive analysis across diverse harm categories such as sexual exploitation, violence, and misinformation. Remarkable features of the framework include:

Automated Image Mining: This involves sourcing real-world images based on tailored image search queries, ensuring contextual relevance to the harm policies defined in the evaluation. The mined images serve as inputs for generating adversarial contexts.
Synthetic Adversarial Data Generation: Queries generated from extracted images undergo transformation into multi-turn conversational contexts, specifically designed to embed potential harm through the crescendo attack technique, which gradually increases the harmfulness of the conversation.
Evaluator Block: Utilizing GPT-4o, the evaluator systematically assesses generated conversations for harmful content based on the defined safety metrics.

Evaluation and Results

The paper evaluated five state-of-the-art VLLMs: GPT-4o, Llama-3.2, Qwen2-VL, Phi3.5V, and Pixtral. Across various harm policies, multi-turn interactions revealed significantly higher defect rates compared to single-turn evaluations, underscoring deeper vulnerabilities in VLLMs. Notably, GPT-4o exhibited balanced performance, with a lower defect rate in multi-turn evaluations while maintaining a minimal refusal incidence, demonstrating robust safety alignment.

Multi-turn evaluations yielded superior insights—higher misinformation defect rates, for instance, reflected models' challenges in handling nuanced, context-rich misinformation attacks. Additionally, the proposal of the Safety-Usability Index (SUI) presents a novel metric capturing the harmony between safety and usability, with GPT-4o and Pixtral scoring favorably under this index. Importantly, text-only evaluations of GPT-4o consistently showed lower defect rates, further validating the need for ongoing enhancement of multi-modal safety protocols.

Implications and Future Directions

As the integration of VLLMs becomes more prevalent, the REVEAL Framework offers critical insights for developers, researchers, and policymakers by illustrating the multifaceted safety challenges and vulnerabilities inherent in these systems. By prioritizing adaptable, automated evaluation processes, REVEAL presents a robust diagnostic tool to guide the development of safer, more reliable AI systems. Its modular design supports expansion to accommodate emerging harm categories and modeling techniques, ensuring continual relevance and efficiency.

While promising, this research emphasizes the importance of a contextualized, policy-driven approach to safety assessments in AI. Developers are urged to balance scalability with safeguard implementations tailored to specific application contexts. As technology progresses, frameworks like REVEAL will be crucial in maintaining ethical standards and securing responsible AI deployment.

In conclusion, REVEAL represents a significant step forward in enhancing the robustness of safety evaluations for VLLMs, providing a practical, comprehensive framework equipped to tackle the complexities of modern AI systems in real-world settings. The availability of REVEAL's components to the public invites further research, fostering advancement in AI safety.

Related Papers

YouTube

Show All Videos