Summary of the Red Teaming Visual LLMs Study
Introduction
The emergence of Vision-LLMs (VLMs), which combine the textual and visual processing capabilities of LLMs, has broadened the spectrum of AI applications. Despite the evident progress in VLMs, the lack of systematic red teaming benchmarks prompted the introduction of the Red Teaming Visual LLM (RTVLM) dataset. This newly constructed dataset assesses VLMs in areas crucial for secure deployment: Faithfulness, Safety, Privacy, and Fairness.
RTVLM Dataset Construction
RTVLM includes ten subtasks, each designed to target specific vulnerabilities within VLMs. The dataset ensures novelty by using images generated via diffusion techniques and human-annotated or GPT-4 generated questions. In evaluating Faithfulness, the dataset includes text and visual misleading tasks, and image order processing; Privacy is assessed through the distinction between public figures and private individuals, while Safety tests model responses to ethically risky inputs. For Fairness, VLMs are evaluated for bias towards individuals of varying races and genders.
Experimental Results
Upon evaluation, it was found that VLMs exhibit performance gaps in red teaming tasks and often lack red teaming alignment. The dataset served to benchmark and perform detailed analysis on 10 prominent VLMs, highlighting up to a 31% performance gap with GPT-4V. Incorporating RTVLM for Supervised Fine-tuning (SFT) into models like LLaVA-v1.5 improved performance significantly on the RTVLM test set and related benchmarks without degrading general performance, suggesting the necessity of incorporating red teaming alignment in the training process.
Red Teaming Alignment and Conclusions
The paper elucidates that current alignment practices in VLMs are insufficient when encountering red teaming scenarios. It also empirically demonstrates that directly aligning models with RTVLM improves both the safety and robustness of model outputs. The paper concludes by underscoring the importance of VLM security, and the RTVLM dataset is posited as a valuable asset for advancing model security measures.