- The paper identifies visual safety information leakage (VSIL) in multimodal large language models, where sensitive visual data inadvertently appears in text queries.
- The study constructs VLSBench, a specialized benchmark ensuring text queries do not reveal sensitive image details, to expose shortcomings in textual-only safety evaluations.
- Experimental results show that multimodal alignment strategies significantly outperform simple textual methods in mitigating safety risks.
VLSBench: Unveiling Visual Leakage in Multimodal Safety
Introduction to Multimodal Safety Concerns
The paper of Multimodal LLMs (MLLMs), which integrate visual inputs with traditional text-based LLMs, has revealed significant concerns regarding safety in their applications. The paper identifies a previously unexplored issue termed "Visual Safety Information Leakage" (VSIL), where sensitive visual content inadvertently leaks into textual queries. This problem leads to misleading safety evaluations, as simple text-based alignment methods can erroneously achieve comparable safety results without genuinely addressing the visual component of multimodal interactions.
Figure 1: Overview of our work. We have discovered a problem in current multimodal safety data samples, which says visual safety information leakage (VSIL). Based on this leakage, we further find it leads to a counter-intuitive problem, that simpler SFT-based alignment methods can perform nearly the same high safety rate. Thus, we construct VLSBench, preventing visual leakage. This newly proposed task discourages textual alignment and motivates more dedicated multimodal alignment methods to better solve this challenging task.
VSIL is identified when the content of an image, which should be independently harmful, is explicitly described in its accompanying text query. This causes existing safety models to inadequately evaluate safety since the decision can be made without actual consideration of the visual input. The paper provides quantitative verification of VSIL through a comparison using the LlamaGuard model, highlighting significant discrepancies between unsafe image-text pairs and solely text-derived conclusions.
Figure 2: Four examples in current benchmarks to showcase the problem of visual safety information leakage. The leakage information from visual to textual is marked as red.
Construction of VLSBench
To address these flaws in current evaluation benchmarks, the authors introduce VLSBench, a dataset specifically designed to prevent VSIL by ensuring that the textual query does not reveal sensitive information present in the image. The construction of VLSBench involves generating image-text pairs that resist visual leakage through an iterative synthesis process.
Figure 3: Overview of VLSBench construction pipeline. Our pipeline features prevent visual leakage. This pipeline includes four steps: (a) Harmful query and image description generation. (b) M: Mitigating visual leakage from textual harmful query. (c) Iterative image generation from image description. (d) Final filtration ensuring image-text pairs are matched and harmful.
Experimental Findings and Implications
The paper's experiments demonstrate that when using VLSBench, models that rely purely on textual alignment significantly underperform compared to those using multimodal alignment strategies. This clear performance gap underscores the necessity for more nuanced multimodal alignment techniques when visual safety information leakage is adequately controlled.
Figure 4: Textual SFT compared with Multimodal SFT on our VLSBench. Dash lines mean the average safety rate on the three base models.
Conclusion
In concluding, the paper emphasizes that addressing VSIL is crucial for realistic multimodal safety evaluations. The introduction of VLSBench is a step toward this goal, providing a challenging environment that accurately reflects the complexities of real-world multimodal data without the oversimplified crutch of text-only leakage. The findings suggest future research should focus on enhancing multimodal safety alignment methods to truly integrate and secure visual data in AI systems.