Papers

Topics

Authors

Recent

View all

Gemini 2.5 Flash

Gemini 2.5 Flash 102 tok/s

Gemini 2.5 Pro 58 tok/s Pro

GPT-5 Medium 25 tok/s

GPT-5 High 35 tok/s Pro

GPT-4o 99 tok/s

GPT OSS 120B 472 tok/s Pro

Kimi K2 196 tok/s Pro

2000 character limit reached

SDEval: Safety Dynamic Evaluation for Multimodal Large Language Models (2508.06142v1)

Published 8 Aug 2025 in cs.CV

Abstract: In the rapidly evolving landscape of Multimodal LLMs (MLLMs), the safety concerns of their outputs have earned significant attention. Although numerous datasets have been proposed, they may become outdated with MLLM advancements and are susceptible to data contamination issues. To address these problems, we propose \textbf{SDEval}, the \textit{first} safety dynamic evaluation framework to controllably adjust the distribution and complexity of safety benchmarks. Specifically, SDEval mainly adopts three dynamic strategies: text, image, and text-image dynamics to generate new samples from original benchmarks. We first explore the individual effects of text and image dynamics on model safety. Then, we find that injecting text dynamics into images can further impact safety, and conversely, injecting image dynamics into text also leads to safety risks. SDEval is general enough to be applied to various existing safety and even capability benchmarks. Experiments across safety benchmarks, MLLMGuard and VLSBench, and capability benchmarks, MMBench and MMVet, show that SDEval significantly influences safety evaluation, mitigates data contamination, and exposes safety limitations of MLLMs. Code is available at https://github.com/hq-King/SDEval

Collections

Summary

The paper presents a dynamic evaluation framework combining text, image, and text-image strategies to enhance safety assessments of MLLMs.
It demonstrates that adaptive evaluation methods can significantly impact safety performance, with models like InternVL experiencing nearly 10% degradation.
The framework balances safety and capability evaluations, offering a robust tool for developing trustworthy multimodal AI systems.

SDEval: Safety Dynamic Evaluation for Multimodal LLMs

Multimodal LLMs (MLLMs) have made significant strides in recent years, extending the success of LLMs into the domain of high-level vision tasks. However, their outputs are increasingly scrutinized for safety, particularly the potential for generating harmful or untruthful content. The paper "SDEval: Safety Dynamic Evaluation for Multimodal LLMs" proposes a novel evaluation framework designed to assess and improve the safety of MLLMs through dynamic evaluation methodologies.

Introduction to SDEval

SDEval addresses the limitations of existing static safety benchmarks by introducing a dynamic evaluation framework capable of adapting to the rapidly evolving field of MLLMs. The framework introduces three primary dynamic strategies: text dynamics, image dynamics, and text-image dynamics, each designed to generate new samples from existing benchmarks, thereby creating a more challenging and relevant evaluation environment.

Figure 1: Dynamic Evaluation vs Static Evaluation. Dynamic evaluation can generate diverse variants from static benchmarks with flexibly adjustable complexity.

Dynamic Evaluation Strategies

Text Dynamics

Text dynamics focus on assessing whether MLLMs can correctly interpret safety risks expressed through varied linguistic structures. This involves strategies such as word replacement, sentence paraphrasing, adding descriptive text, introducing typos, linguistic mix, and utilizing chain-of-thought prompts. These techniques test the model's robustness in understanding safety-related language nuances.

Image Dynamics

Image dynamics aim to evaluate the MLLM's ability to recognize safety hazards presented visually. Techniques include basic augmentations (spatial and color transformations) and advanced manipulations (object and text insertion, generative transformations) to assess the model's consistency in identifying harmful content in altered visual data.

Text-Image Dynamics

This approach evaluates the model's ability to process and understand safety risks through complex interactions between text and image modalities. By blending text and image dynamics, SDEval assesses the model's capability to maintain safety in multimodal contexts, which is crucial for catching vulnerabilities exposed by cross-modal jailbreaking methods like Figstep and HADES.

Figure 2: The whole framework of SDEval. Specifically, the dynamic generation process of SDEval consists of text dynamics, image dynamics, and text-image dynamics.

Experimental Results

The framework was tested against established safety benchmarks such as MLLMGuard and VLSBench. The results, as illustrated in Table 1 and Table 2, highlight the efficacy of dynamic strategies in significantly altering model evaluations. For instance, models like InternVL experienced a notable safety performance degradation of nearly 10%, underscoring the substantial impact of dynamic approaches on safety evaluation.

Capability Evaluation and Balance

Beyond safety, SDEval's dynamic strategies were also applied to capability evaluations using benchmarks such as MMVet and MMBench. The results confirmed that SDEval effectively increases evaluation complexity, impacting model performance across both safety and capability metrics. This raises essential considerations for maintaining a balance, as posited by the AI $45^{\circ}$ Law, which advocates for harmonious development in safety and performance.

Figure 3: We present the balance scatter plot between MLLM capability and safety under the AI $45^{\circ}$ Law.

Conclusion

SDEval offers a flexible, dynamic framework for assessing and improving the safety evaluation of MLLMs. By integrating diverse generative dynamics, SDEval not only mitigates data leakage and static dataset limitations but also aligns with the ongoing advancements in MLLM development. The framework's robust design and ability to extend to capability evaluation mark it as a pivotal tool in ensuring the secure and effective deployment of MLLMs.

Overall, SDEval presents a significant advancement in the safety evaluation of multimodal models, offering insights and methodologies crucial for the future development of trustworthy AI systems.

PDF Markdown

Paper Prompts

Explore 10 Community Prompts

Follow-up Questions

Authors (5)

GitHub

GitHub - hq-King/SDEval (3 stars)

alphaXiv

SDEval: Safety Dynamic Evaluation for Multimodal Large Language Models (7 likes, 0 questions)