- The paper presents a dynamic evaluation framework combining text, image, and text-image strategies to enhance safety assessments of MLLMs.
- It demonstrates that adaptive evaluation methods can significantly impact safety performance, with models like InternVL experiencing nearly 10% degradation.
- The framework balances safety and capability evaluations, offering a robust tool for developing trustworthy multimodal AI systems.
SDEval: Safety Dynamic Evaluation for Multimodal LLMs
Multimodal LLMs (MLLMs) have made significant strides in recent years, extending the success of LLMs into the domain of high-level vision tasks. However, their outputs are increasingly scrutinized for safety, particularly the potential for generating harmful or untruthful content. The paper "SDEval: Safety Dynamic Evaluation for Multimodal LLMs" proposes a novel evaluation framework designed to assess and improve the safety of MLLMs through dynamic evaluation methodologies.
Introduction to SDEval
SDEval addresses the limitations of existing static safety benchmarks by introducing a dynamic evaluation framework capable of adapting to the rapidly evolving field of MLLMs. The framework introduces three primary dynamic strategies: text dynamics, image dynamics, and text-image dynamics, each designed to generate new samples from existing benchmarks, thereby creating a more challenging and relevant evaluation environment.
Figure 1: Dynamic Evaluation vs Static Evaluation. Dynamic evaluation can generate diverse variants from static benchmarks with flexibly adjustable complexity.
Dynamic Evaluation Strategies
Text Dynamics
Text dynamics focus on assessing whether MLLMs can correctly interpret safety risks expressed through varied linguistic structures. This involves strategies such as word replacement, sentence paraphrasing, adding descriptive text, introducing typos, linguistic mix, and utilizing chain-of-thought prompts. These techniques test the model's robustness in understanding safety-related language nuances.
Image Dynamics
Image dynamics aim to evaluate the MLLM's ability to recognize safety hazards presented visually. Techniques include basic augmentations (spatial and color transformations) and advanced manipulations (object and text insertion, generative transformations) to assess the model's consistency in identifying harmful content in altered visual data.
Text-Image Dynamics
This approach evaluates the model's ability to process and understand safety risks through complex interactions between text and image modalities. By blending text and image dynamics, SDEval assesses the model's capability to maintain safety in multimodal contexts, which is crucial for catching vulnerabilities exposed by cross-modal jailbreaking methods like Figstep and HADES.
Figure 2: The whole framework of SDEval. Specifically, the dynamic generation process of SDEval consists of text dynamics, image dynamics, and text-image dynamics.
Experimental Results
The framework was tested against established safety benchmarks such as MLLMGuard and VLSBench. The results, as illustrated in Table 1 and Table 2, highlight the efficacy of dynamic strategies in significantly altering model evaluations. For instance, models like InternVL experienced a notable safety performance degradation of nearly 10%, underscoring the substantial impact of dynamic approaches on safety evaluation.
Capability Evaluation and Balance
Beyond safety, SDEval's dynamic strategies were also applied to capability evaluations using benchmarks such as MMVet and MMBench. The results confirmed that SDEval effectively increases evaluation complexity, impacting model performance across both safety and capability metrics. This raises essential considerations for maintaining a balance, as posited by the AI 45∘ Law, which advocates for harmonious development in safety and performance.

Figure 3: We present the balance scatter plot between MLLM capability and safety under the AI 45∘ Law.
Conclusion
SDEval offers a flexible, dynamic framework for assessing and improving the safety evaluation of MLLMs. By integrating diverse generative dynamics, SDEval not only mitigates data leakage and static dataset limitations but also aligns with the ongoing advancements in MLLM development. The framework's robust design and ability to extend to capability evaluation mark it as a pivotal tool in ensuring the secure and effective deployment of MLLMs.
Overall, SDEval presents a significant advancement in the safety evaluation of multimodal models, offering insights and methodologies crucial for the future development of trustworthy AI systems.