Papers
Topics
Authors
Recent
2000 character limit reached

T2VSafetyBench-TI2V: Multimodal Safety Benchmark

Updated 1 December 2025
  • The benchmark extends traditional T2V safety evaluation by incorporating image prompts to assess new cross-modal risks in video generation.
  • It tests modality-specific and compositional vulnerabilities through three defined scenarios: unsafe image with unsafe text, safe image with unsafe text, and unsafe image with safe text.
  • The dataset, derived from Tiny-T2VSafetyBench, includes 2085 prompt instances across 14 risk categories to facilitate proactive safety detection before video synthesis.

T2VSafetyBench-TI2V is a benchmark designed for evaluating the safety of Text-and-Image-to-Video (TI2V) generative models, specifically addressing the detection and proactive mitigation of multimodal safety risks. It extends concepts from the text-to-video (T2V) safety evaluation domain to account for the unique compositionalities and interactions that arise when both text and image modalities are combined as prompts in state-of-the-art video generation systems. Developed in the context of emerging models such as CogVideoX and frameworks like ConceptGuard, T2VSafetyBench-TI2V systematically tests the ability of TI2V models to identify and resist unsafe content generation before video synthesis, covering a broad spectrum of social, legal, and ethical risk categories (Ma et al., 24 Nov 2025).

1. Motivation and Distinction from Prior T2V Safety Work

T2VSafetyBench-TI2V directly responds to the limitations of text-only safety benchmarks by accounting for risks introduced by image prompts and their potential interaction with text. Modern TI2V systems accept both natural language descriptions and visual cues, introducing new vectors for “visual jailbreaks,” where innocuous text can be subverted by suggestive or illicit images, or vice versa. Unlike T2VSafetyBench, which targets the T2V setting with a focus on temporally emergent risks (scenes that appear innocuous frame-wise but are harmful in sequence), T2VSafetyBench-TI2V is built to systematically pose cross-modal risks, enabling evaluation of both modality-specific and compositional vulnerabilities. This approach permits zero-shot detection stress tests in anticipatory safety pipelines, a requirement largely unmet in traditional post-generation auditing workflows (Miao et al., 8 Jul 2024, Ma et al., 24 Nov 2025).

2. Benchmark Structure and Scenario Design

The core of T2VSafetyBench-TI2V is its explicit modeling of three “cross-modal risk” scenarios, each constituting a distinct form of unsafe content presentation:

  1. Unsafe Image + Unsafe Text (I←T-U): Both modalities specify or imply harmful content, often reinforcing the targeted risk.
  2. Safe Image + Unsafe Text (SI+UT): The text alone encodes the risk category, with the image remaining benign.
  3. Unsafe Image + Safe Text (UI+ST): Only the visual reference is hazardous, with the textual input being an innocuous or sanitized rewrite.

These scenarios support evaluation of not only modality-specific vulnerabilities but also emergent risks from text-image alignment and interaction. All models are evaluated for abilities in proactive, pre-generation detection—i.e., flagging or blocking risky prompt pairs before video synthesis (Ma et al., 24 Nov 2025).

3. Dataset Construction and Metadata

T2VSafetyBench-TI2V builds upon the Tiny-T2VSafetyBench subset, originally a curated dataset of 695 unsafe text prompts spanning 14 fine-grained content risks. Each text prompt undergoes:

  • Synthesis of an Unsafe-Image Prompt using Grok-3, designed to describe a static scene embodying the same (or analogous) risk.
  • Generation of a corresponding Safe-Image Prompt, paired to a sanitized rewrite of the text.
  • Construction of a Safe-Text Prompt, manually rewritten to ensure removal of any hazardous semantic content.

Images are generated via Stable Diffusion 3.5 at 1024×10241024 \times 1024 resolution and curated for fidelity and alignment with the intended risk. Each text prompt thus yields three test pairs, reflecting the cross-modal scenarios above. The final test set comprises 695×3=2085695 \times 3 = 2085 prompt instances. Category representation per 695 prompts is captured in the table below.

Risk Category Prompt Count
Pornography 85
Borderline Pornography 55
Violence 45
Gore 61
Disturbing Content 43
Public Figures 27
Discrimination 50
Political Sensitivity 58
Copyright/Trademark 21
Illegal Activities 50
Misinformation 38
Sequential Action Risk 55
Dynamic Variation Risk 35
Coherent Contextual Risk 72

Annotation is ground-truth by construction; that is, risk labels inherit directly from whether the underlying prompt (text or image) is “safe” or “unsafe” as per manual rewriting and curation. There is no reliance on post-hoc human rating or inter-annotator agreement since the data is generated and curated to directly instantiate or withhold harmful content (Ma et al., 24 Nov 2025).

4. Risk Taxonomy and Definitions

T2VSafetyBench-TI2V evaluates over an extended taxonomy of 14 safety-relevant categories, adapted from the T2VSafetyBench protocol but including augmented or refined definitions for the TI2V context. These include:

  • Pornography: Explicit sexual acts or nudity.
  • Borderline Pornography: Partial nudity or content with strong erotic undertones.
  • Violence: Depiction or implication of physical harm, including via threats or weaponry.
  • Gore: Visible wounds, blood, or dismemberment.
  • Disturbing Content: Visually traumatic or unsettling scenes outside the scope of explicit gore.
  • Public Figures: Representations violating privacy or defaming real individuals.
  • Discrimination: Hate speech, slurs, or strongly biased depictions.
  • Political Sensitivity: Promotion of extremist ideologies or interference in democratic processes.
  • Copyright/Trademark: Use or imitation of protected intellectual property or branding.
  • Illegal Activities, Misinformation, Sequential Action Risk, Dynamic Variation Risk, Coherent Contextual Risk: These categories
Slide Deck Streamline Icon: https://streamlinehq.com

Whiteboard

Forward Email Streamline Icon: https://streamlinehq.com

Follow Topic

Get notified by email when new papers are published related to T2VSafetyBench-TI2V.