- The paper introduces a self-evolving multimodal jailbreak framework that dynamically coordinates text and image attacks to bypass I2V model safety mechanisms.
- It employs a hierarchical system with SACU, MTPU, and TAU modules that refine attack strategies through reinforcement learning and memory-based retrieval.
- Empirical results show up to 96% attack success on various benchmarks, highlighting the framework's efficacy for vulnerability analysis in generative AI.
RunawayEvil: A Self-Evolving Multimodal Jailbreak System for Image-to-Video Generative Models
Introduction and Motivation
This paper introduces "RunawayEvil: Jailbreaking the Image-to-Video Generative Models" (2512.06674), a comprehensive framework targeting the underexplored vulnerabilities of image-to-video (I2V) generative models. Contemporary I2V systems, widely utilized in creative, commercial, and entertainment domains, integrate cross-modal (image and text) inputs to synthesize dynamic video content. While jailbreak attacks against text-to-image (T2I) and text-to-video (T2V) models are increasingly studied, these approaches remain largely ineffective against I2V models due to static attack patterns, unimodal perturbations, and manual prompt engineering which fail to exploit cross-modal synergies and evade integrated safety mechanisms.
RunawayEvil directly addresses these technical gaps by proposing a self-evolving multimodal jailbreak framework that dynamically amplifies its attack strategies, executes coordinated perturbations on both text and image modalities, and iteratively circumvents sophisticated cross-modal security defenses.
Figure 1: Visualization of successful jailbreaks using RunawayEvil, which unleashes the full potential of multimodal jailbreaks.
RunawayEvil Framework and Architecture
The RunawayEvil pipeline is structured around a "Strategy-Tactic-Action" paradigm, reflecting a hierarchical architecture comprised of three interlinked modules:
- Strategy-Aware Command Unit (SACU): Functions as the decision core, employing RL to customize optimal attack strategies based on semantic and visual input features while utilizing an LLM-based agent to autonomously explore and expand its strategy space using a curated memory bank of successful attacks. SACU’s self-evolution mechanism systematically breaks free from static designs, enhancing both adaptivity and attack efficacy.
- Multimodal Tactical Planning Unit (MTPU): Operationalizes SACU’s selected strategies by generating coherent sets of adversarial textual prompts and corresponding image manipulation instructions. By integrating memory-augmented retrieval from successful attack records, MTPU preserves cross-modal coordination and maximizes the likelihood of bypassing multimodal safety checks.
- Tactical Action Unit (TAU): Executes the multimodal attack instructions and leverages an MLLM-based video safety evaluator to provide tactical feedback, thereby closing the adaptive attack loop.
Figure 2: RunawayEvil’s multimodal jailbreak framework with an adaptive closed-loop among SACU, MTPU, and TAU for effective cross-modal jailbreaks against I2V models.
The closed-loop collaboration among these modules leverages cross-modal feedback and dynamic strategy refinement, allowing the system to intensify its attack with each iteration without human intervention.
SACU: Self-Evolutionary Mechanism
SACU’s self-evolutionary process unfolds in two sequential phases:
This architecture eliminates manual prompt engineering and increases the flexibility and adaptivity of jailbreaking tactics, effectively scaling with the diversity and complexity of input distributions.
Multimodal Jailbreak Execution
Upon convergence of SACU’s evolution, RunawayEvil performs coordinated multimodal attacks by:
- Selecting input-aware strategies for text and image perturbations.
- Generating attack instructions using both contemporary strategy and historical context.
- Iteratively executing image edits and prompt mutations through TAU until the MLLM-based safety evaluator deems the output unsafe.
This iterative execution exploits feedback from safety evaluators, dynamically adjusting attack vectors in response to I2V model security mechanisms.



Figure 4: Comparative visualization of jailbreak performance across multiple methods, highlighting RunawayEvil’s capacity to induce unsafe video outputs in I2V models.
Empirical Results and Quantitative Findings
Extensive experiments span four open-source I2V architectures (Open-Sora 2.0, CogVideoX, Wan2.2-TI2V-5B, DynamiCrafter) and multiple benchmarks (COCO2017, JailBreakV-28K, MM-SafetyBench). Key results include:
- Attack Success Rate (ASR): RunawayEvil achieves up to 93.0% ASR on COCO2017 and 96.0% on MM-SafetyBench, outperforming extended T2I/T2V attack baselines by 58.5%–79%. Baselines rarely exceed 50% ASR in unimodal I2V transfers, establishing RunawayEvil’s multistep, cross-modal approach as decisively superior.
- Cross-Modal Synergy: Unimodal attacks (text-only/image-only) yield ASR below 52%, while dual-modal independent attacks improve modestly. RunawayEvil's collaborative, adaptive approach delivers the highest ASR, confirming the importance of cross-modal feedback and joint perturbation.
Iterative analysis shows ASR consistently increases with attack rounds for all evaluators.
Figure 5: ASR increases with the number of attack iterations under both Qwen-VL and LLaVA-Next safety evaluators; multi-stage feedback enhances robustness and success rates.
Parameter sensitivity studies indicate that increased text/image concealment slightly decrease attack stealth (NSFW index, LPIPS) but only marginally affect ASR, demonstrating that subtle cross-modal adversarial edits do not compromise efficacy.
Figure 6: Parameter sensitivity analysis reveals the impact of concealment levels on ASR, NSFW indices, and LPIPS across Wan2.2-TI2V-5B and DynamiCrafter.
Ablation and Module Analysis
Ablation studies verify the synergistic value of SACU’s internal agents:
- The Strategy Memory Bank boosts adaptation via historical context.
- The RL-enhanced Strategy Customization Agent improves adaptability over randomized mechanisms.
- The Strategy Exploration Agent continuously expands tactical diversity.
Feedback-driven adaptive iteration decisively improves attack efficacy compared to fixed-step or unimodal approaches.
Implications and Outlook
RunawayEvil establishes a robust multimodal adversarial framework for real-world vulnerability exploration of I2V models, setting a new technical standard for security benchmarking in large-scale video generation. Its adaptive, closed-loop design anticipates future advancements in I2V safety filter techniques and can drive research in robust multimodal defenses, adversarial training, and anomaly detection. The paradigm’s reliance on RL and experience-driven expansion primes it for integration into automated model red-teaming pipelines.
Practically, the methodology enables model builders and security researchers to probe multimodal safety vulnerabilities systematically, guiding the design of more resilient cross-modal defense mechanisms. Theoretically, the results highlight the necessity of joint-modal adversarial thinking and reward-based strategy evolution in red-teaming next-generation generative models.
Conclusion
RunawayEvil presents the first self-evolving multimodal jailbreak framework for I2V generative models, integrating dynamic strategy discovery, adaptive RL customization, and cross-modal tactical execution. Empirical results demonstrate robust generalization and substantial improvements in attack success rates over existing unimodal approaches. The framework provides an essential tool for vulnerability analysis and guides future research on secure multimodal generative AI systems.