Unresolved defense against joint-modal implicit attacks

Establish effective detection and defense mechanisms for joint-modal implicit malicious attacks against multimodal large language models, where the image and text are each individually benign but jointly express harmful intent, ensuring consistent robustness across diverse and out-of-domain settings.

Background

The paper emphasizes that existing defenses primarily address explicit attacks confined to a single modality, leaving systems vulnerable to joint-modal implicit threats, where harmful intent emerges only from the combined interpretation of benign-looking image and text inputs.

The authors explicitly state that this emerging threat is largely unresolved, motivating their contributions (ImpForge for data generation and CrossGuard for defense). The open problem concerns achieving reliable, comprehensive defenses that consistently detect and refuse such implicit multimodal attacks across benchmarks and practical scenarios.

References

Unfortunately, this emerging threat remains largely unresolved, as shown in Figure 1 (d).

— CrossGuard: Safeguarding MLLMs against Joint-Modal Implicit Malicious Attacks (2510.17687 - Zhang et al., 20 Oct 2025) in Section 1: Introduction

Unresolved defense against joint-modal implicit attacks

Background

References

Related Problems