Unresolved defense against joint-modal implicit attacks
Establish effective detection and defense mechanisms for joint-modal implicit malicious attacks against multimodal large language models, where the image and text are each individually benign but jointly express harmful intent, ensuring consistent robustness across diverse and out-of-domain settings.
References
Unfortunately, this emerging threat remains largely unresolved, as shown in Figure 1 (d).
— CrossGuard: Safeguarding MLLMs against Joint-Modal Implicit Malicious Attacks
(2510.17687 - Zhang et al., 20 Oct 2025) in Section 1: Introduction