Generalizing robustness across semantic and modality perturbations in multimodal LLMs

Develop mechanisms for Multimodal Large Language Models that generalize across both semantic perturbations (e.g., misleading or irrelevant content in text or images) and modality perturbations (e.g., input- or embedding-level noise), so that robustness is maintained without relying on task-specific or attack-specific defenses.

Background

The paper diagnoses modality interference in Multimodal LLMs and proposes a fine-tuning framework combining heuristic and adversarial perturbations with consistency regularization. While this approach improves robustness, the authors emphasize that PGD-based adversarial training is a partial solution confined to a bounded embedding-level perturbation space.

In the Limitations section, the authors highlight the need for defenses that can uniformly handle both semantic perturbations (e.g., misleading descriptions or irrelevant images) and modality perturbations, acknowledging that designing perturbations is inherently open-ended and that current methods do not comprehensively address all forms of interference.

References

Developing mechanisms that generalize across both semantic and modality perturbations remains an open and challenging direction.

Diagnosing and Mitigating Modality Interference in Multimodal Large Language Models  (2505.19616 - Cai et al., 26 May 2025) in Appendix, Section Limitations