Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
129 tokens/sec
GPT-4o
28 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Adversarial Illusions in Multi-Modal Embeddings (2308.11804v4)

Published 22 Aug 2023 in cs.CR, cs.AI, and cs.LG

Abstract: Multi-modal embeddings encode texts, images, thermal images, sounds, and videos into a single embedding space, aligning representations across different modalities (e.g., associate an image of a dog with a barking sound). In this paper, we show that multi-modal embeddings can be vulnerable to an attack we call "adversarial illusions." Given an image or a sound, an adversary can perturb it to make its embedding close to an arbitrary, adversary-chosen input in another modality. These attacks are cross-modal and targeted: the adversary can align any image or sound with any target of his choice. Adversarial illusions exploit proximity in the embedding space and are thus agnostic to downstream tasks and modalities, enabling a wholesale compromise of current and future tasks, as well as modalities not available to the adversary. Using ImageBind and AudioCLIP embeddings, we demonstrate how adversarially aligned inputs, generated without knowledge of specific downstream tasks, mislead image generation, text generation, zero-shot classification, and audio retrieval. We investigate transferability of illusions across different embeddings and develop a black-box version of our method that we use to demonstrate the first adversarial alignment attack on Amazon's commercial, proprietary Titan embedding. Finally, we analyze countermeasures and evasion attacks.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (35)
  1. (Ab)using images and sounds for indirect instruction injection in multi-modal LLMs. arXiv:2307.10490, 2023.
  2. Nicholas Carlini et al. Are aligned neural networks adversarially aligned? arXiv:2306.15447, 2023.
  3. Alexey Dosovitski et al. An image is worth 16x16 words: Transformers for image recognition at scale. In ICML, 2021.
  4. R-LPIPS: An adversarially robust perceptual similarity metric. arXiv:2307.15157, 2023.
  5. ImageBind: One embedding space to bind them all. In CVPR, 2023.
  6. Explaining and harnessing adversarial examples. In ICLR, 2015.
  7. Sven Gowal et al. On the effectiveness of interval bound propagation for training verifiably robust models. In NIPS Workshops, 2018.
  8. Simple black-box adversarial attacks. In ICML, 2019.
  9. Black-box adversarial attacks with limited queries and information. In ICML, 2018.
  10. Adversarial examples are not bugs, they are features. In NeurIPS, 2019.
  11. Certified robustness to adversarial word substitutions. In EMNLP, 2019.
  12. Audiocaps: Generating captions for audios in the wild. In ACL, 2019.
  13. Adversarial self-supervised contrastive learning. In NeurIPS, 2020.
  14. Adversarial examples in the physical world. In ICLR, 2017.
  15. Mind the gap: Understanding the modality gap in multi-modal contrastive representation learning. In NeurIPS, 2022.
  16. BindDiffusion: One diffusion model to bind them all. https://github.com/sail-sg/BindDiffusion/tree/main, 2023.
  17. Feature distillation: DNN-oriented JPEG compression against adversarial examples. In CVPR, 2019.
  18. Towards deep learning models resistant to adversarial attacks. In ICLR, 2017.
  19. Adversarial training methods for semi-supervised text classification. In ICLR, 2017.
  20. Representation learning with contrastive predictive coding. arXiv:1807.03748, 2018.
  21. Visual adversarial examples jailbreak large language models. arXiv:2306.13213, 2023.
  22. Learning transferable visual models from natural language supervision. In ICML, 2021.
  23. Certified defenses against adversarial examples. In ICLR, 2018.
  24. Hierarchical text-conditional image generation with CLIP latents. arXiv:2204.06125, 2022.
  25. ImageNet large scale visual recognition challenge. IJCV, 2015.
  26. Defense-GAN: Protecting classifiers against adversarial attacks using generative models. In ICLR, 2018.
  27. Adversarial training for free! NeurIPS, 2019.
  28. Plug and pray: Exploiting off-the-shelf components of multi-modal models. arXiv:2307.14539, 2023.
  29. JPEG-resistant adversarial images. In NIPS Workshop, 2017.
  30. Adversarial semantic collisions. In EMNLP, 2020.
  31. PandaGPT: One model to instruction-follow them all. arXiv:2305.16355, 2023.
  32. Adversarial training and robustness for multiple perturbations. In NeurIPS, 2019.
  33. Towards improving adversarial training of NLP models. In EMNLP, 2021.
  34. Adversarial contrastive learning via asymmetric InfoNCE. In ECCV, 2022.
  35. Downstream-agnostic adversarial examples. In ICCV, 2023.
Citations (4)

Summary

  • The paper demonstrates that small, imperceptible perturbations can realign multi-modal embeddings to arbitrary targets, unveiling a new adversarial threat.
  • It employs white-box, transfer-based, and query-based attacks to achieve up to a 99% success rate in misaligning inputs in models like ImageBind and AudioCLIP.
  • The findings highlight the urgent need for robust defenses in multi-modal systems to safeguard against adversarial exploitation in practical applications.

An Analysis of Adversarial Illusions in Multi-Modal Embeddings

The paper "Adversarial Illusions in Multi-Modal Embeddings" presents a paper on the vulnerabilities of multi-modal embeddings to adversarial attacks. Specifically, it introduces the concept of "adversarial illusions" where inputs from one modality, such as an image or sound, are perturbed to align closely with an arbitrary input from another modality in the embedding space. The implications of this research are critical as multi-modal models become more integral to various machine learning pipelines, highlighting an underexplored attack surface within these systems.

Multi-Modal Embeddings and Their Vulnerabilities

Multi-modal embeddings are designed to encode various input types—such as text, images, and sounds—into a common embedding space. The primary objective is to align related inputs across different modalities, like associating an image of a dog with a barking sound. Modern encoders such as ImageBind and AudioCLIP achieve this through contrastive learning on large datasets, yielding embeddings that support a wide range of downstream tasks without explicit training for those tasks.

However, this paper identifies a critical vulnerability: adversarial perturbations can exploit the inherent cross-modal alignment by shifting an input's representation to match any desired target within another modality. This cross-modal adversarial alignment bypasses the natural and emergent relationships the embeddings aim to establish and compromise multiple downstream tasks and modalities.

Research Methodology and Findings

The authors develop and test their attack method across various threat models: white-box, transfer-based, query-based, and hybrid attacks. For white-box attacks, where model internals are fully accessible, they demonstrate that small, imperceptible perturbations can achieve misalignment with high accuracy. In scenarios with transfer-based attacks, adversarial examples crafted using surrogate models effectively transferred to target models, achieving high attack success rates.

The paper also introduces query-based attacks, essential when dealing with proprietary models accessible only via API. Even with limited queries, these attacks yielded significant adversarial impact. Evaluating the effectiveness of adversarial illusions on both open-source embeddings (ImageBind, AudioCLIP) and a black-box commercial embedding (Amazon's Titan) confirmed the vulnerability across diverse environments. For example, on zero-shot tasks using ImageBind and AudioCLIP, adversarial success rates often exceeded 99% with minimal perturbation.

Theoretical and Practical Implications

The theoretical implication of this paper lies in its demonstration that the embedding space's alignment can be exploited across modalities and future downstream tasks, raising questions about the robustness of multi-modal models. Practically, these findings necessitate urgent consideration of security mechanisms in multi-modal systems to prevent adversarial exploits.

In conclusion, the paper presents a clear challenge for continued research in machine learning security. The results encourage further exploration of robust embedding techniques and the development of effective defenses against adversarial attacks. Defensive strategies like feature distillation, anomaly detection, and certifiable robustness are discussed, but their limitations against adaptive adversaries highlight the complexity of securing multi-modal systems. Researchers and practitioners must collaborate closely to develop solutions that enhance the resilience of models to such vulnerabilities, thus ensuring the safe integration of multi-modal embeddings in practical applications.

Github Logo Streamline Icon: https://streamlinehq.com