Adversarial Illusions in Multi-Modal Embeddings (2308.11804v4)

Published 22 Aug 2023 in cs.CR, cs.AI, and cs.LG

Abstract: Multi-modal embeddings encode texts, images, thermal images, sounds, and videos into a single embedding space, aligning representations across different modalities (e.g., associate an image of a dog with a barking sound). In this paper, we show that multi-modal embeddings can be vulnerable to an attack we call "adversarial illusions." Given an image or a sound, an adversary can perturb it to make its embedding close to an arbitrary, adversary-chosen input in another modality. These attacks are cross-modal and targeted: the adversary can align any image or sound with any target of his choice. Adversarial illusions exploit proximity in the embedding space and are thus agnostic to downstream tasks and modalities, enabling a wholesale compromise of current and future tasks, as well as modalities not available to the adversary. Using ImageBind and AudioCLIP embeddings, we demonstrate how adversarially aligned inputs, generated without knowledge of specific downstream tasks, mislead image generation, text generation, zero-shot classification, and audio retrieval. We investigate transferability of illusions across different embeddings and develop a black-box version of our method that we use to demonstrate the first adversarial alignment attack on Amazon's commercial, proprietary Titan embedding. Finally, we analyze countermeasures and evasion attacks.

References (35)

Citations (4)

View on Semantic Scholar

Summary

The paper demonstrates that small, imperceptible perturbations can realign multi-modal embeddings to arbitrary targets, unveiling a new adversarial threat.
It employs white-box, transfer-based, and query-based attacks to achieve up to a 99% success rate in misaligning inputs in models like ImageBind and AudioCLIP.
The findings highlight the urgent need for robust defenses in multi-modal systems to safeguard against adversarial exploitation in practical applications.

An Analysis of Adversarial Illusions in Multi-Modal Embeddings

The paper "Adversarial Illusions in Multi-Modal Embeddings" presents a paper on the vulnerabilities of multi-modal embeddings to adversarial attacks. Specifically, it introduces the concept of "adversarial illusions" where inputs from one modality, such as an image or sound, are perturbed to align closely with an arbitrary input from another modality in the embedding space. The implications of this research are critical as multi-modal models become more integral to various machine learning pipelines, highlighting an underexplored attack surface within these systems.

Multi-Modal Embeddings and Their Vulnerabilities

Multi-modal embeddings are designed to encode various input types—such as text, images, and sounds—into a common embedding space. The primary objective is to align related inputs across different modalities, like associating an image of a dog with a barking sound. Modern encoders such as ImageBind and AudioCLIP achieve this through contrastive learning on large datasets, yielding embeddings that support a wide range of downstream tasks without explicit training for those tasks.

However, this paper identifies a critical vulnerability: adversarial perturbations can exploit the inherent cross-modal alignment by shifting an input's representation to match any desired target within another modality. This cross-modal adversarial alignment bypasses the natural and emergent relationships the embeddings aim to establish and compromise multiple downstream tasks and modalities.

Research Methodology and Findings

The authors develop and test their attack method across various threat models: white-box, transfer-based, query-based, and hybrid attacks. For white-box attacks, where model internals are fully accessible, they demonstrate that small, imperceptible perturbations can achieve misalignment with high accuracy. In scenarios with transfer-based attacks, adversarial examples crafted using surrogate models effectively transferred to target models, achieving high attack success rates.

The paper also introduces query-based attacks, essential when dealing with proprietary models accessible only via API. Even with limited queries, these attacks yielded significant adversarial impact. Evaluating the effectiveness of adversarial illusions on both open-source embeddings (ImageBind, AudioCLIP) and a black-box commercial embedding (Amazon's Titan) confirmed the vulnerability across diverse environments. For example, on zero-shot tasks using ImageBind and AudioCLIP, adversarial success rates often exceeded 99% with minimal perturbation.

Theoretical and Practical Implications

The theoretical implication of this paper lies in its demonstration that the embedding space's alignment can be exploited across modalities and future downstream tasks, raising questions about the robustness of multi-modal models. Practically, these findings necessitate urgent consideration of security mechanisms in multi-modal systems to prevent adversarial exploits.

In conclusion, the paper presents a clear challenge for continued research in machine learning security. The results encourage further exploration of robust embedding techniques and the development of effective defenses against adversarial attacks. Defensive strategies like feature distillation, anomaly detection, and certifiable robustness are discussed, but their limitations against adaptive adversaries highlight the complexity of securing multi-modal systems. Researchers and practitioners must collaborate closely to develop solutions that enhance the resilience of models to such vulnerabilities, thus ensuring the safe integration of multi-modal embeddings in practical applications.

PDF Markdown

Related Papers

GitHub

GitHub - ebagdasa/adversarial_illusions: Code for "Adversarial Illusions in Multi-Modal Embeddings" (24 stars)

Tweets

https://twitter.com/ebagdasa/status/1824898472350257611

https://twitter.com/ebagdasa/status/1909964957711483386