Diffusion Models for Imperceptible and Transferable Adversarial Attack (2305.08192v2)

Published 14 May 2023 in cs.CV

Abstract: Many existing adversarial attacks generate $L_p$-norm perturbations on image RGB space. Despite some achievements in transferability and attack success rate, the crafted adversarial examples are easily perceived by human eyes. Towards visual imperceptibility, some recent works explore unrestricted attacks without $L_p$-norm constraints, yet lacking transferability of attacking black-box models. In this work, we propose a novel imperceptible and transferable attack by leveraging both the generative and discriminative power of diffusion models. Specifically, instead of direct manipulation in pixel space, we craft perturbations in the latent space of diffusion models. Combined with well-designed content-preserving structures, we can generate human-insensitive perturbations embedded with semantic clues. For better transferability, we further "deceive" the diffusion model which can be viewed as an implicit recognition surrogate, by distracting its attention away from the target regions. To our knowledge, our proposed method, DiffAttack, is the first that introduces diffusion models into the adversarial attack field. Extensive experiments on various model structures, datasets, and defense methods have demonstrated the superiority of our attack over the existing attack methods.

References (80)

Authors (6)

Jianqi Chen (18 papers)
Hao Chen (1006 papers)
Keyan Chen (34 papers)
Yilan Zhang (10 papers)
Zhengxia Zou (52 papers)
Zhenwei Shi (77 papers)

Citations (32)

View on Semantic Scholar

Summary

Insights into "Diffusion Models for Imperceptible and Transferable Adversarial Attack"

The paper "Diffusion Models for Imperceptible and Transferable Adversarial Attack" introduces a novel approach to adversarial attacks by leveraging the inherent generative and discriminative capabilities of diffusion models. The proposed method, termed DiffAttack, seeks to address the perennial trade-off in adversarial machine learning between attack imperceptibility and transferability. The work stands out by taking a departure from the conventional $L_p$ -norm-based methodologies, which have historically stuck to direct pixel manipulation, and instead initiates perturbations within the latent space of diffusion models.

Methodological Advancements

DiffAttack's framework capitalizes on the intrinsic characteristics of diffusion models, originally conceptualized for image synthesis, to generate perturbations that maintain human invisibility while encapsulating robust semantic content. This approach leverages two crucial properties of diffusion models: their ability to produce visually natural outputs and their implicit recognition ability when trained on extensive datasets, which aids transfer-based attacks across varied models and defenses.

The method involves manipulating the latent space rather than directly altering pixel values. This manipulation is complemented by content-preserving structures ensuring that the resultant adversarial examples maintain the initial semantic intent of the source images. Additionally, through the minimization of variance in cross-attention maps between text and image pixels, DiffAttack can "deceive" the diffusion model into becoming an implicit surrogate, thus improving the transferability of the attack across black-box models.

Experimental Insights

The research demonstrates DiffAttack's superiority through comprehensive evaluations across a broad spectrum of network architectures, including CNNs, Transformers, and MLPs, using datasets such as ImageNet-Compatible, CUB-200-2011, and Stanford Cars. The empirical results indicate that DiffAttack consistently outperforms traditional pixel-based and other unrestricted attack approaches in both transferability and human imperceptibility, as quantified by top-1 accuracy and FID scores. Notably, while the diffusion-based approach shows a decrease in white-box attack success compared to pixel-based attacks, its robustness against various defenses underscores its practicality and the nuanced effectiveness of its latent space attacks.

Theoretical and Practical Implications

DiffAttack introduces a significant methodological shift by incorporating diffusion models into adversarial attacks, which opens a pathway to profoundly rethink attack strategies and defense mechanisms in machine learning systems. This paper's findings imply a multidimensional expansion of the attack surface that goes beyond traditional pixel perturbations, embracing more abstract manipulations that resonate with both high-fidelity image representation and cross-model applicability.

The research suggests intriguing future investigations into diffusion models and their role in adversarial machine learning. By advancing adversarial perturbation techniques that are less perceptible yet more transferable, this paper challenges the current understanding and opens the possibility for more resilient AI systems capable of defending against sophisticated macro-level attacks.

In sum, the methodological framework laid out in this paper not only stretches the boundaries of adversarial attack strategies but also signifies a burgeoning trend where synthesis-centric models like diffusion models are increasingly being exploited for adversarial applications, thus calling for heightened vigilance in AI safety and resilience discussions.

PDF Markdown

GitHub

GitHub - WindVChen/DiffAttack: An unrestricted attack based on diffusion models that can achieve both good transferability and imperceptibility. (218 stars)

Tweets

https://twitter.com/mallow610/status/1770674366251925929