Generating Adversarial Examples with Adversarial Networks (1801.02610v5)

Published 8 Jan 2018 in cs.CR, cs.CV, and stat.ML

Abstract: Deep neural networks (DNNs) have been found to be vulnerable to adversarial examples resulting from adding small-magnitude perturbations to inputs. Such adversarial examples can mislead DNNs to produce adversary-selected results. Different attack strategies have been proposed to generate adversarial examples, but how to produce them with high perceptual quality and more efficiently requires more research efforts. In this paper, we propose AdvGAN to generate adversarial examples with generative adversarial networks (GANs), which can learn and approximate the distribution of original instances. For AdvGAN, once the generator is trained, it can generate adversarial perturbations efficiently for any instance, so as to potentially accelerate adversarial training as defenses. We apply AdvGAN in both semi-whitebox and black-box attack settings. In semi-whitebox attacks, there is no need to access the original target model after the generator is trained, in contrast to traditional white-box attacks. In black-box attacks, we dynamically train a distilled model for the black-box model and optimize the generator accordingly. Adversarial examples generated by AdvGAN on different target models have high attack success rate under state-of-the-art defenses compared to other attacks. Our attack has placed the first with 92.76% accuracy on a public MNIST black-box attack challenge.

Authors (6)

Chaowei Xiao (110 papers)
Bo Li (1107 papers)
Jun-Yan Zhu (80 papers)
Warren He (8 papers)
Mingyan Liu (70 papers)
Dawn Song (229 papers)

Citations (837)

View on Semantic Scholar

Summary

Generating Adversarial Examples with Adversarial Networks: An Analytical Overview

The paper "Generating Adversarial Examples with Adversarial Networks" by Chaowei Xiao, Bo Li, Jun-Yan Zhu, Warren He, Mingyan Liu, and Dawn Song introduces AdvGAN, an advanced approach to creating adversarial examples utilizing Generative Adversarial Networks (GANs). This work focuses on enhancing the efficiency and perceptual realism of adversarial examples, aiming for high attack success rates against state-of-the-art defenses in both semi-whitebox and black-box attack settings.

Motivation and Problem Definition

Deep Neural Networks (DNNs), while highly successful across various domains, remain vulnerable to adversarial examples—inputs modified through small perturbations to yield incorrect outputs. Traditional adversarial attack methods—such as Fast Gradient Sign Method (FGSM) or optimization-based approaches—often optimize perturbations individually and typically require constant access to the target model, limiting their practicality and efficiency.

AdvGAN addresses these challenges by leveraging GANs to generate perturbations, ensuring perceptual realism and operational efficiency. The AdvGAN framework comprises a generator that learns to produce perturbations and a discriminator that maintains the realism of the generated examples. This setup allows the generator, once trained, to produce adversarial perturbations swiftly without necessitating continuous access to the target model.

AdvGAN Framework and Attack Strategies

The AdvGAN architecture features three primary components:

Generator ( $G$ ): Outputs perturbations when fed with original instances.
Discriminator ( $D$ ): Discerns whether inputs are genuine or perturbed.
Target Network ( $f$ ): Used for computing adversarial loss.

The training objective for AdvGAN combines adversarial loss (ensuring misclassification by the target network) and GAN loss (maintaining perceptual similarity to real instances). This combined objective allows for the generation of realistic adversarial examples that efficiently fool target models.

Semi-Whitebox and Black-Box Attacks

In the semi-whitebox setting, the generator, once trained, can produce perturbations without further model access, allowing for efficient attack initialization. AdvGAN demonstrates high attack success rates in this scenario, assessed on datasets such as MNIST and CIFAR-10.

In the black-box context, AdvGAN employs a distillation strategy to create a surrogate model approximating the unknown target model. Through static and dynamic distillation methods, which incrementally improve the generator using queries to the target model, AdvGAN achieves significant success, far surpassing traditional transferability-based attacks.

Experimental Results and Findings

Empirical evaluations highlight AdvGAN's superior performance across several metrics:

Success Rates: AdvGAN consistently achieves high attack success rates under semi-whitebox and black-box conditions.
State-of-the-Art Defenses: Against robust defense mechanisms like adversarial training (Adv.), ensemble adversarial training (Ens.), and iterative adversarial training (Iter.Adv.), AdvGAN-generated examples maintain higher attack success rates compared to FGSM and optimization-based methods.
Efficiency: AdvGAN's generation process is notably faster due to its feed-forward architecture, suitable for large-scale or real-time applications.

Implications and Future Directions

The implications of AdvGAN are multifold:

Adversarial Training: AdvGAN-generated examples can enhance adversarial training methods, potentially leading to more resilient models.
Security and Robustness: Understanding the capabilities of AdvGAN in generating robust adversarial examples can inform security practices and the development of more robust AI systems.

Future research may delve into:

Extending AdvGAN across domains: Exploring its applicability beyond image data to domains like text, speech, or reinforcement learning.
Enhancing Distillation Methods: Further optimizing the dynamic distillation strategies to improve black-box attack performance.
Defense Mechanisms: Developing adaptive defenses that can effectively mitigate AdvGAN-style attacks.

Overall, AdvGAN exemplifies a significant advancement in adversarial example generation, combining efficiency with perceptual realism to challenge and potentially improve the robustness of modern DNNs.

PDF Markdown

Related Papers

Find Related Papers

GitHub

GitHub - MadryLab/mnist_challenge: A challenge to explore adversarial robustness of neural networks on MNIST. (707 stars)