The Robust Manifold Defense: Adversarial Training using Generative Models (1712.09196v5)

Published 26 Dec 2017 in cs.CV, cs.CR, cs.LG, and stat.ML

Abstract: We propose a new type of attack for finding adversarial examples for image classifiers. Our method exploits spanners, i.e. deep neural networks whose input space is low-dimensional and whose output range approximates the set of images of interest. Spanners may be generators of GANs or decoders of VAEs. The key idea in our attack is to search over latent code pairs to find ones that generate nearby images with different classifier outputs. We argue that our attack is stronger than searching over perturbations of real images. Moreover, we show that our stronger attack can be used to reduce the accuracy of Defense-GAN to 3\%, resolving an open problem from the well-known paper by Athalye et al. We combine our attack with normal adversarial training to obtain the most robust known MNIST classifier, significantly improving the state of the art against PGD attacks. Our formulation involves solving a min-max problem, where the min player sets the parameters of the classifier and the max player is running our attack, and is thus searching for adversarial examples in the {\em low-dimensional} input space of the spanner. All code and models are available at \url{https://github.com/ajiljalal/manifold-defense.git}

Authors (4)

Ajil Jalal (18 papers)
Andrew Ilyas (39 papers)
Constantinos Daskalakis (111 papers)
Alexandros G. Dimakis (133 papers)

Citations (171)

View on Semantic Scholar

Summary

The Robust Manifold Defense: Adversarial Training using Generative Models

The paper presents a novel approach to both attacking and defending deep neural network (DNN) classifiers using the concept of "spanners" derived from generative models like GANs and VAEs. Spanners map low-dimensional inputs to high-dimensional outputs that effectively approximate target image datasets. The core idea of the proposed method, the "overpowered attack," exploits these spanners by searching for pairs of latent codes that generate similar images but result in different classifier outcomes, enhancing the efficacy of adversarial attacks beyond traditional perturbation techniques.

Key Findings and Methodology

Overpowered Attack: The authors introduce an adversarial attack that operates in the latent space of spanners. By identifying latent code pairs (z, z') such that the generated images (G(z) and G(z')) are similar but lead to significantly different classifications, the attack effectively circumvents defenses like DefenseGAN, reducing adversarial accuracy to as low as 3%.
Robust Manifold Defense: The paper proposes a defense mechanism combining normal adversarial training with their novel attack approach. This is formulated as a min-max optimization problem where the adversarial attack is utilized to fortify the classifier against the spanner-generated adversarial examples.
Significant Numerical Results: Application to MNIST classifiers shows improvements in state-of-the-art adversarial accuracy from 91.88% to 96.26% against PGD attacks, demonstrating the robustness of the training method when integrated with their proposed defense strategy.

Theoretical and Practical Implications

From a theoretical standpoint, the paper suggests that leveraging generative models as spanners in adversarial settings may provide more powerful and efficient attacks due to the low-dimensional search space for adversarial examples. This approach potentially impacts future research in both adversarial attack strategies and robust classifier design, introducing new paradigms for addressing the problem of adversarial vulnerability in neural networks.

Practically, the findings propose a feasible defense mechanism that could be integrated into existing adversarial training algorithms to enhance robustness. This could have direct applications in areas requiring high-confidence classification systems, such as autonomous driving and security systems.

Future Directions

The success of the Robust Manifold Defense suggests further exploration and refinement of generative spanners for various types of classifiers and datasets beyond MNIST and CelebA. Additionally, development of more sophisticated spanners could reduce the dimensionality gap further, potentially improving both adversarial attack and defense capabilities. Studies might also focus on extending these methodologies to address adversarial challenges in more complex and dynamic real-world environments. Overall, the framework laid out by the authors provides a foundational step towards holistic adversarial defenses incorporating generative models.

PDF Markdown