Understanding and Improving Fast Adversarial Training (2007.02617v2)

Published 6 Jul 2020 in cs.LG, cs.CR, cs.CV, and stat.ML

Abstract: A recent line of work focused on making adversarial training computationally efficient for deep learning models. In particular, Wong et al. (2020) showed that $\ell_\infty$-adversarial training with fast gradient sign method (FGSM) can fail due to a phenomenon called "catastrophic overfitting", when the model quickly loses its robustness over a single epoch of training. We show that adding a random step to FGSM, as proposed in Wong et al. (2020), does not prevent catastrophic overfitting, and that randomness is not important per se -- its main role being simply to reduce the magnitude of the perturbation. Moreover, we show that catastrophic overfitting is not inherent to deep and overparametrized networks, but can occur in a single-layer convolutional network with a few filters. In an extreme case, even a single filter can make the network highly non-linear locally, which is the main reason why FGSM training fails. Based on this observation, we propose a new regularization method, GradAlign, that prevents catastrophic overfitting by explicitly maximizing the gradient alignment inside the perturbation set and improves the quality of the FGSM solution. As a result, GradAlign allows to successfully apply FGSM training also for larger $\ell_\infty$-perturbations and reduce the gap to multi-step adversarial training. The code of our experiments is available at https://github.com/tml-epfl/understanding-fast-adv-training.

Citations (269)

View on Semantic Scholar

Summary

The paper identifies catastrophic overfitting in FGSM, showing that model robustness abruptly declines when the inner maximization fails.
It demonstrates that random initialization in FGSM reduces effective perturbation magnitude rather than truly diversifying the threat model.
The paper introduces GradAlign, a novel regularizer that improves gradient alignment and achieves competitive robustness in adversarial training.

Overview of "Understanding and Improving Fast Adversarial Training"

The paper "Understanding and Improving Fast Adversarial Training" by Maksym Andriushchenko and Nicolas Flammarion examines and enhances adversarial training methods with a focus on making them computationally efficient, specifically addressing the challenges associated with the Fast Gradient Sign Method (FGSM). The authors aim to improve the robustness of machine learning models training on adversarial examples which remain susceptible to input perturbations, affecting model predictions.

Core Contributions

Catastrophic Overfitting: The paper elaborates on catastrophic overfitting, a phenomenon in which the robustness gained during adversarial training is abruptly lost after a certain epoch. This overfitting occurs when models trained using fast adversarial methods, such as FGSM, can no longer adequately solve the inner maximization problem. The authors illustrate that this issue is not exclusive to deep networks but can also manifest in simple single-layer convolutional models.
Role of Random Initialization: Although random initialization in FGSM as proposed by prior work can extend the robustness to larger perturbations, the paper posits that this effect is simply due to reducing the effective perturbation magnitude rather than true diversification of the threat model.
Gradient Alignment and Local Linearity: The authors introduce the concept of gradient alignment to explain the local linearity of deep networks. A drop in gradient alignment, aligned with catastrophic overfitting, is noted to indicate that FGSM can no longer effectively solve the inner optimization problem, underscoring the hypothesis that the model's increased non-linearity within perturbation sets leads to training failure.
GradAlign Regularizer: The paper introduces 'GradAlign', a novel regularization method focused on maximizing gradient alignment within the perturbation set to mitigate catastrophic overfitting. This regularizer is shown to help achieve high robustness even in large adversarial perturbations and help close the gap to multi-step adversarial training methodologies.

Experimental Findings

The experiments underscore several key observations:

Models trained with FGSM combined with GradAlign show competitive adversarial robustness without experiencing catastrophic overfitting.
Across datasets like CIFAR-10 and SVHN, GradAlign demonstrated effectiveness in enhancing the FGSM method while avoiding the pitfalls faced by other FGSM-based and fast adversarial training methods.
High-dimensional datasets such as ImageNet did not consistently exhibit catastrophic overfitting for FGSM, suggesting dataset-specific characteristics in the training dynamics.

Implications and Future Directions

This research has both theoretical and practical implications for the area of adversarial training:

Theoretical Insights: The paper highlights the significance of gradient alignment as a diagnostic tool for understanding when and why fast adversarial training methods like FGSM fail. It posits a novel viewpoint that randomness in FGSM extensions merely reduces perturbation strength, rather than diversifying threat models.
Practical Improvements: By introducing GradAlign, the authors provide a more robust and computationally efficient approach to enhancing FGSM. This implements a promising avenue for practitioners who rely on adversarial training to secure models against adversarial threats without the computational expense of more rigorous multi-step methods such as PGD.
Future Work: The need for efficient implementations of gradient-based regularizers opens the door for inventive methodologies that do not rely on double backpropagation—an avenue that could make methods like GradAlign more feasible in large-scale applications. Further exploration of the relationship between catastrophic overfitting and dataset-specific properties could illuminate more uniform strategies to address adversarial vulnerabilities across varied contexts.

The paper suggests that by understanding and improving fast adversarial training, it is possible to significantly enhance model robustness efficiently. Continuing work in this area may focus on optimizing regularizers and ensuring broad applicability across diverse model architectures and input datasets.

PDF Markdown

Related Papers

GitHub

GitHub - tml-epfl/understanding-fast-adv-training: Understanding and Improving Fast Adversarial Training [NeurIPS 2020] (95 stars)