Uncovering the Limits of Adversarial Training against Norm-Bounded Adversarial Examples (2010.03593v3)

Published 7 Oct 2020 in stat.ML, cs.AI, and cs.LG

Abstract: Adversarial training and its variants have become de facto standards for learning robust deep neural networks. In this paper, we explore the landscape around adversarial training in a bid to uncover its limits. We systematically study the effect of different training losses, model sizes, activation functions, the addition of unlabeled data (through pseudo-labeling) and other factors on adversarial robustness. We discover that it is possible to train robust models that go well beyond state-of-the-art results by combining larger models, Swish/SiLU activations and model weight averaging. We demonstrate large improvements on CIFAR-10 and CIFAR-100 against $\ell_\infty$ and $\ell_2$ norm-bounded perturbations of size $8/255$ and $128/255$, respectively. In the setting with additional unlabeled data, we obtain an accuracy under attack of 65.88% against $\ell_\infty$ perturbations of size $8/255$ on CIFAR-10 (+6.35% with respect to prior art). Without additional data, we obtain an accuracy under attack of 57.20% (+3.46%). To test the generality of our findings and without any additional modifications, we obtain an accuracy under attack of 80.53% (+7.62%) against $\ell_2$ perturbations of size $128/255$ on CIFAR-10, and of 36.88% (+8.46%) against $\ell_\infty$ perturbations of size $8/255$ on CIFAR-100. All models are available at https://github.com/deepmind/deepmind-research/tree/master/adversarial_robustness.

PDF Abstract

Analyzing the Limits of Adversarial Training for Norm-Bounded Adversarial Examples

The paper "Uncovering the Limits of Adversarial Training against Norm-Bounded Adversarial Examples" undertakes a comprehensive investigation into the capacity of adversarial training to enhance the robustness of deep neural networks. The paper systematically examines multiple facets of adversarial training, establishing new benchmarks for robustness against norm-bounded adversarial perturbations.

Key Findings

The authors explore a variety of factors influencing adversarial robustness, including model architecture, training losses, activation functions, and the use of unlabeled data. They demonstrate significant advancements over previous state-of-the-art results on Cifar-10 and Cifar-100 datasets. Notably, the introduction of larger model architectures, Swish/SiLU activation functions, and model weight averaging practices leads to superior adversarial robustness. On Cifar-10, the paper reports an accuracy under attack of 65.88% when incorporating additional unlabeled data, marking a substantial improvement from previous best practices.

Detailed Analysis

Training Losses and Optimization: The paper reveals that TRADES, a loss function combining standard and robust accuracy, consistently outperforms traditional adversarial training methods. This observation challenges earlier summaries suggesting classical adversarial training can match state-of-the-art methods. Additionally, the paper reveals the susceptibility of specific loss functions to creating gradient masking, underlining the importance of loss function selection.
Inner Maximization Variability: The authors investigate the impact of altering inner maximization losses and find that modifications can substantially affect robustness. They point out that using margin loss during training may lead to gradient masking, thereby emphasizing the need for careful loss function selection in inner maximization procedures.
Model Scale and Activation Functions: Larger neural networks demonstrate a consistent improvement in robustness, aligning with previous findings. Moreover, the paper underscores the role of activation functions, with Swish providing notable robustness gains. This reinforces the nuanced understanding of activation functions beyond simple rectified linear units.
Unlabeled Data Utilization: The research confirms that integrating unlabeled data with strategic pseudo-labeling can substantially boost robustness. This signals a potential pathway for enhancing model generalization without explicitly increasing labeled datasets. The paper's findings show significant gains when additional data is thoughtfully leveraged.
Model Weight Averaging: This method emerges as a critical component in attaining robustness, akin to ensembling, but more computationally efficient. Its influence on achieving flatter solutions indicates a promising direction for further exploration.

Implications and Future Directions

The methods and results presented hold substantial practical implications for deploying robust AI models in adversarial contexts. The paper suggests further research into nuanced interactions between training strategies and architecture choices. While the paper sets fresh benchmarks, it also proposes that fundamentally new approaches may ultimately be required to reach even greater robustness.

From a theoretical standpoint, the findings encourage exploration into the dynamics of adversarial training, especially concerning loss landscapes and the role of model architecture. As adversarial robustness becomes increasingly crucial in real-world applications, this research provides a vital foundation for future advancements in robust AI systems.

In conclusion, this paper pushes the boundaries of adversarial training, offering both immediate advancements and laying the groundwork for future exploration into the limitations and potentials of current methodologies in adversarial robustness.

PDF Markdown Bookmark Chat (Pro)

Authors (5)

Sven Gowal (37 papers)
Chongli Qin (9 papers)
Jonathan Uesato (29 papers)
Timothy Mann (19 papers)
Pushmeet Kohli (116 papers)

Citations (308)

View on Semantic Scholar

Related Papers

Find Related Papers