Overfitting in adversarially robust deep learning (2002.11569v2)

Published 26 Feb 2020 in cs.LG and stat.ML

Abstract: It is common practice in deep learning to use overparameterized networks and train for as long as possible; there are numerous studies that show, both theoretically and empirically, that such practices surprisingly do not unduly harm the generalization performance of the classifier. In this paper, we empirically study this phenomenon in the setting of adversarially trained deep networks, which are trained to minimize the loss under worst-case adversarial perturbations. We find that overfitting to the training set does in fact harm robust performance to a very large degree in adversarially robust training across multiple datasets (SVHN, CIFAR-10, CIFAR-100, and ImageNet) and perturbation models ($\ell_\infty$ and $\ell_2$). Based upon this observed effect, we show that the performance gains of virtually all recent algorithmic improvements upon adversarial training can be matched by simply using early stopping. We also show that effects such as the double descent curve do still occur in adversarially trained models, yet fail to explain the observed overfitting. Finally, we study several classical and modern deep learning remedies for overfitting, including regularization and data augmentation, and find that no approach in isolation improves significantly upon the gains achieved by early stopping. All code for reproducing the experiments as well as pretrained model weights and training logs can be found at https://github.com/locuslab/robust_overfitting.

View on arXiv

Authors (3)

Leslie Rice (4 papers)
Eric Wong (47 papers)
J. Zico Kolter (151 papers)

Citations (739)

View on Semantic Scholar

Summary

Overfitting in Adversarially Robust Deep Learning

Introduction

In the domain of adversarial robustness in deep learning, where models are trained to withstand adversarial perturbations, overfitting presents a significant challenge. Unlike conventional deep learning, where overparameterization and prolonged training typically do not harm generalization, adversarially trained models exhibit "robust overfitting" – a phenomenon where the robust test performance significantly degrades towards the end of training. This paper, conducted across multiple datasets (SVHN, CIFAR-10, CIFAR-100, and ImageNet) and perturbation models ( $\ell_\infty$ and $\ell_2$ ), underscores the detrimental impacts of overfitting in adversarial settings.

Key Findings

Robust Overfitting Phenomenon: The research demonstrates that robust overfitting occurs consistently across different datasets and perturbation models. For instance, training on CIFAR-10 with an $\ell_\infty$ perturbation reveals that after the first learning rate decay, the robust test error initially decreases but subsequently increases – a clear indication of robust overfitting. This pattern is observed in all tested settings, including SVHN, CIFAR-100, and ImageNet.
Implications of Early Stopping: Notably, early stopping significantly mitigates the effects of robust overfitting. The paper reveals that nearly all recent advancements in adversarial training can be replicated by employing early stopping. For example, the TRADES method, which purportedly offers state-of-the-art robust performance, achieves 43.4% robust test error on CIFAR-10 when stopped early, compared to 50.6% if training is allowed to converge. Similarly, standard PGD-based adversarial training matches TRADES's performance when early stopping is applied.
Learning Rate Schedules: Different learning rate schedules (piecewise decay, multiple decay, linear decay, cyclic, and cosine) were examined to understand their impact. The piecewise decay schedule, where the learning rate is drastically reduced at fixed epochs, was found to induce the most notable robust overfitting but also produced the best checkpoints before overfitting set in. Attempts to smooth out the learning rate adjustments did not mitigate robust overfitting.
Classical and Modern Regularization Methods: Classical regularization techniques such as $\ell_1$ and $\ell_2$ regularization, and modern data augmentation techniques like cutout and mixup, were evaluated. While these methods provided some benefits, none were as effective as early stopping when used in isolation. Regularization often led to over-regularized models, and augmented data did not fully prevent the degradation of robust performance.

Empirical Observations

Adversarially Trained Models' Curves: The paper observed that adversarially trained models exhibit the double descent curve, similar to standard training, but this does not alleviate robust overfitting. Increasing the hypothesis class size improved the robust performance but did not counteract the overfitting effect seen during training.
Validation-Based Early Stopping: Validation-based early stopping was tested and found effective. For instance, by holding out a validation set of 1,000 examples from CIFAR-10, the models stopped based on validation performance closely matched those chosen by observing test performance, achieving 46.9% robust error versus 46.7%.
Final Performance: With the full spectrum of standard explicit regularization, $\ell_2$ regularization was the most effective but still not as advantageous alone as early stopping. Data augmentation with semi-supervised learning showed potential, particularly when combined with early stopping, achieving a robust test error of 40.2%, highlighting the value of integrating multiple methods.

Implications and Future Directions

The findings underscore the complexity of training adversarially robust models and the distinct behavior of overfitting in this context. Early stopping emerges as a crucial strategy to maintain robust performance, yet the interaction between different regularization techniques and training protocols necessitates further exploration. Future research should focus on:

Hybrid Techniques: Combining early stopping with advanced regularization and data augmentation strategies to find synergistic effects.
Theoretical Foundations: Developing theoretical models to better understand robust overfitting and guide the design of more resilient training schemes.
Scalability: Evaluating the generalization of these findings to larger and more diverse datasets, including industrial-scale applications.

The code and models from this paper are publicly available, providing a valuable resource for further innovation in adversarially robust deep learning.

Conclusion

This paper highlights the nuanced challenges of overfitting in adversarial robust training and reaffirms the efficacy of early stopping as a practical solution to maintain robust performance. By systemically evaluating different datasets, training methods, and regularization techniques, it steers the community towards more effective strategies to combat robust overfitting, presenting a pivotal step in advancing the field of adversarial robust deep learning.

PDF Markdown

Related Papers

Find Related Papers

GitHub

GitHub - locuslab/robust_overfitting (158 stars)