Overfitting in Adversarially Robust Deep Learning
Introduction
In the domain of adversarial robustness in deep learning, where models are trained to withstand adversarial perturbations, overfitting presents a significant challenge. Unlike conventional deep learning, where overparameterization and prolonged training typically do not harm generalization, adversarially trained models exhibit "robust overfitting" – a phenomenon where the robust test performance significantly degrades towards the end of training. This paper, conducted across multiple datasets (SVHN, CIFAR-10, CIFAR-100, and ImageNet) and perturbation models (ℓ∞ and ℓ2), underscores the detrimental impacts of overfitting in adversarial settings.
Key Findings
- Robust Overfitting Phenomenon: The research demonstrates that robust overfitting occurs consistently across different datasets and perturbation models. For instance, training on CIFAR-10 with an ℓ∞ perturbation reveals that after the first learning rate decay, the robust test error initially decreases but subsequently increases – a clear indication of robust overfitting. This pattern is observed in all tested settings, including SVHN, CIFAR-100, and ImageNet.
- Implications of Early Stopping: Notably, early stopping significantly mitigates the effects of robust overfitting. The paper reveals that nearly all recent advancements in adversarial training can be replicated by employing early stopping. For example, the TRADES method, which purportedly offers state-of-the-art robust performance, achieves 43.4% robust test error on CIFAR-10 when stopped early, compared to 50.6% if training is allowed to converge. Similarly, standard PGD-based adversarial training matches TRADES's performance when early stopping is applied.
- Learning Rate Schedules: Different learning rate schedules (piecewise decay, multiple decay, linear decay, cyclic, and cosine) were examined to understand their impact. The piecewise decay schedule, where the learning rate is drastically reduced at fixed epochs, was found to induce the most notable robust overfitting but also produced the best checkpoints before overfitting set in. Attempts to smooth out the learning rate adjustments did not mitigate robust overfitting.
- Classical and Modern Regularization Methods: Classical regularization techniques such as ℓ1 and ℓ2 regularization, and modern data augmentation techniques like cutout and mixup, were evaluated. While these methods provided some benefits, none were as effective as early stopping when used in isolation. Regularization often led to over-regularized models, and augmented data did not fully prevent the degradation of robust performance.
Empirical Observations
- Adversarially Trained Models' Curves: The paper observed that adversarially trained models exhibit the double descent curve, similar to standard training, but this does not alleviate robust overfitting. Increasing the hypothesis class size improved the robust performance but did not counteract the overfitting effect seen during training.
- Validation-Based Early Stopping: Validation-based early stopping was tested and found effective. For instance, by holding out a validation set of 1,000 examples from CIFAR-10, the models stopped based on validation performance closely matched those chosen by observing test performance, achieving 46.9% robust error versus 46.7%.
- Final Performance: With the full spectrum of standard explicit regularization, ℓ2 regularization was the most effective but still not as advantageous alone as early stopping. Data augmentation with semi-supervised learning showed potential, particularly when combined with early stopping, achieving a robust test error of 40.2%, highlighting the value of integrating multiple methods.
Implications and Future Directions
The findings underscore the complexity of training adversarially robust models and the distinct behavior of overfitting in this context. Early stopping emerges as a crucial strategy to maintain robust performance, yet the interaction between different regularization techniques and training protocols necessitates further exploration. Future research should focus on:
- Hybrid Techniques: Combining early stopping with advanced regularization and data augmentation strategies to find synergistic effects.
- Theoretical Foundations: Developing theoretical models to better understand robust overfitting and guide the design of more resilient training schemes.
- Scalability: Evaluating the generalization of these findings to larger and more diverse datasets, including industrial-scale applications.
The code and models from this paper are publicly available, providing a valuable resource for further innovation in adversarially robust deep learning.
Conclusion
This paper highlights the nuanced challenges of overfitting in adversarial robust training and reaffirms the efficacy of early stopping as a practical solution to maintain robust performance. By systemically evaluating different datasets, training methods, and regularization techniques, it steers the community towards more effective strategies to combat robust overfitting, presenting a pivotal step in advancing the field of adversarial robust deep learning.