- The paper shows that r-separated datasets theoretically allow for perfect accuracy and robustness, challenging the assumed trade-off in deep learning.
- It introduces a locally Lipschitz classifier, constructed via rounding a distance-based function, to achieve perfect astuteness within a perturbation radius.
- Empirical evaluations reveal that robust training methods produce smoother decision boundaries but suffer from generalization gaps that can be mitigated with Dropout.
This paper, "A Closer Look at Accuracy vs. Robustness" (2003.02460), investigates the widely observed phenomenon where achieving adversarial robustness in deep neural networks often leads to a decrease in standard test accuracy, suggesting a potential inherent trade-off. The authors challenge this notion by exploring the properties of real image datasets and current robust training methods.
The central argument of the paper is that, contrary to the idea of an inevitable trade-off, both high accuracy and robustness should theoretically be achievable on standard image classification datasets. They introduce the concept of r-separation, defined as the property where any two examples from different classes are separated by a distance of at least $2r$ in the input space. Through empirical measurements, the authors demonstrate that datasets like MNIST, CIFAR-10, SVHN, and Restricted ImageNet are indeed r-separated for values of r larger than typical adversarial perturbation radii (ℓ∞ norm).
Based on this empirical finding, the paper provides a theoretical result (Theorem 3.2) showing that if a data distribution is r-separated, there exists a classifier that is both perfectly accurate and robust up to perturbations of size r. This classifier can be constructed by rounding an underlying function that is locally Lipschitz around the data points. Specifically, for a function f(x)i=r1⋅dist(x,X(i)), where X(i) is the support for class i, the classifier g(x)=argminif(x)i is shown to be $1/r$-locally Lipschitz near the data and achieve perfect astuteness (robust accuracy) with radius r. This theoretical existence result suggests that the observed accuracy-robustness trade-off in practice is not dictated by the fundamental properties of these datasets.
To understand the discrepancy between theory and practice, the paper empirically evaluates several existing robust training methods: Gradient Regularization (GR), Locally Linear Regularization (LLR), Adversarial Training (AT), Robust Self Training (RST), and TRADES. They measure test accuracy, adversarial test accuracy, and the empirical Lipschitz constant of the trained models on synthetic (Staircase) and real image datasets (MNIST, SVHN, CIFAR-10, Restricted ImageNet). The empirical Lipschitz constant for a classifier f at radius ϵ is estimated as n1i=1∑nxi′∈B∞(xi,ϵ)max∥xi−xi′∥∞∥f(xi)−f(xi′)∥1.
The experimental results highlight two key observations about current methods:
- Local Lipschitzness: Methods that achieve higher adversarial robustness (AT, TRADES, RST) also tend to produce classifiers with lower empirical Lipschitz constants, indicating greater smoothness. TRADES, which explicitly penalizes deviations within an adversarial ball, often produces the smoothest models.
- Generalization Gap: The robust training methods (AT, TRADES, RST) suffer from significantly larger generalization gaps compared to naturally trained models or those trained with GR/LLR. This gap exists for both standard test accuracy (train vs. test accuracy) and, even more pronouncedly, for adversarial accuracy (train adversarial accuracy vs. test adversarial accuracy). This suggests that while these methods improve robustness on the training data, they struggle to generalize this robustness to unseen test data.
Further exploring the generalization issue, the authors experiment with adding Dropout, a standard regularization technique, to the robust training methods on SVHN and CIFAR-10. They find that incorporating Dropout effectively narrows the generalization gaps for AT, RST, and TRADES, leading to improved test accuracy and adversarial test accuracy. Dropout also tends to decrease the test empirical Lipschitz constant for these methods, suggesting it helps in achieving smoother, more generalized decision boundaries.
The paper concludes that the observed accuracy-robustness trade-off is not an intrinsic property of image classification tasks on standard benchmarks, but rather a consequence of limitations in current robust training algorithms, particularly concerning their generalization ability. The findings suggest that future research should focus on improving generalization techniques in robust training, potentially by redesigning other components of the deep learning pipeline, such as network architectures or optimization methods, in conjunction with robustness-inducing loss functions and generalization tools like Dropout. The paper's code is made available for reproducibility and further research.