Is RobustBench/AutoAttack a suitable Benchmark for Adversarial Robustness?
Abstract: Recently, RobustBench (Croce et al. 2020) has become a widely recognized benchmark for the adversarial robustness of image classification networks. In its most commonly reported sub-task, RobustBench evaluates and ranks the adversarial robustness of trained neural networks on CIFAR10 under AutoAttack (Croce and Hein 2020b) with l-inf perturbations limited to eps = 8/255. With leading scores of the currently best performing models of around 60% of the baseline, it is fair to characterize this benchmark to be quite challenging. Despite its general acceptance in recent literature, we aim to foster discussion about the suitability of RobustBench as a key indicator for robustness which could be generalized to practical applications. Our line of argumentation against this is two-fold and supported by excessive experiments presented in this paper: We argue that I) the alternation of data by AutoAttack with l-inf, eps = 8/255 is unrealistically strong, resulting in close to perfect detection rates of adversarial samples even by simple detection algorithms and human observers. We also show that other attack methods are much harder to detect while achieving similar success rates. II) That results on low-resolution data sets like CIFAR10 do not generalize well to higher resolution images as gradient-based attacks appear to become even more detectable with increasing resolutions.
- Square Attack: a query-efficient blackbox adversarial attack via random search. In ECCV.
- Towards Evaluating the Robustness of Neural Networks. IEEE Symposium on Security and Privacy (SP), 39–57.
- A Downsampled Variant of ImageNet as an Alternative to the CIFAR datasets. arXiv:1707.08819.
- An algorithm for the machine calculation of complex Fourier series. Mathematics of Computation, 19: 297–301.
- RobustBench: a standardized adversarial robustness benchmark. arXiv preprint arXiv:2010.09670.
- Minimally distorted Adversarial Examples with a Fast Adaptive Boundary Attack. In ICML.
- Reliable evaluation of adversarial robustness with an ensemble of diverse parameter-free attacks. In ICML.
- advertorch v0.1: An Adversarial Robustness Toolbox based on PyTorch. arXiv, abs/1902.07623.
- Explaining and Harnessing Adversarial Examples. CoRR, abs/1412.6572.
- SpectralDefense: Detecting Adversarial Attacks on CNNs in the Fourier Domain. arXiv preprint arXiv:2103.03000.
- Adversarial examples in the physical world. arXiv:1607.02533.
- Deep Learning Face Attributes in the Wild. In Proceedings of International Conference on Computer Vision (ICCV).
- Detecting AutoAttack Perturbations in the Frequency Domain. In ICML 2021 Workshop on Adversarial Machine Learning.
- Towards Deep Learning Models Resistant to Adversarial Attacks. arXiv, abs/1706.06083.
- DeepFool: A Simple and Accurate Method to Fool Deep Neural Networks. CVPR, 2574–2582.
- Technical Report on the CleverHans v2.1.0 Adversarial Examples Library. arXiv: Learning.
- Foolbox: A Python toolbox to benchmark the robustness of machine learning models. arXiv, abs/1707.04131.
- Wide Residual Networks. In BMVC.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.