Natural Adversarial Examples (1907.07174v4)

Published 16 Jul 2019 in cs.LG, cs.CV, and stat.ML

Abstract: We introduce two challenging datasets that reliably cause machine learning model performance to substantially degrade. The datasets are collected with a simple adversarial filtration technique to create datasets with limited spurious cues. Our datasets' real-world, unmodified examples transfer to various unseen models reliably, demonstrating that computer vision models have shared weaknesses. The first dataset is called ImageNet-A and is like the ImageNet test set, but it is far more challenging for existing models. We also curate an adversarial out-of-distribution detection dataset called ImageNet-O, which is the first out-of-distribution detection dataset created for ImageNet models. On ImageNet-A a DenseNet-121 obtains around 2% accuracy, an accuracy drop of approximately 90%, and its out-of-distribution detection performance on ImageNet-O is near random chance levels. We find that existing data augmentation techniques hardly boost performance, and using other public training datasets provides improvements that are limited. However, we find that improvements to computer vision architectures provide a promising path towards robust models.

Authors (5)

Dan Hendrycks (63 papers)
Kevin Zhao (22 papers)
Steven Basart (16 papers)
Jacob Steinhardt (88 papers)
Dawn Song (229 papers)

Citations (1,254)

View on Semantic Scholar

Summary

Overview of "Natural Adversarial Examples"

The paper "Natural Adversarial Examples" presents two innovative datasets, ImageNet-A and ImageNet-O, designed to expose the vulnerabilities of machine learning models, particularly in computer vision. These datasets contain natural images that exploit shared weaknesses in existing classifiers, demonstrating that robust model performance on standard benchmarks like ImageNet does not necessarily translate to real-world robustness.

Contributions and Methodology

The primary contributions of the paper include:

ImageNet-A: This dataset comprises real-world images from ImageNet classes that models should be able to classify accurately but often fail to do so. These images are selected using an adversarial filtration technique. On this dataset, a DenseNet-121 model achieves only around 2% accuracy, indicating a significant drop from its typical performance on standard ImageNet test sets.
ImageNet-O: This is an out-of-distribution (OOD) detection dataset for ImageNet models. The images belong to classes not included in the ImageNet training set. Models often classify these images with high confidence even though they should not, showing the models' inability to properly detect OOD samples.

The datasets were curated through an adversarial filtration process. Images that fooled a fixed ResNet-50 model were selected, ensuring these adversarial examples were natural, unmodified, and challenging. This process highlighted the classifiers' reliance on spurious patterns and simple cues rather than robust, generalizable features.

Experimental Results

The experiments reveal several critical insights:

Performance Degradation: Existing models like DenseNet-121 and ResNet-50 show drastic accuracy reductions on ImageNet-A. For instance, ResNet-50's accuracy dropped to approximately 2.17%.
Data Augmentation: Popular data augmentation techniques, including adversarial training, Mixup, and CutMix, provided minimal improvements on ImageNet-A. Even sophisticated augmentations like AugMix and self-attention mechanisms yielded limited gains, illustrating the robustness gap for real-world adversarial examples.
Increased Dataset Size: Training with an order of magnitude more data (e.g., from ImageNet-21K) slightly improved performance, indicating that more labeled data could help but is not a complete solution.
Architectural Changes: Adjustments to model architecture had a more pronounced impact. Self-attention mechanisms and increased capacity improved performance. For example, ResNet-152 showed better robustness compared to smaller ResNets, and incorporating Squeeze-and-Excitation (SE) modules led to additional gains. The paper also demonstrated that vision Transformers like DeiT, despite not using convolutional operations, still showed improvement but were also susceptible to these adversarial examples.

Implications

The paper underscores the limitations of current models' robustness to distribution shifts and adversarial examples. The findings suggest several theoretical and practical implications:

Theoretical: The shared weaknesses among diverse models indicate fundamental issues in the representation learning capabilities of these architectures. This calls for revisiting the design principles in model architectures and training paradigms beyond the conventional settings.
Practical: In real-world applications, models need to be resilient against naturally occurring adversarial examples. Failure to handle such cases can have significant implications, especially in safety-critical applications like autonomous driving or medical diagnosis.

Future Directions

Future research could focus on developing more robust training techniques that specifically address the types of distribution shifts represented in ImageNet-A and ImageNet-O. Potential areas of exploration include:

Enhancing data augmentation techniques to better simulate real-world variations.
Investigating architectural innovations that can generalize beyond standard benchmarks.
Integrating more sophisticated uncertainty estimation methods to improve OOD detection.

In conclusion, the datasets introduced in this paper provide valuable benchmarks for advancing the robustness of computer vision models. The insights derived from this research highlight the need for continued innovation to bridge the gap between benchmark performance and real-world applicability.

PDF Markdown

Related Papers

Find Related Papers

Tweets

YouTube

Show All Videos