Benchmarking Neural Network Robustness to Common Corruptions and Surface Variations (1807.01697v5)

Published 4 Jul 2018 in cs.LG, cs.AI, cs.CV, cs.NE, and stat.ML

Abstract: In this paper we establish rigorous benchmarks for image classifier robustness. Our first benchmark, ImageNet-C, standardizes and expands the corruption robustness topic, while showing which classifiers are preferable in safety-critical applications. Unlike recent robustness research, this benchmark evaluates performance on commonplace corruptions not worst-case adversarial corruptions. We find that there are negligible changes in relative corruption robustness from AlexNet to ResNet classifiers, and we discover ways to enhance corruption robustness. Then we propose a new dataset called Icons-50 which opens research on a new kind of robustness, surface variation robustness. With this dataset we evaluate the frailty of classifiers on new styles of known objects and unexpected instances of known classes. We also demonstrate two methods that improve surface variation robustness. Together our benchmarks may aid future work toward networks that learn fundamental class structure and also robustly generalize.

Authors (2)

Dan Hendrycks (63 papers)
Thomas G. Dietterich (28 papers)

Citations (188)

View on Semantic Scholar

Summary

The paper introduces the ImageNet-C and Icons-50 benchmarks to evaluate neural network resilience against commonplace corruptions and surface variations.
It rigorously measures performance using metrics like mean Corruption Error (mCE) and Relative mCE across various architectures.
Enhancement techniques such as adaptive preprocessing and multiscale designs show promise but indicate further innovation is needed for real-world robustness.

Benchmarking Neural Network Robustness to Common Corruptions and Surface Variations

The paper "Benchmarking Neural Network Robustness to Common Corruptions and Surface Variations" by Dan Hendrycks and Thomas G. Dietterich rigorously evaluates neural network classifiers against unpredictable and commonplace image distortions. Rather than focusing on adversarial examples, this work addresses everyday phenomena such as noise, blur, and surface changes using two newly defined benchmarks: ImageNet-C for corruption robustness and Icons-50 for surface variation robustness. This shift towards more typical visual challenges is pivotal, particularly for the deployment of neural networks in safety-critical applications, where the models' ability to generalize over unpredictable input variations is non-negotiable.

ImageNet-C: Common Corruption Benchmark

ImageNet-C is constructed by applying 75 types of algorithmically-generated distortions, categorized into noise, blur, weather, and digital corruption classes, onto ImageNet validation images. This design fosters a comprehensive measure of classifiers' robustness, quantified through the mean Corruption Error (mCE). An additional measure, Relative mCE, adjusts for ease of corruption to normalize across different severities and types. Analysis of various architectures, from AlexNet to ResNet-50, reveals that progress in classifier accuracy does not directly translate to robustness improvements. Although mCE decreases with architecture advancements, indicating some adaptation, relative robustness remained largely unchanged, suggesting further development in model architecture is critical.

Enhancements and Implications

Several robustness enhancement techniques were put to the test. Noteworthy methods include preprocessing techniques like Contrast Limited Adaptive Histogram Equalization, which improved model robustness without affecting accuracy, and multiscale architectures like Multigrid Networks, offering superior resistance to noise corruptions. Larger models with increased feature aggregation capabilities, such as ResNeXt and DenseNet, exhibited unexpectedly robust responses to corruption, indicating a strong potential area for methodological improvements in model robustness design.

Icons-50: Surface Variation Benchmark

Turning to surface variation robustness, Icons-50 introduces challenges in style and subtype robustness by evaluating models' abilities to generalize across different styles and subtypes of previously encountered classes. Notably, typical architectures like ResNet, DenseNet, and ResNeXt underperformed in adapting to new styles, exemplified by testing with Microsoft-styled icons, and in handling the breadth of subtypes within Icons-50 and other datasets like CIFAR-100 and ImageNet-22K.

Future Directions

The benchmarks and datasets developed herein provide critical tools to guide future research endeavors towards more robust visual recognition models. Benchmark-driven analysis suggests that while some enhancement techniques yield marginal gains, there remains substantial performance gaps across both robustness forms delineated by ImageNet-C and Icons-50. Future work should target more adaptive and generalizable network architectures capable of handling real-world visual variability. Notably, the significant robustness indicative seen in monolithic architectures suggests a potential pathway forward in optimizing network depth, width, and feature integration. The focus on realistic distortions advocates for more pervasive applicability and resilience in AI systems deployed in dynamic and adverse environments.

In conclusion, by establishing standardized benchmarks and demonstrating methods to enhance resilience, this work illuminates critical pathways for improving the robustness of neural networks against commonplace and structural image distortions, laying the groundwork for future systems that robustly generalize beyond the confines of current model training paradigms.

PDF Markdown