Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
97 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
5 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Generalisation in humans and deep neural networks (1808.08750v3)

Published 27 Aug 2018 in cs.CV, cs.AI, cs.LG, q-bio.NC, and stat.ML

Abstract: We compare the robustness of humans and current convolutional deep neural networks (DNNs) on object recognition under twelve different types of image degradations. First, using three well known DNNs (ResNet-152, VGG-19, GoogLeNet) we find the human visual system to be more robust to nearly all of the tested image manipulations, and we observe progressively diverging classification error-patterns between humans and DNNs when the signal gets weaker. Secondly, we show that DNNs trained directly on distorted images consistently surpass human performance on the exact distortion types they were trained on, yet they display extremely poor generalisation abilities when tested on other distortion types. For example, training on salt-and-pepper noise does not imply robustness on uniform white noise and vice versa. Thus, changes in the noise distribution between training and testing constitutes a crucial challenge to deep learning vision systems that can be systematically addressed in a lifelong machine learning approach. Our new dataset consisting of 83K carefully measured human psychophysical trials provide a useful reference for lifelong robustness against image degradations set by the human visual system.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (6)
  1. Robert Geirhos (28 papers)
  2. Carlos R. Medina Temme (1 paper)
  3. Jonas Rauber (13 papers)
  4. Heiko H. Schütt (12 papers)
  5. Matthias Bethge (103 papers)
  6. Felix A. Wichmann (19 papers)
Citations (564)

Summary

  • The paper demonstrates that human object recognition remains robust under diverse image degradations, unlike deep neural networks.
  • The study finds that specific DNNs can excel on trained distortions but fail to generalise across different types of noise.
  • The research emphasizes the need to integrate human perceptual principles to develop more adaptable and resilient AI systems.

Generalisation in Humans and Deep Neural Networks

This paper presents a detailed examination of the robustness and generalization capabilities of humans and deep neural networks (DNNs) when faced with various image degradations. It provides a comprehensive comparative analysis that highlights key differences in visual recognition performance between human observers and three popular DNN architectures: ResNet-152, VGG-19, and GoogLeNet.

Summary of Findings

The research investigates object recognition under twelve different types of image distortions, revealing that humans consistently outperform DNNs in robustness. This robust generalization in humans is observable even as signal-to-noise ratios decrease. Conversely, the DNNs struggle, showing diverging error patterns from human observers. This includes specific biases where DNNs increasingly misclassify inputs into certain categories as the distortions intensify.

Key numerical results include the observation that DNNs trained on specific degraded images can surpass human performance on those exact distortions. However, they exhibit significant limitations in generalizing to new types of noise—indicating a fundamental generalization challenge. Systems trained on one noise type, such as salt-and-pepper noise, performed poorly on another, such as uniform noise. This specificity and lack of transferability mark a stark contrast with human recognition capabilities.

Implications for Machine Learning and Neuroscience

From a practical standpoint, the findings suggest that current deep learning models need substantial improvement for applications requiring robust environmental adaptability. The paper emphasizes the importance of systematic approaches in addressing these challenges, such as lifelong machine learning paradigms that facilitate continuous adaptation rather than reliance on predefined datasets.

Theoretically, the research opens pathways for exploring the underlying mechanisms in human cognition that afford such robust generalization, possibly informing future neural architectures. Concepts such as neural normalizations or local gain control, prevalent in human vision, might be insightful for developing more resilient models. Moreover, the paper underscores the importance of developing DNNs with intrinsic shape biases over texture biases, enhancing resilience against noise and distortion.

Future Research Directions

The research prompts several avenues for advancing the field. Key among them is the need for developing architectures and training protocols that inherently support non-i.i.d. (independent and identically distributed) environments—the norm rather than the exception in real-world applications. Understanding and replicating human-like generalization remain essential for achieving robust DNN performance.

Furthermore, future work might explore integrating insights from human perceptual processes, such as feedback mechanisms and dynamic adaptation, into machine learning frameworks. This cross-disciplinary approach could drive the next generation of AI systems that better bridge the gap between technical performance and practical robustness.

Conclusion

This paper significantly contributes to understanding the disparities in generalization capabilities between humans and deep learning systems. It provides a rigorous benchmark and associated data for further exploration. Addressing the challenges highlighted by these findings will be critical for both improving AI robustness and developing more accurate models of human visual cognition. The authors' comprehensive approach and openly accessible data underscore the paper's value as a resource for advancing research in machine learning and cognitive neuroscience.