Comparing deep neural networks against humans: object recognition when the signal gets weaker (1706.06969v2)

Published 21 Jun 2017 in cs.CV, q-bio.NC, and stat.ML

Abstract: Human visual object recognition is typically rapid and seemingly effortless, as well as largely independent of viewpoint and object orientation. Until very recently, animate visual systems were the only ones capable of this remarkable computational feat. This has changed with the rise of a class of computer vision algorithms called deep neural networks (DNNs) that achieve human-level classification performance on object recognition tasks. Furthermore, a growing number of studies report similarities in the way DNNs and the human visual system process objects, suggesting that current DNNs may be good models of human visual object recognition. Yet there clearly exist important architectural and processing differences between state-of-the-art DNNs and the primate visual system. The potential behavioural consequences of these differences are not well understood. We aim to address this issue by comparing human and DNN generalisation abilities towards image degradations. We find the human visual system to be more robust to image manipulations like contrast reduction, additive noise or novel eidolon-distortions. In addition, we find progressively diverging classification error-patterns between humans and DNNs when the signal gets weaker, indicating that there may still be marked differences in the way humans and current DNNs perform visual object recognition. We envision that our findings as well as our carefully measured and freely available behavioural datasets provide a new useful benchmark for the computer vision community to improve the robustness of DNNs and a motivation for neuroscientists to search for mechanisms in the brain that could facilitate this robustness.

Authors (6)

Robert Geirhos (28 papers)
David H. J. Janssen (1 paper)
Heiko H. Schütt (12 papers)
Jonas Rauber (13 papers)
Matthias Bethge (103 papers)
Felix A. Wichmann (19 papers)

Citations (239)

View on Semantic Scholar

Summary

The paper demonstrates that while DNNs excel on standard tasks, their accuracy drops significantly under image degradations compared to human vision.
The study uses controlled experiments comparing human subjects with AlexNet, GoogLeNet, and VGG-16 across grayscale, contrast reduction, noise, and eidolon distortions.
The findings suggest that integrating bio-inspired strategies into DNN training could improve their resilience, bridging the gap with human perceptual robustness.

A Comparative Analysis of Deep Neural Networks and Human Object Recognition Under Degradations

The paper presented in the paper "Comparing Deep Neural Networks against Humans: Object Recognition When the Signal Gets Weaker" provides a nuanced examination of the robustness of deep neural networks (DNNs) in comparison to human visual systems under various image degradation conditions. The research focuses on understanding the differences in image processing between the primate visual systems and DNNs, particularly concerning object recognition tasks when signals are weakened by image manipulations.

Methodology

The researchers employed a multi-faceted experimental design, testing both human participants and three specific DNN architectures: AlexNet, GoogLeNet, and VGG-16. The human and DNN performances were assessed under four types of image degradation: conversion to grayscale, reduction of image contrast, addition of uniform white noise, and novel eidolon distortions. Trials were designed to isolate feedforward processing by using brief image presentations (200 ms) masked by high-contrast noise.

Findings

The experiments revealed that while DNNs often surpass human accuracy on standardized ImageNet tasks, they display less resilience under non-standard conditions. Specifically, when images are converted to grayscale, DNNs exhibit a notable drop in performance compared to humans, who appear less reliant on color cues. More strikingly, under conditions of reduced contrast and increased noise, the disparity becomes more pronounced. Human visual systems maintain higher accuracy and demonstrate less bias in their response distributions compared to DNNs, which tend to erroneously converge on a few categories as signal quality degrades.

Under eidolon distortions, where image coherence parameters were varied, human observers significantly outperformed DNNs, indicating that human visual processing possesses robust generalization capabilities towards novel distortive stimuli not typically encountered. This superior generalization suggests that humans might rely on advanced processing mechanisms, possibly involving mid-level vision strategies, for depth-layering which current DNNs lack.

Discussion

The results highlight critical areas where DNNs diverge from human-like processing, particularly in robustness to common image distortions. Such findings emphasize the necessity for augmented training regimes that include varying forms of image degradation or architectural innovations that mirror human adaptive strategies—potentially taking cues from bio-inspired mechanisms like contrast gain control.

From a theoretical standpoint, these insights underline the limitations of current DNN models as accurate analogs for biological vision systems in the presence of less-than-ideal image inputs. Practically, the research urges the computer vision community to refine existing network designs or training paradigms, bolstered by behavioral benchmarks resembling human performance under challenging conditions.

Implications for Future Research

Moving forward, the findings warrant exploration into hybrid models incorporating elements of human vision that contribute to robustness. Furthermore, the implications of these results suggest a broader need for interdisciplinary collaboration between neuroscientists and artificial intelligence experts to identify and integrate the biological strategies effective in human vision.

Overall, this paper contributes substantially to the understanding of where DNNs falter against human object recognition and posits vital considerations for advancing computer vision capabilities in real-world scenarios with varying image quality. The data and methods shared provide a valuable resource for ongoing research efforts to bridge the gap between artificial and natural vision systems.

PDF Markdown

Related Papers

GitHub

GitHub - rgeirhos/object-recognition: Data and materials from the paper "Comparing deep neural networks against humans: object recognition when the signal gets weaker" (arXiv 2017) (35 stars)

YouTube

Show All Videos