Eigen-Distortions of Hierarchical Representations (1710.02266v3)

Published 6 Oct 2017 in cs.CV

Abstract: We develop a method for comparing hierarchical image representations in terms of their ability to explain perceptual sensitivity in humans. Specifically, we utilize Fisher information to establish a model-derived prediction of sensitivity to local perturbations of an image. For a given image, we compute the eigenvectors of the Fisher information matrix with largest and smallest eigenvalues, corresponding to the model-predicted most- and least-noticeable image distortions, respectively. For human subjects, we then measure the amount of each distortion that can be reliably detected when added to the image. We use this method to test the ability of a variety of representations to mimic human perceptual sensitivity. We find that the early layers of VGG16, a deep neural network optimized for object recognition, provide a better match to human perception than later layers, and a better match than a 4-stage convolutional neural network (CNN) trained on a database of human ratings of distorted image quality. On the other hand, we find that simple models of early visual processing, incorporating one or more stages of local gain control, trained on the same database of distortion ratings, provide substantially better predictions of human sensitivity than either the CNN, or any combination of layers of VGG16.

Citations (67)

View on Semantic Scholar

Summary

The paper leverages the Fisher Information Matrix to quantify perceptual sensitivity by identifying extremal eigen-distortions in neural network representations.
It reveals that early layers and biologically-based models better predict human visual perception compared to deeper CNN layers.
Rigorous psychophysical experiments highlight the need for neuroscience-aligned designs in advancing human-centric AI applications.

Analyzing Eigen-Distortions in Hierarchical Image Representations

The paper, titled "Eigen-Distortions of Hierarchical Representations," provides an in-depth examination of the relationship between hierarchical image representations in neural networks and their alignment with human perceptual sensitivity. The researchers focus on evaluating how effectively these computational models can predict human sensitivity to image distortions, utilizing a method derived from Fisher information theory.

A primary contribution of this paper is the application of Fisher Information Matrix (FIM) to analyze perceptual sensitivity modeled by different neural architectures. The authors compute the model-derived sensitivity predictions by determining the eigenvectors of the FIM that signify directions of most and least noticeable distortions. This methodological approach allows the researchers to quantify how changes in image representations align with human visual perceptual sensitivity.

The empirical evaluation involves comparing human-discriminable thresholds for extremal eigen-distortions generated from distinct layers of various models, including the VGG16 neural network and simpler models that simulate early visual processing, which are trained to predict human perceptual sensitivity. A key finding is that the early layers of VGG16 provide a reasonable match to human perception, outperforming deeper layers of the network. On the other hand, simple visual processing models, structured to reflect biological vision makeup, outperform both the deep CNNs and models trained from human ratings of image distortions.

Through rigorous psychophysical experiments, the paper demonstrates the limitations of traditional cross-validation metrics and emphasizes the necessity of understanding model predictions in light of human perception. Particularly, it highlights how biologically constrained models with fewer layers can outperform complex, deep networks due to their regularization strength and alignment with known psychophysiological mechanisms.

The results insightfully suggest that a strong understanding of human neuroscience principles is valuable when guiding the development of AI systems aimed at mimicking human perceptual capabilities. This paper raises crucial considerations for applying deep learning architectures in domains seeking to model or interact with human cognitive functions.

Future inquiries could benefit from extending these methodologies across other neural architectures beyond VGG16 and increasing the depth of human behavioral analyses involved in evaluating model performances. The development of more sophisticated hierarchical image representation models might be encouraged to leverage these insights, potentially enhancing AI performance in applications ranging from computer vision to more nuanced human-interactive systems.

PDF Markdown

Related Papers

YouTube

Show All Videos