Papers

Topics

Authors

Recent

View all

Gemini 2.5 Flash

139 tokens/sec

GPT-4o

7 tokens/sec

Gemini 2.5 Pro Pro

46 tokens/sec

o3 Pro

4 tokens/sec

GPT-4.1 Pro

38 tokens/sec

DeepSeek R1 via Azure Pro

28 tokens/sec

2000 character limit reached

MindSet: Vision. A toolbox for testing DNNs on key psychological experiments (2404.05290v1)

Published 8 Apr 2024 in cs.CV and cs.AI

Abstract: Multiple benchmarks have been developed to assess the alignment between deep neural networks (DNNs) and human vision. In almost all cases these benchmarks are observational in the sense they are composed of behavioural and brain responses to naturalistic images that have not been manipulated to test hypotheses regarding how DNNs or humans perceive and identify objects. Here we introduce the toolbox MindSet: Vision, consisting of a collection of image datasets and related scripts designed to test DNNs on 30 psychological findings. In all experimental conditions, the stimuli are systematically manipulated to test specific hypotheses regarding human visual perception and object recognition. In addition to providing pre-generated datasets of images, we provide code to regenerate these datasets, offering many configurable parameters which greatly extend the dataset versatility for different research contexts, and code to facilitate the testing of DNNs on these image datasets using three different methods (similarity judgments, out-of-distribution classification, and decoder method), accessible at https://github.com/MindSetVision/mindset-vision. We test ResNet-152 on each of these methods as an example of how the toolbox can be used.

References (119)

Summary

The paper introduces MindSet: Vision, a toolbox leveraging 30 psychological experiments to benchmark deep neural networks with human visual perception.
It provides customizable datasets and scripts for methods like out-of-distribution classification, similarity judgment, and decoder analysis.
The toolbox offers practical insights into aligning artificial vision with biological processing, guiding future refinements in DNN models.

Bridging the Gap Between Deep Neural Networks and Human Vision Through Psychological Phenomena

Introduction

Deep neural networks (DNNs) have emerged as the leading approach to object recognition, closely mimicking the capabilities of biological vision. However, the majority of benchmarks evaluating these models rely on observational data, lacking the rigor of experimental manipulations designed to probe the underlying mechanisms of visual perception. The paper introduces MindSet: Vision, a comprehensive toolbox that leverages 30 psychological findings to systematically assess DNNs against human visual perception and object recognition through manipulated visual stimuli. This initiative is poised to refine our understanding of DNN models in relation to human vision by providing a versatile framework for conducting controlled experiments.

Overview of MindSet: Vision

MindSet: Vision offers a rich collection of image datasets alongside scripts for regenerating datasets with customizable parameters, enhancing the flexibility of the toolbox for various research contexts. The toolbox is complemented by scripts facilitating the testing of DNNs using similarity judgments, out-of-distribution classification, and a decoder method. Tested on the ResNet-152 architecture, these resources aim to foster a granular understanding of the alignment between DNNs and human visual processing.

Datasets

The datasets span a wide array of visual phenomena, from low and mid-level vision effects, such as Weber's Law and Gestalt phenomena, to visual illusions and complex object recognition tasks. These datasets are designed to address the nuances of visual perception, including sensitivity to emergent features, non-accidental properties, visual illusions, and the robustness of DNN object recognition to variations in visual presentation. In deploying these datasets, the toolbox targets the critical evaluation of DNN models against established psychological findings on human vision.

Testing Methods

MindSet: Vision proposes three primary methods for assessing DNNs: out-of-distribution classification, similarity judgment analysis, and the decoder method. Each method targets a different aspect of visual processing, ranging from the model's ability to classify images that deviate from the training distribution to the comparison of internal activation patterns and the extraction of visual information encoded at various network layers. These methods are instrumental in dissecting the complex relationship between DNN representations and human perceptual judgments.

Implications and Future Directions

By systematically challenging DNNs with experimentally manipulated stimuli grounded in psychological research, MindSet: Vision catalyzes a deeper exploration of the consonance and discordance between artificial and biological vision. The toolbox not only underscores the disparities in processing visual information but also illuminates the pathways through which DNNs can be refined to more accurately emulate human visual perception.

Conclusion

MindSet: Vision represents a pivotal step toward integrating the rich empirical tradition of psychological research on visual perception with contemporary computational modeling efforts. Through a meticulously curated set of datasets and methodological approaches, this toolbox enables researchers to conduct stringent tests on DNNs, advancing the field's understanding of the complex interplay between artificial and human vision. The journey toward crafting DNNs that truly mirror human visual capabilities is fraught with challenges, yet MindSet: Vision lays a robust foundation for bridging the existing gaps.

PDF Markdown

Tweets

https://twitter.com/sandervanbree/status/1780355976912052320

https://twitter.com/jeffrey_bowers/status/1798742784901603460

https://twitter.com/jeffrey_bowers/status/1858879993641656432

https://twitter.com/jeffrey_bowers/status/1848492081662033993