MindSet: Vision. A toolbox for testing DNNs on key psychological experiments (2404.05290v1)
Abstract: Multiple benchmarks have been developed to assess the alignment between deep neural networks (DNNs) and human vision. In almost all cases these benchmarks are observational in the sense they are composed of behavioural and brain responses to naturalistic images that have not been manipulated to test hypotheses regarding how DNNs or humans perceive and identify objects. Here we introduce the toolbox MindSet: Vision, consisting of a collection of image datasets and related scripts designed to test DNNs on 30 psychological findings. In all experimental conditions, the stimuli are systematically manipulated to test specific hypotheses regarding human visual perception and object recognition. In addition to providing pre-generated datasets of images, we provide code to regenerate these datasets, offering many configurable parameters which greatly extend the dataset versatility for different research contexts, and code to facilitate the testing of DNNs on these image datasets using three different methods (similarity judgments, out-of-distribution classification, and decoder method), accessible at https://github.com/MindSetVision/mindset-vision. We test ResNet-152 on each of these methods as an example of how the toolbox can be used.
- E. H. Adelson. Checkershadow Illusion. Perceptual Science Group, 2005.
- Size-contrast illusions deceive the eye but not the hand. Current Biology, 5(6):679–685, June 1995.
- Exploring the Jastrow Illusion in Humans ( Homo sapiens), Rhesus Monkeys ( Macaca mulatta), and Capuchin Monkeys ( Sapajus apella). Perception, 48(5):367–385, May 2019.
- A massive 7T fMRI dataset to bridge cognitive neuroscience and artificial intelligence. Nature neuroscience, 25(1):116–126, January 2022.
- Sensitivity to nonaccidental properties across various shape dimensions. Vision Research, 62:35–43, June 2012.
- Greater sensitivity to nonaccidental than metric shape properties in preschool children. Vision Research, 97:83–88, 2014.
- Monya Baker. Over half of psychology studies fail reproducibility test. Nature, August 2015.
- Deep learning models fail to capture the configural nature of human shape perception. iScience, 25(9), September 2022.
- Deep learning models fail to capture the configural nature of human shape perception. iScience, 25(9):104913, September 2022.
- Deep convolutional networks do not classify based on global object shape. PLoS Computational Biology, 14(12):1–43, 2018.
- H. B. Barlow. Optic nerve impulses and Weber’s law. Cold Spring Harbor symposia on quantitative biology, 30:539–546, 1965.
- Capturing human categorization of natural images by combining deep networks and cognitive models. Nature Communications 2020 11:1, 11(1):1–14, October 2020.
- A Theory of Textural Segmentation. In Human and Machine Vision, pages 1–38. Elsevier, 1983.
- Shared visual illusions between humans and artificial neural networks. 2019 Conference on Cognitive Computational Neuroscience, August 2019.
- The Thatcher illusion and face processing in infancy. Developmental Science, 7(4):431–436, 2004.
- Irving Biederman. Recognition-by-Components: A Theory of Human Image Understanding. Psychological Review, M(2):115–147, 1987.
- Irving Biederman. Recognition-by-Components: A Theory of Human Image Understanding. Psychological Review, 94(2):115–147, 1987.
- Priming contour-deleted images: Evidence for intermediate representations in visual object recognition. Cognitive Psychology, 23(3):393–419, July 1991.
- Surface versus edge-based determinants of visual recognition. Cognitive Psychology, 20(1):38–64, January 1988.
- Convolutional Neural Networks Are Not Invariant to Translation, but They Can Learn to Be. Journal of Machine Learning Research, 22(229):1–28, 2021.
- Learning online visual invariances for novel objects via supervised and self-supervised training. Neural Networks, March 2022.
- Mixed Evidence for Gestalt Grouping in Deep Neural Networks. Computational Brain and Behavior, 6(3):438–456, September 2023.
- A case for robust translation tolerance in humans and CNNs. A commentary on Han et al. arXiv prepring arXiv: 2012.05950, December 2020.
- The human visual system and CNNs can both support robust online translation tolerance following extreme displacements. Journal of Vision, 21(2):1–16, 2021.
- Comparing neural correlates of configural processing in faces and objects: An ERP study of the Thatcher illusion. NeuroImage, 32(1):352–367, August 2006.
- On the importance of severely testing deep learning models of cognition. Cognitive Systems Research, 82:101158, December 2023.
- Deep problems with neural network models of human vision. The Behavioral and brain sciences, 46, December 2022.
- Deep Problems with Neural Network Models of Human Vision. Behavioral and Brain Sciences, pages 1–74, December 2023.
- The Ebbinghaus illusion modulates visual search for size-defined targets: Evidence for preattentive processing of apparent object size. Perception & Psychophysics, 66(3):475–495, April 2004.
- Deep Neural Networks Rival the Representation of Primate IT Cortex for Core Visual Object Recognition. PLOS Computational Biology, 10(12):e1003963, December 2014.
- Changing the Real Viewing Distance Reveals the Temporal Evolution of Size Constancy in Visual Cortex. Current Biology, 29(13):2237–2243.e4, July 2019.
- Covariance descriptors on a Gaussian manifold and their application to image set classification. Pattern Recognition, 107:107463, November 2020.
- Colin W.G. Clifford. The tilt illusion: Phenomenology and functional implications. Vision Research, 104:3–11, November 2014.
- The Thatcher illusion in humans and monkeys. Proceedings of the Royal Society B: Biological Sciences, 277(1696):2973–2981, October 2010.
- R. H. Day and H. Knuth. The Contributions of F C Müller-Lyer. http://dx.doi.org/10.1068/p100126, 10(2):126–146, April 1981.
- Developing the Leuven Embedded Figures Test (L-EFT): Testing the stimulus features that influence embedding. PeerJ, 5, 2017.
- Scaling Vision Transformers to 22 Billion Parameters, February 2023.
- Let’s move forward: Image-computable models and a common model evaluation scheme are prerequisites for a scientific understanding of human vision. The Behavioral and brain sciences, 46, December 2023.
- Crowding reveals fundamental differences in local vs. global processing in humans and machines. Vision Research, 167:39–45, February 2020.
- Beyond Bouma’s window: How to explain global aspects of crowding? PLOS Computational Biology, 15(5):e1006580, May 2019.
- The pitfalls of measuring representational similarity using representational similarity analysis. bioRxiv, page 2022.04.05.487135, April 2022.
- Preattentive recovery of three-dimensional orientation from line drawings. Psychological review, 98(3):335–351, 1991.
- Visual shape perception as Bayesian inference of 3D object-centered shape representations. Psychological Review, 124(6):740–761, 2017.
- Biological convolutions improve DNN robustness to noise and generalisation, September 2021.
- Biological convolutions improve DNN robustness to noise and generalisation. Neural Networks, 148:96–110, April 2022.
- Deep neural models for color classification and color constancy. Journal of Vision, 22(4):17–17, March 2022.
- Neural dynamics of grouping and segmentation explain properties of visual crowding. Psychological Review, 124(4):483–504, July 2017.
- Illusion effects on grasping are temporally constant not dynamic. Journal of experimental psychology. Human perception and performance, 31(6):1359–1378, December 2005.
- Retinal Ganglion Cell Adaptation to Small Luminance Fluctuations. Journal of Neurophysiology, 104(2):704, August 2010.
- Partial success in closing the gap between human and machine vision. June 2021.
- ImageNet-trained CNNs are biased towards texture; increasing shape bias improves accuracy and robustness, November 2022.
- J. J. Gibson and M. Radner. Adaptation, after-effect and contrast in the perception of tilted lines. Journal of Experimental Psychology, 20(5):453–467, May 1937.
- Deep neural networks are not a single hypothesis but a language for expressing computational hypotheses. The Behavioral and brain sciences, 46, December 2023.
- Color illusions also deceive CNNs for low-level vision tasks: Analysis and implications. Vision Research, 176:156–174, November 2020.
- Convolutional Neural Networks Deceived by Visual Illusions. November 2018.
- Richard L. Gregory. Knowledge in perception and illusion. Philosophical Transactions of the Royal Society B: Biological Sciences, 352(1358):1121, August 1997.
- Stephen Grossberg. The quantized geometry of visual space: The coherent computation of depth, form, and lightness. Behavioral and Brain Sciences, 6(4):625–657, 1983.
- Stephen Grossberg. The visual world as illusion: The ones we know and the ones we don’t. In The Oxford Compendium of Visual Illusions, pages 90–118. Oxford University Press, New York, NY, US, 2017.
- Position shifts of fMRI-based population receptive fields in human visual cortex induced by Ponzo illusion. Experimental brain research, 233(12):3535–3541, December 2015.
- THINGS-data, a multimodal collection of large-scale datasets for investigating object representations in human brain and behavior. eLife, 12, February 2023.
- Neural recruitment explains “weber’s law” of spatial position. Vision Research, 33(12):1673–1684, August 1993.
- J. Hochberg and V. Brooks. Pictorial recognition as an unlearned ability: A study of one child’s performance. The American journal of psychology, 75:624–628, 1962.
- Parts of recognition. Cognition, 18(1-3):65–96, December 1984.
- Categorical relations in shape perception. Spatial Vision, 10(3):201–236, 1996.
- Qualitative similarities and differences in visual object representations between brains and deep networks. Nature Communications 2021 12:1, 12(1):1–14, March 2021.
- Joseph Jastrow. Studies from the Laboratory of Experimental Psychology of the University of Wisconsin. II. The American Journal of Psychology, 4(3):381–428, 1892.
- Using artificial neural networks to ask ‘why’ questions of minds and brains. Trends in Neurosciences, 46(3):240–254, March 2023.
- Shape Tuning in Macaque Inferior Temporal Cortex. Journal of Neuroscience, 23(7):3016–3027, April 2003.
- Matching categorical object representations in inferior temporal cortex of man and monkey. Neuron, 60(6):1126–1141, December 2008.
- Brain-Like Object Recognition with High-Performing Shallow Recurrent ANNs. 33rd Conference on Neural Information Processing Systems (NeurIPS 2019), 2019.
- Sensitivity to nonaccidental configurations of two-line stimuli. i-Perception, 8(2):1–12, April 2017.
- Emergence of perceptual Gestalts in the human visual cortex: The case of the configural-superiority effect. Psychological science, 22(10):1296–1303, 2011.
- Recurrent Neural Circuits for Contour Detection. Iclr, pages 1–23, 2020.
- Object classification for human and ideal observers. Vision Research, 35(4):549–568, 1995.
- Latent Noise Segmentation: How Neural Noise Leads to the Emergence of Segmentation and Grouping, September 2023.
- Lightness constancy in primary visual cortex. Proceedings of the National Academy of Sciences of the United States of America, 98(15):8827–8831, July 2001.
- Human shape representations are not an emergent property of learning to classify objects. bioRxiv, page 2021.12.14.472546, August 2022.
- Wes McKinney. Data Structures for Statistical Computing in Python. In Python in Science Conference, pages 56–61, Austin, Texas, 2010.
- An ecologically motivated image dataset for deep learning yields better models of human vision. Proceedings of the National Academy of Sciences, 118(8), February 2021.
- Visual Surface Representation: A Critical Link between Lower-Level and Higher-Level Vision. An Invitation to Cognitive Science, October 1995.
- Experiencing and Perceiving Visual Surfaces. Science, 257(5075):1357–1363, September 1992.
- A double dissociation between action and perception in bimanual grasping: Evidence from the Ponzo and the Wundt–Jastrow illusions. Scientific Reports 2020 10:1, 10(1):1–10, September 2020.
- A Metaanalysis of Perceptual Organization in Schizophrenia, Schizotypy, and Other High-Risk Groups Based on Variants of the Embedded Figures Task. Frontiers in Psychology, 7, 2016.
- Evaluating (and improving) the correspondence between deep neural networks and human representations. Cognitive Science, 42(8):2648–2669, June 2017.
- Grouping and Emergent Features in Vision: Toward a Theory of Basic Gestalts. Journal of Experimental Psychology: Human Perception and Performance, 37(5):1331–1349, October 2011.
- Grouping and emergent features in vision: Toward a theory of basic Gestalts. Journal of experimental psychology. Human perception and performance, 37(5):1331–1349, October 2011.
- Perception of wholes and of their component parts: Some configural superiority effects. Journal of Experimental Psychology: Human Perception and Performance, 3(3):422–435, 1977.
- Mario Ponzo. Intorno ad alcune illusioni nel campo delle sensazioni tattili, sull’illusione di Aristotele e fenomeni analoghi. Wilhelm Engelmann, 1910.
- Can deep convolutional neural networks support relational reasoning in the same-different task? Journal of Vision, 22(10):11, September 2022.
- R. D.S. Raizada and S. Grossberg. Context-sensitive binding by the laminar circuits of V1 and V2: A unified model of perceptual grouping, attention, and orientation contrast. Visual Cognition, 8(3-5):431–466, 2001.
- Large-Scale, High-Resolution Comparison of the Core Visual Object Recognition Behavior of Humans, Monkeys, and State-of-the-Art Deep Artificial Neural Networks. Journal of Neuroscience, 38(33):7255–7269, August 2018.
- Perceptual grouping leads to objecthood effects in the Ebbinghaus illusion. Journal of Vision, 20(8):11–11, August 2020.
- Early completion of occluded objects. Vision Research, 38(15-16):2489–2505, August 1998.
- Irvin Rock. An Introduction to Perception. Macmillan, New York, NY, 1975.
- Global stimulus configuration modulates crowding. Journal of Vision, 9(2):5–5, February 2009.
- Brain-Score: Which Artificial Neural Network for Object Recognition is most Brain-Like? bioRxiv, page 407007, September 2018.
- Integrative Benchmarking to Advance Neurally Mechanistic Models of Human Intelligence. Neuron, 108(3):413–423, November 2020.
- What is the function of the orientation-tilt illusion? Journal of Vision, 20(11):868, October 2020.
- From photos to sketches - how humans and deep neural networks process objects across different levels of visual abstraction. Journal of Vision, 22(2):4–4, February 2022.
- Intra-hemispheric integration underlies perception of tilt illusion. NeuroImage, 175:80–90, July 2018.
- The Müller-Lyer illusion in the teleost fish Xenotoca eiseni. Animal cognition, 19(1):123–132, January 2016.
- Retinotopic activity in V1 reflects the perceived and not the retinal size of an afterimage. Nature neuroscience, 15(4):540–542, April 2012.
- Unsupervised learning predicts human perception and misperception of gloss. Nature Human Behaviour 2021 5:10, 5(10):1402–1417, May 2021.
- Keiji Tanaka. Inferotemporal Cortex and Object Vision. Annual Review of Neuroscience, 19(1):109–139, March 1996.
- P. Thompson. Margaret Thatcher: A new illusion. Perception, 9(4):483–484, 1980.
- The Leuven Perceptual Organization Screening Test (L-POST), an online test to assess mid-level visual perception. Behavior Research Methods, 46(2):472–487, November 2014.
- Michel Treisman. Noise and Weber’s law: The discrimination of brightness and other dimensions. Psychological Review, 71(4):314–330, July 1964.
- Are Convolutional Neural Networks or Transformers more like human vision? Proceedings of the 43rd Annual Meeting of the Cognitive Science Society: Comparative Cognition: Animal Minds, CogSci 2021, pages 1844–1850, May 2021.
- H. von Helmholtz. Handbuch Der Physiologischen Optik. Leipziv: Voss, 1867.
- Emily J Ward. Exploring perceptual illusions in deep neural networks. Journal of Vision, 19(10):34b–34b, September 2019.
- Illusory Motion Reproduced by Deep Neural Networks Trained for Prediction. Frontiers in Psychology, 9, March 2018.
- E. H. Weber. Der Tastsinn und das Gemeingefühl. Handwörterbuch der Physiologie, 3:481–588, 1983.
- Gerald Westhhmer. Simultaneous orientation contrast for lines in the human fovea. Vision Research, 30(11):1913–1921, January 1990.
- Looking at the Ebbinghaus illusion: Differences in neurocomputational requirements, not gaze-mediated attention, explain a classic perception-action dissociation. Philosophical Transactions of the Royal Society B: Biological Sciences, 378(1869):20210459, December 2022.
- Does Thompson’s Thatcher Effect Reflect a Face-Specific Mechanism? Perception, 39(8):1125–1141, August 2010.
- Early computation of part structure: Evidence from visual search. Perception & psychophysics, 64(7):1039–1054, 2002.
- Müller-Lyer illusion is Replicated by Higher Layer of Pre-trained Deep Neural Network for Object Recognition M ¨ uller-Lyer illusion is Replicated by Higher Layer of Pre-trained Deep Neural Network for Object Recognition. In The 10th International Symposium on Computational Intelligence and Industrial Applications (ISCIIA2022, Zhang2022mullerlyer, 2022.
- Grounding Visual Illusions in Language: Do Vision-Language Models Perceive Illusions Like Humans? October 2023.
- Unsupervised Neural Network Models of the Ventral Visual Stream. bioRxiv, page 2020.06.16.155556, 2020.