Cognitive Psychology for Deep Neural Networks: A Shape Bias Case Study (1706.08606v2)

Published 26 Jun 2017 in stat.ML, cs.CV, and cs.LG

Abstract: Deep neural networks (DNNs) have achieved unprecedented performance on a wide range of complex tasks, rapidly outpacing our understanding of the nature of their solutions. This has caused a recent surge of interest in methods for rendering modern neural systems more interpretable. In this work, we propose to address the interpretability problem in modern DNNs using the rich history of problem descriptions, theories and experimental methods developed by cognitive psychologists to study the human mind. To explore the potential value of these tools, we chose a well-established analysis from developmental psychology that explains how children learn word labels for objects, and applied that analysis to DNNs. Using datasets of stimuli inspired by the original cognitive psychology experiments, we find that state-of-the-art one shot learning models trained on ImageNet exhibit a similar bias to that observed in humans: they prefer to categorize objects according to shape rather than color. The magnitude of this shape bias varies greatly among architecturally identical, but differently seeded models, and even fluctuates within seeds throughout training, despite nearly equivalent classification performance. These results demonstrate the capability of tools from cognitive psychology for exposing hidden computational properties of DNNs, while concurrently providing us with a computational model for human word learning.

Authors (4)

Samuel Ritter (2 papers)
David G. T. Barrett (16 papers)
Adam Santoro (32 papers)
Matt M. Botvinick (1 paper)

Citations (188)

View on Semantic Scholar

Summary

Cognitive Psychology for Deep Neural Networks: A Shape Bias Case Study

The paper explores the potential to use cognitive psychology methodologies to interpret the often opaque operations of deep neural networks (DNNs). By leveraging the historical and theoretical foundations laid by developmental psychology, the authors examine how certain biases, specifically the shape bias hypothesis, manifest in DNNs trained on one-shot learning tasks using ImageNet data. This work specifically investigates the shape bias—a tendency to categorize objects based on shape rather than color—observed in development psychology, similarly occurring in DNN architectures such as Matching Networks (MNs) and Inception Baseline models.

Key Findings and Analysis

Shape Bias in DNNs:
- The paper finds that state-of-the-art DNNs trained for one-shot learning exhibit a strong shape bias, paralleling observations in human cognitive development where children learn new word labels predominantly applying to similarly shaped objects rather than those matching in color.
- This shape bias fluctuates greatly with the model’s initialization and varies dynamically during the training process, indicating divergent qualitative solutions from ostensibly identical models.
Implications for Model Interpretability:
- This investigation provides empirical evidence that neural networks possess implicit inductive biases. Such findings underscore the rich insights available from incorporating theories from cognitive psychology, which provide frameworks for hypothesizing and testing the biases guiding neural network behavior.
- The consistent bias propagation across composite model components, such as those between Inception and MNs, suggests a potential for biases to be inherited through model architecture, necessitating careful consideration during model selection and integration in practical applications.
Practical Applications and Considerations:
- Recognizing and gauging inherent biases is crucial, especially in domains where said biases may detract from model efficacy (e.g., modeling fruit categories where color is primary).
- Design modifications and post-hoc techniques, like strategic seed initialization or selective model tuning, could mitigate undesired biases or leverage desired ones, depending on the use case.

Theoretical and Practical Implications

These findings play a dual role in advancing cognitive science and improving machine learning systems:

For Cognitive Modeling: The presented methodology offers a new computational framework to replicate human-like cognitive biases in neural network architectures. The convergence of machine behavior with human psychological data presents opportunities to explore human cognitive theory validation or to hypothesize new psychological principles rooted in empirical machine learning analysis.
For Machine Learning Development: The strategic application of cognitive psychology techniques as an auxiliary tool furnishes a more comprehensive interpretative layer over state-of-the-art learning models. As neural networks are increasingly deployed to solve complex, high-stakes problems, interpreting their behavior through refined psychological molds holds promise for enhanced transparency and reliability.

Future Directions

This paper lays foundational work for subsequent exploration extending cognitive psychological experiments to broader artificial intelligence facets, compelling further examination into biases within more nuanced machine cognition tasks akin to human mental constructs. The prospects for leveraging extensive psychological research into ongoing AI development promises profound interdisciplinary collaboration, potentially unlocking deeper levels of understanding and cross-disciplinary advancement in both fields.

In conclusion, the cross-application of cognitive psychology to DNN interpretability presents an innovative approach to probing the computational idiosyncrasies of established neural models, offering pathways to both refine machine learning techniques and provide new insights into human cognitive processes.