Universal dimensions of visual representation (2408.12804v1)

Published 23 Aug 2024 in q-bio.NC and cs.CV

Abstract: Do neural network models of vision learn brain-aligned representations because they share architectural constraints and task objectives with biological vision or because they learn universal features of natural image processing? We characterized the universality of hundreds of thousands of representational dimensions from visual neural networks with varied construction. We found that networks with varied architectures and task objectives learn to represent natural images using a shared set of latent dimensions, despite appearing highly distinct at a surface level. Next, by comparing these networks with human brain representations measured with fMRI, we found that the most brain-aligned representations in neural networks are those that are universal and independent of a network's specific characteristics. Remarkably, each network can be reduced to fewer than ten of its most universal dimensions with little impact on its representational similarity to the human brain. These results suggest that the underlying similarities between artificial and biological vision are primarily governed by a core set of universal image representations that are convergently learned by diverse systems.

Citations (2)

View on Semantic Scholar

Summary

The paper demonstrates that deep neural networks learn universal latent dimensions shared across architectures, initializations, and training objectives.
It introduces metrics for universality and brain similarity, revealing that highly shared dimensions align closely with fMRI data from human visual cortex.
The findings imply that a small subspace of dimensions drives semantic organization and representational similarity, offering new insights for neural network design.

Universal Dimensions of Visual Representation

The paper "Universal dimensions of visual representation" by Zirui Chen and Michael F. Bonner aims to explore the extent to which neural network models of vision capture universal features of natural image processing and how these features align with human brain representations. The research investigates whether the brain-aligned representations in neural networks result from shared architectural constraints and task objectives or from intrinsic universal properties of natural images.

Overview of the Study

The primary objective of this paper is to characterize the universality of visual representations in deep neural networks (DNNs) and their alignment with human brain representations as measured by fMRI. By analyzing over 200,000 representational dimensions from neural networks with varied architectures and training objectives, the authors seek to identify dimensions that are universally shared across networks and those that align with human visual cortex representations.

Methodology

The methodology involves several key steps:

Universality and Brain Similarity Metrics: The authors develop metrics to quantify universality (extent to which a dimension is shared across networks) and brain similarity (extent to which a dimension aligns with human brain representations). Universality is computed as the average prediction accuracy of a latent dimension from one network using activations from other networks. Brain similarity is computed similarly using fMRI activations from human subjects as predictors.
Network Variation and Data: Four sets of deep neural networks are analyzed, with variations in initializations, architectures, training objectives, and untrained random weights. These networks are evaluated against a large set of natural images from the Microsoft Common Objects in Context (COCO) database and fMRI data from the Natural Scenes Dataset (NSD).
Analysis Scope: Representational dimensions are examined across all layers of the networks, from early to deep layers, to capture both low-level and high-level visual properties.

Results

The paper reveals several notable findings:

Universal Dimensions Across Networks: Regardless of initialization, architecture, or task objective, neural networks learn a core set of latent dimensions that are universally shared. These dimensions span all depths of network layers.
Brain-Aligned Representations: The universal dimensions show high alignment with human brain representations. Specifically, dimensions that are predictably shared across networks are the ones that align most closely with fMRI data from human visual cortex.
Effect of Training: Trained networks exhibit a strong nonlinear relationship between universality and brain similarity, highlighting that the most universal dimensions are learned representational properties not present in untrained networks.
Semantic Organization: High-level visual representations in these universal dimensions encode semantic properties that group images into meaningful categories, such as people, animals, and objects.
Representational Similarity: Conventional Representational Similarity Analysis (RSA) between neural networks and visual cortex indicates that similarities are predominantly driven by a small subspace of highly universal dimensions. Even drastic reductions to just five or ten dimensions retain high RSA scores.

Implications and Future Directions

The findings suggest that a core set of visual representations is learned across diverse neural networks and shared with the human brain, irrespective of specific architectural or task-related constraints. This highlights a potentially fundamental aspect of how visual systems, both artificial and biological, adapt to the statistics of natural images.

Several exciting directions emerge from this paper:

Cross-Modal Representations: Extending this approach to other modalities, such as language, to examine shared representational dimensions across vision and LLMs.
Universal Initializations: Investigating whether these universal dimensions can be embedded into networks at the initialization stage to enhance learning efficiency and robustness.
Species Comparisons: Comparing the universality of visual representations across different species to understand shared versus species-specific processing mechanisms.

Conclusion

This paper provides robust evidence that the most brain-aligned dimensions in visual neural networks are universal, transcending specific network configurations. These universal dimensions capture both low-level and high-level image properties, offering new insights into the convergent evolution of artificial and biological vision systems. The implications of this research extend to improving neural network design and understanding the fundamental principles governing visual representation.

PDF Markdown

Related Papers

Tweets

https://twitter.com/ziruichen44/status/1828177494874210658

https://twitter.com/jreuben1/status/1828790848894746941