Convergent Learning: Do different neural networks learn the same representations? (1511.07543v3)

Published 24 Nov 2015 in cs.LG and cs.NE

Abstract: Recent success in training deep neural networks have prompted active investigation into the features learned on their intermediate layers. Such research is difficult because it requires making sense of non-linear computations performed by millions of parameters, but valuable because it increases our ability to understand current models and create improved versions of them. In this paper we investigate the extent to which neural networks exhibit what we call convergent learning, which is when the representations learned by multiple nets converge to a set of features which are either individually similar between networks or where subsets of features span similar low-dimensional spaces. We propose a specific method of probing representations: training multiple networks and then comparing and contrasting their individual, learned representations at the level of neurons or groups of neurons. We begin research into this question using three techniques to approximately align different neural networks on a feature level: a bipartite matching approach that makes one-to-one assignments between neurons, a sparse prediction approach that finds one-to-many mappings, and a spectral clustering approach that finds many-to-many mappings. This initial investigation reveals a few previously unknown properties of neural networks, and we argue that future research into the question of convergent learning will yield many more. The insights described here include (1) that some features are learned reliably in multiple networks, yet other features are not consistently learned; (2) that units learn to span low-dimensional subspaces and, while these subspaces are common to multiple networks, the specific basis vectors learned are not; (3) that the representation codes show evidence of being a mix between a local code and slightly, but not fully, distributed codes across multiple units.

Citations (331)

View on Semantic Scholar

Summary

The paper demonstrates that neural networks share a core set of similar features despite variations in individual neuron representations.
The study employs bipartite matching, sparse prediction, and spectral clustering to rigorously analyze internal representations.
The findings suggest that these convergent features can enhance model compression and ensemble methods for efficient neural network design.

Convergent Learning in Neural Networks: An Examination of Representation Similarity

The paper "Convergent Learning: Do different neural networks learn the same representations?" explores the intriguing question of whether neural networks trained independently, yet with identical architectures and data, converge to similar internal representations. Given the opacity of neural networks due to their non-linear computations and a multitude of parameters, the analysis of their intermediate transformations is challenging yet crucial for understanding and improving machine learning models.

Methodological Framework

The authors approach this investigation using an experimental framework where multiple neural networks are trained from different random initializations. The core inquiry revolves around determining the extent of similarity in the features learned by these networks. The paper employs three primary techniques to analyze alignments:

Bipartite Matching: A one-to-one correspondence is sought between neurons in networks, examining if a unit in one network can find an equivalent counterpart in another.
Sparse Prediction and Clustering: This technique departs from strict one-to-one mappings, allowing examination of one-to-many relationships. It uses LASSO models to predict units in one network as sparse linear combinations of units in another, thus capturing the semi-distributed nature of representations.
Spectral Clustering: By constructing neuron similarity graphs, spectral methods help identify many-to-many mappings, revealing subspaces and higher-dimensional relationships between networks.

Significant Findings

The investigation uncovers several enlightening insights into neural network behavior:

Core and Unique Features: It becomes evident that while some features are reliably learned across networks (indicative of a core set), others are network-specific. This implies diversity in learning paths even under similar conditions.
Subspace Learning: Neurons tend to span low-dimensional subspaces that are consistent among networks, but the specific basis vectors differ. This result highlights the semi-distributed nature of neural representations.
Mixed Representation Codes: The paper reveals that neural codes are neither entirely local nor fully distributed but exist on a spectrum between these two paradigms.
Convergence in Activation Distributions: Despite internal variability, networks exhibit convergence in the distribution of average neuron activations, reinforcing the notion of reliability in feature learning beyond individual variations.

Practical and Theoretical Implications

From a practical perspective, understanding the convergence of features among networks can guide the design of architectures and learning algorithms that are more robust and efficient. For instance, insights into feature regularity could optimize ensemble methods or aid in model compression strategies by identifying and preserving only unique representations.

Theoretically, the findings contribute to the broader exploration of neural representation theory. They offer a groundwork for advancing concepts concerning representation redundancy and diversity, potentially informing better generalization in deep learning frameworks.

Future Directions

The paper suggests several avenues for future research:

Investigation of model compression by selectively removing non-essential features.
Exploration of ensemble optimization by leveraging shared and unique features across networks.
Application of these insights to architectures with varied configurations or datasets, broadening the scope and robustness of the conclusions drawn.

In synthesis, this research provides a foundation for understanding the extent of feature convergence in neural networks, challenging researchers to further explore the underlying mechanisms governing the intricate balance between unique and shared learned representations. This examination is critical for evolving more intelligent and adaptive machine learning systems.

PDF Markdown

Related Papers

Tweets

https://twitter.com/jeffclune/status/1925245501781549402

https://twitter.com/RainerEngelken/status/1848420271968964743

https://twitter.com/zhouql1978/status/1926599336470683936

YouTube

Show All Videos