Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
139 tokens/sec
GPT-4o
47 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Do Wide and Deep Networks Learn the Same Things? Uncovering How Neural Network Representations Vary with Width and Depth (2010.15327v2)

Published 29 Oct 2020 in cs.LG

Abstract: A key factor in the success of deep neural networks is the ability to scale models to improve performance by varying the architecture depth and width. This simple property of neural network design has resulted in highly effective architectures for a variety of tasks. Nevertheless, there is limited understanding of effects of depth and width on the learned representations. In this paper, we study this fundamental question. We begin by investigating how varying depth and width affects model hidden representations, finding a characteristic block structure in the hidden representations of larger capacity (wider or deeper) models. We demonstrate that this block structure arises when model capacity is large relative to the size of the training set, and is indicative of the underlying layers preserving and propagating the dominant principal component of their representations. This discovery has important ramifications for features learned by different models, namely, representations outside the block structure are often similar across architectures with varying widths and depths, but the block structure is unique to each model. We analyze the output predictions of different model architectures, finding that even when the overall accuracy is similar, wide and deep models exhibit distinctive error patterns and variations across classes.

Citations (247)

Summary

  • The paper demonstrates that overparameterized models develop a characteristic block structure in hidden representations dominated by the first principal component.
  • It employs centered kernel alignment (CKA) to systematically compare representations across varying depths and widths on benchmark datasets.
  • Results reveal that architecture-specific features affect task performance, offering insights for network compression and tailored model design.

Analysis of Neural Networks: Depth, Width, and Learned Representations

The paper "Do Wide and Deep Networks Learn the Same Things? Uncovering How Neural Network Representations Vary with Width and Depth" explores the influence of neural network architecture on internal representations, specifically focusing on variations in depth and width. This paper addresses a fundamental, yet underexplored, aspect of neural network design and provides detailed insights into how architectural choices affect learned representations and model performance.

The empirical investigation conducted in the paper utilizes ResNet architectures trained on CIFAR-10, CIFAR-100, and ImageNet datasets. The authors employ centered kernel alignment (CKA) to measure similarity in hidden representations between models of varying architectures. This method reveals a characteristic block structure in the hidden representations of networks that are either deep or wide, particularly when these models have high capacity in relation to the size of the training data. The block structure is characterized by layers exhibiting high similarity in their learned representations, attributable to a dominant first principal component that is preserved across these layers.

A significant finding of this paper is the identification of the conditions under which block structure arises. The block structure emerges primarily in overparameterized models. Experiments involving training on reduced dataset sizes demonstrate that even narrower or shallower models exhibit block structures when relative capacity is inflated by reduced data size. This finding provides a critical understanding of the relationship between model capacity, relative to training data size, and the formation of learned representation structures.

Furthermore, the paper probes the implications of block structure by analyzing its relationship with a model's principal components. The results show that the block structure is indicative of the first principal component dominating variance representation within the layers, effectively propagating through multiple hidden layers. This phenomenon raises interesting questions regarding the nature of information processing within overparameterized architectures and whether such structures may be leveraged for network compression without substantial performance loss.

Cross-comparisons of representations between different model architectures reveal that, while representations outside the block structure are similar across models, block structure representations are unique to each model, even across different initializations. This specificity holds implications for understanding model generalization and transferability, as unique structures could imply differentiated feature abstraction levels primarily influenced by architecture.

On the practical side, the research explores how architectural variations impact model predictions. Despite similar overall model accuracy, wide networks display proficiency in certain types of tasks, like scene recognition, over others, such as consumer goods classification, where deeper networks perform better. It suggests that while both architectures achieve comparable performance metrics, their error patterns and class accuracies can still exhibit significant variations.

This paper elucidates the nuanced effects architectural choices have on neural network behavior, contributing to a deeper empirical understanding of model design and underlying representation learning. It highlights areas such as efficient model design by leveraging block structure insights for compression, and the architectural implications on task-specific performance, suggesting future work in optimizing architectures tailored to specific dataset characteristics and tasks. The presented methodologies and findings pave the way for more informed architectural decisions in deep learning.

X Twitter Logo Streamline Icon: https://streamlinehq.com
Youtube Logo Streamline Icon: https://streamlinehq.com