What Do Neural Networks Learn When Trained With Random Labels? (2006.10455v2)

Published 18 Jun 2020 in stat.ML and cs.LG

Abstract: We study deep neural networks (DNNs) trained on natural image data with entirely random labels. Despite its popularity in the literature, where it is often used to study memorization, generalization, and other phenomena, little is known about what DNNs learn in this setting. In this paper, we show analytically for convolutional and fully connected networks that an alignment between the principal components of network parameters and data takes place when training with random labels. We study this alignment effect by investigating neural networks pre-trained on randomly labelled image data and subsequently fine-tuned on disjoint datasets with random or real labels. We show how this alignment produces a positive transfer: networks pre-trained with random labels train faster downstream compared to training from scratch even after accounting for simple effects, such as weight scaling. We analyze how competing effects, such as specialization at later layers, may hide the positive transfer. These effects are studied in several network architectures, including VGG16 and ResNet18, on CIFAR10 and ImageNet.

Citations (82)

View on Semantic Scholar

Summary

The paper demonstrates that pre-training with random labels speeds up fine-tuning due to positive transfer effects.
The study reveals that network weights align with data principal components even with random labels, uncovering underlying data structures.
The paper finds that while early layers benefit from random label training, specialization in later layers may hinder performance on downstream tasks.

Overview of "What Do Neural Networks Learn When Trained With Random Labels?"

This paper investigates the learning mechanisms of deep neural networks (DNNs) trained on natural image data bearing entirely random labels. Such a scenario, despite its prevalent use for studying phenomena like memorization and generalization, remains incompletely understood in the literature. The research presented here focuses particularly on the alignment between the principal components of network parameters and data during training with random labels. It provides both analytical proofs and empirical evidence of this alignment effect across various network architectures, including VGG16 and ResNet18, using datasets like CIFAR10 and ImageNet.

Main Contributions

Positive Transfer from Random Labels: The paper demonstrates that pre-training DNNs on randomly labeled data can result in a faster training process when fine-tuning on new datasets, even when these datasets have real labels. This enhanced speed of learning is termed positive transfer, and it accounts for simple effects such as weight scaling.
Alignment of Principal Components: Through analytical proofs, it is shown that both convolutional and fully connected networks exhibit an alignment of their weight parameter principal components with the data's principal components. This alignment implies the network learns certain underlying data distributions, even with random labels.
Competing Specialization Effects: Despite positive transfer at the earlier layers, specialization effects at later network layers can obscure this benefit. The paper explores how specialization might lead to adverse outcomes for downstream tasks due to decreased neuron activations, effectively lowering the network's capacity.

Methodology & Analysis

Covariance Alignment: By computing the covariance of network weights and data, the researchers show alignment in the principal component space. This alignment conforms to the notion that despite randomness in labels, DNNs can acquire a structured understanding of input data distributions.
Speed of Learning: The research utilizes analytical tools to correlate eigenvectors of data and weights, constructing a function mapping data eigenvalues to weight eigenvalues that explains the faster training upon fine-tuning. Notably, this mapping exhibits an "increasing then decreasing" pattern.
Experimental Validation: To validate their theoretical findings, extensive experiments are conducted across multiple network architectures and hyperparameters. These experiments confirm that pre-training on random labels rarely hinders and mostly accelerates downstream tasks.

Implications and Future Directions

The paper's findings have substantial implications for transfer learning, offering insights into why certain initialization strategies might be advantageous and suggesting new methods to explore task transfer scenarios. Furthermore, the exploration of critical and specialization stages lends deeper understanding into the stages of learning and model capacity during training.

Furthermore, this work sets the stage for future research to explore alignment-driven strategies in network training and initialization. It also opens pathways to investigate the roles and limits of over-parameterization in DNNs beyond classical generalization and memorization paradigms. Such insights could be pivotal in devising more efficient and robust training methodologies for neural networks, particularly in contexts with limited labeled data or high label noise.

In conclusion, "What Do Neural Networks Learn When Trained With Random Labels?" makes a significant contribution to understanding the learning dynamics of DNNs in environments with stochastic labels. The observations on principal component alignment and transfer effects underline the potential latent capabilities of DNNs to harness feature distributions beneficially, offering compelling directions for further empirical and theoretical research in artificial intelligence.

PDF Markdown