Critical Learning Periods in Deep Neural Networks (1711.08856v3)

Published 24 Nov 2017 in cs.LG, q-bio.NC, and stat.ML

Abstract: Similar to humans and animals, deep artificial neural networks exhibit critical periods during which a temporary stimulus deficit can impair the development of a skill. The extent of the impairment depends on the onset and length of the deficit window, as in animal models, and on the size of the neural network. Deficits that do not affect low-level statistics, such as vertical flipping of the images, have no lasting effect on performance and can be overcome with further training. To better understand this phenomenon, we use the Fisher Information of the weights to measure the effective connectivity between layers of a network during training. Counterintuitively, information rises rapidly in the early phases of training, and then decreases, preventing redistribution of information resources in a phenomenon we refer to as a loss of "Information Plasticity". Our analysis suggests that the first few epochs are critical for the creation of strong connections that are optimal relative to the input data distribution. Once such strong connections are created, they do not appear to change during additional training. These findings suggest that the initial learning transient, under-scrutinized compared to asymptotic behavior, plays a key role in determining the outcome of the training process. Our findings, combined with recent theoretical results in the literature, also suggest that forgetting (decrease of information in the weights) is critical to achieving invariance and disentanglement in representation learning. Finally, critical periods are not restricted to biological systems, but can emerge naturally in learning systems, whether biological or artificial, due to fundamental constrains arising from learning dynamics and information processing.

Citations (88)

View on Semantic Scholar

Summary

The paper demonstrates that a brief interference in early training can permanently impair a DNN's ability to learn complex representations.
The authors use image classification with simulated visual deficits to reveal a critical period phenomenon analogous to biological systems.
The study finds that early inflexible connections limit further training adaptability by reducing information plasticity.

Overview of Critical Learning Periods in DNNs

Deep Neural Networks (DNNs), analogous to living organisms, can experience critical periods in their training where temporary disruptions can cause long-term effects on their abilities. This phenomenon is thoroughly investigated in the paper, detailing how a brief interference in the early stages of a DNN's training can result in a lasting impairment in its capability to develop certain skills, much like a sensory deficit during a critical period of post-natal development can lead to permanent issues in animal learning.

Sensitivity to Training Conditions

The paper employed commonly used image classification tasks to simulate a deficit. For instance, images were blurred for a number of epochs at the beginning of training to mimic cataracts, which in humans and other animals can cause a permanent visual acuity impairment if not corrected during a critical period post-birth. What emerged was that if this artificial 'blurring' was not rectified within an early time window, the network's final performance significantly declined compared to the unaffected case. This drop in performance didn't change, irrespective of additional training, signifying the existence of a critical learning period within the DNN.

Information Plasticity and Network Analysis

A key discovery revealed by the researchers is that information within the network does not increment monotonically during training. After an initial surge, the information content decreases, which the authors refer to as a loss of "Information Plasticity". This suggests that early in the network's training, it is very flexible and the connections adapt swiftly to the input data. However, once strong connections align with the data distribution, they become rigid and subsequent training does not modify them. This observation paves the way for understanding the network's sensitivity to deficits and how DNNs prioritize different scales of features in data during learning.

Implications and Broader Context

The implication of this research extends to the importance of initial training conditions on a DNN's final performance. Critical learning periods aren't restricted to biological entities but are a fundamental aspect of learning systems, arising from inherent dynamics in information processing and optimization constraints during training. These findings possess profound implications for practices such as transfer learning, where a network pre-trained on one task is fine-tuned for another. Additionally, the fact that DNNs can forget or reduce information in their weights to improve general performance presents intriguing parallels to biological forgetting mechanisms, pointing to a shared principle between natural and artificial learning systems.

In conclusion, the characteristics of these critical periods in DNNs, including the underlying causes and consequences, deepen our understanding of not just neural network training, but learning as a universal information processing phenomenon.

PDF Markdown

Related Papers

Tweets

https://twitter.com/StphTphsn1/status/1830285388386333140