Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

An Empirical Study of Example Forgetting during Deep Neural Network Learning (1812.05159v3)

Published 12 Dec 2018 in cs.LG and stat.ML

Abstract: Inspired by the phenomenon of catastrophic forgetting, we investigate the learning dynamics of neural networks as they train on single classification tasks. Our goal is to understand whether a related phenomenon occurs when data does not undergo a clear distributional shift. We define a `forgetting event' to have occurred when an individual training example transitions from being classified correctly to incorrectly over the course of learning. Across several benchmark data sets, we find that: (i) certain examples are forgotten with high frequency, and some not at all; (ii) a data set's (un)forgettable examples generalize across neural architectures; and (iii) based on forgetting dynamics, a significant fraction of examples can be omitted from the training data set while still maintaining state-of-the-art generalization performance.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (6)
  1. Mariya Toneva (23 papers)
  2. Alessandro Sordoni (53 papers)
  3. Remi Tachet des Combes (23 papers)
  4. Adam Trischler (50 papers)
  5. Yoshua Bengio (601 papers)
  6. Geoffrey J. Gordon (30 papers)
Citations (658)

Summary

Overview of "An Empirical Study of Example Forgetting during Deep Neural Network Learning"

The paper "An Empirical Study of Example Forgetting during Deep Neural Network Learning" by Mariya Toneva et al. presents a thorough investigation into the dynamics of example forgetting in neural networks during the training process on single classification tasks. The paper is driven by the phenomenon of catastrophic forgetting, where neural networks lose previously acquired knowledge upon learning new tasks. Unlike traditional settings involving clear distributional shifts across tasks, this work focuses on whether similar forgetting dynamics occur when data belongs to a singular task distribution.

Key Findings

  1. Forgetting Events: The authors define a "forgetting event" as occurring when a training example transitions from being correctly classified to being misclassified during the course of model training using Stochastic Gradient Descent (SGD). The paper encompasses several benchmark datasets, such as MNIST, permuted MNIST, and CIFAR-10.
  2. Unforgettable Examples: A notable finding is that certain examples are never forgotten once learned, termed as "unforgettable". These examples tend to be stable across seeds and exhibit a strong correlation across different neural architectures.
  3. Dataset Reduction: An intriguing result shows that a significant fraction of the training data can be removed without negatively impacting the model's generalization capabilities. Specifically, the research demonstrates that up to 35% of CIFAR-10's training examples can be omitted while maintaining the state-of-the-art performance levels.
  4. Correlation with Noisy Examples: The presence of noisy labels is found to be correlated with higher frequencies of forgetting events. This suggests that forgetting dynamics can be a useful metric in identifying noisy or outlier examples within datasets.

Implications and Future Directions

The findings of this paper provide insights into the efficiency of neural network training. The identification of unforgettable examples opens avenues for optimizing data usage in machine learning models, contributing to resource-efficient learning without compromising accuracy. Moreover, the stability of forgetting dynamics across different seeds and architectures suggests that these phenomena reflect intrinsic properties of the data rather than specificities of the training scheme.

From a theoretical perspective, understanding forgetting events can contribute to ongoing discussions about the generalization capabilities of neural networks. The paper aligns with views interpreting deep networks as max-margin classifiers, promoting further investigation into the implicit regularization effects of SGD.

Looking forward, these insights could significantly impact other areas of machine learning and AI, such as reinforcement learning, where continual learning challenges are prevalent. Additionally, the concept of example forgetting might be valuable in unsupervised and semi-supervised learning settings.

Conclusion

This paper provides robust empirical evidence that example forgetting is a critical aspect of the learning process and can be exploited to enhance model efficiency and performance. While it does not claim any groundbreaking theoretical advancements, it offers a solid foundation for future explorations into optimizing neural network training by leveraging the dynamics of example forgetting. The potential applications across various domains of AI research make it a valuable contribution to the understanding of deep learning processes.