On The Power of Curriculum Learning in Training Deep Networks (1904.03626v3)

Published 7 Apr 2019 in cs.LG and stat.ML

Abstract: Training neural networks is traditionally done by providing a sequence of random mini-batches sampled uniformly from the entire training data. In this work, we analyze the effect of curriculum learning, which involves the non-uniform sampling of mini-batches, on the training of deep networks, and specifically CNNs trained for image recognition. To employ curriculum learning, the training algorithm must resolve 2 problems: (i) sort the training examples by difficulty; (ii) compute a series of mini-batches that exhibit an increasing level of difficulty. We address challenge (i) using two methods: transfer learning from some competitive ``teacher" network, and bootstrapping. In our empirical evaluation, both methods show similar benefits in terms of increased learning speed and improved final performance on test data. We address challenge (ii) by investigating different pacing functions to guide the sampling. The empirical investigation includes a variety of network architectures, using images from CIFAR-10, CIFAR-100 and subsets of ImageNet. We conclude with a novel theoretical analysis of curriculum learning, where we show how it effectively modifies the optimization landscape. We then define the concept of an ideal curriculum, and show that under mild conditions it does not change the corresponding global minimum of the optimization function.

Authors (2)

Guy Hacohen (12 papers)
Daphna Weinshall (31 papers)

Citations (403)

View on Semantic Scholar

Summary

An Examination of Curriculum Learning in Deep Neural Network Training

The paper "On The Power of Curriculum Learning in Training Deep Networks" by Hacohen and Weinshall presents a comprehensive analysis of Curriculum Learning (CL) for deep neural networks, focusing specifically on Convolutional Neural Networks (CNNs) used for image recognition tasks. CL builds on the premise that learning can be more efficient if training examples are introduced in order of difficulty, akin to how humans are taught new skills. Though the concept has historical roots in human and animal learning paradigms, this work explores its implications and efficacy within the context of deep learning.

Core Contributions

The research introduces two main problems in implementing CL in machine learning: determining the difficulty level of training examples and defining the progression of difficulty levels. The paper proposes solutions to these issues by employing transfer learning and bootstrapping methods to assess example difficulty, and various pacing functions to guide the sampling of data batches.

The empirical evaluation is diversified across several network architectures and datasets (CIFAR-10, CIFAR-100, and subsets of ImageNet), illustrating the robustness of the proposed CL methods. The primary outcomes highlight that CL not only accelerates convergence but can also enhance the final performance of the trained networks when compared to traditional random sampling techniques.

Experimental Insights

Two scoring methods are explored for arranging the data by difficulty:

Transfer Scoring: Uses a pre-trained network (such as Inception) to classify image features, ranking higher confidence scores as easier.
Bootstrapping: Trains a network without a curriculum to initiate and then ranks data based on its performance.

Both methods are shown to impact learning speed and accuracy positively, and various pacing strategies—fixed, varied exponential, and single-step pacing—were evaluated, illustrating their impact on training dynamics. Among these, 'transfer scoring' with a 'fixed exponential pacing' typically offered more significant improvements.

Theoretical Contributions

A theoretical analysis is conducted to explore how CL alters the optimization landscape of neural networks. The work posits that an ideal curriculum—a distribution correlating positively with the optimal utility—can steepen the landscape without altering the global minimum. This finding suggests that CL could potentially guide optimization towards more pronounced global optima, beneficially impacting convergence rates and final model accuracy.

Implications and Future Directions

The implications of this research extend beyond theoretical insights into practical applications across deep learning paradigms. By systematically structuring training data, CL can provide more efficient learning paths, particularly in scenarios requiring rapid adaptation or when resources are constrained.

Theoretical explorations suggest possible harmonizations with other dynamic sampling frameworks, such as Self-Paced Learning and hard example mining, denoting areas ripe for further exploration and optimization. Future studies may delve into tailoring curricula customized to the intricacies of specific datasets or network architectures, extending the utility of CL across a broader range of problem domains.

In conclusion, this paper contributes to the understanding of how pedagogical strategies can be systematically applied to machine learning, offering new avenues to enhance model training in both speed and outcome, creating a more nuanced understanding of how curriculum-based approaches can impact neural network performance.

PDF Markdown

Related Papers

YouTube

Show All Videos