Teacher-Student Curriculum Learning (1707.00183v2)

Published 1 Jul 2017 in cs.LG and cs.AI

Abstract: We propose Teacher-Student Curriculum Learning (TSCL), a framework for automatic curriculum learning, where the Student tries to learn a complex task and the Teacher automatically chooses subtasks from a given set for the Student to train on. We describe a family of Teacher algorithms that rely on the intuition that the Student should practice more those tasks on which it makes the fastest progress, i.e. where the slope of the learning curve is highest. In addition, the Teacher algorithms address the problem of forgetting by also choosing tasks where the Student's performance is getting worse. We demonstrate that TSCL matches or surpasses the results of carefully hand-crafted curricula in two tasks: addition of decimal numbers with LSTM and navigation in Minecraft. Using our automatically generated curriculum enabled to solve a Minecraft maze that could not be solved at all when training directly on solving the maze, and the learning was an order of magnitude faster than uniform sampling of subtasks.

Citations (349)

View on Semantic Scholar

Summary

The paper’s main contribution is the novel Teacher-Student Curriculum Learning framework that automates task selection through a dynamic Teacher to enhance learning efficiency.
It employs POMDP formulations and multi-armed bandit inspired algorithms to adapt the curriculum based on the Student’s progress and address forgetting.
Experimental results demonstrate that TSCL outperforms manual curricula and uniform sampling in tasks like decimal addition and Minecraft navigation.

Teacher-Student Curriculum Learning: An Analytical Overview

The paper "Teacher-Student Curriculum Learning" introduces a novel framework to enhance curriculum learning, known as Teacher-Student Curriculum Learning (TSCL). This approach automates curriculum learning through a Teacher-Student paradigm, where the Teacher dynamically selects a series of subtasks for the Student to improve learning efficiency.

Background and Context

Traditional curriculum learning typically involves the gradual increase in task complexity, which helps a model master simpler tasks before advancing to more challenging exercises. This strategy has proven useful in various domains, including video games, robotics, and language processing. Nonetheless, it requires intensive manual effort to define the hierarchy and progression of tasks. TSCL proposes to automate this by using a Teacher to assess learning progress and adjust the curriculum dynamically, identifying tasks where the Student makes the quickest progress or shows signs of forgetting.

Methodology and Framework

The TSCL framework is delineated using a partially observable Markov decision process (POMDP) that facilitates automatic task selection. The crucial element is the Teacher, which optimizes task selection based on the slope of the Student's learning curve and its performance degradation on certain tasks. Two primary POMDP formulations are introduced: the "Simple" for reinforcement learning and the "Batch" for supervised learning.

The authors introduce several algorithms adapted from multi-armed bandit problems to implement this framework, such as Online, Naive, Window, and Sampling algorithms. These algorithms have been modified to estimate learning progress effectively and counteract forgetting by maintaining engagement with challenging tasks. Notably, the Sampling algorithm stands out by resembling Thompson sampling to manage exploration without explicit hyperparameters.

Experimentation and Results

The framework is evaluated on two tasks: decimal number addition using an LSTM in a supervised learning setting, and a navigation task in Minecraft under reinforcement learning. The results are insightful:

Decimal Addition: TSCL algorithms surpass both uniform sampling and established manual curricula for tasks with 1-dimensional and 2-dimensional difficulties. Specifically, using the absolute value of expected rewards significantly aids in handling model forgetting.
Minecraft Navigation: The automated curriculum rivals a manually designed curriculum and significantly surpasses uniform sampling, enabling the agent to solve a complex navigation task far more efficiently.

The paper shows that TSCL can not only match but often surpass the performance of labor-intensive hand-designed curricula, illustrating its potential to streamline machine learning workflows.

Implications and Future Directions

TSCL has crucial practical implications by reducing the manual burden in designing learning curricula, offering tailored training sequences that adaptively respond to the model's learning progress. Future work could extend this framework to environments where task parameterization is continuous rather than discrete or where tasks are dynamically generated. Additionally, exploring its integration with Student algorithms featuring intrinsic motivation or advanced exploration strategies might yield improvements. In conclusion, TSCL represents an important step forward in curriculum learning by leveraging adaptive task selection to optimize learning efficiency.

PDF Markdown

Related Papers

GitHub

GitHub - tambetm/TSCL: Teacher-Student Curriculum Learning code (78 stars)

YouTube

Show All Videos