Progressive Neural Networks (1606.04671v4)

Published 15 Jun 2016 in cs.LG

Abstract: Learning to solve complex sequences of tasks--while both leveraging transfer and avoiding catastrophic forgetting--remains a key obstacle to achieving human-level intelligence. The progressive networks approach represents a step forward in this direction: they are immune to forgetting and can leverage prior knowledge via lateral connections to previously learned features. We evaluate this architecture extensively on a wide variety of reinforcement learning tasks (Atari and 3D maze games), and show that it outperforms common baselines based on pretraining and finetuning. Using a novel sensitivity measure, we demonstrate that transfer occurs at both low-level sensory and high-level control layers of the learned policy.

Citations (2,288)

View on Semantic Scholar

Summary

The paper introduces a novel architecture that prevents catastrophic forgetting by using separate network columns with lateral connections.
It demonstrates significant performance improvements over finetuning and pretraining in diverse reinforcement learning tasks such as Atari games and 3D labyrinths.
A new transfer analysis using Average Fisher Sensitivity offers practical insights into effectively combining prior knowledge with new task learning.

Progressive Neural Networks: A Step Towards Continual Learning in Reinforcement Learning

The paper "Progressive Neural Networks" presents a novel architectural approach for addressing two pivotal challenges in continual learning: the ability to leverage transfer learning effectively and to avoid catastrophic forgetting. The architecture, termed "Progressive Neural Networks," is evaluated in various reinforcement learning (RL) domains, demonstrating its superiority over traditional transfer learning methods like finetuning and pretraining.

Core Contributions

The contributions highlighted by the authors are threefold:

Architectural Innovation: The combination and practical application of several existing techniques to form the progressive network architecture is a significant advancement. Unlike finetuning, progressive networks maintain a pool of previously trained models that new models can access via lateral connections. This approach ensures that learned knowledge is not discarded but retained and utilized effectively.
Empirical Validation: The paper extensively evaluates this architecture on complex RL tasks, including Atari games and 3D maze environments. The results underscore the efficacy of progressive networks in handling sequences of tasks without succumbing to catastrophic forgetting.
Transfer Analysis: A novel analytical framework based on Fisher Information is introduced, providing insights into the mechanisms and effectiveness of knowledge transfer across different layers and tasks.

Architectural Overview

Progressive neural networks mitigate catastrophic forgetting by introducing new neural network columns for each task sequentially, freezing the parameters of previously trained columns. Lateral connections enable new columns to access and combine features from prior columns, facilitating knowledge transfer. Formally, the representation in column $k$ at layer $i$ , denoted as $h_i^{(k)}$ , is computed as follows:

$h_i^{(k)} = f\left( W_i^{(k)} h_{i-1}^{(k)} + \sum_{j<k} U_{i}^{(k:j)} h_{i-1}^{(j)} \right),$

where $W_i^{(k)}$ and $U_{i}^{(k:j)}$ are the weight matrices for the current and lateral connections, respectively.

The architecture's design inherently supports transfer learning by allowing the reuse, adaptation, and enhancement of previously learned features. This compositionality boosts the model's capacity to handle diverse and possibly adversarial tasks.

Experimental Domains and Results

Pong Variants

The initial experiments involved modified versions of the Atari game Pong. The variants included visual and control-level changes such as added noise, different background colors, and flipped orientations. The results showed that progressive networks outperformed both fine-tuning and pretraining baselines. For instance, progressive networks demonstrated a transfer improvement with mean scores reaching 209% compared to 181% for finetuning.

Atari Suite

The approach was further tested on a broader set of Atari games, characterized by varied visuals and gameplay mechanics. The experiments spanned sequences of different source and target game pairs. Progressive networks exhibited positive transfer in most cases, significantly outperforming the finetuning approach. The transfer success was prominent in 8 out of 12 target tasks compared to 5 out of 12 for finetuning. Additionally, the inclusion of more columns facilitated better performance, reinforcing the architecture's constructive transfer capability.

3D Labyrinth

In the 3D Labyrinth domain, progressive networks demonstrated their robustness. This domain, featuring tasks requiring the collection of various items in a maze, presented dense and sparse reward structures. Progressive networks maintained their advantage, particularly noticeable in challenging tasks with complex game elements. Results indicated a mean improvement of 491% for progressive networks over the baseline's 235%.

Transfer Analysis

The efficacy and mechanisms of feature transfer were analyzed using an Average Fisher Sensitivity (AFS) measure. This measure evaluated the sensitivity of the network's representations to determine how transfer occurred across different layers and columns. Findings revealed that positive transfer was most effective when the network could judiciously combine old and new features rather than relying solely on previously learned ones. This balance is crucial for optimizing performance and avoiding negative transfer effects.

Implications and Future Work

The implications of progressive neural networks are both theoretical and practical. The framework provides a robust method for continual learning, crucial for developing more adaptive and intelligent agents. However, the architecture does introduce scalability concerns due to the potential growth in the number of parameters with additional tasks. The authors suggest methods like online compression and pruning to address this.

Future research could focus on:

Scalability: Efficiently managing the growth in parameters as more tasks are introduced.
Unsupervised Task Identification: Developing mechanisms for dynamically identifying tasks and corresponding columns without explicit task labels.
Enhanced Lateral Connections: Investigating more sophisticated lateral connection strategies to further improve transfer learning capabilities.

In conclusion, progressive neural networks represent a significant advancement in the field of continual learning, addressing critical challenges in transfer learning and catastrophic forgetting. The empirical results and analytical insights provided in this work establish a solid foundation for future exploration and enhancement of continual learning architectures.

PDF Markdown