Selective Experience Replay for Lifelong Learning (1802.10269v1)

Published 28 Feb 2018 in cs.AI

Abstract: Deep reinforcement learning has emerged as a powerful tool for a variety of learning tasks, however deep nets typically exhibit forgetting when learning multiple tasks in sequence. To mitigate forgetting, we propose an experience replay process that augments the standard FIFO buffer and selectively stores experiences in a long-term memory. We explore four strategies for selecting which experiences will be stored: favoring surprise, favoring reward, matching the global training distribution, and maximizing coverage of the state space. We show that distribution matching successfully prevents catastrophic forgetting, and is consistently the best approach on all domains tested. While distribution matching has better and more consistent performance, we identify one case in which coverage maximization is beneficial - when tasks that receive less trained are more important. Overall, our results show that selective experience replay, when suitable selection algorithms are employed, can prevent catastrophic forgetting.

PDF Abstract

Selective Experience Replay for Lifelong Learning: An Expert Overview

The paper "Selective Experience Replay for Lifelong Learning" by David Isele and Akansel Cosgun introduces a novel approach to addressing catastrophic forgetting in deep reinforcement learning (DRL) when agents are tasked with learning multiple tasks sequentially. The phenomenon of catastrophic forgetting is a significant obstacle in lifelong machine learning, where a system is expected to retain and leverage knowledge across various tasks over an extended period. The authors propose a method that enhances the traditional experience replay mechanism by selectively storing experiences, thereby ensuring the retention and consolidation of knowledge over the agent’s lifetime.

Motivation and Background

Within the field of reinforcement learning, particularly with deep neural networks, the challenge of sequentially learning tasks without forgetting previously acquired ones is pronounced due to shifts in experience distributions that occur when agents move from task to task. Traditional approaches, such as keeping a separate model per task or using hierarchical action structures, face limitations in scalability and memory efficiency. The work by Isele and Cosgun seeks to overcome these limitations by implementing a selective experience replay strategy that augments the standard First-In-First-Out (FIFO) buffer.

Methodology

The core innovation of this paper lies in the use of a dual-buffer system, comprising a short-term FIFO buffer and a long-term episodic memory, which judiciously selects and stores a subset of experiences. Four strategies for the selection of experiences are explored:

Surprise: Based on the temporal difference (TD) error, prioritizing experiences that lead to high prediction errors, suggesting uncertainty or novelty in experiences.
Reward: Storing experiences with high reward to ensure that rewarding or valuable experiences are not forgotten.
Distribution Matching: Employing a reservoir sampling technique to approximately preserve the global training distribution of state-action pairs.
Coverage Maximization: Ensuring that a wide coverage of the state space is maintained to preserve variety and avoid overfitting to recent tasks or frequently visited states.

Results and Analysis

The empirical evaluation is conducted across various domains, including autonomous driving tasks involving intersections, a grid world navigation task, and a lifelong learning variant of the MNIST classification problem. The experiments demonstrate that when using unlimited capacity buffers, lifelong machine learning can achieve superior performance via knowledge transfer without forgetting. However, this is impractical for real-world applications requiring lifelong capabilities.

In the constrained memory scenario, distribution matching and coverage maximization strategies exhibit an ability to mitigate forgetting, contrary to the surprise and reward-based strategies which still suffer from performance degradation on previously learned tasks. Notably, distribution matching offers a more stable performance, approaching that of the unlimited buffer scenario. Yet, in scenarios where tasks have differential importance or receive disproportionate training durations, coverage maximization may offer advantages due to its broader state representation.

Implications and Future Directions

The proposed selective experience replay system addresses a pivotal concern in designing continuously learning DRL agents. By demonstrating a feasible method to prevent catastrophic forgetting, this work lays a foundation for more robust and scalable lifelong learning systems. The implications extend beyond reinforcement learning and could influence methodologies in general neural network architecture, particularly in environments characterized by non-stationary data distributions and resource constraints.

Future work could delve into adaptive mechanisms for dynamically selecting the optimal strategy based on task characteristics or evolving system objectives. Moreover, integrating these strategies with other memory compression techniques or exploring hybrid approaches that combine aspects of different strategies might yield further insights and improvements.

In conclusion, Isele and Cosgun's research offers a significant step toward realizing agents capable of seamless long-term learning, contributing valuable knowledge to the discourse on lifelong learning in artificial intelligence.

PDF Markdown Bookmark Chat (Pro)

Authors (2)

David Isele (38 papers)
Akansel Cosgun (59 papers)

Citations (414)

View on Semantic Scholar