Incentivizing Exploration In Reinforcement Learning With Deep Predictive Models (1507.00814v3)

Published 3 Jul 2015 in cs.AI, cs.LG, and stat.ML

Abstract: Achieving efficient and scalable exploration in complex domains poses a major challenge in reinforcement learning. While Bayesian and PAC-MDP approaches to the exploration problem offer strong formal guarantees, they are often impractical in higher dimensions due to their reliance on enumerating the state-action space. Hence, exploration in complex domains is often performed with simple epsilon-greedy methods. In this paper, we consider the challenging Atari games domain, which requires processing raw pixel inputs and delayed rewards. We evaluate several more sophisticated exploration strategies, including Thompson sampling and Boltzman exploration, and propose a new exploration method based on assigning exploration bonuses from a concurrently learned model of the system dynamics. By parameterizing our learned model with a neural network, we are able to develop a scalable and efficient approach to exploration bonuses that can be applied to tasks with complex, high-dimensional state spaces. In the Atari domain, our method provides the most consistent improvement across a range of games that pose a major challenge for prior methods. In addition to raw game-scores, we also develop an AUC-100 metric for the Atari Learning domain to evaluate the impact of exploration on this benchmark.

Citations (485)

View on Semantic Scholar

Summary

The paper presents a novel exploration strategy that uses prediction errors from deep models as exploration bonuses.
It employs deep autoencoders to compress high-dimensional states, enabling scalable exploration in challenging Atari domains.
Empirical results demonstrate faster convergence and enhanced performance compared to traditional RL approaches.

Incentivizing Exploration in Reinforcement Learning with Deep Predictive Models

The paper "Incentivizing Exploration in Reinforcement Learning with Deep Predictive Models" addresses the challenge of efficient exploration in reinforcement learning (RL) within complex domains, specifically focusing on Atari games as a testbed. Traditional methods like PAC-MDP algorithms and Bayesian approaches often become impractical in high-dimensional spaces, necessitating exploration strategies that can scale effectively.

Key Contributions

The authors propose a novel exploration strategy that leverages deep predictive models to assign exploration bonuses. This method aims to drive exploration in RL by using learned representations of state dynamics, rather than relying on the enumeration of state-action pairs, which is computationally expensive in large-scale problems.

Methodology

The method introduces exploration bonuses based on the disagreement between a learned model’s predictions and the actual outcomes. The exploration bonus is computed by encoding states into a lower-dimensional space using a deep neural network-based autoencoder. This encoding captures essential features of the state space, which are then used in a dynamics model to predict subsequent states. Errors in these predictions signal novel states, thus assigning exploration bonuses to them.

Exploration Bonus Calculation

An augmented reward function incorporates these bonuses:

$R_{\text{Bonus}}(s, a) = R(s, a) + \beta N(s, a)$

Where $N(s, a)$ represents the novelty of a state-action pair, determined by the predictive model’s error.

Empirical Evaluation

The paper evaluates the proposed method on 14 challenging Atari games. The results indicate that using exploration bonuses significantly enhances learning speed and stability compared to traditional methods like epsilon-greedy, Boltzmann exploration, and Thompson sampling. The authors introduce the AUC-100 benchmark, which provides a normalized measure of learning progress, further demonstrating the advantages of their approach.

Quantitatively, the method achieved state-of-the-art performance on several games, particularly excelling where human strategies previously outperformed RL algorithms. Improvement was noted in terms of both faster convergence and higher ultimate performance.

Implications and Future Directions

The proposed approach highlights the potential for using deep learning to streamline exploration in RL, offering a scalable solution for high-dimensional state spaces. The integration of model-based exploration bonuses into standard RL frameworks suggests broader applications beyond the scope of Atari games.

One limitation outlined is the reliance on deterministic assumptions in environments, presenting challenges in stochastic settings. Future research could explore refining the model to differentiate between stochastic and novel dynamics, thereby improving exploration robustness.

Additionally, extending the framework to directly incorporate learned dynamics models into the policy learning process could enhance learning efficiency beyond model-free methods, opening avenues for more rapid RL development in complex real-world scenarios.

Overall, the paper contributes a significant step forward in designing efficient, scalable exploration strategies that leverage deep learning, laying the groundwork for more sophisticated exploration methodologies in reinforcement learning.

PDF Markdown