Large-Scale Study of Curiosity-Driven Learning (1808.04355v1)

Published 13 Aug 2018 in cs.LG, cs.AI, cs.CV, cs.RO, and stat.ML

Abstract: Reinforcement learning algorithms rely on carefully engineering environment rewards that are extrinsic to the agent. However, annotating each environment with hand-designed, dense rewards is not scalable, motivating the need for developing reward functions that are intrinsic to the agent. Curiosity is a type of intrinsic reward function which uses prediction error as reward signal. In this paper: (a) We perform the first large-scale study of purely curiosity-driven learning, i.e. without any extrinsic rewards, across 54 standard benchmark environments, including the Atari game suite. Our results show surprisingly good performance, and a high degree of alignment between the intrinsic curiosity objective and the hand-designed extrinsic rewards of many game environments. (b) We investigate the effect of using different feature spaces for computing prediction error and show that random features are sufficient for many popular RL game benchmarks, but learned features appear to generalize better (e.g. to novel game levels in Super Mario Bros.). (c) We demonstrate limitations of the prediction-based rewards in stochastic setups. Game-play videos and code are at https://pathak22.github.io/large-scale-curiosity/

Authors (6)

Yuri Burda (15 papers)
Harri Edwards (6 papers)
Deepak Pathak (91 papers)
Amos Storkey (75 papers)
Trevor Darrell (324 papers)
Alexei A. Efros (100 papers)

Citations (665)

View on Semantic Scholar

Summary

Insights from the Large-Scale Study of Curiosity-Driven Learning

The paper presents a comprehensive paper on curiosity-driven learning, focusing on environments where reinforcement learning (RL) agents are trained without extrinsic rewards. This research is a significant exploration of intrinsic motivation, using curiosity as an intrinsic reward signal to guide RL agents, which could reshape our understanding of how to train agents in complex environments.

Key Contributions

Large-Scale Evaluation: The paper evaluates curiosity-driven learning across 54 benchmark environments, including the widely used Atari suite. It investigates agents' ability to operate using intrinsic rewards based on prediction error as a signal for curiosity. The results reveal that these agents can achieve impressive performance in environments traditionally reliant on extrinsic rewards.
Feature Space Analysis: The paper thoroughly explores the effects of different feature spaces—random features, learned features, and raw pixel spaces—on the success of curiosity-driven agents. It finds that random features often provide satisfactory results, but learned features offer better generalization, especially in novel scenarios like new levels of the Super Mario Bros game.
Intrinsic Reward Dynamics: By defining intrinsic rewards as the prediction error in forward dynamics, the paper underscores an effective method for curiosity-driven exploration. This method aids agents in navigating and interacting with dynamic environments without pre-defined extrinsic reward signals.
Empirical Demonstration: The researchers empirically demonstrate the versatile applicability of curiosity-driven learning in various frameworks such as video games, physics simulations, and 3D navigation. In one remarkable instance, the agent explores 11 different Mario levels purely guided by curiosity.
Understanding Performance Limits: While the approach showed potential, the paper also reveals limitations within stochastic environments. For example, curious agents could be distracted by random elements (e.g., a noisy TV) due to their inherent unpredictability, suggesting areas for further research to address these challenges.

Implications and Future Directions

The findings suggest profound implications for both theoretical understanding and practical applications of RL. The demonstrated ability of agents to perform well with intrinsic rewards alone proposes new methodologies for training in expansive, unguided environments without task-specific rewards, potentially reducing the need for laborious reward shaping processes.

Future developments could focus on overcoming the identified limitations of curiosity-driven models in stochastic setups. Enhancements might involve integrating mechanisms to differentiate between useful and detrimental unpredictability in environments. Moreover, exploring combinations of intrinsic and sparse extrinsic rewards could further optimize learning in complex tasks.

As AI continues to develop, frameworks utilizing intrinsic rewards could open up possibilities for versatile, scalable applications in areas like autonomous exploration and learning in unfamiliar terrains. This paper sets a promising foundation for future studies to build upon in adapting curiosity-based models to broader, more challenging environments.

In conclusion, the paper's exploration into the capabilities and boundaries of curiosity-driven learning enriches the narrative of AI research, providing critical insights into alternative learning paradigms that prioritize intrinsic motivation over traditional extrinsic rewards.

PDF Markdown

Related Papers

Find Related Papers

GitHub

Large-Scale Study of Curiosity-Driven Learning