Episodic Curiosity through Reachability (1810.02274v5)

Published 4 Oct 2018 in cs.LG, cs.AI, cs.CV, cs.RO, and stat.ML

Abstract: Rewards are sparse in the real world and most of today's reinforcement learning algorithms struggle with such sparsity. One solution to this problem is to allow the agent to create rewards for itself - thus making rewards dense and more suitable for learning. In particular, inspired by curious behaviour in animals, observing something novel could be rewarded with a bonus. Such bonus is summed up with the real task reward - making it possible for RL algorithms to learn from the combined reward. We propose a new curiosity method which uses episodic memory to form the novelty bonus. To determine the bonus, the current observation is compared with the observations in memory. Crucially, the comparison is done based on how many environment steps it takes to reach the current observation from those in memory - which incorporates rich information about environment dynamics. This allows us to overcome the known "couch-potato" issues of prior work - when the agent finds a way to instantly gratify itself by exploiting actions which lead to hardly predictable consequences. We test our approach in visually rich 3D environments in ViZDoom, DMLab and MuJoCo. In navigational tasks from ViZDoom and DMLab, our agent outperforms the state-of-the-art curiosity method ICM. In MuJoCo, an ant equipped with our curiosity module learns locomotion out of the first-person-view curiosity only.

Authors (7)

Nikolay Savinov (16 papers)
Anton Raichuk (13 papers)
Damien Vincent (25 papers)
Marc Pollefeys (230 papers)
Timothy Lillicrap (60 papers)
Sylvain Gelly (43 papers)
Raphaël Marinier (5 papers)

Citations (256)

View on Semantic Scholar

Summary

An Analysis of "Episodic Curiosity through Reachability"

The paper "Episodic Curiosity through Reachability" presents a novel approach to handling the sparse reward problem inherent in reinforcement learning (RL) applications in real-world environments. Sparse rewards pose a significant challenge for RL algorithms, which often struggle to explore and learn effectively when feedback from the environment is limited or delayed. The authors address this issue by introducing a curiosity-driven method that leverages episodic memory to enhance exploration via a novel concept of reachability-based novelty.

Methodological Innovation

The central innovation of this paper lies in its definition of a curiosity bonus, which is calculated using an episodic memory system. This system stores past observations and assesses the novelty of current observations based on their reachability from those stored observations. Reachability, in this context, refers to the number of environment steps required to transition between observations. By rewarding observations that are a significant number of steps away from previously encountered states, the method encourages exploration of novel states, avoiding what the authors term "couch-potato" behavior—where agents exploit unpredictable actions without effectively exploring the environment.

The episodic curiosity module proposed in the paper is composed of parametric components, including an embedding network and a comparator network, coupled with non-parametric components such as an episodic memory buffer and a reward bonus estimation function. The embedding network is tasked with transforming observations into a latent space where the comparator network assesses the reachability between observations.

Empirical Results

The paper demonstrates the efficacy of the proposed method through a series of experiments in visually rich 3D environments, including the DeepMind Lab and ViZDoom platforms, as well as in a locomotion task in the MuJoCo environment. The proposed method, referred to as Episodic Curiosity (EC), is benchmarked against the state-of-the-art curiosity-based method ICM (Intrinsic Curiosity Module) and shows notable improvements. In particular, the episodic curiosity method exhibits superior performance in sparse-reward environments, showcasing faster convergence and more robust exploration strategies. For example, EC outperforms ICM in the "Sparse + Doors" environment in DeepMind Lab, achieving higher success rates and exploring a greater area.

Additionally, the EC method is shown to be effective in scenarios with no task-specific rewards, purely driven by curiosity bonuses. Notably, EC leads to more meaningful exploration patterns compared to ICM, as evidenced by the increased area coverage in the "No Reward" task.

Theoretical and Practical Implications

The paper's approach to modeling curiosity via reachability has several implications for both the theoretical understanding and practical application of RL. Theoretically, this work provides a new perspective on how novelty can be operationalized in RL, suggesting that the exploration-exploitation balance can be more finely tuned by incorporating past experience stored in episodic memory. Practically, the proposed method can be applied to complex, high-dimensional environments where traditional RL methods falter due to sparse rewards. This has potential applications in robotics, autonomous navigation, and other domains where agents must learn from limited extrinsic feedback.

Future Directions

Looking ahead, the exploration of policy design that incorporates the contents of memory not only as a source of rewards but also as part of the decision-making process remains a promising direction. Such extensions could enable more efficient few-shot learning and exploration in novel tasks, moving towards more human-like learning capabilities in RL agents.

In conclusion, the "Episodic Curiosity through Reachability" paper presents a compelling advancement in curiosity-driven exploration in RL. By integrating memory and reachability, the authors offer a robust solution to the sparse reward problem, paving the way for more adaptive and intelligent RL systems.

Related Papers

Find Related Papers

Tweets

https://twitter.com/giffmana/status/1890176842038145151

YouTube

Show All Videos