RIDE: Rewarding Impact-Driven Exploration for Procedurally-Generated Environments (2002.12292v2)

Published 27 Feb 2020 in cs.LG and cs.AI

Abstract: Exploration in sparse reward environments remains one of the key challenges of model-free reinforcement learning. Instead of solely relying on extrinsic rewards provided by the environment, many state-of-the-art methods use intrinsic rewards to encourage exploration. However, we show that existing methods fall short in procedurally-generated environments where an agent is unlikely to visit a state more than once. We propose a novel type of intrinsic reward which encourages the agent to take actions that lead to significant changes in its learned state representation. We evaluate our method on multiple challenging procedurally-generated tasks in MiniGrid, as well as on tasks with high-dimensional observations used in prior work. Our experiments demonstrate that this approach is more sample efficient than existing exploration methods, particularly for procedurally-generated MiniGrid environments. Furthermore, we analyze the learned behavior as well as the intrinsic reward received by our agent. In contrast to previous approaches, our intrinsic reward does not diminish during the course of training and it rewards the agent substantially more for interacting with objects that it can control.

Authors (2)

Roberta Raileanu (41 papers)
Tim Rocktäschel (86 papers)

Citations (158)

View on Semantic Scholar

Summary

Rewarding Impact-Driven Exploration for Procedurally-Generated Environments

The paper "RIDE: Rewarding Impact-Driven Exploration for Procedurally-Generated Environments" addresses the persistent challenge in reinforcement learning (RL) related to efficient exploration in environments where extrinsic rewards are sparse. This research introduces a novel approach to intrinsic motivation, termed Rewarding Impact-Driven Exploration (RIDE), which is designed to enhance the exploration capabilities of agents, particularly in procedurally-generated environments where conventional techniques fall short.

Problematic Scope

Reinforcement learning traditionally relies on extrinsic rewards provided by environments, which become scarce or sparse in many realistic scenarios. In such settings, reliance on only extrinsic rewards can lead to inefficiencies, requiring numerous episodes for an agent to obtain meaningful interaction feedback. Intrinsic motivation has thus been posited as a mechanism to simulate rewards, driving the agent to explore states actively. However, existing methods leveraging intrinsic motivation often struggle in procedurally-generated environments due to the improbability of state revisits and assumptions of state space constancy.

Innovative Approach

The paper proposes RIDE, an intrinsic reward that encourages the agent to undertake actions leading to significant changes in its learned state representation, thus facilitating more meaningful exploration. Unlike previous methods whose efficacy diminishes with increased training episodes, RIDE maintains its exploration incentives throughout the training period. By utilizing a learned state representation and computing the intrinsic reward based on the changes between consecutive states, RIDE avoids the pitfall of reward diminishment and ensures continued adaptation to changes.

Experiments and Results

The performance of RIDE is evaluated on various tasks in MiniGrid, a procedurally-generated grid-world environment. It is shown that RIDE surpasses existing methods like Count-Based Exploration, Random Network Distillation (RND), and Intrinsic Curiosity Module (ICM) in terms of sample efficiency and solving complex tasks. RIDE demonstrates particular efficacy in tasks with high-dimensional and procedurally-generated nature, where traditional methods exhibit reduction in performance or fail to achieve meaningful learning progress.

The introduced intrinsic reward mechanism is scrutinized with respect to its distribution across distinct types of actions. RIDE effectively prioritizes actions involving meaningful environmental changes, like interacting with controllable objects, contributing to its robust exploration capabilities.

Implications and Future Directions

The introduction of RIDE opens pathways for RL agent enhancements, especially in domains requiring substantial state-space exploration, such as robotics and automated navigation in unfamiliar terrains. Practically, RIDE's design diverges from both state density estimation methods and traditional prediction error rewards, offering advantages like increased convergence speed and sustained exploration drive.

Theoretically, RIDE underscores the importance of representation learning as an auxiliary function, not merely for policy advancement but also as a backbone for intrinsic reward computation. Future research could explore integration with meta-learning frameworks or extend RIDE's principles to hierarchical RL architectures, potentially advancing capabilities in complex multi-task environments.

In sum, RIDE represents a notable development in the pursuit of more adaptable and efficient exploratory strategies within model-free RL, suggesting broader applicability and impact in the progressively complex field of artificial intelligence.

PDF Markdown

Related Papers

Find Related Papers

YouTube

Show All Videos