Count-Based Exploration with the Successor Representation (1807.11622v4)

Published 31 Jul 2018 in cs.LG, cs.AI, and stat.ML

Abstract: In this paper we introduce a simple approach for exploration in reinforcement learning (RL) that allows us to develop theoretically justified algorithms in the tabular case but that is also extendable to settings where function approximation is required. Our approach is based on the successor representation (SR), which was originally introduced as a representation defining state generalization by the similarity of successor states. Here we show that the norm of the SR, while it is being learned, can be used as a reward bonus to incentivize exploration. In order to better understand this transient behavior of the norm of the SR we introduce the substochastic successor representation (SSR) and we show that it implicitly counts the number of times each state (or feature) has been observed. We use this result to introduce an algorithm that performs as well as some theoretically sample-efficient approaches. Finally, we extend these ideas to a deep RL algorithm and show that it achieves state-of-the-art performance in Atari 2600 games when in a low sample-complexity regime.

Citations (169)

View on Semantic Scholar

Summary

The paper introduces an exploration bonus derived from the successor representation, using its norm as an indicator of state novelty.
It demonstrates that the substochastic successor representation effectively counts state visits, offering theoretical guarantees for efficient exploration.
The integration with deep RL architectures and auxiliary tasks, like next observation prediction, stabilizes learning and improves sample efficiency.

Count-Based Exploration with the Successor Representation

The paper "Count-Based Exploration with the Successor Representation" by Marlos C. Machado, Marc G. Bellemare, and Michael Bowling presents a novel approach to exploration in reinforcement learning (RL). This approach leverages the successor representation (SR) to form theoretically grounded algorithms that extend from tabular settings to environments that require function approximation. The primary insight is that the norm of the SR can act as an exploration bonus, encouraging agents to sample less frequently visited states.

Exploration in Reinforcement Learning

Exploration is a fundamental component of RL, where agents must learn optimal strategies through trial-and-error interactions with the environment. Traditional methods often rely on random exploration, which proves inefficient in domains with sparse rewards. This paper seeks to address these shortcomings by introducing an exploration bonus derived from the successor representation, which inherently captures state visitation frequency.

Theoretical Underpinning and Empirical Evaluation

The successor representation generalizes states based on successor likeness and estimates transition dynamics, making it suitable for influencing exploration policies. The authors empirically demonstrate that the norm of the SR, as it is learned through temporal-difference learning, serves as an effective indicator of state novelty. Furthermore, introducing the substochastic successor representation (SSR) allows for more tractable theoretical analysis, revealing the SSR implicitly counts state visits.

This aspect is crucial for exploration, as state visitation counts are a known strategy to drive exploration efficiently. The paper extends this concept through the SSR, presenting a model-based algorithm, ESSR, which implicitly estimates state visit counts and achieves performance comparable to algorithms with sample-efficiency guarantees.

Practical Implications and Deep RL

One significant advancement in this work is the application of SR-based exploration bonuses to deep reinforcement learning algorithms. Through function approximation, successor features—a generalization of SR—enable the application of the SR framework in large-scale environments, such as Atari 2600 video games. The proposed deep RL algorithm, DQN$_e^{\scriptsize \mbox{MMC}$+SR, maintains state-of-the-art performance under sample complexity constraints, improving upon established baselines without resorting to domain-specific models.

The architecture encapsulates the value function, successor features, and auxiliary tasks, such as next observation prediction, to stabilize learning and enhance exploratory behavior. The auxiliary task ensures meaningful representations are learned even before significant rewards are observed, reinforcing exploration based on learned feature activations.

Conclusion and Future Directions

This research highlights the versatility and effectiveness of SR as a basis for exploration bonuses in reinforcement learning. The introduction of theoretically justified algorithms using substochastic representations represents a leap toward more generalized exploration strategies that transcend tabular limitations in RL. Function approximation methods and integration with deep learning architectures open new avenues for exploration techniques that are universally applicable across different RL problems.

Further inquiry into the relationship between learned representations and exploration, potentially leveraging more advanced auxiliary tasks, remains an intriguing pathway for future exploration improvements. Additionally, formalizing PAC-MDP guarantees for SSR-based exploration could solidify its standing within theoretical RL frameworks.

PDF Markdown