Count-Based Exploration in Feature Space for Reinforcement Learning (1706.08090v1)

Published 25 Jun 2017 in cs.AI

Abstract: We introduce a new count-based optimistic exploration algorithm for Reinforcement Learning (RL) that is feasible in environments with high-dimensional state-action spaces. The success of RL algorithms in these domains depends crucially on generalisation from limited training experience. Function approximation techniques enable RL agents to generalise in order to estimate the value of unvisited states, but at present few methods enable generalisation regarding uncertainty. This has prevented the combination of scalable RL algorithms with efficient exploration strategies that drive the agent to reduce its uncertainty. We present a new method for computing a generalised state visit-count, which allows the agent to estimate the uncertainty associated with any state. Our \phi-pseudocount achieves generalisation by exploiting same feature representation of the state space that is used for value function approximation. States that have less frequently observed features are deemed more uncertain. The \phi-Exploration-Bonus algorithm rewards the agent for exploring in feature space rather than in the untransformed state space. The method is simpler and less computationally expensive than some previous proposals, and achieves near state-of-the-art results on high-dimensional RL benchmarks.

Citations (117)

View on Semantic Scholar

Summary

The paper introduces a phi-pseudocount technique that generalizes state visit counts to efficiently guide exploration in reinforcement learning.
It employs linear function approximation with feature-based representations to reward novel state discovery, outperforming traditional epsilon-greedy approaches in sparse reward settings.
Results in Atari games like Montezuma's Revenge demonstrate its effectiveness, paving the way for future research on non-linear exploration strategies.

Count-Based Exploration in Feature Space for Reinforcement Learning

In the field of reinforcement learning (RL), efficient exploration is a pivotal component, particularly in high-dimensional state-action spaces. The paper "Count-Based Exploration in Feature Space for Reinforcement Learning" introduces a novel RL exploration strategy that hinges on a count-based approach within feature space. This method represents an advancement in environments where traditional exploration techniques fall short due to scalability challenges.

Theoretical Foundations and Methodology

Traditional count-based approaches struggle with scalability in large state-action spaces, primarily because they maintain visit counts for each state-action pair. The proposed method circumvents this by leveraging feature-based representations, which allow for a generalized understanding of state similarity. This is achieved through the introduction of the $\phi$ -pseudocount, which is a generalization of state visit counts computed to reflect state uncertainty. The method operates by constructing a visit-density model over the feature space, wherein states with rarer features are tagged as more uncertain, thus guiding the exploration process.

The $\phi$ -Exploration-Bonus algorithm is then devised, which rewards agents for exploring less familiar regions in the feature space rather than the raw state space. By employing linear function approximation (LFA), the algorithm seamlessly integrates with scalable RL methods, adding an exploration bonus calculated from these pseudocounts. This bonus incentivizes agents to explore areas of higher uncertainty, in line with the optimism in the face of uncertainty paradigm.

Results and Contributions

The algorithm was tested against a backdrop of high-dimensional RL challenges, specifically within video game environments from the Arcade Learning Environment (ALE). The results showcased that the $\phi$ -Exploration-Bonus method achieved near state-of-the-art performance, demonstrating significant improvements over $\epsilon$ -greedy strategies, particularly in environments with sparse rewards such as Montezuma's Revenge and Venture.

In scenarios where reward signals are well-shaped and dense, such as in Q*bert and Frostbite, the exploration bonus still yielded improvements over traditional methods but to a lesser extent. In Freeway, however, sensitivity to the exploration bonus parameter was observed, highlighting the requirement for tuning dependent on task-relevant features.

Implications and Future Directions

The paper paves the way for future exploration strategies that can efficiently utilize task-relevant features for estimating novelty and guiding exploration. While the current implementation focuses on environments amenable to linear approximations, extending this concept to non-linear function approximation remains an open area of research. Such an extension could potentially harness the expressive power of deep neural networks without the prohibitive computational overhead traditionally associated with count-based methods in high-dimensional settings.

The implications of this research are twofold: it offers a more computationally efficient model for managing exploration in vast environments, while also underscoring the utility of feature-based generalization in RL. Future developments could explore hybrid models incorporating non-linear approximators, potentially broadening the scope and applicability of $\phi$ -Exploration-Bonus in more complex and dynamic RL tasks.

In conclusion, the proposed $\phi$ -ET algorithm provides a valuable contribution to exploration strategies in reinforcement learning, benefitting from its simplicity and computational efficiency. It stands as a promising avenue for ongoing research, particularly in developing exploration strategies that remain effective as task complexity scales.

PDF Markdown

Related Papers

YouTube

Show All Videos