Explore, Discover and Learn: Unsupervised Discovery of State-Covering Skills (2002.03647v4)

Published 10 Feb 2020 in cs.LG, cs.AI, and stat.ML

Abstract: Acquiring abilities in the absence of a task-oriented reward function is at the frontier of reinforcement learning research. This problem has been studied through the lens of empowerment, which draws a connection between option discovery and information theory. Information-theoretic skill discovery methods have garnered much interest from the community, but little research has been conducted in understanding their limitations. Through theoretical analysis and empirical evidence, we show that existing algorithms suffer from a common limitation -- they discover options that provide a poor coverage of the state space. In light of this, we propose 'Explore, Discover and Learn' (EDL), an alternative approach to information-theoretic skill discovery. Crucially, EDL optimizes the same information-theoretic objective derived from the empowerment literature, but addresses the optimization problem using different machinery. We perform an extensive evaluation of skill discovery methods on controlled environments and show that EDL offers significant advantages, such as overcoming the coverage problem, reducing the dependence of learned skills on the initial state, and allowing the user to define a prior over which behaviors should be learned. Code is publicly available at https://github.com/victorcampos7/edl.

Authors (6)

Víctor Campos (21 papers)
Alexander Trott (10 papers)
Caiming Xiong (337 papers)
Richard Socher (115 papers)
Xavier Giro-i-Nieto (69 papers)
Jordi Torres (25 papers)

Citations (141)

View on Semantic Scholar

Summary

Explore, Discover and Learn: Unsupervised Discovery of State-Covering Skills

The paper "Explore, Discover and Learn: Unsupervised Discovery of State-Covering Skills" addresses a critical problem in reinforcement learning (RL), specifically the challenge of discovering useful skills in the absence of task-specific rewards. The authors propose a novel framework called Explore, Discover and Learn (EDL), which leverages information-theoretic methods to discover skills that provide comprehensive coverage of the state space, overcoming the limitations observed in existing methods.

Central Contributions and Findings

Theoretical Critique of Existing Methods: The paper critiques existing approaches to skill discovery which typically use mutual information-based objectives. Through theoretical analysis and empirical evidence, it demonstrates that these methods often suffer from a significant limitation—they tend to discover options providing poor state coverage. This is because these methods optimize mutual information without adequately addressing state exploration, causing the algorithms to prematurely commit to behaviors induced by the initial, often random, policy configuration.
Introduction of EDL: EDL is proposed as a solution to this limitation. Unlike existing approaches, EDL separates the processes of exploration, skill discovery, and skill learning into distinct stages. This allows the agent to first explore the environment using a fixed distribution over states, drawn uniformly or guided by variational inference techniques. This is followed by a skill discovery phase using a variational autoencoder (VAE) to model skill distributions, and finally, a skill learning phase where the RL policy optimizes skill execution in the environment.
Empirical Validation: The paper includes extensive experiments on controlled environments such as 2D mazes to validate the effectiveness of EDL. The results indicate that EDL successfully discovers state-covering skills across various challenging environments, including those with bottleneck states where existing methods fail.
Handling of State and Skill Dependencies: EDL proves to be robust to changes in the state distribution, which is a critical advantage over existing methods that heavily rely on the initial state distribution for skill formation. By decoupling exploration from skill discovery, EDL can incorporate priors about which behaviors might be relevant, improving its applicability in user-defined tasks.
Skill Interpolation: An interesting observation is that EDL generates a meaningful latent space for skills, allowing interpolation between learned skills to produce new, previously unseen behaviors. This property might enhance adaptive capabilities in future RL systems.

Implications and Future Directions

Enhanced Exploration Mechanisms: EDL’s approach could inspire advancements in exploration strategies, particularly in complex environments where exploration is inherently difficult due to sparse rewards or deceptive pathways.
Broader Application: The ability to incorporate state coverage and leverage priors is particularly beneficial for learning behaviors in vast state spaces or complex environments such as robotics, navigation, and autonomous systems.
Potential for Meta-RL: There is potential to integrate EDL within the meta-reinforcement learning framework, where it could not only discover individual skills but also automate the discovery of task hierarchies without human intervention.
Exploration of State Embeddings: Future work might explore improved embedding techniques for states, possibly employing learned metrics that reflect transition complexity within the environment, thus enhancing skill utility and transferability.

The paper thus provides a promising avenue for unsupervised skill discovery while addressing fundamental limitations in current methodologies, laying the groundwork for scalable and adaptable RL solutions.

PDF Markdown

Related Papers

GitHub

GitHub - victorcampos7/edl: Code for "Explore, Discover and Learn: Unsupervised Discovery of State-Covering Skills" (36 stars)