Discovering and Achieving Goals via World Models (2110.09514v1)

Published 18 Oct 2021 in cs.LG, cs.AI, cs.CV, cs.RO, and stat.ML

Abstract: How can artificial agents learn to solve many diverse tasks in complex visual environments in the absence of any supervision? We decompose this question into two problems: discovering new goals and learning to reliably achieve them. We introduce Latent Explorer Achiever (LEXA), a unified solution to these that learns a world model from image inputs and uses it to train an explorer and an achiever policy from imagined rollouts. Unlike prior methods that explore by reaching previously visited states, the explorer plans to discover unseen surprising states through foresight, which are then used as diverse targets for the achiever to practice. After the unsupervised phase, LEXA solves tasks specified as goal images zero-shot without any additional learning. LEXA substantially outperforms previous approaches to unsupervised goal-reaching, both on prior benchmarks and on a new challenging benchmark with a total of 40 test tasks spanning across four standard robotic manipulation and locomotion domains. LEXA further achieves goals that require interacting with multiple objects in sequence. Finally, to demonstrate the scalability and generality of LEXA, we train a single general agent across four distinct environments. Code and videos at https://orybkin.github.io/lexa/

Citations (111)

View on Semantic Scholar

Summary

The paper introduces LEXA, a dual-policy framework that separates exploration from goal achievement using latent world models.
The explorer policy employs ensemble-based epistemic uncertainty to drive discovery of novel goals via imagined rollouts.
The achiever policy optimizes reaching newly discovered states, outperforming existing unsupervised methods across 40 robotic tasks.

Overview of "Discovering and Achieving Goals via World Models"

This paper presents a novel approach to unsupervised reinforcement learning (RL) by introducing the Latent Explorer Achiever (LEXA), which enables artificial agents to autonomously discover and achieve numerous tasks within complex visual environments. The authors decompose the overarching challenge into two main objectives: goal discovery and goal achievement. Their approach leverages world models trained on image inputs to facilitate the learning of both an explorer policy that identifies new goals and an achiever policy that practices reaching these goals through imagined rollouts.

Methodological Contributions

The authors contribute a significant step forward in unsupervised goal-reaching by proposing LEXA, an agent capable of solving diverse tasks without any external supervision. The key innovation lies in separating the traditional goal-conditioned RL approach into a dual-policy framework using a learned world model. In particular, they advocate for the use of imagined states in the model's latent space to drive exploration and training, which allows the instantiated policies to discover and achieve goals without the need for task-specific reward functions.

Explorer Policy: Instead of revisiting previously seen states, LEXA's explorer seeks to discover surprising, unseen states through foresight. By utilizing disagreement among an ensemble of models over predicted future states, the explorer policy systematically visits states with high epistemic uncertainty, thus broadening the exploration space beyond the agent's current experiences.
Achiever Policy: The achiever learns to reach newly discovered states by leveraging imagined trajectories generated within the world model. This policy is conditioned on goal images and trained entirely in imagination, forgoing retrospective experience relabeling and thereby optimizing foresight as opposed to hindsight.
Evaluation Benchmark: The paper introduces an extensive and challenging benchmark that comprises 40 tasks across four robotic manipulation and locomotion domains, including tasks that involve multiple-object interactions. LEXA showcases robust performance by substantially outperforming existing unsupervised methods on this new benchmark.

Empirical Results

The empirical validation reveals that LEXA's innovative exploration strategy leads to markedly improved performance over prior methods on both established benchmarks and the newly introduced, more challenging scenarios. Specifically, LEXA's separation of explorer and achiever roles facilitates deeper and more diverse exploration, enabling the agent to solve tasks zero-shot - including those requiring intricate interactions with multiple objects or complex mechanical states.

Implications and Future Directions

The implications of this research are promising, both in practical applications and theoretical developments. Practically, LEXA's architecture offers a path toward automating the training of agents capable of performing a broad range of tasks without human-designed reward functions or task-specific guidance. This positions such agents as useful tools in settings like robotics, where dynamic and unpredictable environments may render traditional training approaches less effective.

Theoretically, LEXA's success in leveraging world models for unsupervised learning reinforces the potential of imagined rollouts and latent space exploration in tackling tasks beyond the traditionally narrow confines of supervised datasets. It opens avenues for further exploration into more efficient and generalizable model architectures and imagination-based policy training algorithms.

Future research may expand upon the current framework by integrating natural language processing to facilitate complex goal specifications, thereby broadening the spectrum of achievable tasks. Additionally, validating LEXA's efficacy in real-world scenarios could address concerns of scalability and generalization, laying the groundwork for more autonomous and versatile AI systems in varied domains.

PDF Markdown

Related Papers

GitHub

https://orybkin.github.io/lexa/

YouTube

Show All Videos