Learning Latent Dynamics for Planning from Pixels (1811.04551v5)

Published 12 Nov 2018 in cs.LG, cs.AI, and stat.ML

Abstract: Planning has been very successful for control tasks with known environment dynamics. To leverage planning in unknown environments, the agent needs to learn the dynamics from interactions with the world. However, learning dynamics models that are accurate enough for planning has been a long-standing challenge, especially in image-based domains. We propose the Deep Planning Network (PlaNet), a purely model-based agent that learns the environment dynamics from images and chooses actions through fast online planning in latent space. To achieve high performance, the dynamics model must accurately predict the rewards ahead for multiple time steps. We approach this using a latent dynamics model with both deterministic and stochastic transition components. Moreover, we propose a multi-step variational inference objective that we name latent overshooting. Using only pixel observations, our agent solves continuous control tasks with contact dynamics, partial observability, and sparse rewards, which exceed the difficulty of tasks that were previously solved by planning with learned models. PlaNet uses substantially fewer episodes and reaches final performance close to and sometimes higher than strong model-free algorithms.

Authors (7)

Danijar Hafner (32 papers)
Timothy Lillicrap (60 papers)
Ian Fischer (30 papers)
Ruben Villegas (20 papers)
David Ha (30 papers)
Honglak Lee (174 papers)
James Davidson (15 papers)

Citations (1,314)

View on Semantic Scholar

Summary

Learning Latent Dynamics for Planning from Pixels

In the paper titled "Learning Latent Dynamics for Planning from Pixels," Hafner et al. introduce the Deep Planning Network (PlaNet), a purely model-based reinforcement learning (MBRL) approach that emphasizes latent space planning. The agent learns the environment dynamics directly from image observations and performs online planning efficiently within that learned latent space.

Core Contributions

Recurrent State Space Model (RSSM): The latent dynamics model integrates both deterministic and stochastic transition components. This hybrid approach effectively overcomes the limitation of purely deterministic or stochastic models. The deterministic path enables the model to retain and utilize temporal information across multiple steps, while the stochastic path allows for the capture of uncertainties and multiple possible futures.
Latent Overshooting: The authors propose a novel multi-step variational inference technique called latent overshooting. This method involves regularizing multi-step predictions in the latent space, significantly improving the model's long-term prediction accuracy without adding computational overhead. It addresses the shortcomings of standard variational bounds, which mainly focus on one-step predictions.
Planning Performance and Efficiency: PlaNet uses the Cross Entropy Method (CEM) for planning in latent space. The empirical results exhibit that PlaNet achieves superior or comparable final performance to state-of-the-art model-free algorithms such as D4PG while using substantially fewer interactions with the environment. For instance, PlaNet outperforms A3C, achieving performance within similar computational time but with approximately 200 times fewer episodes. The results validate the efficacy of planning within a compact latent space derived from image-based observations.

Experimental Evaluation

The paper presents comprehensive empirical evaluations across six challenging continuous control tasks using the DeepMind Control Suite. These tasks include:

Cartpole Swing Up
Reacher Easy
Cheetah Run
Finger Spin
Cup Catch
Walker Walk

Key findings include:

The PlaNet agent significantly outperforms A3C, trained using proprioceptive states, within a mere fraction of episodes.
On tasks such as Cheetah Run, PlaNet not only achieves higher final performance than D4PG trained from pixels but also does so with markedly higher data efficiency.
Modeling both deterministic and stochastic transitions is crucial, with the recurrent state-space model (RSSM) showcasing the most robust and consistent performance.

Implications and Future Directions

The paper addresses several critical aspects of MBRL, particularly in image-based domains. The use of latent space planning contributes to improved data efficiency and scalable performance across complex control tasks. The implications extend to various applications within robotics, autonomous systems, and beyond, where learning dynamics from high-dimensional sensory inputs is vital.

Future Work:

Exploring hierarchical models for temporal abstraction can reduce the dependency on fixed action repeat parameters.
Integrating value functions to extend reward summarization beyond the immediate planning horizon could significantly enhance long-term decision-making.
Investigating gradient-based planning methods could further enhance computational efficiency.
Extending the model to handle environments with higher visual diversity without sacrificing performance will be an essential next step.

Summary

Hafner et al.'s work on PlaNet represents a significant advancement in model-based reinforcement learning through demonstrating effective latent space planning from pixel observations. By innovating on latent dynamics modeling and planning, this research sets a solid foundation for future developments that aim to bridge the gap between model-based and model-free learning in visually complex environments.

PDF Markdown

Related Papers

Find Related Papers

YouTube

Show All Videos