Planning with Goal-Conditioned Policies (1911.08453v1)

Published 19 Nov 2019 in cs.LG, cs.AI, cs.RO, and stat.ML

Abstract: Planning methods can solve temporally extended sequential decision making problems by composing simple behaviors. However, planning requires suitable abstractions for the states and transitions, which typically need to be designed by hand. In contrast, model-free reinforcement learning (RL) can acquire behaviors from low-level inputs directly, but often struggles with temporally extended tasks. Can we utilize reinforcement learning to automatically form the abstractions needed for planning, thus obtaining the best of both approaches? We show that goal-conditioned policies learned with RL can be incorporated into planning, so that a planner can focus on which states to reach, rather than how those states are reached. However, with complex state observations such as images, not all inputs represent valid states. We therefore also propose using a latent variable model to compactly represent the set of valid states for the planner, so that the policies provide an abstraction of actions, and the latent variable model provides an abstraction of states. We compare our method with planning-based and model-free methods and find that our method significantly outperforms prior work when evaluated on image-based robot navigation and manipulation tasks that require non-greedy, multi-staged behavior.

Authors (4)

Soroush Nasiriany (17 papers)
Vitchyr H. Pong (6 papers)
Steven Lin (6 papers)
Sergey Levine (531 papers)

Citations (202)

View on Semantic Scholar

Summary

Planning with Goal-Conditioned Policies

Recent advancements in combining model-based and model-free reinforcement learning (RL) techniques have sought to address complex, temporally extended decision-making problems. This paper titled "Planning with Goal-Conditioned Policies" presents a novel approach leveraging goal-conditioned policies to facilitate planning in environments with high-dimensional observations, such as images, where direct model-based approaches often face significant hurdles.

Overview

The central problem addressed by this research is how to effectively plan and perform tasks that involve sequential decision-making and long-term reasoning without requiring explicit state transition models. Traditional model-free RL methods excel at learning from low-level inputs but struggle with long-horizon tasks. Conversely, model-based methods require detailed models, which are often challenging to learn, particularly in environments with high-dimensional input spaces.

The proposed solution is to use goal-conditioned RL to learn goal-reaching policies, which are then utilized in a model-based planning context. This approach integrates the strengths of both model-based and model-free methodologies by abstracting the planning process to focus on which goals to achieve rather than the specifics of state transitions.

Core Contributions

Goal-Conditioned Policies for Temporal Abstraction: The research introduces goal-conditioned policies that allow the planner to focus on higher-level subgoals rather than low-level details. By conditioning policies on specific goals and using their corresponding value functions, the approach facilitates temporal abstraction, thus enabling the planner to navigate complex tasks by composing simpler sub-tasks.
Latent Variable Model for State Representation: The paper proposes a latent variable model to compactly represent valid states in environments with complex observations like images. This model provides an abstraction layer, allowing the planner to operate effectively on a reduced-dimensionality representation of the state space, greatly simplifying the planning process.
Latent Embeddings for Abstracted Planning (LEAP): The integration of goal-conditioned policies and latent state representations culminates in the LEAP method, which employs a combination of model-free RL for short-horizon goals and model-based planning in a learned latent space to solve long-horizon tasks.

Experimental Evaluation

The proposed method was evaluated on complex, multi-stage robotic tasks requiring non-greedy behavior, demonstrating significant improvements over traditional model-free and planning-based approaches. In environments such as image-based robot navigation and manipulation, LEAP outperformed baseline methods by a substantial margin, highlighting its effectiveness in managing temporally extended sequences and high-dimensional inputs.

Implications and Future Directions

The implications of this research are considerable for various domains requiring long-horizon planning and decision-making under uncertainty. The use of latent spaces for state abstraction not only improves planning efficiency but also hints at the potential application of these techniques in other high-dimensional problem domains, such as autonomous driving or large-scale strategy games.

Future developments could explore improvements in exploration strategies to further enhance the efficacy of goal-conditioned policies. Additionally, extending this framework to handle continuous state representations or multi-agent environments could broaden its applicability and performance.

In conclusion, the paper presents a compelling argument for leveraging RL to build effective abstractions that can be used to simplify and enhance planning processes in high-dimensional and complex environments. The synergy of model-free goal-conditioned learning with model-based planning marks a promising direction for future research in reinforcement learning and planning integration.

PDF Markdown

Related Papers

Find Related Papers