Diffused Task-Agnostic Milestone Planner (2312.03395v1)

Published 6 Dec 2023 in cs.RO, cs.AI, and cs.LG

Abstract: Addressing decision-making problems using sequence modeling to predict future trajectories shows promising results in recent years. In this paper, we take a step further to leverage the sequence predictive method in wider areas such as long-term planning, vision-based control, and multi-task decision-making. To this end, we propose a method to utilize a diffusion-based generative sequence model to plan a series of milestones in a latent space and to have an agent to follow the milestones to accomplish a given task. The proposed method can learn control-relevant, low-dimensional latent representations of milestones, which makes it possible to efficiently perform long-term planning and vision-based control. Furthermore, our approach exploits generation flexibility of the diffusion model, which makes it possible to plan diverse trajectories for multi-task decision-making. We demonstrate the proposed method across offline reinforcement learning (RL) benchmarks and an visual manipulation environment. The results show that our approach outperforms offline RL methods in solving long-horizon, sparse-reward tasks and multi-task problems, while also achieving the state-of-the-art performance on the most challenging vision-based manipulation benchmark.

References (50)

Citations (6)

View on Semantic Scholar

Summary

The paper introduces DTAMP as a novel planning method that utilizes diffusion models to create latent milestones for guiding agents in complex tasks.
It leverages goal-conditioned imitation learning and classifier-free diffusion guidance to overcome tuning challenges of traditional offline RL methods.
Experimental evaluations on D4RL and CALVIN benchmarks demonstrate significant improvements in efficiency and performance on long-horizon, sparse-reward tasks.

Overview

The paper introduces a novel method for planning called Diffused Task-Agnostic Milestone Planner (DTAMP). It utilizes diffusion models for planning a sequence of intermediate goals, termed "milestones," in a latent space to guide an agent towards accomplishing a particular task. This approach is suitable for tasks with long-term planning horizons, sparse rewards, and requires decision-making across multiple tasks. The work showcases significant improvements in efficiency and effectiveness over existing methods in various benchmarks.

Method Development

DTAMP is developed to overcome challenges with instability and complexity in tuning that are present in offline reinforcement learning (RL) methods which rely on bootstrapping or temporal difference learning. Instead, DTAMP employs goal-conditioned imitation learning, which does not use bootstrapping, resulting in a method with reduced hyperparameters. An encoder extracts relevant features from high-dimensional observations, which are then used to compactly represent milestones as latent vectors. The encoder is jointly trained with a goal-conditioned actor and critic.

Planning with Diffusion Models

Diffusion models in DTAMP generate a series of milestones by reconstructing sequences from offline data, gradually denoising data to form a trajectory toward the goal state. The method guarantees this trajectory is efficient by using a novel technique called classifier-free diffusion guidance. This technique ensures shorter paths by considering the temporal distances between successive milestones. Thus, the planner can handle variations in tasks and aid in decision-making for multi-task challenges.

Performance Evaluation

Extensively tested on D4RL benchmark tasks and the CALVIN benchmark—the state-of-the-art in vision-based manipulation—DTAMP displays marked improvements, handling long-horizon and sparse-reward tasks commendably without bootstrapping methods. Results also demonstrate remarkable performance in multi-goal settings with minimal degradation compared to single-goal settings. Notably, the paper points out that DTAMP simplifies the inference process, requiring less computational power than conventional sequence modeling methods, thus enabling it to work in real-time control scenarios.

Challenges and Future Directions

The approach assumes that tasks in the offline data collection are relevant for the new tasks to be performed by the agent, limiting the model's ability to tackle unseen tasks. Also, while DTAMP can function efficiently without frequent replanning, the paper indicates that performance can potentially be enhanced if efficient replanning strategies are developed, suggesting a direction for further research. Overall, DTAMP is posited as a robust and versatile framework that advances task-agnostic planning for complex decision-making problems.

PDF Markdown