Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
169 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Diffused Task-Agnostic Milestone Planner (2312.03395v1)

Published 6 Dec 2023 in cs.RO, cs.AI, and cs.LG

Abstract: Addressing decision-making problems using sequence modeling to predict future trajectories shows promising results in recent years. In this paper, we take a step further to leverage the sequence predictive method in wider areas such as long-term planning, vision-based control, and multi-task decision-making. To this end, we propose a method to utilize a diffusion-based generative sequence model to plan a series of milestones in a latent space and to have an agent to follow the milestones to accomplish a given task. The proposed method can learn control-relevant, low-dimensional latent representations of milestones, which makes it possible to efficiently perform long-term planning and vision-based control. Furthermore, our approach exploits generation flexibility of the diffusion model, which makes it possible to plan diverse trajectories for multi-task decision-making. We demonstrate the proposed method across offline reinforcement learning (RL) benchmarks and an visual manipulation environment. The results show that our approach outperforms offline RL methods in solving long-horizon, sparse-reward tasks and multi-task problems, while also achieving the state-of-the-art performance on the most challenging vision-based manipulation benchmark.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (50)
  1. Is conditional generative modeling all you need for decision-making. arXiv preprint arXiv:2211.15657, Dec 2022.
  2. Hindsight experience replay. In Advances in Neural Information Processing Systems, Long Beach, US, Dec 2017.
  3. Decision transformer: Reinforcement learning via sequence modeling. In Advances in Neural Information Processing Systems, Virtual conference, Dec 2021.
  4. Diffusion policy: Visuomotor policy learning via action diffusion. In Proceedings of Robotics: Science and Systems (RSS), Daegu, KR, Jul 2023.
  5. Diffusion models beat GANs on image synthesis. In Advances in Neural Information Processing Systems, Virutal Conference, Dec 2021.
  6. Replacing rewards with examples: Example-based policy search via recursive classification. In Advances in Neural Information Processing Systems, Virtual Conference, Dec 2021.
  7. Contrastive learning as goal-conditioned reinforcement learning. In Advances in Neural Information Processing Systems, New Orleans, US, Dec 2022.
  8. D4RL: Datasets for deep data-driven reinforcement learning. arXiv preprint arXiv:2004.07219, Apr 2020.
  9. A minimalist approach to offline reinforcement learning. In Advances in Neural Information Processing Systems, Virtual conference, Dec 2021.
  10. Addressing function approximiation error in actor-critic methods. In Proceedings of International Conference on Machine Learning, Stockholm, SE, Jul 2018.
  11. Off-policy deep reinforcement learning without exploration. In Proceedings of the International Conference on Machine Learning, Long Beach, US, Jun 2019.
  12. Byol-explore: Exploration by bootstrapped prediction. In Advances in neural information processing systems, New Orleans, US, Dec 2022.
  13. Relay policy learning: Solving long-horizon tasks via imitation and reinforcement learning. In Proceedings of the Conference on Robot Learning, Osaka, JP, Oct 2019.
  14. Learning to walk via deep reinforcement learning. In Proceedings of the Robotics: Science and Systems Conference, Messe Freiburg, DE, Jun 2019.
  15. Flexible diffusion modeling of long videos. In Advances in Neural Information Processing Systems, Virtual Conference, Dec 2022.
  16. Latent video diffusion models for high-fidelity long video generation. arXiv preprint arXiv:2211.13221, Mar 2023.
  17. Classifier-free diffusion guidance. arXiv preprint arXiv:2207.12598, Jul 2022.
  18. Denoising diffusion probabilistic models. In Advances in Neural Information Processing Systems, Virtual conference, Dec 2020.
  19. Video diffusion models. arXiv preprint arXiv:2204.03458, Apr 2022.
  20. Offline reinforcement learning as one big sequence modeling problem. In Advances in Neural Information Processing Systems, Virtual conference, Dec 2021.
  21. Planning with diffusion for flexible behavior synthesis. In Proceedings of the International Conference on Machine Learning, Baltimore, US, Jul 2022.
  22. MOReL: Model-based offline reinforcement learning. In Advances in Neural Information Processing Systems, Virtual Conference, Dec 2020.
  23. Adam: A method for stochastic optimization. In Proceedings of International Conference on Learning Representations, San Diego, US, May 2015.
  24. Offline reinforcement learning with Fisher divergence critic regularization. In Proceedings of the International Conference on Machine Learning, Virtual Conference, Jul 2021.
  25. Offline reinforcement learning with implicit q-learning. In Proceedings of the International Conference on Learning Representations, Virtual conference, Apr 2022.
  26. Conservative q-learning for offline reinforcement learning. In Advances in Neural Information Processing Systems, Virtual conference, Dec 2020.
  27. Continuous control with deep reinforcement learning. In Proceedings of International Conference on Learning Representations, San Juan, PR, May 2016.
  28. Learning latent plans from play. In Proceedings of the Conference on Robot Learning, Osaka, JP, Oct 2019.
  29. CALVIN: A benchmark for language-conditioned policy learning for long-horizon robot manipulation tasks. IEEE Robotics and Automation Letters, 7(3):7327–7334, Jul 2022.
  30. Playing atari with deep reinforcement learning. arXiv preprint arXiv:1312.5602, Dec 2013.
  31. GLIDE: Towards photorealistic image generation and editing with text-guided diffusion models. In Proceedings of the International Conference on Machine Learning, Baltimore, US, Jul 2022.
  32. Advantage-weighted regression: Simple and scalable off-policy reinforcement learning. arXiv preprint arXiv:1910.00177, Oct 2019.
  33. Hierarchical text-conditional image generation with CLIP latents. arXiv preprint arXiv:2204.06125, Apr 2022.
  34. High-resolution image synthesis with latent diffusion models. In Proceddings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, US, Jun 2022.
  35. Latent plans for task agnostic offline reinforcement learning. In Proceedings of the Conference on Robot Learning, Auckland, NZ, Dec 2022.
  36. Photorealistic text-to-image diffusion models with deep language understanding. In Advances in Neural Information Processing Systems, New Orleans, US, Dec 2022.
  37. Mastering the game of Go with deep neural networks and tree search. Nature, 529:484–489, Jan 2016.
  38. Denoising diffusion implicit models. In Proceedings of the International Conference on Learning Representations, Virtual Conference, May 2021.
  39. Generative modeling by estimating gradients of the data distribution. In Advances in Neural Information Processing Systems, Vancouver, CA, Dec 2019.
  40. Universal planning networks. In Proceedings of International Conference on Machine Learning, Stockholm, SE, Jul 2018.
  41. Model-based visual planning with self-supervised functional distances. In Proceedings of the International Conference on Learning Representations, Virtual Conference, May 2021.
  42. SE(3)-DiffusionFields: Learning smooth cost functions for joint grasp and motion optimization through diffusion. arXiv preprint arXiv:2209.03855, Mar 2023.
  43. Laurens Van der Maaten and Geoffrey Hinton. Visualizing data using t-sne. Journal of machine learning research, 9:2579–2605, Nov 2008.
  44. Deep reinforcement learning with double q-learning. In Proceedings of AAAI Conference on Artificial Intelligence, Phoenix, US, Feb 2016.
  45. Attention in all you need. In Advances in Neural Information Processing Systems, Long Beach, US, Dec 2017.
  46. Diffusion policies as an expressive policy class for offline reinforcement learning. arXiv preprint arXiv:2208.06193, Aug 2022.
  47. Behavior regularized offline reinforcement learning. arXiv preprint arXiv:1911.11361, Nov 2019.
  48. Optimal uniform OPE and model-based offline reinforcement learning in time-homogeneous, reward-free and task-agnostic settings. In Advances in Neural Information Processing Systems, Virtual Conference, Dec 2021.
  49. Skills regularized task decomposition for multi-task offline reinforcement learning. In Advances in Neural Information Processing Systems, Long Beach, US, Dec 2022.
  50. MagicVideo: Efficient video generation with latent diffusion models. arXiv preprint arXiv:2211.11018, Nov 2022.
Citations (6)

Summary

  • The paper introduces DTAMP as a novel planning method that utilizes diffusion models to create latent milestones for guiding agents in complex tasks.
  • It leverages goal-conditioned imitation learning and classifier-free diffusion guidance to overcome tuning challenges of traditional offline RL methods.
  • Experimental evaluations on D4RL and CALVIN benchmarks demonstrate significant improvements in efficiency and performance on long-horizon, sparse-reward tasks.

Overview

The paper introduces a novel method for planning called Diffused Task-Agnostic Milestone Planner (DTAMP). It utilizes diffusion models for planning a sequence of intermediate goals, termed "milestones," in a latent space to guide an agent towards accomplishing a particular task. This approach is suitable for tasks with long-term planning horizons, sparse rewards, and requires decision-making across multiple tasks. The work showcases significant improvements in efficiency and effectiveness over existing methods in various benchmarks.

Method Development

DTAMP is developed to overcome challenges with instability and complexity in tuning that are present in offline reinforcement learning (RL) methods which rely on bootstrapping or temporal difference learning. Instead, DTAMP employs goal-conditioned imitation learning, which does not use bootstrapping, resulting in a method with reduced hyperparameters. An encoder extracts relevant features from high-dimensional observations, which are then used to compactly represent milestones as latent vectors. The encoder is jointly trained with a goal-conditioned actor and critic.

Planning with Diffusion Models

Diffusion models in DTAMP generate a series of milestones by reconstructing sequences from offline data, gradually denoising data to form a trajectory toward the goal state. The method guarantees this trajectory is efficient by using a novel technique called classifier-free diffusion guidance. This technique ensures shorter paths by considering the temporal distances between successive milestones. Thus, the planner can handle variations in tasks and aid in decision-making for multi-task challenges.

Performance Evaluation

Extensively tested on D4RL benchmark tasks and the CALVIN benchmark—the state-of-the-art in vision-based manipulation—DTAMP displays marked improvements, handling long-horizon and sparse-reward tasks commendably without bootstrapping methods. Results also demonstrate remarkable performance in multi-goal settings with minimal degradation compared to single-goal settings. Notably, the paper points out that DTAMP simplifies the inference process, requiring less computational power than conventional sequence modeling methods, thus enabling it to work in real-time control scenarios.

Challenges and Future Directions

The approach assumes that tasks in the offline data collection are relevant for the new tasks to be performed by the agent, limiting the model's ability to tackle unseen tasks. Also, while DTAMP can function efficiently without frequent replanning, the paper indicates that performance can potentially be enhanced if efficient replanning strategies are developed, suggesting a direction for further research. Overall, DTAMP is posited as a robust and versatile framework that advances task-agnostic planning for complex decision-making problems.