Diffused Task-Agnostic Milestone Planner (2312.03395v1)
Abstract: Addressing decision-making problems using sequence modeling to predict future trajectories shows promising results in recent years. In this paper, we take a step further to leverage the sequence predictive method in wider areas such as long-term planning, vision-based control, and multi-task decision-making. To this end, we propose a method to utilize a diffusion-based generative sequence model to plan a series of milestones in a latent space and to have an agent to follow the milestones to accomplish a given task. The proposed method can learn control-relevant, low-dimensional latent representations of milestones, which makes it possible to efficiently perform long-term planning and vision-based control. Furthermore, our approach exploits generation flexibility of the diffusion model, which makes it possible to plan diverse trajectories for multi-task decision-making. We demonstrate the proposed method across offline reinforcement learning (RL) benchmarks and an visual manipulation environment. The results show that our approach outperforms offline RL methods in solving long-horizon, sparse-reward tasks and multi-task problems, while also achieving the state-of-the-art performance on the most challenging vision-based manipulation benchmark.
- Is conditional generative modeling all you need for decision-making. arXiv preprint arXiv:2211.15657, Dec 2022.
- Hindsight experience replay. In Advances in Neural Information Processing Systems, Long Beach, US, Dec 2017.
- Decision transformer: Reinforcement learning via sequence modeling. In Advances in Neural Information Processing Systems, Virtual conference, Dec 2021.
- Diffusion policy: Visuomotor policy learning via action diffusion. In Proceedings of Robotics: Science and Systems (RSS), Daegu, KR, Jul 2023.
- Diffusion models beat GANs on image synthesis. In Advances in Neural Information Processing Systems, Virutal Conference, Dec 2021.
- Replacing rewards with examples: Example-based policy search via recursive classification. In Advances in Neural Information Processing Systems, Virtual Conference, Dec 2021.
- Contrastive learning as goal-conditioned reinforcement learning. In Advances in Neural Information Processing Systems, New Orleans, US, Dec 2022.
- D4RL: Datasets for deep data-driven reinforcement learning. arXiv preprint arXiv:2004.07219, Apr 2020.
- A minimalist approach to offline reinforcement learning. In Advances in Neural Information Processing Systems, Virtual conference, Dec 2021.
- Addressing function approximiation error in actor-critic methods. In Proceedings of International Conference on Machine Learning, Stockholm, SE, Jul 2018.
- Off-policy deep reinforcement learning without exploration. In Proceedings of the International Conference on Machine Learning, Long Beach, US, Jun 2019.
- Byol-explore: Exploration by bootstrapped prediction. In Advances in neural information processing systems, New Orleans, US, Dec 2022.
- Relay policy learning: Solving long-horizon tasks via imitation and reinforcement learning. In Proceedings of the Conference on Robot Learning, Osaka, JP, Oct 2019.
- Learning to walk via deep reinforcement learning. In Proceedings of the Robotics: Science and Systems Conference, Messe Freiburg, DE, Jun 2019.
- Flexible diffusion modeling of long videos. In Advances in Neural Information Processing Systems, Virtual Conference, Dec 2022.
- Latent video diffusion models for high-fidelity long video generation. arXiv preprint arXiv:2211.13221, Mar 2023.
- Classifier-free diffusion guidance. arXiv preprint arXiv:2207.12598, Jul 2022.
- Denoising diffusion probabilistic models. In Advances in Neural Information Processing Systems, Virtual conference, Dec 2020.
- Video diffusion models. arXiv preprint arXiv:2204.03458, Apr 2022.
- Offline reinforcement learning as one big sequence modeling problem. In Advances in Neural Information Processing Systems, Virtual conference, Dec 2021.
- Planning with diffusion for flexible behavior synthesis. In Proceedings of the International Conference on Machine Learning, Baltimore, US, Jul 2022.
- MOReL: Model-based offline reinforcement learning. In Advances in Neural Information Processing Systems, Virtual Conference, Dec 2020.
- Adam: A method for stochastic optimization. In Proceedings of International Conference on Learning Representations, San Diego, US, May 2015.
- Offline reinforcement learning with Fisher divergence critic regularization. In Proceedings of the International Conference on Machine Learning, Virtual Conference, Jul 2021.
- Offline reinforcement learning with implicit q-learning. In Proceedings of the International Conference on Learning Representations, Virtual conference, Apr 2022.
- Conservative q-learning for offline reinforcement learning. In Advances in Neural Information Processing Systems, Virtual conference, Dec 2020.
- Continuous control with deep reinforcement learning. In Proceedings of International Conference on Learning Representations, San Juan, PR, May 2016.
- Learning latent plans from play. In Proceedings of the Conference on Robot Learning, Osaka, JP, Oct 2019.
- CALVIN: A benchmark for language-conditioned policy learning for long-horizon robot manipulation tasks. IEEE Robotics and Automation Letters, 7(3):7327–7334, Jul 2022.
- Playing atari with deep reinforcement learning. arXiv preprint arXiv:1312.5602, Dec 2013.
- GLIDE: Towards photorealistic image generation and editing with text-guided diffusion models. In Proceedings of the International Conference on Machine Learning, Baltimore, US, Jul 2022.
- Advantage-weighted regression: Simple and scalable off-policy reinforcement learning. arXiv preprint arXiv:1910.00177, Oct 2019.
- Hierarchical text-conditional image generation with CLIP latents. arXiv preprint arXiv:2204.06125, Apr 2022.
- High-resolution image synthesis with latent diffusion models. In Proceddings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, US, Jun 2022.
- Latent plans for task agnostic offline reinforcement learning. In Proceedings of the Conference on Robot Learning, Auckland, NZ, Dec 2022.
- Photorealistic text-to-image diffusion models with deep language understanding. In Advances in Neural Information Processing Systems, New Orleans, US, Dec 2022.
- Mastering the game of Go with deep neural networks and tree search. Nature, 529:484–489, Jan 2016.
- Denoising diffusion implicit models. In Proceedings of the International Conference on Learning Representations, Virtual Conference, May 2021.
- Generative modeling by estimating gradients of the data distribution. In Advances in Neural Information Processing Systems, Vancouver, CA, Dec 2019.
- Universal planning networks. In Proceedings of International Conference on Machine Learning, Stockholm, SE, Jul 2018.
- Model-based visual planning with self-supervised functional distances. In Proceedings of the International Conference on Learning Representations, Virtual Conference, May 2021.
- SE(3)-DiffusionFields: Learning smooth cost functions for joint grasp and motion optimization through diffusion. arXiv preprint arXiv:2209.03855, Mar 2023.
- Laurens Van der Maaten and Geoffrey Hinton. Visualizing data using t-sne. Journal of machine learning research, 9:2579–2605, Nov 2008.
- Deep reinforcement learning with double q-learning. In Proceedings of AAAI Conference on Artificial Intelligence, Phoenix, US, Feb 2016.
- Attention in all you need. In Advances in Neural Information Processing Systems, Long Beach, US, Dec 2017.
- Diffusion policies as an expressive policy class for offline reinforcement learning. arXiv preprint arXiv:2208.06193, Aug 2022.
- Behavior regularized offline reinforcement learning. arXiv preprint arXiv:1911.11361, Nov 2019.
- Optimal uniform OPE and model-based offline reinforcement learning in time-homogeneous, reward-free and task-agnostic settings. In Advances in Neural Information Processing Systems, Virtual Conference, Dec 2021.
- Skills regularized task decomposition for multi-task offline reinforcement learning. In Advances in Neural Information Processing Systems, Long Beach, US, Dec 2022.
- MagicVideo: Efficient video generation with latent diffusion models. arXiv preprint arXiv:2211.11018, Nov 2022.