Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Efficient Planning with Latent Diffusion (2310.00311v1)

Published 30 Sep 2023 in cs.LG and cs.AI

Abstract: Temporal abstraction and efficient planning pose significant challenges in offline reinforcement learning, mainly when dealing with domains that involve temporally extended tasks and delayed sparse rewards. Existing methods typically plan in the raw action space and can be inefficient and inflexible. Latent action spaces offer a more flexible paradigm, capturing only possible actions within the behavior policy support and decoupling the temporal structure between planning and modeling. However, current latent-action-based methods are limited to discrete spaces and require expensive planning. This paper presents a unified framework for continuous latent action space representation learning and planning by leveraging latent, score-based diffusion models. We establish the theoretical equivalence between planning in the latent action space and energy-guided sampling with a pretrained diffusion model and incorporate a novel sequence-level exact sampling method. Our proposed method, $\texttt{LatentDiffuser}$, demonstrates competitive performance on low-dimensional locomotion control tasks and surpasses existing methods in higher-dimensional tasks.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (134)
  1. Diffusion policies for out-of-distribution generalization in offline reinforcement learning. arXiv preprint arXiv:2307.04726, 2023.
  2. Learning to see by moving. In ICCV, 2015.
  3. Opal: Offline primitive discovery for accelerating offline reinforcement learning. In ICLR, 2021.
  4. Is conditional generative modeling all you need for decision-making? In ICLR, 2023.
  5. Laser: Learning a latent action space for efficient reinforcement learning. In ICRA, 2021.
  6. Hindsight experience replay. In NeurIPS, 2017.
  7. The option-critic architecture. In AAAI, 2017.
  8. Equivariant energy-guided SDE for inverse molecular design. In ICLR, 2023.
  9. Sequence modeling is a robust contender for offline reinforcement learning. arXiv preprint arXiv:2305.14550, 2023.
  10. Imitating complex trajectories: Bridging low-level stability and high-level behavior. arXiv preprint arXiv:2307.14619, 2023.
  11. Planning with sequence models through iterative energy minimization. In ICLR, 2023a.
  12. Offline reinforcement learning via high-fidelity generative behavior modeling. In ICLR, 2023b.
  13. Decision transformer: Reinforcement learning via sequence modeling. In NeurIPS, 2021.
  14. Lapo: Latent-variable advantage-weighted policy optimization for offline reinforcement learning. In NeurIPS, 2022.
  15. Iql-td-mpc: Implicit q-learning for hierarchical model predictive control. arXiv preprint arXiv:2306.00867, 2023.
  16. Diffusion posterior sampling for general noisy inverse problems. In ICLR, 2023.
  17. Self-consistent trajectory autoencoder: Hierarchical reinforcement learning with trajectory embeddings. In ICML, 2018.
  18. Continuous control with action quantization from demonstrations. In ICML, 2022.
  19. Autonomic discovery of subgoals in hierarchical reinforcement learning. The Journal of China Universities of Posts and Telecommunications, 21(5):94–104, 2014.
  20. Tree-based batch mode reinforcement learning. Journal of Machine Learning Research, 6, 2005.
  21. Search on the replay buffer: Bridging planning and reinforcement learning. In NeurIPS, 2019.
  22. Mismatched no more: Joint model-policy optimization for model-based rl. In NeurIPS, 2022.
  23. Semi-markov offline reinforcement learning for healthcare. In CHIL, 2022.
  24. D4RL: Datasets for deep data-driven reinforcement learning. arXiv preprint arXiv:2004.07219, 2020.
  25. A minimalist approach to offline reinforcement learning. In NeurIPS, 2021.
  26. Off-policy deep reinforcement learning without exploration. In ICML, 2019.
  27. Offline rl policies should be trained to be adaptive. In ICML, 2022.
  28. Know your boundaries: The necessity of explicit behavioral cloning in offline rl. arXiv preprint arXiv:2206.00695, 2022.
  29. Diffusion models as plug-and-play priors. In NeurIPS, 2022.
  30. K-means clustering based reinforcement learning algorithm for automatic control in robots. International Journal of Simulation Systems, Science & Technology, 17(24):6.1–6.6, 2016.
  31. Learning latent dynamics for planning from pixels. In ICML, 2019.
  32. Mastering atari with discrete world models. In ICLR, 2021.
  33. Mastering diverse domains through world models. arXiv preprint arXiv:2301.04104, 2023.
  34. On the role of planning in model-based deep reinforcement learning. In ICLR, 2021.
  35. Diffusion model is an effective planner and data synthesizer for multi-task reinforcement learning. arXiv preprint arXiv:2305.18459, 2023.
  36. Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415, 2016.
  37. Classifier-free diffusion guidance. arXiv preprint arXiv:2207.12598, 2022.
  38. Denoising diffusion probabilistic models. In NeurIPS, 2020.
  39. Instructed diffuser with temporal condition guidance for offline reinforcement learning. arXiv preprint arXiv:2306.04875, 2023.
  40. When to trust your model: Model-based policy optimization. In NeurIPS, 2019.
  41. Offline reinforcement learning as one big sequence modeling problem. In NeurIPS, 2021.
  42. Planning with diffusion for flexible behavior synthesis. In ICML, 2022a.
  43. Planning with diffusion for flexible behavior synthesis. In ICML, 2022b.
  44. Chain-of-thought predictive control. arXiv preprint arXiv:2304.00776, 2023.
  45. Efficient planning in a compact latent action space. In ICLR, 2023.
  46. Learning visually guided latent actions for assistive teleoperation. In L4DC, 2021.
  47. Denoising diffusion restoration models. In NeurIPS, 2022.
  48. Morel: Model-based offline reinforcement learning. In NeurIPS, 2020a.
  49. Morel: Model-based offline reinforcement learning. In NeurIPS, 2020b.
  50. Auto-encoding variational bayes. Stat, 1050:10, 2014.
  51. CompILE: Compositional imitation learning and execution. In ICML, 2019.
  52. Offline reinforcement learning with fisher divergence critic regularization. In ICML, 2021.
  53. Offline reinforcement learning with implicit q-learning. In ICLR, 2022.
  54. Hierarchical imitation learning with vector quantized models. In ICML, 2023.
  55. Conservative q-learning for offline reinforcement learning. In NeurIPS, 2020a.
  56. Conservative q-learning for offline reinforcement learning. In NeurIPS, 2020b.
  57. Hindsight planner. In AAMAS, 2020.
  58. Learning dynamic manipulation skills from haptic-play. arXiv preprint arXiv:2207.14007, 2022.
  59. Sergey Levine. Reinforcement learning and control as probabilistic inference: Tutorial and review. arXiv preprint arXiv:1805.00909, 2018.
  60. Offline reinforcement learning: Tutorial, review, and perspectives on open problems. arXiv preprint arXiv:2005.01643, 2020.
  61. Hierarchical planning through goal-conditioned offline reinforcement learning. arXiv preprint arXiv:2205.11790, 2022.
  62. Toward minimax off-policy value estimation. In AISTATS, 2015.
  63. Hierarchical diffusion for offline decision making. In ICML, 2023.
  64. Adaptdiffuser: Diffusion models as adaptive self-evolving planners. In ICML, 2023.
  65. Gochat: Goal-oriented chatbots with hierarchical reinforcement learning. In SIGIR, 2020.
  66. A kernelized stein discrepancy for goodness-of-fit tests. In ICML, 2016.
  67. Safe offline reinforcement learning through hierarchical policies. In PAKDD, 2022.
  68. Learning latent actions to control assistive robots. Autonomous robots, 46(1):115–147, 2022.
  69. Dpm-solver: A fast ode solver for diffusion probabilistic model sampling in around 10 steps. In NeurIPS, 2022a.
  70. Dpm-solver++: Fast solver for guided sampling of diffusion probabilistic models. arXiv preprint arXiv:2211.01095, 2022b.
  71. Contrastive energy prediction for exact energy-guided diffusion sampling in offline reinforcement learning. In ICML, 2023a.
  72. Contrastive energy prediction for exact energy-guided diffusion sampling in offline reinforcement learning. In ICML, 2023b.
  73. Revisiting design choices in offline model based reinforcement learning. In ICLR, 2022c.
  74. Reset-free lifelong learning with skill-space planning. In ICLR, 2021.
  75. Learning latent plans from play. In CoRL, 2020.
  76. Offline goal-conditioned reinforcement learning via f𝑓fitalic_f-advantage regression. In NeurIPS, 2022.
  77. A hierarchical reinforcement learning based optimization framework for large-scale dynamic pickup and delivery problems. In NeurIPS, 2021.
  78. Iris: Implicit reinforcement without interaction at scale for learning control from offline robot manipulation data. In ICRA, 2020.
  79. Neural probabilistic motor primitives for humanoid control. In International Conference on Learning Representations, 2019.
  80. Diganta Misra. Mish: A self regularized non-monotonic neural activation function. arXiv preprint arXiv:1908.08681, 2019.
  81. Awac: Accelerating online reinforcement learning with offline datasets. arXiv preprint arXiv:2006.09359, 2020.
  82. Augmenting reinforcement learning with behavior primitives for diverse manipulation tasks. In ICRA, 2022.
  83. Metadiffuser: Diffusion model as conditional planner for offline meta-rl. In ICML, 2023.
  84. Improved denoising diffusion probabilistic models. In ICML, 2021.
  85. Vector quantized models for planning. In ICML, 2021.
  86. Hierarchical reinforcement learning with integrated discovery of salient subgoals. In AAMAS, 2020.
  87. Hierarchical reinforcement learning: A comprehensive survey. ACM Computing Surveys, 54(5):1–35, 2021.
  88. Curiosity-driven exploration by self-supervised prediction. In ICML, 2017.
  89. Learning from trajectories via subgoal discovery. In NeurIPS, 2019.
  90. Imitating human behaviour with diffusion models. In ICLR, 2023.
  91. Advantage-weighted regression: Simple and scalable off-policy reinforcement learning. arXiv preprint arXiv:1910.00177, 2019.
  92. Ase: Large-scale reusable adversarial skill embeddings for physically simulated characters. ACM Transactions On Graphics (TOG), 41(4):1–17, 2022.
  93. Demonstration-guided reinforcement learning with learned skills. In CoRL, 2021.
  94. Relative entropy policy search. In AAAI, 2010.
  95. Dreamfusion: Text-to-3d using 2d diffusion. In ICLR, 2023.
  96. Language models are unsupervised multitask learners, 2019.
  97. Learning transferable motor skills with hierarchical latent mixture policies. In ICLR, 2022.
  98. Nearly horizon-free offline reinforcement learning. In NeurIPS, 2021.
  99. Stochastic backpropagation and approximate inference in deep generative models. In ICML, 2014.
  100. High-resolution image synthesis with latent diffusion models. In CVPR, 2022.
  101. Latent plans for task-agnostic offline reinforcement learning. In CoRL, 2022.
  102. Mo2: Model-based offline options. In CoLLAs, 2022.
  103. Universal value function approximators. In ICML, 2015.
  104. Juergen Schmidhuber. Reinforcement learning upside down: Don’t predict rewards–just map them to actions. arXiv preprint arXiv:1912.02875, 2019.
  105. Mastering atari, go, chess and shogi by planning with a learned model. Nature, 588(7839):604–609, 2020.
  106. Deep unsupervised learning using nonequilibrium thermodynamics. In ICML, 2015.
  107. Yang Song. Generative modeling by estimating gradients of the data distribution. yang-song.net, May 2021. URL https://yang-song.net/blog/2021/score/.
  108. Score-based generative modeling through stochastic differential equations. In ICLR, 2021.
  109. Consistency models. In ICML, 2023.
  110. Learning options in reinforcement learning. In Abstraction, Reformulation, and Approximation: 5th International Symposium, SARA 2002 Kananaskis, Alberta, Canada August 2–4, 2002 Proceedings 5, pages 212–223. Springer, 2002.
  111. Fighting uncertainty with gradients: Offline reinforcement learning via diffusion score matching. arXiv preprint arXiv:2306.14079, 2023.
  112. Richard S Sutton. Integrated architectures for learning, planning, and reacting based on approximating dynamic programming. In ICML, 1990.
  113. Reinforcement learning: An introduction. MIT press, 2018.
  114. Between mdps and semi-mdps: A framework for temporal abstraction in reinforcement learning. Artificial intelligence, 112(1-2):181–211, 1999.
  115. Score-based generative modeling in latent space. In NeurIPS, 2021.
  116. Deep reinforcement learning and the deadly triad. arXiv preprint arXiv:1812.02648, 2018.
  117. Attention is all you need. In NeurIPS, 2017.
  118. Siddarth Venkatraman. Latent skill models for offline reinforcement learning. Master’s thesis, Carnegie Mellon University Pittsburgh, PA, 2023.
  119. Bayesian nonparametrics for offline skill discovery. In ICML, 2022.
  120. Offline reinforcement learning with reverse model-based imagination. In NeurIPS, 2021.
  121. Learning search space partition for black-box optimization using monte carlo tree search. In NeurIPS, 2020a.
  122. Diffusion policies as an expressive policy class for offline reinforcement learning. In ICLR, 2023.
  123. Critic regularized regression. In NeurIPS, 2020b.
  124. Behavior regularized offline reinforcement learning. arXiv preprint arXiv:1911.11361, 2019.
  125. Elastic decision transformer. arXiv preprint arXiv:2307.02484, 2023.
  126. Group normalization. In ECCV, 2018.
  127. Data-efficient hindsight off-policy option learning. In ICML, 2021.
  128. Hierarchical reinforcement learning for integrated recommendation. In AAAI, 2021.
  129. Learning space partitions for path planning. In NeurIPS, 2021.
  130. Flow to control: Offline reinforcement learning with lossless primitive discovery. In AAAI, 2023.
  131. Egsde: Unpaired image-to-image translation via energy-guided stochastic differential equations. In NeurIPS, 2022.
  132. Online decision transformer. In ICML, 2022.
  133. Hierarchical reinforcement learning for pedagogical policy induction. In IJCAI, 2019.
  134. Plas: Latent action space for offline reinforcement learning. In CoRL, 2021.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (1)
  1. Wenhao Li (136 papers)
Citations (2)

Summary

We haven't generated a summary for this paper yet.