Papers
Topics
Authors
Recent
Assistant
AI Research Assistant
Well-researched responses based on relevant abstracts and paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses.
Gemini 2.5 Flash
Gemini 2.5 Flash 165 tok/s
Gemini 2.5 Pro 50 tok/s Pro
GPT-5 Medium 41 tok/s Pro
GPT-5 High 33 tok/s Pro
GPT-4o 124 tok/s Pro
Kimi K2 193 tok/s Pro
GPT OSS 120B 443 tok/s Pro
Claude Sonnet 4.5 36 tok/s Pro
2000 character limit reached

Deep Generative Models for Decision-Making and Control (2306.08810v2)

Published 15 Jun 2023 in cs.LG and cs.AI

Abstract: Deep model-based reinforcement learning methods offer a conceptually simple approach to the decision-making and control problem: use learning for the purpose of estimating an approximate dynamics model, and offload the rest of the work to classical trajectory optimization. However, this combination has a number of empirical shortcomings, limiting the usefulness of model-based methods in practice. The dual purpose of this thesis is to study the reasons for these shortcomings and to propose solutions for the uncovered problems. Along the way, we highlight how inference techniques from the contemporary generative modeling toolbox, including beam search, classifier-guided sampling, and image inpainting, can be reinterpreted as viable planning strategies for reinforcement learning problems.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (137)
  1. Differentiable mpc for end-to-end planning and control. In Advances in Neural Information Processing Systems, 2018.
  2. On the model-based stochastic value gradient for continuous reinforcement learning. In Conference on Learning for Dynamics and Control, 2021.
  3. Hindsight experience replay. In Advances in Neural Information Processing Systems. 2017.
  4. Model-based offline planning. In International Conference on Learning Representations, 2021.
  5. Combating the compounding-error problem with a multi-step model. arXiv preprint arXiv:1905.13320, 2019.
  6. Structured denoising diffusion models in discrete state-spaces. In Advances in Neural Information Processing Systems, 2021.
  7. Bakker, B. Reinforcement learning with long short-term memory. In Neural Information Processing Systems, 2002.
  8. Structured agents for physical construction. In International Conference on Machine Learning, 2019.
  9. Successor features for transfer in reinforcement learning. In Advances in Neural Information Processing Systems 30. 2017.
  10. Transfer in deep reinforcement learning using successor features and generalised policy improvement. In Proceedings of the International Conference on Machine Learning, 2018.
  11. Bellman, R. Dynamic Programming. Dover Publications, 1957.
  12. The cross-entropy method for optimization. In Handbook of Statistics, volume 31, chapter 3. 2013.
  13. JAX: composable transformations of Python+NumPy programs, 2018. URL http://github.com/google/jax.
  14. Openai gym. arXiv preprint arXiv:1606.01540, 2016.
  15. Language models are few-shot learners. In Advances in Neural Information Processing Systems, 2020.
  16. Sample-efficient reinforcement learning with stochastic ensemble value expansion. In Advances in Neural Information Processing Systems, 2018a.
  17. Sample-efficient reinforcement learning with stochastic ensemble value expansion. In Advances in Neural Information Processing Systems, 2018b.
  18. TransDreamer: Reinforcement learning with transformer world models, 2021a.
  19. Decision transformer: Reinforcement learning via sequence modeling. In Advances in Neural Information Processing Systems, 2021b.
  20. Wavegrad: Estimating gradients for waveform generation. In International Conference on Learning Representations, 2021c.
  21. Recurrent environment simulators. In International Conference on Learning Representations, 2017.
  22. Deep reinforcement learning in a handful of trials using probabilistic dynamics models. In Advances in Neural Information Processing Systems. 2018.
  23. Self-consistent trajectory autoencoder: Hierarchical reinforcement learning with trajectory embeddings. In International Conference on Machine Learning, 2018.
  24. Offline reinforcement learning with pseudometric learning. arXiv preprint arXiv:2103.01948, 2021.
  25. Dayan, P. Improving generalization for temporal difference learning: The successor representation. Neural Computation, 5:613, 1993.
  26. PILCO: A model-based and data-efficient approach to policy search. In International Conference on Machine Learning, 2011.
  27. Diffusion models beat GANs on image synthesis. In Advances in Neural Information Processing Systems, 2021.
  28. Efficient Numerical Methods for Nonlinear MPC and Moving Horizon Estimation. 2009.
  29. Dieleman, S. Diffusion models are autoencoders, 2022. URL https://benanne.github.io/2022/01/31/diffusion.html.
  30. On the expressivity of neural networks for deep reinforcement learning. In International Conference on Machine Learning, 2020.
  31. Model-based reinforcement learning for semi-markov decision processes with neural odes. In Advances in Neural Information Processing Systems, 2020a.
  32. Implicit generation and generalization in energy-based models. In Advances in Neural Information Processing Systems, 2019.
  33. Model based planning with energy based models. In Conference on Robot Learning, 2019.
  34. Compositional visual generation with energy based models. In Advances in Neural Information Processing Systems, 2020b.
  35. Neural spline flows. In Advances in Neural Information Processing Systems. 2019.
  36. Mismatched no more: Joint model-policy optimization for model-based rl. arXiv preprint arXiv:2110.02758, 2021.
  37. Fairbank, M. Reinforcement learning by value gradients. arXiv preprint arXiv:0803.3539, 2008.
  38. Value-aware loss function for model-based reinforcement learning. In International Conference on Artificial Intelligence and Statistics, 2017.
  39. Learning of closed-loop motion control. In International Conference on Intelligent Robots and Systems, 2014.
  40. Model-based value estimation for efficient model-free reinforcement learning. In International Conference on Machine Learning, 2018.
  41. Structure in the space of value functions. Machine Learning, 49:325, 2002.
  42. D4RL: Datasets for deep data-driven reinforcement learning. arXiv preprint arXiv:2004.07219, 2020.
  43. Off-policy deep reinforcement learning without exploration. In International Conference on Machine Learning, 2019.
  44. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In International Conference on Automated Planning and Scheduling, 2020.
  45. Gershman, S. J. The successor representation: Its computational logic and neural substrates. Journal of Neuroscience, 2018.
  46. EMaQ: Expected-max Q-learning operator for simple yet effective offline and online rl. arXiv preprint arXiv:2007.11091, 2020.
  47. Learning to reach goals via iterated supervised learning. In International Conference on Learning Representations, 2021.
  48. Generative adversarial nets. In Advances in Neural Information Processing Systems, 2014a.
  49. Explaining and harnessing adversarial examples. arXiv preprint arXiv:1412.6572, 2014b.
  50. Learning the stein discrepancy for training and evaluating energy-based models without sampling. In International Conference on Machine Learning, 2020.
  51. The value equivalence principle for model-based reinforcement learning. In Neural Information Processing Systems, 2020.
  52. Insertion-based Decoding with Automatically Inferred Generation Order. Transactions of the Association for Computational Linguistics, 2019.
  53. Recurrent world models facilitate policy evolution. In Advances in Neural Information Processing Systems, 2018.
  54. Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor. In International Conference on Machine Learning, 2018.
  55. Learning latent dynamics for planning from pixels. In International Conference on Machine Learning, 2021a.
  56. Mastering atari with discrete world models. In International Conference on Learning Representations, 2021b.
  57. Fast task inference with variational intrinsic successor features. In International Conference on Learning Representations, 2020.
  58. Array programming with NumPy. Nature, 585(7825):357–362, 2020.
  59. Flax: A neural network library and ecosystem for JAX, 2023. URL http://github.com/google/flax.
  60. Memory-based control with recurrent neural networks. In Neural Information Processing Systems Deep Reinforcement Learning Workshop, 2015a.
  61. Learning continuous control policies by stochastic value gradients. In Advances in Neural Information Processing Systems, 2015b.
  62. Denoising diffusion probabilistic models. In Advances in Neural Information Processing Systems, 2020.
  63. Long short-term memory. Neural computation, 9(8):1735–1780, 1997.
  64. Hyvärinen, A. Estimation of non-normalized statistical models by score matching. Journal of Machine Learning Research, 2005.
  65. When to trust your model: Model-based policy optimization. In Advances in Neural Information Processing Systems, 2019.
  66. γ𝛾\gammaitalic_γ-models: Generative temporal difference learning for infinite-horizon prediction. In Advances in Neural Information Processing Systems, 2020.
  67. Offline reinforcement learning as one big sequence modeling problem. In Advances in Neural Information Processing Systems, 2021.
  68. Language as an abstraction for hierarchical deep reinforcement learning. In Advances in Neural Information Processing Systems, 2019.
  69. Is pessimism provably efficient for offline RL? In International Conference on Machine Learning, 2021.
  70. Forward models: Supervised learning with a distal teacher. Cognitive Science, 16:307, 1992.
  71. Kaelbling, L. P. Learning to achieve goals. In Proceedings of the International Joint Conference on Artificial Intelligence, 1993.
  72. Hierarchical task and motion planning in the now. In 2011 IEEE International Conference on Robotics and Automation, 2011.
  73. Self-supervised deep reinforcement learning with generalized computation graphs for robot navigation. In International Conference on Robotics and Automation, 2018.
  74. Model based reinforcement learning for atari. In International Conference on Learning Representations, 2020.
  75. Uncertainty-driven imagination for continuous deep reinforcement learning. In Conference on Robot Learning, 2017.
  76. Scaling laws for neural language models. arXiv preprint arXiv:2001.08361, 2020.
  77. Karpathy, A. minGPT: A minimal pytorch re-implementation of the openai gpt training, 2020. URL https://github.com/karpathy/minGPT.
  78. Modeling the long term future in model-based reinforcement learning. In International Conference on Learning Representations, 2018.
  79. Kelly, M. An introduction to trajectory optimization: How to do your own direct collocation. SIAM Review, 59(4):849–904, 2017.
  80. MOReL: Model-based offline reinforcement learning. In Advances in Neural Information Processing Systems, 2020.
  81. Adam: A method for stochastic optimization. In International Conference on Learning Representations, 2015.
  82. Bandit based monte-carlo planning. In European Conference on Machine Learning, 2006.
  83. Offline reinforcement learning with implicit Q-learning. In International Conference on Learning Representations, 2022.
  84. Imagenet classification with deep convolutional neural networks. In Neural Information Processing Systems. 2012.
  85. Deep successor reinforcement learning, 2016.
  86. Stabilizing off-policy Q-learning via bootstrapping error reduction. In Advances in Neural Information Processing Systems, 2019a.
  87. Reward-conditioned policies. arXiv preprint arXiv:1912.13465, 2019b.
  88. Conservative Q-learning for offline reinforcement learning. In Advances in Neural Information Processing Systems, 2020a.
  89. DR3: Value-based deep reinforcement learning requires explicit regularization. In International Conference on Learning Representations, 2022.
  90. Adaptive transformers in RL. arXiv preprint arXiv:2004.03761, 2020b.
  91. Model-ensemble trust-region policy optimization. In International Conference on Learning Representations, 2018.
  92. Learning accurate long-term dynamics for model-based reinforcement learning. arXiv preprint arXiv:2012.09156, 2020.
  93. Approximate model-assisted neural fitted Q-iteration. In International Joint Conference on Neural Networks, 2014.
  94. A tutorial on energy-based learning. In Predicting Structured Data. MIT Press, 2006.
  95. Levine, S. Reinforcement learning and control as probabilistic inference: Tutorial and review. arXiv preprint arXiv:1805.00909, 2018.
  96. Guided policy search. In International Conference on Machine Learning, 2013.
  97. 3d neural scene representations for visuomotor control. In Conference on Robot Learning, 2021.
  98. Algorithmic framework for model-based deep reinforcement learning with theoretical guarantees. In International Conference on Learning Representations, 2019.
  99. Universal successor representations for transfer reinforcement learning. arXiv preprint arXiv:1804.03758, 2018.
  100. Calibrated model-based deep reinforcement learning. In International Conference on Machine Learning, 2019.
  101. Least squares generative adversarial networks. arXiv preprint arXiv:1611.04076, 2016.
  102. If beam search is the answer, what was the question? In Empirical Methods in Natural Language Processing, 2020.
  103. Misra, D. Mish: A self regularized non-monotonic neural activation function. In British Machine Vision Conference, 2019.
  104. Human-level control through deep reinforcement learning. Nature, 2015.
  105. The successor representation in human reinforcement learning. Nature Human Behaviour, 1(9):680–692, 2017.
  106. Moore, A. W. Efficient Memory-based Learning for Robot Control. PhD thesis, University of Cambridge, 1990.
  107. Neural network dynamics for model-based deep reinforcement learning with model-free fine-tuning. In International Conference on Robotics and Automation, 2018.
  108. Deep Dynamics Models for Learning Dexterous Manipulation. In Conference on Robot Learning, 2019.
  109. Accelerating online reinforcement learning with offline datasets. arXiv preprint arXiv:2006.09359, 2020.
  110. Neural networks for self-learning control systems. IEEE Control Systems Magazine, 1990.
  111. Improved denoising diffusion probabilistic models. In International Conference on Machine Learning, 2021.
  112. Learning non-convergent non-persistent short-run MCMC toward energy-based model. In Advances in Neural Information Processing Systems, 2019.
  113. f-gan: Training generative neural samplers using variational divergence minimization. In Advances in Neural Information Processing Systems. 2016.
  114. Control of memory, active perception, and action in Minecraft. In International Conference on Machine Learning, 2016.
  115. Value prediction network. In Advances in Neural Information Processing Systems, 2017.
  116. OpenAI. mujoco-py, 2016. URL https://github.com/openai/mujoco-py.
  117. Vector quantized models for planning. In International Conference on Machine Learning, 2021.
  118. Efficient transformers in reinforcement learning using actor-learner distillation. In International Conference on Learning Representations, 2021.
  119. Stabilizing transformers for reinforcement learning. In International Conference on Machine Learning, 2020.
  120. Planning from pixels using inverse dynamics models. In International Conference on Learning Representations, 2021.
  121. Pytorch: An imperative style, high-performance deep learning library. In Advances in Neural Information Processing Systems. 2019.
  122. DeepLoco: Dynamic locomotion skills using hierarchical deep reinforcement learning. ACM Transactions on Graphics, 2017.
  123. Temporal difference models: Model-free deep RL for model-based control. In International Conference on Learning Representations, 2018.
  124. A direct method for trajectory optimization of rigid bodies through contact. The International Journal of Robotics Research, 2014.
  125. Improving language understanding by generative pre-training. 2018.
  126. Hindsight policy gradients. In International Conference on Learning Representations, 2019.
  127. Reddy, R. Speech understanding systems: Summary of results of the five-year research effort at Carnegie Mellon University, 1977.
  128. Variational inference with normalizing flows. In Proceedings of Machine Learning Research, 2015.
  129. Deep imitative models for flexible inference, planning, and control. In International Conference on Learning Representations, 2020.
  130. Rogozhnikov, A. Einops: Clear and reliable tensor manipulations with einstein-like notation. In International Conference on Learning Representations, 2022.
  131. Efficient reductions for imitation learning. In International Conference on Artificial Intelligence and Statistics, 2010.
  132. A reduction of imitation learning and structured prediction to no-regret online learning. In International Conference on Artificial Intelligence and Statistics, 2011.
  133. Model-based reinforcement learning via latent-space collocation. In International Conference on Machine Learning, pp.  9190–9201. PMLR, 2021.
  134. Graph networks as learnable physics engines for inference and control. In International Conference on Machine Learning, 2018.
  135. Universal value function approximators. In Proceedings of the International Conference on Machine Learning, 2015.
  136. Schmidhuber, J. Reinforcement learning upside down: Don’t predict rewards–just map them to actions. arXiv preprint arXiv:1912.02875, 2019.
  137. Mastering atari, go, chess and shogi by planning with a learned model. arXiv preprint arXiv:1911.08265, 2019.
Citations (1)

Summary

We haven't generated a summary for this paper yet.

Dice Question Streamline Icon: https://streamlinehq.com

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Lightbulb Streamline Icon: https://streamlinehq.com

Continue Learning

We haven't generated follow-up questions for this paper yet.

Authors (1)

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.