Deep Gaussian Covariance Network with Trajectory Sampling for Data-Efficient Policy Search
Abstract: Probabilistic world models increase data efficiency of model-based reinforcement learning (MBRL) by guiding the policy with their epistemic uncertainty to improve exploration and acquire new samples. Moreover, the uncertainty-aware learning procedures in probabilistic approaches lead to robust policies that are less sensitive to noisy observations compared to uncertainty unaware solutions. We propose to combine trajectory sampling and deep Gaussian covariance network (DGCN) for a data-efficient solution to MBRL problems in an optimal control setting. We compare trajectory sampling with density-based approximation for uncertainty propagation using three different probabilistic world models; Gaussian processes, Bayesian neural networks, and DGCNs. We provide empirical evidence using four different well-known test environments, that our method improves the sample-efficiency over other combinations of uncertainty propagation methods and probabilistic models. During our tests, we place particular emphasis on the robustness of the learned policies with respect to noisy initial states.
- “Recurrent World Models Facilitate Policy Evolution” In Advances in Neural Information Processing Systems 31 Curran Associates, Inc., 2018, pp. 2451–2463 URL: https://papers.nips.cc/paper/7512-recurrent-world-models-facilitate-policy-evolution
- “Proximal policy optimization algorithms” In arXiv preprint arXiv:1707.06347, 2017
- “Model-based reinforcement learning for atari” In arXiv preprint arXiv:1903.00374, 2019
- Marc Peter Deisenroth “Efficient reinforcement learning using Gaussian processes” KIT Scientific Publishing, 2010
- Marc Peter Deisenroth and Carl Edward Rasmussen “PILCO: A Model-Based and Data-Efficient Approach to Policy Search” In Proceedings of the 28th International Conference on International Conference on Machine Learning, ICML’11 Bellevue, Washington, USA: Omnipress, 2011, pp. 465–472
- Marc Peter Deisenroth, Dieter Fox and Carl Edward Rasmussen “Gaussian processes for data-efficient learning in robotics and control” In IEEE transactions on pattern analysis and machine intelligence 37.2 IEEE, 2015, pp. 408–423
- Yarin Gal, Rowan McAllister and Carl E. Rasmussen “Improving PILCO with Bayesian Neural Network Dynamics Models” In Data-Efficient Machine Learning workshop, ICML, 2016
- “Sample-efficient reinforcement learning using deep Gaussian processes”, 2020 arXiv:2011.01226 [stat.ML]
- “Deep Reinforcement Learning in a Handful of Trials Using Probabilistic Dynamics Models” In Proceedings of the 32nd International Conference on Neural Information Processing Systems, NIPS’18 Montréal, Canada: Curran Associates Inc., 2018, pp. 4759–4770
- Christopher K Williams and Carl Edward Rasmussen “Gaussian processes for machine learning” MIT Press Cambridge, MA, 2006
- R. Neal “Bayesian learning for neural networks”, 1995
- Marc Peter Deisenroth, Gerhard Neumann and Jan Peters “A survey on policy search for robotics” In Foundations and trends in Robotics 2.1-2 now publishers, 2013, pp. 388–403
- Thomas M Moerland, Joost Broekens and Catholijn M Jonker “Model-based reinforcement learning: A survey” In arXiv preprint arXiv:2006.16712, 2020
- Sanket Kamthe and Marc P. Deisenroth “Data-Efficient Reinforcement Learning with Probabilistic Model Predictive Control” In Proceedings of the International Conference on Artificial Intelligence and Statistics (AISTATS), 2018 URL: http://proceedings.mlr.press/v84/kamthe18a/kamthe18a.pdf
- M.L. Stein “Interpolation of Spatial Data: Some Theory for Kriging”, 1999
- Kevin Cremanns “Probabilistic machine learning for pattern recognition and design exploration” RWTH Aachen University, 2021 DOI: 10.18154/RWTH-2021-01905
- Balaji Lakshminarayanan, Alexander Pritzel and Charles Blundell “Simple and Scalable Predictive Uncertainty Estimation Using Deep Ensembles” In Proceedings of the 31st International Conference on Neural Information Processing Systems, NIPS’17 Long Beach, California, USA: Curran Associates Inc., 2017, pp. 6405–6416
- “GP-BayesFilters: Bayesian filtering using Gaussian process prediction and observation models” In Auton. Robots 27.1, 2009, pp. 75–90 DOI: 10.1007/s10514-009-9119-x
- “Openai gym” In arXiv preprint arXiv:1606.01540, 2016
- Emanuel Todorov, Tom Erez and Yuval Tassa “MuJoCo: A physics engine for model-based control” In 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, 2012, pp. 5026–5033 DOI: 10.1109/IROS.2012.6386109
- “PyBullet, a Python module for physics simulation for games, robotics and machine learning”
- Benjamin Ellenberger “PyBullet Gymperium”
- Diederik P Kingma and Jimmy Ba “Adam: A method for stochastic optimization” In arXiv preprint arXiv:1412.6980, 2014
- “GPflow: A Gaussian process library using TensorFlow” In Journal of Machine Learning Research 18.40, 2017, pp. 1–6 URL: http://jmlr.org/papers/v18/16-537.html
- Michalis Titsias “Variational learning of inducing variables in sparse Gaussian processes” In Artificial intelligence and statistics, 2009, pp. 567–574 PMLR
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.