Papers
Topics
Authors
Recent
Search
2000 character limit reached

Deep Gaussian Covariance Network with Trajectory Sampling for Data-Efficient Policy Search

Published 23 Mar 2024 in cs.LG and stat.ML | (2403.15908v1)

Abstract: Probabilistic world models increase data efficiency of model-based reinforcement learning (MBRL) by guiding the policy with their epistemic uncertainty to improve exploration and acquire new samples. Moreover, the uncertainty-aware learning procedures in probabilistic approaches lead to robust policies that are less sensitive to noisy observations compared to uncertainty unaware solutions. We propose to combine trajectory sampling and deep Gaussian covariance network (DGCN) for a data-efficient solution to MBRL problems in an optimal control setting. We compare trajectory sampling with density-based approximation for uncertainty propagation using three different probabilistic world models; Gaussian processes, Bayesian neural networks, and DGCNs. We provide empirical evidence using four different well-known test environments, that our method improves the sample-efficiency over other combinations of uncertainty propagation methods and probabilistic models. During our tests, we place particular emphasis on the robustness of the learned policies with respect to noisy initial states.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (25)
  1. “Recurrent World Models Facilitate Policy Evolution” In Advances in Neural Information Processing Systems 31 Curran Associates, Inc., 2018, pp. 2451–2463 URL: https://papers.nips.cc/paper/7512-recurrent-world-models-facilitate-policy-evolution
  2. “Proximal policy optimization algorithms” In arXiv preprint arXiv:1707.06347, 2017
  3. “Model-based reinforcement learning for atari” In arXiv preprint arXiv:1903.00374, 2019
  4. Marc Peter Deisenroth “Efficient reinforcement learning using Gaussian processes” KIT Scientific Publishing, 2010
  5. Marc Peter Deisenroth and Carl Edward Rasmussen “PILCO: A Model-Based and Data-Efficient Approach to Policy Search” In Proceedings of the 28th International Conference on International Conference on Machine Learning, ICML’11 Bellevue, Washington, USA: Omnipress, 2011, pp. 465–472
  6. Marc Peter Deisenroth, Dieter Fox and Carl Edward Rasmussen “Gaussian processes for data-efficient learning in robotics and control” In IEEE transactions on pattern analysis and machine intelligence 37.2 IEEE, 2015, pp. 408–423
  7. Yarin Gal, Rowan McAllister and Carl E. Rasmussen “Improving PILCO with Bayesian Neural Network Dynamics Models” In Data-Efficient Machine Learning workshop, ICML, 2016
  8. “Sample-efficient reinforcement learning using deep Gaussian processes”, 2020 arXiv:2011.01226 [stat.ML]
  9. “Deep Reinforcement Learning in a Handful of Trials Using Probabilistic Dynamics Models” In Proceedings of the 32nd International Conference on Neural Information Processing Systems, NIPS’18 Montréal, Canada: Curran Associates Inc., 2018, pp. 4759–4770
  10. Christopher K Williams and Carl Edward Rasmussen “Gaussian processes for machine learning” MIT Press Cambridge, MA, 2006
  11. R. Neal “Bayesian learning for neural networks”, 1995
  12. Marc Peter Deisenroth, Gerhard Neumann and Jan Peters “A survey on policy search for robotics” In Foundations and trends in Robotics 2.1-2 now publishers, 2013, pp. 388–403
  13. Thomas M Moerland, Joost Broekens and Catholijn M Jonker “Model-based reinforcement learning: A survey” In arXiv preprint arXiv:2006.16712, 2020
  14. Sanket Kamthe and Marc P. Deisenroth “Data-Efficient Reinforcement Learning with Probabilistic Model Predictive Control” In Proceedings of the International Conference on Artificial Intelligence and Statistics (AISTATS), 2018 URL: http://proceedings.mlr.press/v84/kamthe18a/kamthe18a.pdf
  15. M.L. Stein “Interpolation of Spatial Data: Some Theory for Kriging”, 1999
  16. Kevin Cremanns “Probabilistic machine learning for pattern recognition and design exploration” RWTH Aachen University, 2021 DOI: 10.18154/RWTH-2021-01905
  17. Balaji Lakshminarayanan, Alexander Pritzel and Charles Blundell “Simple and Scalable Predictive Uncertainty Estimation Using Deep Ensembles” In Proceedings of the 31st International Conference on Neural Information Processing Systems, NIPS’17 Long Beach, California, USA: Curran Associates Inc., 2017, pp. 6405–6416
  18. “GP-BayesFilters: Bayesian filtering using Gaussian process prediction and observation models” In Auton. Robots 27.1, 2009, pp. 75–90 DOI: 10.1007/s10514-009-9119-x
  19. “Openai gym” In arXiv preprint arXiv:1606.01540, 2016
  20. Emanuel Todorov, Tom Erez and Yuval Tassa “MuJoCo: A physics engine for model-based control” In 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, 2012, pp. 5026–5033 DOI: 10.1109/IROS.2012.6386109
  21. “PyBullet, a Python module for physics simulation for games, robotics and machine learning”
  22. Benjamin Ellenberger “PyBullet Gymperium”
  23. Diederik P Kingma and Jimmy Ba “Adam: A method for stochastic optimization” In arXiv preprint arXiv:1412.6980, 2014
  24. “GPflow: A Gaussian process library using TensorFlow” In Journal of Machine Learning Research 18.40, 2017, pp. 1–6 URL: http://jmlr.org/papers/v18/16-537.html
  25. Michalis Titsias “Variational learning of inducing variables in sparse Gaussian processes” In Artificial intelligence and statistics, 2009, pp. 567–574 PMLR
Citations (1)

Summary

  • The paper presents a novel integration of deep Gaussian covariance networks and trajectory sampling that significantly enhances sample efficiency in model-based reinforcement learning.
  • It demonstrates that the proposed method outperforms traditional Gaussian process and ensemble approaches, especially under noisy initial conditions.
  • Experimental results on benchmarks like IPSU and CMC confirm the method's competitiveness and potential for robust, data-efficient policy search.

Exploring Sample Efficiency and Robust Policy Learning in MBRL via Deep Gaussian Covariance Networks and Trajectory Sampling

Introduction

The quest for data-efficient solutions in Model-Based Reinforcement Learning (MBRL) has led to the exploration of probabilistic world models. These models leverage epistemic uncertainty to guide policy improvement and exploration, aiming at developing robust policies with fewer samples. This paper introduces a methodology combining Deep Gaussian Covariance Networks (DGCN) with trajectory sampling, aiming to achieve optimal policy search efficiency in MBRL settings. The approach is evaluated against other probabilistic models under various uncertainty propagation methods, with a focus on sample efficiency and policy robustness in noisy environments.

Probabilistic Models in Focus

  • Gaussian Process Regression: Serves as a foundational probabilistic model, emphasizing the role of GP in incorporating both aleatoric and epistemic uncertainties into model predictions. The implementation utilizes widely recognized kernels, such as the squared exponential and Matérn kernels, facilitated by an extensive overview of GP’s theoretical underpinnings.
  • Deep Gaussian Covariance Network (DGCN): Builds on the GP framework by employing neural networks to predict kernel parameters dynamically, thus accommodating non-stationary data characteristics. The model efficiently handles heteroscedastic uncertainty, marking a significant advancement over traditional GP models.
  • Probabilistic Neural Networks (PNN): Represented in this study by ensemble-based PNN models (E-PNN), these capture predictive distributions and emphasize the amalgamation of aleatoric and epistemic uncertainties, showcasing a competitive alternative to DGCN in MBRL applications.

Uncertainty Propagation Methods

Two principal methods for propagating uncertainty were assessed:

  • Density-based Uncertainty Propagation: Utilizes approximations (e.g., moment matching) to propagate uncertainty through models, a method traditionally associated with GP models and requiring Gaussian distribution assumptions.
  • Trajectory Sampling: Offers a more flexible and potentially more accurate alternative for non-linear and complex dynamics, eschewing the need for Gaussian approximations by directly sampling trajectories according to the probabilistic model's output.

Deep Gaussian Covariance Network with Trajectory Sampling (DGCNTS)

A novel framework termed DGCNTS was proposed, integrating DGCN with trajectory sampling within an MBRL setup. Unlike traditional methods that might rely on ensemble models or restrictive uncertainty propagation techniques, DGCNTS leverages the best of both worlds: the robust uncertainty modeling capabilities of DGCN and the flexibility of trajectory sampling. This combination seeks to enhance sample efficiency and policy robustness, particularly under noisy initial states.

Experimental Insights

Empirical tests were conducted across four benchmarks: Inverted Pendulum Swing Up (IPSU), Continuous Mountain Car (CMC), Inverted Double Pendulum (IDP), and Pendulum (P). Key findings include:

  • Sample Efficiency: DGCNTS exhibited pronounced improvements in sample efficiency across most tested environments when compared to ensemble-based PNNs and GP models utilizing density-based uncertainty propagation.
  • Robustness to Initial State Variability: Policies derived using DGCNTS demonstrated superior robustness against noisy initial conditions, underscoring the method's practicality in real-world applications where such conditions are prevalent.
  • Comparison with Existing Methods: The performance of DGCNTS was consistently competitive or superior to existing probabilistic models combined with alternative uncertainty propagation methods, highlighting its effectiveness in navigating MBRL problems.

Future Directions

Given the promising results of DGCNTS, several avenues for future research emerge:

  • Scalability and Complexity: Extending the DGCNTS framework to handle more complex tasks and higher-dimension spaces remains an open challenge.
  • Exploration Mechanisms: Integrating exploration strategies that leverage epistemic uncertainty could further improve the data efficiency of DGCNTS.
  • Model Predictive Control (MPC): The applicability of DGCNTS in MPC settings warrants exploration, potentially broadening its utility in MBRL.

Conclusion

The introduction of DGCNTS represents a significant stride toward resolving the sample efficiency challenges in MBRL. By marrying DGCN's robust uncertainty handling with the flexible trajectory sampling approach, this method sets a new precedent for developing efficient and robust policies under conditions of uncertainty. As MBRL continues to evolve, techniques like DGCNTS will undoubtedly play a pivotal role in enhancing the practicality and applicability of MBRL solutions across a wide array of tasks and environments.

Paper to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Collections

Sign up for free to add this paper to one or more collections.

Tweets

Sign up for free to view the 2 tweets with 6 likes about this paper.