Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
153 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Model-Based Reinforcement Learning via Stochastic Hybrid Models (2111.06211v3)

Published 11 Nov 2021 in eess.SY, cs.LG, and cs.SY

Abstract: Optimal control of general nonlinear systems is a central challenge in automation. Enabled by powerful function approximators, data-driven approaches to control have recently successfully tackled challenging applications. However, such methods often obscure the structure of dynamics and control behind black-box over-parameterized representations, thus limiting our ability to understand closed-loop behavior. This paper adopts a hybrid-system view of nonlinear modeling and control that lends an explicit hierarchical structure to the problem and breaks down complex dynamics into simpler localized units. We consider a sequence modeling paradigm that captures the temporal structure of the data and derive an expectation-maximization (EM) algorithm that automatically decomposes nonlinear dynamics into stochastic piecewise affine models with nonlinear transition boundaries. Furthermore, we show that these time-series models naturally admit a closed-loop extension that we use to extract local polynomial feedback controllers from nonlinear experts via behavioral cloning. Finally, we introduce a novel hybrid relative entropy policy search (Hb-REPS) technique that incorporates the hierarchical nature of hybrid models and optimizes a set of time-invariant piecewise feedback controllers derived from a piecewise polynomial approximation of a global state-value function.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (79)
  1. J. Kober, J. A. Bagnell, and J. Peters, “Reinforcement learning in robotics: A survey,” The International Journal of Robotics Research, 2013.
  2. V. Mnih, K. Kavukcuoglu, D. Silver, A. A. Rusu, J. Veness, M. G. Bellemare, A. Graves, M. Riedmiller, A. K. Fidjeland, G. Ostrovski et al., “Human-level control through deep reinforcement learning,” Nature, 2015.
  3. M. Deisenroth and C. E. Rasmussen, “PILCO: A model-based and data-efficient approach to policy search,” in International Conference on Machine Learning, 2011.
  4. S. Levine, C. Finn, T. Darrell, and P. Abbeel, “End-to-end training of deep visuomotor policies,” The Journal of Machine Learning Research, 2016.
  5. D. Hafner, T. Lillicrap, I. Fischer, R. Villegas, D. Ha, H. Lee, and J. Davidson, “Learning latent dynamics for planning from pixels,” in International Conference on Machine Learning, 2019.
  6. J. Schulman, S. Levine, P. Abbeel, M. Jordan, and P. Moritz, “Trust region policy optimization,” in International Conference on Machine Learning, 2015.
  7. T. P. Lillicrap, J. J. Hunt, A. Pritzel, N. Heess, T. Erez, Y. Tassa, D. Silver, and D. Wierstra, “Continuous control with deep reinforcement learning,” in International Conference on Learning Representations, 2016.
  8. T. Haarnoja, A. Zhou, P. Abbeel, and S. Levine, “Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor,” in International Conference on Machine Learning, 2018.
  9. W. M. Haddad, V. Chellaboina, and S. G. Nersesov, “Impulsive and hybrid dynamical systems,” Princeton Series in Applied Mathematics, 2006.
  10. Z. Ghahramani and G. E. Hinton, “Variational learning for switching state-space models,” Neural Computation, 2000.
  11. M. J. Beal, “Variational algorithms for approximate Bayesian inference,” Ph.D. dissertation, University College London, 2003.
  12. E. Fox, “Bayesian nonparametric learning of complex dynamical phenomena,” Ph.D. dissertation, Massachusetts Institute of Technology, 2009.
  13. S. W. Linderman, M. J. Johnson, A. C. Miller, R. P. Adams, D. M. Blei, and L. Paninski, “Bayesian learning and inference in recurrent switching linear dynamical systems,” in International Conference on Artificial Intelligence and Statistics, 2017.
  14. H. Abdulsamad and J. Peters, “Hierarchical decomposition of nonlinear dynamics and control for system identification and policy distillation,” in Learning for Dynamics and Control, 2020.
  15. F. Borrelli, A. Bemporad, M. Fodor, and D. Hrovat, “An MPC/hybrid system approach to traction control,” IEEE Transactions on Control Systems Technology, 2006.
  16. P. Menchinelli and A. Bemporad, “Hybrid model predictive control of a solar air conditioning plant,” European Journal of Control, 2008.
  17. S. Paoletti, A. L. Juloski, G. Ferrari-Trecate, and R. Vidal, “Identification of hybrid systems: A tutorial,” European Journal of Control, 2007.
  18. A. Garulli, S. Paoletti, and A. Vicino, “A survey on switched and piecewise affine system identification,” International Federation of Automatic Control, 2012.
  19. R. Vidal, S. Soatto, Y. Ma, and S. Sastry, “An algebraic geometric approach to the identification of a class of linear hybrid systems,” in IEEE International Conference on Decision and Control, 2003.
  20. A. Bemporad, J. Roll, and L. Ljung, “Identification of hybrid systems via mixed-integer programming,” in IEEE Conference on Decision and Control, 2001.
  21. A. L. Juloski, S. Weiland, and W. Heemels, “A Bayesian approach to identification of hybrid systems,” IEEE Transactions on Automatic Control, 2005.
  22. L. Bako, K. Boukharouba, and S. Lecoeuche, “An l0subscript𝑙0l_{0}italic_l start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT-l1subscript𝑙1l_{1}italic_l start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT norm based optimization procedure for the identification of switched nonlinear systems,” in IEEE Conference on Decision and Control, 2010.
  23. F. Lauer, G. Bloch, and R. Vidal, “Nonlinear hybrid system identification with kernel models,” in IEEE Conference on Decision and Control, 2010.
  24. A. Bemporad and S. Di Cairano, “Optimal control of discrete hybrid stochastic automata,” in International Workshop on Hybrid Systems: Computation and Control, 2005.
  25. E. Sontag, “Nonlinear regulation: The piecewise linear approach,” IEEE Transactions on Automatic Control, 1981.
  26. F. Zhu and P. J. Antsaklis, “Optimal control of hybrid switched systems: A brief survey,” Discrete Event Dynamic Systems, 2015.
  27. E. F. Camacho, D. R. Ramírez, D. Limón, D. M. De La Peña, and T. Alamo, “Model predictive control techniques for hybrid systems,” Annual reviews in control, 2010.
  28. A. Bemporad and M. Morari, “Control of systems integrating logic, dynamics, and constraints,” Automatica, 1999.
  29. A. Bemporad, F. Borrelli, and M. Morari, “Piecewise linear optimal controllers for hybrid systems,” in American Control Conference, 2000.
  30. F. Borrelli, M. Baotic, A. Bemporad, and M. Morari, “An efficient algorithm for computing the state feedback optimal control law for discrete time hybrid systems,” in American Control Conference, 2003.
  31. T. Marcucci and R. Tedrake, “Mixed-integer formulations for optimal control of piecewise-affine systems,” in ACM International Conference on Hybrid Systems: Computation and Control, 2019.
  32. G. Ackerson and K. Fu, “On state estimation in switching environments,” IEEE Transactions on Automatic Control, 1970.
  33. J. D. Hamilton, “Analysis of time series subject to changes in regime,” Journal of Econometrics, 1990.
  34. V. Pavlovic, J. M. Rehg, and J. MacCormick, “Learning switching linear models of human motion,” in Advances in Neural Information Processing Systems, 2001.
  35. S. M. Oh, J. M. Rehg, T. Balch, and F. Dellaert, “Data-driven MCMC for learning and inference in switching linear dynamic systems,” in National Conference on Artificial Intelligence, 2005.
  36. B. Mesot and D. Barber, “Switching linear dynamical systems for noise robust speech recognition,” IEEE Transactions on Audio, Speech, and Language Processing, 2007.
  37. U. N. Lerner, “Hybrid Bayesian networks for reasoning about complex systems,” Ph.D. dissertation, Stanford University, 2002.
  38. M. D. Escobar and M. West, “Bayesian density estimation and inference using mixtures,” Journal of the American Statistical Association, 1995.
  39. C. E. Rasmussen, “The infinite Gaussian mixture model,” in Advances in Neural Information Processing Systems, 1999.
  40. M. J. Beal, Z. Ghahramani, and C. E. Rasmussen, “The infinite hidden Markov model,” in Advances in Neural Information Processing Systems, 2002.
  41. Y. W. Teh, M. I. Jordan, M. J. Beal, and D. M. Blei, “Sharing clusters among related groups: Hierarchical Dirichlet processes,” in Advances in Neural Information Processing Systems, 2005.
  42. E. Fox, E. B. Sudderth, M. I. Jordan, and A. S. Willsky, “Nonparametric Bayesian learning of switching linear dynamical systems,” in Advances in Neural Information Processing Systems, 2009.
  43. D. P. Kingma and M. Welling, “Auto-encoding variational Bayes,” in International Conference on Learning Representations, 2014.
  44. P. Becker-Ehmck, J. Peters, and P. Van Der Smagt, “Switching linear dynamics for variational Bayes filtering,” in International Conference on Machine Learning, 2019.
  45. A. G. Barto and S. Mahadevan, “Recent advances in hierarchical reinforcement learning,” Discrete Event Dynamic Systems, 2003.
  46. R. E. Parr, “Hierarchical control and learning for Markov decision processes,” Ph.D. dissertation, University of California Berkeley, 1998.
  47. D. Precup, “Temporal abstraction in reinforcement learning,” Ph.D. dissertation, University of Massachusetts Amherst, 2000.
  48. D. Andre and S. J. Russell, “State abstraction for programmable reinforcement learning agents,” in National Conference on Artificial Intelligence, 2002.
  49. R. S. Sutton, D. Precup, and S. Singh, “Between MDPs and semi-MDPs: A framework for temporal abstraction in reinforcement learning,” Artificial Intelligence, 1999.
  50. ——, “Intra-option learning about temporally abstract actions.” in International Conference on Machine Learning, 1998.
  51. S. J. Bradtke and M. O. Duff, “Reinforcement learning methods for continuous-time Markov decision problems,” in Advances in Neural Information Processing Systems, 1995.
  52. M. Huber and R. A. Grupen, “Learning to coordinate controllers-reinforcement learning on a control basis,” in International Joint Conferences on Artificial Intelligence, 1997.
  53. M. Huber, “A hybrid architecture for adaptive robot control,” Ph.D. dissertation, University of Massachusetts Amherst, 2000.
  54. G. Konidaris and A. G. Barto, “Skill discovery in continuous reinforcement learning domains using skill chaining,” in Advances in Neural Information Processing Systems, 2009.
  55. D. J. Mankowitz, T. A. Mann, and S. Mannor, “Adaptive skills adaptive partitions (ASAP),” in Advances in Neural Information Processing Systems, 2016.
  56. C. Daniel, H. Van Hoof, J. Peters, and G. Neumann, “Probabilistic inference for determining options in reinforcement learning,” Machine Learning, 2016.
  57. P.-L. Bacon, J. Harb, and D. Precup, “The option-critic architecture,” in AAAI Conference on Artificial Intelligence, 2017.
  58. M. Smith, H. Hoof, and J. Pineau, “An inference-based policy gradient method for learning options,” in International Conference on Machine Learning, 2018.
  59. T. G. Dietterich, “State abstraction in MAXQ hierarchical reinforcement learning,” in Advances in Neural Information Processing Systems, 2000.
  60. L. Li, T. J. Walsh, and M. L. Littman, “Towards a unified theory of state abstraction for MDPs,” in International Symposium on Artificial Intelligence and Mathematics, 2006.
  61. R. Akrour, F. Veiga, J. Peters, and G. Neumann, “Regularizing reinforcement learning with state abstraction,” in IEEE/RSJ International Conference on Intelligent Robots and Systems, 2018.
  62. S. Calinon, F. D’halluin, E. L. Sauser, D. G. Caldwell, and A. G. Billard, “Learning and reproduction of gestures by imitation,” IEEE Robotics & Automation Magazine, 2010.
  63. M. Burke, Y. Hristov, and S. Ramamoorthy, “Hybrid system identification using switching density networks,” in Conference on Robot Learning, 2020.
  64. A. Sosic, A. M. Zoubir, and H. Koeppl, “A Bayesian approach to policy recognition and state representation learning,” IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017.
  65. D. Barber, “Expectation correction for smoothed inference in switching linear dynamical systems,” Journal of Machine Learning Research, 2006.
  66. L. E. Baum, T. Petrie, G. Soules, and N. Weiss, “A maximization technique occurring in the statistical analysis of probabilistic functions of Markov chains,” The Annals of Mathematical Statistics, 1970.
  67. A. P. Dempster, N. M. Laird, and D. B. Rubin, “Maximum likelihood from incomplete data via the EM algorithm,” Journal of the Royal Statistical Society, 1977.
  68. L. R. Rabiner, “A tutorial on hidden Markov models and selected applications in speech recognition,” IEEE, 1989.
  69. Y. Bengio and P. Frasconi, “An input-output HMM architecture,” in Advances in Neural Information Processing Systems, 1995.
  70. S. Kullback and R. A. Leibler, “On information and sufficiency,” The Annals of Mathematical Statistics, 1951.
  71. H. Robbins and S. Monro, “A stochastic approximation method,” The Annals of Mathematical Statistics, 1951.
  72. J. Peters, K. Mülling, and Y. Altun, “Relative entropy policy search,” in AAAI Conference on Artificial Intelligence, 2010.
  73. H. Van Hoof, J. Peters, and G. Neumann, “Learning of non-parametric control policies with high-dimensional state features,” in International Conference on Artificial Intelligence and Statistics, 2015.
  74. B. Belousov and J. Peters, “f-Divergence constrained policy improvement,” arXiv preprint arXiv:1801.00056, 2017.
  75. M. Deisenroth, G. Neumann, and J. Peters, “A survey on policy search for robotics,” Foundations and Trends ® in Robotics, 2013.
  76. J. Schulman, F. Wolski, P. Dhariwal, A. Radford, and O. Klimov, “Proximal policy optimization algorithms,” arXiv preprint arXiv:1707.06347, 2017.
  77. S. Hochreiter and J. Schmidhuber, “Long short-term memory,” Neural Computation, 1997.
  78. A. Rahimi and B. Recht, “Random features for large-scale kernel machines,” in Advances in Neural Information Processing Systems, 2008.
  79. M. J. Wainwright and M. I. Jordan, “Graphical models, exponential families, and variational inference,” Foundations and Trends in Machine Learning, 2008.
Citations (2)

Summary

We haven't generated a summary for this paper yet.