FORESEE: Prediction with Expansion-Compression Unscented Transform for Online Policy Optimization (2209.12644v2)
Abstract: Propagating state distributions through a generic, uncertain nonlinear dynamical model is known to be intractable and usually begets numerical or analytical approximations. We introduce a method for state prediction, called the Expansion-Compression Unscented Transform, and use it to solve a class of online policy optimization problems. Our proposed algorithm propagates a finite number of sigma points through a state-dependent distribution, which dictates an increase in the number of sigma points at each time step to represent the resulting distribution; this is what we call the expansion operation. To keep the algorithm scalable, we augment the expansion operation with a compression operation based on moment matching, thereby keeping the number of sigma points constant across predictions over multiple time steps. Its performance is empirically shown to be comparable to Monte Carlo but at a much lower computational cost. Under state and control input constraints, the state prediction is subsequently used in tandem with a proposed variant of constrained gradient-descent for online update of policy parameters in a receding horizon fashion. The framework is implemented as a differentiable computational graph for policy training. We showcase our framework for a quadrotor stabilization task as part of a benchmark comparison in safe-control-gym and for optimizing the parameters of a Control Barrier Function based controller in a leader-follower problem.
- F. Amadio, A. Dalla Libera, R. Antonello, D. Nikovski, R. Carli, and D. Romeres, “Model-based policy search using Monte Carlo gradient estimation with real systems application,” IEEE Transactions on Robotics, vol. 38, no. 6, pp. 3879–3898, 2022.
- A. D. Ames, X. Xu, J. W. Grizzle, and P. Tabuada, “Control barrier function based quadratic programs for safety critical systems,” IEEE Transactions on Automatic Control, vol. 62, no. 8, pp. 3861–3876, 2016.
- O. Andersson, F. Heintz, and P. Doherty, “Model-based reinforcement learning in continuous environments using real-time constrained optimization,” in Proc. of the AAAI Conference on Artificial Intelligence, vol. 29, no. 1, 2015.
- P. T. Boggs and J. W. Tolle, “Sequential quadratic programming,” Acta numerica, vol. 4, pp. 1–51, 1995.
- K. Chatzilygeroudis, R. Rama, R. Kaushik, D. Goepp, V. Vassiliades, and J.-B. Mouret, “Black-box data-efficient policy search for robotics,” in IEEE/RSJ Int. Conf. on Intelligent Robots and Systems, 2017, pp. 51–58.
- R. Cheng, G. Orosz, R. M. Murray, and J. W. Burdick, “End-to-end safe reinforcement learning through barrier functions for safety-critical continuous control tasks,” in Proceedings of the AAAI Conference on Artificial Intelligence, vol. 33, no. 01, 2019, pp. 3387–3395.
- M. Deisenroth and C. E. Rasmussen, “Pilco: A model-based and data-efficient approach to policy search,” in Proceedings of the 28th International Conference on Machine Learning, 2011, pp. 465–472.
- M. P. Deisenroth, D. Fox, and C. E. Rasmussen, “Gaussian processes for data-efficient learning in robotics and control,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 37, no. 2, pp. 408–423, 2013.
- M. P. Deisenroth, G. Neumann, J. Peters, et al., “A survey on policy search for robotics,” Foundations and Trends® in Robotics, vol. 2, no. 1–2, pp. 1–142, 2013.
- C. Dohrmann and R. Robinett, “Efficient sequential quadratic programming implementations for equality-constrained discrete-time optimal control,” Journal of Optimization Theory and Applications, vol. 95, pp. 323–346, 1997.
- D. Ebeigbe, T. Berry, M. M. Norton, A. J. Whalen, D. Simon, T. Sauer, and S. J. Schiff, “A generalized unscented transformation for probability distributions,” ArXiv, 2021.
- Y. Gal, R. McAllister, and C. E. Rasmussen, “Improving PILCO with Bayesian neural network dynamics models,” in Data-efficient machine learning workshop, ICML, vol. 4, no. 34, 2016, p. 25.
- M. Ghavamzadeh and Y. Engel, “Bayesian policy gradient algorithms,” Advances in neural information processing systems, vol. 19, 2006.
- L. Hewing, J. Kabzan, and M. N. Zeilinger, “Cautious model predictive control using gaussian process regression,” IEEE Transactions on Control Systems Technology, vol. 28, no. 6, pp. 2736–2743, 2019.
- S. J. Julier and J. K. Uhlmann, “New extension of the kalman filter to nonlinear systems,” in Signal processing, sensor fusion, and target recognition VI, vol. 3068. Spie, 1997, pp. 182–193.
- S. Kamthe and M. Deisenroth, “Data-efficient reinforcement learning with probabilistic model predictive control,” in International conference on artificial intelligence and statistics. PMLR, 2018, pp. 1701–1710.
- S. Levine and P. Abbeel, “Learning neural network policies with guided policy search under unknown dynamics,” Advances in Neural Information Processing Systems, vol. 27, 2014.
- X.-W. Liu, “Global convergence on an active set SQP for inequality constrained optimization,” Journal of Computational and Applied Mathematics, vol. 180, no. 1, pp. 201–211, 2005.
- A. Nemirovski and A. Shapiro, “Convex approximations of chance constrained programs,” SIAM Journal on Optimization, vol. 17, no. 4, pp. 969–996, 2007.
- A. O’Hagan, “Monte Carlo is fundamentally unsound,” The Statistician, pp. 247–249, 1987.
- N. Ozaki, S. Campagnola, R. Funase, and C. H. Yam, “Stochastic differential dynamic programming with unscented transform for low-thrust trajectory design,” Journal of Guidance, Control, and Dynamics, vol. 41, no. 2, pp. 377–387, 2018.
- Y. Pan and E. Theodorou, “Probabilistic differential dynamic programming,” Advances in Neural Information Processing Systems, vol. 27, 2014.
- H. Parwana and D. Panagou, “Recursive feasibility guided optimal parameter adaptation of differential convex optimization policies for safety-critical systems,” in 2022 International Conference on Robotics and Automation (ICRA). IEEE, 2022, pp. 6807–6813.
- S. Roberts, “Safepilco: A software tool for safe and data-efficient policy synthesis,” in Quantitative Evaluation of Systems: 17th International Conference, QEST 2020, Vienna, Austria, vol. 12289. Springer, 2020, p. 18.
- E. Schulz, M. Speekenbrink, and A. Krause, “A tutorial on Gaussian process regression: Modelling, exploring, and exploiting functions,” Journal of Mathematical Psychology, vol. 85, pp. 1–16, 2018.
- S. Stanton, K. A. Wang, and A. G. Wilson, “Model-based policy gradients with entropy exploration through sampling,” ICML Generative Modeling and Model-Based Reasoning for Robotics and AI Workshop.
- C. Sun, D.-K. Kim, and J. P. How, “FISAR: Forward invariant safe reinforcement learning with a deep neural network-based optimizer,” in IEEE Int. Conf. on Robotics and Automation (ICRA), 2021, pp. 10 617–10 624.
- E. Theodorou, Y. Tassa, and E. Todorov, “Stochastic differential dynamic programming,” in Proc. of the 2010 American Control Conference, 2010, pp. 1125–1132.
- A. L. Tits, “Feasible sequential quadratic programming,” Encyclopedia of Optimization, 2009.
- R. Turner and C. E. Rasmussen, “Model based learning of sigma points in unscented kalman filtering,” Neurocomputing, vol. 80, pp. 47–53, 2012.
- B. Van Niekerk, A. Damianou, and B. Rosman, “Online constrained model-based reinforcement learning,” arXiv preprint arXiv:2004.03499, 2020.
- J. Vinogradska, B. Bischoff, J. Achterhold, T. Koller, and J. Peters, “Numerical quadrature for probabilistic policy search,” IEEE Trans. on Pattern Analysis and Machine Intelligence, vol. 42, no. 1, pp. 164–175, 2018.
- Z. Yuan, A. W. Hall, S. Zhou, L. Brunke, M. Greeff, J. Panerati, and A. P. Schoellig, “Safe-control-gym: A unified benchmark suite for safe learning-based control and reinforcement learning in robotics,” IEEE Robotics and Automation Letters, vol. 7, no. 4, pp. 11 142–11 149, 2022, Open source code at https://github.com/utiasDSL/safe-control-gym.
- H. Zhang, Z. Li, and A. Clark, “Model-based reinforcement learning with provable safety guarantees via control barrier functions,” in IEEE International Conference on Robotics and Automation (ICRA), 2021, pp. 792–798.
Sponsor
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.