Impact of Computation in Integral Reinforcement Learning for Continuous-Time Control
Abstract: Integral reinforcement learning (IntRL) demands the precise computation of the utility function's integral at its policy evaluation (PEV) stage. This is achieved through quadrature rules, which are weighted sums of utility functions evaluated from state samples obtained in discrete time. Our research reveals a critical yet underexplored phenomenon: the choice of the computational method -- in this case, the quadrature rule -- can significantly impact control performance. This impact is traced back to the fact that computational errors introduced in the PEV stage can affect the policy iteration's convergence behavior, which in turn affects the learned controller. To elucidate how computation impacts control, we draw a parallel between IntRL's policy iteration and Newton's method applied to the Hamilton-Jacobi-Bellman equation. In this light, computational error in PEV manifests as an extra error term in each iteration of Newton's method, with its upper bound proportional to the computational error. Further, we demonstrate that when the utility function resides in a reproducing kernel Hilbert space (RKHS), the optimal quadrature is achievable by employing Bayesian quadrature with the RKHS-inducing kernel function. We prove that the local convergence rates for IntRL using the trapezoidal rule and Bayesian quadrature with a Mat\'ern kernel to be $O(N{-2})$ and $O(N{-b})$, where $N$ is the number of evenly-spaced samples and $b$ is the Mat\'ern kernel's smoothness parameter. These theoretical findings are finally validated by two canonical control tasks.
- Nearly optimal control laws for nonlinear systems with saturating actuators using a neural network hjb approach. Automatica, 41(5):779–791, 2005.
- Sobolev Spaces (Pure and applied mathematics; v. 140). Elsevier, 2003.
- Kendall Atkinson. An introduction to numerical analysis. John wiley & sons, 1991.
- Leemon C Baird. Reinforcement learning in continuous time: Advantage updating. In Proceedings of 1994 IEEE International Conference on Neural Networks (ICNN’94), volume 4, pp. 2448–2453. IEEE, 1994.
- Galerkin approximations of the generalized hamilton-jacobi-bellman equation. Automatica, 33(12):2159–2177, 1997.
- Probabilistic integration: a role in statistical computation? 2019.
- Sparks of artificial general intelligence: Early experiments with gpt-4. arXiv preprint arXiv:2303.12712, 2023.
- Neural ordinary differential equations. Advances in neural information processing systems, 31, 2018.
- Bayesian probabilistic numerical methods. SIAM review, 61(4):756–789, 2019.
- Kenji Doya. Reinforcement learning in continuous time and space. Neural computation, 12(1):219–245, 2000.
- Probabilistic Numerics: Computation as Machine Learning. Cambridge University Press, 2022.
- Neural laplace control for continuous-time delayed systems. In International Conference on Artificial Intelligence and Statistics, pp. 1747–1778. PMLR, 2023.
- Yu Jiang and Zhong-Ping Jiang. Robust adaptive dynamic programming. John Wiley & Sons, 2017.
- Convergence guarantees for kernel-based quadrature rules in misspecified settings. Advances in Neural Information Processing Systems, 29, 2016.
- Convergence analysis of deterministic kernel-based quadrature rules in misspecified settings. Foundations of Computational Mathematics, 20:155–194, 2020.
- Functional analysis. Elsevier, 2016.
- Classical quadrature rules via gaussian processes. In 2017 IEEE 27th International Workshop on Machine Learning for Signal Processing (MLSP), pp. 1–6. IEEE, 2017.
- Peter Lancaster. Error analysis for the newton-raphson method. Numerische Mathematik, 9(1):55–68, 1966.
- Integral reinforcement learning for continuous-time input-affine nonlinear systems with simultaneous invariant explorations. IEEE Transactions on Neural Networks and Learning Systems, 26(5):916–932, 2014.
- Reinforcement learning and adaptive dynamic programming for feedback control. IEEE circuits and systems magazine, 9(3):32–50, 2009.
- Neural network control of robot manipulators and non-linear systems. CRC press, 1998.
- Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971, 2015.
- Deep lagrangian networks: Using physics as model prior for deep learning. In International Conference on Learning Representations, 2018.
- Bertil Matérn. Spatial variation, volume 36. Springer Science & Business Media, 2013.
- Optimal tracking control of nonlinear partially-unknown constrained-input systems using integral reinforcement learning. Automatica, 50(7):1780–1792, 2014.
- Integral reinforcement learning and experience replay for adaptive optimal control of partially-unknown constrained-input continuous-time systems. Automatica, 50(1):193–202, 2014.
- Erich Novak. Deterministic and stochastic error bounds in numerical analysis, volume 1349. Springer, 2006.
- Anthony O’Hagan. Bayes–hermite quadrature. Journal of statistical planning and inference, 29(3):245–260, 1991.
- Alexander M Ostrowski. Solution of equations and systems of equations: Pure and applied mathematics: A series of monographs and textbooks, vol. 9, volume 9. Elsevier, 2016.
- Interpolation and learning with scale dependent kernels. arXiv preprint arXiv:2006.09984, 2020.
- Mastering atari, go, chess and shogi by planning with a learned model. Nature, 588(7839):604–609, 2020.
- Matthias Seeger. Gaussian processes for machine learning. International journal of neural systems, 14(02):69–106, 2004.
- Mastering the game of go with deep neural networks and tree search. nature, 529(7587):484–489, 2016.
- Mastering chess and shogi by self-play with a general reinforcement learning algorithm. arXiv preprint arXiv:1712.01815, 2017.
- Al’bert Valentinovich Sul’din. Wiener measure and its applications to approximation methods. i. Izvestiya Vysshikh Uchebnykh Zavedenii. Matematika, (6):145–158, 1959.
- Introduction to reinforcement learning, volume 135. MIT press Cambridge, 1998.
- Minoru Urabe. Convergence of numerical iteration in solution of equations. Journal of Science of the Hiroshima University, Series A (Mathematics, Physics, Chemistry), 19(3):479–489, 1956.
- Online actor–critic algorithm to solve the continuous-time infinite horizon optimal control problem. Automatica, 46(5):878–888, 2010.
- Online adaptive algorithm for optimal control with integral reinforcement learning. International Journal of Robust and Nonlinear Control, 24(17):2686–2710, 2014.
- Neural network approach to continuous-time direct adaptive optimal control for partially unknown nonlinear systems. Neural Networks, 22(3):237–246, 2009.
- Adaptive optimal control for continuous-time linear systems based on policy iteration. Automatica, 45(2):477–484, 2009.
- Continuous-time reinforcement learning control: A review of theoretical results, insights on performance, and needs for new designs. IEEE Transactions on Neural Networks and Learning Systems, 2023.
- Continuous-time mean–variance portfolio selection: A reinforcement learning framework. Mathematical Finance, 30(4):1273–1308, 2020.
- Holger Wendland. Scattered data approximation, volume 17. Cambridge university press, 2004.
- Continuous-time model-based reinforcement learning. In International Conference on Machine Learning, pp. 12009–12018. PMLR, 2021.
- Symplectic ode-net: Learning hamiltonian dynamics with control. In International Conference on Learning Representations, 2019.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.