Actively Learning Reinforcement Learning: A Stochastic Optimal Control Approach (2309.10831v4)
Abstract: In this paper we propose a framework towards achieving two intertwined objectives: (i) equipping reinforcement learning with active exploration and deliberate information gathering, such that it regulates state and parameter uncertainties resulting from modeling mismatches and noisy sensory; and (ii) overcoming the computational intractability of stochastic optimal control. We approach both objectives by using reinforcement learning to compute the stochastic optimal control law. On one hand, we avoid the curse of dimensionality prohibiting the direct solution of the stochastic dynamic programming equation. On the other hand, the resulting stochastic optimal control reinforcement learning agent admits caution and probing, that is, optimal online exploration and exploitation. Unlike fixed exploration and exploitation balance, caution and probing are employed automatically by the controller in real-time, even after the learning process is terminated. We conclude the paper with a numerical simulation, illustrating how a Linear Quadratic Regulator with the certainty equivalence assumption may lead to poor performance and filter divergence, while our proposed approach is stabilizing, of an acceptable performance, and computationally convenient.
- K. J. Åström and A. Helmersson, “Dual control of an integrator with unknown gain,” Computers & Mathematics with Applications, vol. 12, no. 6, pp. 653–662, 1986.
- K. J. Åström and B. Wittenmark, “Problems of identification and control,” Journal of Mathematical analysis and applications, vol. 34, no. 1, pp. 90–113, 1971.
- A. Doucet, S. Godsill, and C. Andrieu, “On sequential monte carlo sampling methods for bayesian filtering,” Statistics and computing, vol. 10, no. 3, pp. 197–208, 2000.
- P. Henderson, R. Islam, P. Bachman, J. Pineau, D. Precup, and D. Meger, “Deep reinforcement learning that matters,” in Proceedings of the AAAI conference on artificial intelligence, vol. 32, no. 1, 2018.
- T. Jaakkola, S. Singh, and M. Jordan, “Reinforcement learning algorithm for partially observable markov decision problems,” Advances in neural information processing systems, vol. 7, 1994.
- T. P. Lillicrap, J. J. Hunt, A. Pritzel, N. Heess, T. Erez, Y. Tassa, D. Silver, and D. Wierstra, “Continuous control with deep reinforcement learning,” arXiv preprint arXiv:1509.02971, 2015.
- G. Marafioti, R. R. Bitmead, and M. Hovd, “Persistently exciting model predictive control,” International Journal of Adaptive Control and Signal Processing, vol. 28, no. 6, pp. 536–552, 2014.
- B. Recht, “A tour of reinforcement learning: The view from continuous control,” Annual Review of Control, Robotics, and Autonomous Systems, vol. 2, pp. 253–279, 2019.
- K. Reif, S. Gunther, E. Yaz, and R. Unbehauen, “Stochastic stability of the discrete-time extended kalman filter,” IEEE Transactions on Automatic control, vol. 44, no. 4, pp. 714–728, 1999.
- T. B. Schön, A. Wills, and B. Ninness, “System identification of nonlinear state-space models,” Automatica, vol. 47, no. 1, pp. 39–49, 2011.
- D. Silver, G. Lever, N. Heess, T. Degris, D. Wierstra, and M. Riedmiller, “Deterministic policy gradient algorithms,” in International conference on machine learning. Pmlr, 2014, pp. 387–395.
- C. Striebel, “Sufficient statistics in the optimum control of stochastic systems,” Journal of Mathematical Analysis and Applications, vol. 12, no. 3, pp. 576–592, 1965.
- E. Tse, Y. Bar-Shalom, and L. Meier, “Wide-sense adaptive dual control for nonlinear stochastic systems,” IEEE Transactions on Automatic Control, vol. 18, no. 2, pp. 98–108, 1973.
- A. Wills, T. B. Schön, L. Ljung, and B. Ninness, “Identification of hammerstein–wiener models,” Automatica, vol. 49, no. 1, pp. 70–81, 2013.