Reinforcement Learning Design for Quickest Change Detection
Abstract: The field of quickest change detection (QCD) concerns design and analysis of algorithms to estimate in real time the time at which an important event takes place, and identify properties of the post-change behavior. It is shown in this paper that approaches based on reinforcement learning (RL) can be adapted based on any "surrogate information state" that is adapted to the observations. Hence we are left to choose both the surrogate information state process and the algorithm. For the former, it is argued that there are many choices available, based on a rich theory of asymptotic statistics for QCD. Two approaches to RL design are considered: (i) Stochastic gradient descent based on an actor-critic formulation. Theory is largely complete for this approach: the algorithm is unbiased, and will converge to a local minimum. However, it is shown that variance of stochastic gradients can be very large, necessitating the need for commensurately long run times; (ii) Q-learning algorithms based on a version of the projected Bellman equation. It is shown that the algorithm is stable, in the sense of bounded sample paths, and that a solution to the projected Bellman equation exists under mild conditions. Numerical experiments illustrate these findings, and provide a roadmap for algorithm design in more general settings.
- V. Anantharam. How large delays build up in a GI/G/1𝐺𝐼𝐺1GI/G/1italic_G italic_I / italic_G / 1 queue. Queueing Systems Theory Appl., 5(4):345–367, 1989.
- The ODE method for asymptotic statistics in stochastic approximation and reinforcement learning. arXiv e-prints:2110.14427, pages 1–50, 2021.
- Zap Q-learning for optimal stopping. In Proc. of the American Control Conf., pages 3920–3925, 2020.
- Zap Q-Learning with nonlinear function approximation. In H. Larochelle, M. Ranzato, R. Hadsell, M. F. Balcan, and H. Lin, editors, Proc. Conference on Neural Information Processing Systems (NeurIPS), and arXiv e-prints 1910.05405, volume 33, pages 16879–16890, 2020.
- Hidden Markov models, volume 29 of Applications of Mathematics (New York). Springer-Verlag, New York, 1995. Estimation and control.
- A. Ganesh and N. O’Connell. A large deviation principle with queueing applications. Stochastics and Stochastic Reports, 73(1-2):25–35, 2002.
- D. Huang and S. Meyn. Generalized error exponents for small sample universal hypothesis testing. IEEE Trans. Inform. Theory, 59(12):8157–8181, 2013.
- V. Krishnamurthy. Structural results for partially observed Markov decision processes. ArXiv e-prints, page arXiv:1512.03873, 2015.
- Online cyber-attack detection in smart grid: A reinforcement learning approach. IEEE Transactions on Smart Grid, 10(5):5174–5185, 2019.
- Learning exercise policies for american options. In D. van Dyk and M. Welling, editors, Proceedings of the Twelth International Conference on Artificial Intelligence and Statistics, volume 5 of Proceedings of Machine Learning Research, pages 352–359, Hilton Clearwater Beach Resort, Clearwater Beach, Florida USA, 16–18 Apr 2009. PMLR.
- Quickest change detection with non-stationary post-change observations. arXiv 2110.01581, 2021.
- Y. Liang and V. V. Veeravalli. Non-parametric quickest mean-change detection. Transactions on Information Theory, pages 8040–8052, 2022.
- S. Meyn. Control Systems and Reinforcement Learning. Cambridge University Press, Cambridge, 2022.
- S. Meyn. Stability of Q-learning through design and optimism. arXiv 2307.02632, 2023.
- G. V. Moustakides. Optimal stopping times for detecting changes in distributions. The Annals of Statistics, 14(4):1379 – 1387, 1986.
- E. Nummelin. General Irreducible Markov Chains and Nonnegative Operators. Cambridge University Press, Cambridge, 1984.
- M. Pollak and A. G. Tartakovsky. On optimality properties of the Shiryaev-Roberts procedure. Statistica Sinica, 19(4):1729–1739, 2009.
- On optimality of the Shiryaev–Roberts procedure for detecting a change in distribution. The Annals of Statistics, 38(6):3445 – 3457, 2010.
- A. N. Shiryaev. Optimal stopping rules, volume 8. Springer Science & Business Media, 2007 (reprint from 1977 ed.).
- Approximate information state for approximate planning and reinforcement learning in partially observed systems. The Journal of Machine Learning Research, 23(1):483–565, 2022.
- J. Tsitsiklis and B. van Roy. Optimal stopping of Markov processes: Hilbert space theory, approximation algorithms, and an application to pricing high-dimensional financial derivatives. IEEE Trans. Automat. Control, 44(10):1840 –1851, 1999.
- Universal and composite hypothesis testing via mismatched divergence. IEEE Trans. Inform. Theory, 57(3):1587 –1603, 2011.
- V. V. Veeravalli and T. Banerjee. Quickest change detection. In Academic press library in signal processing, volume 3, pages 209–255. Elsevier, 2014.
- Sequential (quickest) change detection: Classical results and new directions. IEEE Journal on Selected Areas in Information Theory, 2(2):494–514, 2021.
- Data-driven quickest change detection in hidden Markov models. In IEEE International Symposium on Information Theory (ISIT), pages 2643–2648, June 2023.
- Data-driven quickest change detection in Markov models. In IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 1–5, June 2023.
- E. Zhou. Optimal stopping under partial observation: Near-value iteration. IEEE Transactions on Automatic Control, 58(2):500–506, 2013.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.