Information divergences of Markov chains and their applications (2312.04863v1)
Abstract: In this paper, we first introduce and define several new information divergences in the space of transition matrices of finite Markov chains which measure the discrepancy between two Markov chains. These divergences offer natural generalizations of classical information-theoretic divergences, such as the $f$-divergences and the R\'enyi divergence between probability measures, to the context of finite Markov chains. We begin by detailing and deriving fundamental properties of these divergences and notably gives a Markov chain version of the Pinsker's inequality and Chernoff information. We then utilize these notions in a few applications. First, we investigate the binary hypothesis testing problem of Markov chains, where the newly defined R\'enyi divergence between Markov chains and its geometric interpretation play an important role in the analysis. Second, we propose and analyze information-theoretic (Ces`aro) mixing times and ergodicity coefficients, along with spectral bounds of these notions in the reversible setting. Examples of the random walk on the hypercube, as well as the connections between the critical height of the low-temperature Metropolis-Hastings chain and these proposed ergodicity coefficients, are highlighted.
- David J Aldous “Some inequalities for reversible Markov chains” In Journal of the London Mathematical Society 2.3 Oxford University Press, 1982, pp. 564–576
- Robert M Anderson, Haosui Duanmu and Aaron Smith “Mixing times and hitting times for general Markov processes” In Israel Journal of Mathematics Springer, 2023, pp. 1–76
- Riddhipratim Basu, Jonathan Hermon and Yuval Peres “Characterization of cutoff for reversible Markov chains” In Proceedings of the twenty-sixth annual ACM-SIAM symposium on Discrete algorithms, 2014, pp. 1774–1791 SIAM
- Louis J Billera and Persi Diaconis “A geometric interpretation of the Metropolis-Hastings algorithm” In Statistical Science JSTOR, 2001, pp. 335–339
- Olivier Catoni “Simulated annealing algorithms and Markov chains with rare transitions” In Séminaire de probabilités XXXIII Springer, 2006, pp. 69–119
- “Relative entropy under mappings by stochastic matrices” In Linear algebra and its applications 179 Elsevier, 1993, pp. 211–235
- “Elements of Information Theory” Wiley, 2012 URL: https://books.google.com.sg/books?id=VWq5GG6ycxMC
- “Tight inequalities among set hitting times in Markov chains” In Proceedings of the American Mathematical Society 142.9, 2014, pp. 3285–3298
- Jonathan Hermon “A technical report on hitting times, mixing and cutoff” In arXiv preprint arXiv:1501.01869, 2015
- David A Levin and Yuval Peres “Markov chains and mixing times” American Mathematical Soc., 2017
- L Miclo “About relaxation time of finite generalized Metropolis algorithms” In The Annals of Applied Probability 12.4 Institute of Mathematical Statistics, 2002, pp. 1492–1515
- Roberto Oliveira “Mixing and hitting times for finite Markov chains”, 2012
- “Mixing times are hitting times of large sets” In Journal of Theoretical Probability 28.2 Springer, 2015, pp. 488–519
- “Information theory: From coding to learning” In Book draft, 2022
- Eugene Seneta “Non-negative matrices and Markov chains” Springer Science & Business Media, 2006
- “Mixing times and moving targets” In Combinatorics, Probability and Computing 23.3 Cambridge University Press, 2014, pp. 460–476
- Tim Van Erven and Peter Harremos “Rényi divergence and Kullback-Leibler divergence” In IEEE Transactions on Information Theory 60.7 IEEE, 2014, pp. 3797–3820
- “Information geometry of reversible Markov chains” In Information Geometry 4.2 Springer, 2021, pp. 393–433