A rate-distortion framework for MCMC algorithms: geometry and factorization of multivariate Markov chains (2404.12589v2)
Abstract: We introduce a framework rooted in a rate distortion problem for Markov chains, and show how a suite of commonly used Markov Chain Monte Carlo (MCMC) algorithms are specific instances within it, where the target stationary distribution is controlled by the distortion function. Our approach offers a unified variational view on the optimality of algorithms such as Metropolis-Hastings, Glauber dynamics, the swapping algorithm and Feynman-Kac path models. Along the way, we analyze factorizability and geometry of multivariate Markov chains. Specifically, we demonstrate that induced chains on factors of a product space can be regarded as information projections with respect to a particular divergence. This perspective yields Han--Shearer type inequalities for Markov chains as well as applications in the context of large deviations and mixing time comparison. Finally, to demonstrate the significance of our framework, we propose a new projection sampler based on the swapping algorithm that provably accelerates the mixing time by multiplicative factors related to the number of temperatures and the dimension of the underlying state space.
- Optimal scaling of MCMC beyond Metropolis. Adv. in Appl. Probab., 55(2):492–509, 2023.
- D. Aldous and J. A. Fill. Reversible markov chains and random walks on graphs, 2002. Unfinished monograph, recompiled 2014, available at http://www.stat.berkeley.edu/$∼$aldous/RWG/book.html.
- Statistical inference under multiterminal rate restrictions: A differential geometric approach. IEEE Transactions on Information Theory, 35(2):217–227, 1989.
- N. Bhatnagar and D. Randall. Simulated tempering and swapping on mean-field models. J. Stat. Phys., 164(3):495–530, 2016.
- J. Bierkens. Non-reversible Metropolis-Hastings. Stat. Comput., 26(6):1213–1228, 2016.
- L. J. Billera and P. Diaconis. A geometric interpretation of the Metropolis-Hastings algorithm. Statistical Science, pages 335–339, 2001.
- V. Borkar and L. Miclo. On the fastest finite Markov processes. J. Math. Anal. Appl., 481(2):123488, 43, 2020.
- Concentration inequalities. Oxford University Press, Oxford, 2013. A nonasymptotic theory of independence, With a foreword by Michel Ledoux.
- Fastest mixing Markov chain on a graph. SIAM Rev., 46(4):667–689, 2004.
- Fastest mixing Markov chain on graphs with symmetries. SIAM J. Optim., 20(2):792–819, 2009.
- Localization of the maximal entropy random walk. Physical review letters, 102(16):160602, 2009.
- Lifting Markov chains to speed up mixing. In Annual ACM Symposium on Theory of Computing (Atlanta, GA, 1999), pages 275–281. ACM, New York, 1999.
- G.-Y. Chen and T. Kumagai. Cutoffs for product chains. Stochastic Process. Appl., 128(11):3840–3879, 2018.
- On the optimal transition matrix for Markov chain Monte Carlo sampling. SIAM J. Control Optim., 50(5):2743–2762, 2012.
- Ergodic theory for controlled Markov chains with stationary inputs. Ann. Appl. Probab., 28(1):79–111, 2018.
- M. C. Choi and G. Wolfer. Systematic approaches to generate reversiblizations of markov chains. IEEE Transactions on Information Theory, 2024+. doi: 10.1109/TIT.2023.3304685.
- Some intersection theorems for ordered sets and graphs. Journal of Combinatorial Theory, Series A, 43(1):23–37, 1986.
- Elements of information theory. Wiley-Interscience [John Wiley & Sons], Hoboken, NJ, second edition, 2006.
- Conditional limit theorems under Markov conditioning. IEEE Transactions on Information Theory, 33(6):788–801, 1987.
- P. Del Moral. Feynman-Kac formulae. Probability and its Applications (New York). Springer-Verlag, New York, 2004. Genealogical and interacting particle systems with applications.
- A. Dembo and O. Zeitouni. Large deviations techniques and applications, volume 38 of Stochastic Modelling and Applied Probability. Springer-Verlag, Berlin, 2010. Corrected reprint of the second (1998) edition.
- P. Diaconis and L. Miclo. On characterizations of Metropolis type algorithms in continuous time. ALEA Lat. Am. J. Probab. Math. Stat., 6:199–238, 2009.
- G. Fayolle and A. de La Fortelle. Entropy and the principle of large deviations for discrete-time Markov chains. Problemy Peredachi Informatsii, 38(4):121–135, 2002. ISSN 0555-2923.
- W. H. Fleming and D. Hernández-Hernández. Risk-sensitive control of finite state machines on an infinite horizon. I. SIAM J. Control Optim., 35(5):1790–1810, 1997.
- A tail bound for read-k𝑘kitalic_k families of functions. Random Structures Algorithms, 47(1):99–108, 2015.
- A. Ghassami and N. Kiyavash. Interaction information for causal inference: The case of directed triangle. In 2017 IEEE International Symposium on Information Theory (ISIT), pages 1326–1330, 2017.
- T. S. Han. Nonnegative entropy measures of multivariate symmetric correlations. Information and Control, 36:133–156, 1978.
- On learning markov chains. In S. Bengio, H. Wallach, H. Larochelle, K. Grauman, N. Cesa-Bianchi, and R. Garnett, editors, Advances in Neural Information Processing Systems, volume 31. Curran Associates, Inc., 2018.
- M. Hayashi and S. Watanabe. Information geometry approach to parameter estimation in Markov chains. The Annals of Statistics, 44(4):1495 – 1535, 2016.
- D. Hernández-Hernández and S. I. Marcus. Risk sensitive control of Markov processes in countable state space. Systems Control Lett., 29(3):147–155, 1996.
- D. M. Higdon. Auxiliary variable methods for markov chain monte carlo with applications. Journal of the American Statistical Association, 93(442):585–595, 1998.
- N. J. Higham. Matrix nearness problems and applications. In Applications of matrix theory (Bradford, 1988), volume 22 of Inst. Math. Appl. Conf. Ser. New Ser., pages 1–27. Oxford Univ. Press, New York, 1989.
- Entropy and mutual information for markov channels with general inputs. In Proceedings of the annual Allerton conference on communication control and computing, volume 40, pages 824–833. The University; 1998, 2002.
- Mixing time estimation in reversible Markov chains from a single sample path. Ann. Appl. Probab., 29(4):2439–2480, 2019.
- Optimal variance reduction for Markov chain Monte Carlo. SIAM J. Control Optim., 56(4):2977–2996, 2018.
- Elementary bounds on Poincaré and log-Sobolev constants for decomposable Markov chains. Ann. Appl. Probab., 14(4):1741–1765, 2004.
- Finite Markov chains: with a new appendix” Generalization of a fundamental matrix”. Springer, New York, 1983.
- D. Lacker. Independent projections of diffusions: Gradient flows for variational inference and optimal mean field approximations, 2023.
- D. A. Levin and Y. Peres. Markov chains and mixing times, volume 107. American Mathematical Soc., 2017.
- Z. Li and L.-H. Lim. Generalized matrix nearness problems. SIAM J. Matrix Anal. Appl., 44(4):1709–1730, 2023.
- P. Mathé. Relaxation of product Markov chains on product spaces. J. Complexity, 14(3):319–332, 1998.
- R. Montenegro and P. Tetali. Mathematical aspects of mixing times in Markov chains. Found. Trends Theor. Comput. Sci., 1(3):x+121, 2006.
- H. Nagaoka. The exponential family of Markov chains and its information geometry. In The proceedings of the Symposium on Information Theory and Its Applications, volume 28(2), pages 601–604, 2005.
- S. Natarajan. Large deviations, hypotheses testing, and source coding for finite Markov chains. IEEE Trans. Inform. Theory, 31(3):360–365, 1985.
- Involutive mcmc: a unifying framework. In Proceedings of the 37th International Conference on Machine Learning, ICML’20, 2020.
- Y. Polyanskiy and Y. Wu. Information theory: From coding to learning. Book draft, 2022.
- The Kullback-Leibler divergence rate between Markov sources. IEEE Trans. Inform. Theory, 50(5):917–921, 2004.
- L. Saloff-Coste. Lectures on finite Markov chains. In Lectures on probability theory and statistics (Saint-Flour, 1996), volume 1665 of Lecture Notes in Math., pages 301–413. Springer, Berlin, 1997.
- I. Sason. Information inequalities via submodularity and a problem in extremal graph theory. Entropy, 24(5):597, 2022.
- W. F. Schreiber. Cameraman image. https://hdl.handle.net/1721.3/195767, 1978. Accessed: 10-Apr-2024. Licensed under CC BY-NC.
- Rationally inattentive control of Markov processes. SIAM J. Control Optim., 54(2):987–1016, 2016.
- Run-and-tumble motion: the role of reversibility. J. Stat. Phys., 183(3):Paper No. 44, 31, 2021.
- M. Vidyasagar. An elementary derivation of the large deviation rate function for finite state Markov chains. Asian J. Control, 16(1):1–19, 2014.
- M. J. Wainwright. High-dimensional statistics, volume 48 of Cambridge Series in Statistical and Probabilistic Mathematics. Cambridge University Press, Cambridge, 2019. A non-asymptotic viewpoint.
- Y. Wang and M. C. H. Choi. Information divergences of markov chains and their applications, 2023.
- S. Watanabe. Neyman–Pearson test for zero-rate multiterminal hypothesis testing. IEEE Transactions on Information Theory, 64(7):4923–4939, 2017.
- G. Wolfer and S. Watanabe. Information geometry of reversible Markov chains. Information Geometry, 4(2):393–433, 2021.
- G. Wolfer and S. Watanabe. Information geometry of Markov kernels: a survey. Frontiers in Physics, 11, 2023.
- G. Wolfer and S. Watanabe. Geometric aspects of data-processing of Markov chains. To appear in Transactions of Mathematics and Its Applications, 2024+.
- Constructing optimal transition matrix for Markov chain Monte Carlo. Linear Algebra Appl., 487:184–202, 2015.
- R. W. Yeung. Information theory and network coding. New York, NY: Springer, 2008.