Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
184 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Perturbations of Markov Chains (2404.10251v1)

Published 16 Apr 2024 in stat.ME and math.PR

Abstract: This chapter surveys progress on three related topics in perturbations of Markov chains: the motivating question of when and how "perturbed" MCMC chains are developed, the theoretical problem of how perturbation theory can be used to analyze such chains, and finally the question of how the theoretical analyses can lead to practical advice.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (163)
  1. Noisy Monte Carlo: Convergence of Markov chains with approximate transition kernels. Statistics and Computing, 26(1-2):29–47, 2016.
  2. Comparison of Markov chains via weak Poincaré inequalities with application to pseudo-marginal MCMC. The Annals of Statistics, 50(6):3592–3618, 2022.
  3. Peskun–Tierney ordering for Markovian Monte Carlo: Beyond the reversible scenario. The Annals of Statistics, 49(4):1958–1981, 2021.
  4. The pseudo-marginal approach for efficient Monte Carlo computations. The Annals of Statistics, 37(2):697–725, 2009.
  5. Establishing some order amongst exact approximations of MCMCs. The Annals of Applied Probability, 26(5):2661–2696, 2016.
  6. On a single server queue fed by scheduled traffic with Pareto perturbations. Queuing Systems, 100:61–91, 2022.
  7. A fast asynchronous Markov chain Monte Carlo sampler for sparse Bayesian inference. Journal of the Royal Statistical Society Series B: Statistical Methodology, 85(5):1492–1516, 09 2023.
  8. Control variates for stochastic gradient MCMC. Statistics and Computing, 29(3):599–615, 2019.
  9. Towards scaling up Markov chain Monte Carlo: an adaptive subsampling approach. In Eric Xing and Tony Jebara, editors, Proceedings of the 31st International Conference on Machine Learning, volume 113, pages 405–413. PMLR, 2014.
  10. On Markov chain Monte Carlo methods for tall data. Journal of Machine Learning Research, 18(1):1515–1557, 2017.
  11. J. Besag. Comments on “Representations of knowledge in complex systems” by U. Grenander and M.L. Miller. Journal of the Royal Statistical Society, Series B, 56:591–592, 1994.
  12. Optimal tuning of the hybrid Monte Carlo algorithm. Bernoulli, 19(5A):1501–1534, 2013.
  13. Michael Betancourt. The fundamental incompatibility of scalable Hamiltonian Monte Carlo and naive data subsampling. In Francis Bach and David Blei, editors, Proceedings of the 32nd International Conference on Machine Learning, volume 114 of Proceedings of Machine Learning Research, pages 533–540. PMLR, 2015.
  14. Mixing time of exponential random graphs. The Annals of Applied Probability, 21(6):2146 – 2170, 2011.
  15. The boomerang sampler. In Hal Daumé III and Aarti Singh, editors, Proceedings of the 37th International Conference on Machine Learning, volume 119 of Proceedings of Machine Learning Research, pages 908–918. PMLR, 2020.
  16. Data-driven forward discretizations for Bayesian inversion. Inverse Problems, 36(10):105008, 2020.
  17. The use of a single pseudo-sample in approximate Bayesian computation. Statistics and Computing, 27:583–590, 2014.
  18. Coupling and convergence for Hamiltonian Monte Carlo. The Annals of Applied Probability, 30(3):1209–1250, 2020.
  19. Convergence of unadjusted Hamiltonian Monte Carlo for mean-field models. Electronic Journal of Probability, 28(none):1–40, 2023.
  20. Jonathan R Bradley. An approach to incorporate subsampling into a generic Bayesian hierarchical model. Journal of Computational and Graphical Statistics, 30(4):889–905, 2021.
  21. A note on geometric ergodicity and floating-point roundoff error. Statistics & Probability Letters, 53(2):123–127, 2001.
  22. Handbook of Markov chain Monte Carlo. CRC press, 2011.
  23. The promises and pitfalls of stochastic gradient Langevin dynamics. In S. Bengio, H. Wallach, H. Larochelle, K. Grauman, N. Cesa-Bianchi, and R. Garnett, editors, Advances in Neural Information Processing Systems, volume 31, pages 8268–8278, 2018.
  24. Ben Calderhead. A general construction for parallelizing Metropolis-Hastings algorithms. Proceedings of the National Academy of Sciences, 111(49):17408–17413, 2014.
  25. An operational approach to perturbation analysis of closed queuing networks. Mathematics and Computers in Simulation, 28(6):433–451, 1986.
  26. Hal Caswell. Sensitivity analysis of discrete Markov chains via matrix calculus. Linear Algebra and its Applications, 438(4):1727–1745, 2013. 16th ILAS Conference Proceedings, Pisa 2010.
  27. DM Ceperley and Mark Dewing. The penalty method for random walks with uncertain energies. The Journal of Chemical Physics, 110(20):9812–9820, 1999.
  28. On the theory of variance reduction for stochastic gradient Monte Carlo. In Proceedings of the 35th International Conference on Machine Learning, volume 35, pages 764–773. PMLR, 2018.
  29. Bayesian inference via sparse Hamiltonian flows. In S. Koyejo, S. Mohamed, A. Agarwal, D. Belgrave, K. Cho, and A. Oh, editors, Advances in Neural Information Processing Systems, volume 35, pages 20876–20888. Curran Associates, Inc., 2022.
  30. Stochastic gradient Hamiltonian Monte Carlo. In Eric P. Xing and Tony Jebara, editors, Proceedings of the 31st International Conference on Machine Learning, volume 32 of Proceedings of Machine Learning Research, pages 1683–1691. PMLR, 2014.
  31. Fast mixing of Metropolized Hamiltonian Monte Carlo: Benefits of multi-step gradients. Journal of Machine Learning Research, 21(1), 2020.
  32. When does Metropolized Hamiltonian Monte Carlo provably outperform Metropolis-adjusted Langevin algorithm? arXiv preprint arXiv:2304.04724, 2023.
  33. Markov chain Monte Carlo using an approximation. Journal of Computational and Graphical Statistics, 14(4):795–810, 2005.
  34. Parallel local approximation MCMC for expensive models. SIAM/ASA J. Uncertain. Quantification, 6:339–373, 2016.
  35. Accelerating asymptotically exact MCMC for computationally intensive models via local approximations. Journal of the American Statistical Association, 111(516):1591–1607, 2016.
  36. Approximation of Bayesian inverse problems for PDEs. SIAM Journal on Numerical Analysis, 48(1):322–345, 2010.
  37. User-friendly guarantees for the Langevin Monte Carlo with inaccurate gradient. Stochastic Processes and their Applications, 129(12):5278–5311, 2019.
  38. Hamiltonian Monte Carlo with energy conserving subsampling. Journal of Machine Learning Research, 20, 2019.
  39. The Bayesian Approach to Inverse Problems, pages 311–428. Springer International Publishing, Cham, 2017.
  40. Rate-optimal refinement strategies for local approximation MCMC. Statistics and Computing, 32(4):60, 2022.
  41. The rotation of eigenvectors by a perturbation. iii. SIAM Journal on Numerical Analysis, 7(1):1–46, 1970.
  42. Minibatch Gibbs sampling on large graphical models. In Jennifer Dy and Andreas Krause, editors, Proceedings of the 35th International Conference on Machine Learning, volume 80 of Proceedings of Machine Learning Research, pages 1165–1173. PMLR, 2018.
  43. Parallel MCMC without embarrassing failures. In Gustau Camps-Valls, Francisco J. R. Ruiz, and Isabel Valera, editors, Proceedings of The 25th International Conference on Artificial Intelligence and Statistics, volume 151 of Proceedings of Machine Learning Research, pages 1786–1804. PMLR, 2022.
  44. The correlated pseudomarginal method. Journal of the Royal Statistical Society Series B: Statistical Methodology, 80(5):839–870, 2018.
  45. A hierarchical multilevel Markov chain Monte Carlo algorithm with applications to uncertainty quantification in subsurface flow. SIAM/ASA Journal on Uncertainty Quantification, 3(1):1075–1108, 2015.
  46. Markov chains. Springer, 2018.
  47. Efficient implementation of Markov chain Monte Carlo when using an unbiased likelihood estimator. Biometrika, 102(2):295–313, 2015.
  48. Hybrid monte carlo. Physics Letters B, 195(2):216–222, 1987.
  49. Variance reduction in stochastic gradient Langevin dynamics. In D. Lee, M. Sugiyama, U. Luxburg, I. Guyon, and R. Garnett, editors, Advances in Neural Information Processing Systems, volume 29, pages 1154–1162. Curran Associates, Inc., 2016.
  50. High-dimensional Bayesian inference via the unadjusted Langevin algorithm. Bernoulli, 2016.
  51. Nonasymptotic convergence analysis for the unadjusted Langevin algorithm. The Annals of Applied Probability, 27(3):1551–1587, 2017.
  52. Andreas Eberle and Mateusz B. Majka. Quantitative contraction rates for Markov chains on general state spaces. Electronic Journal of Probability, 24, 2019.
  53. Sure Independence Screening for Ultrahigh Dimensional Feature Space. Journal of the Royal Statistical Society Series B: Statistical Methodology, 70(5):849–911, 10 2008.
  54. Regular perturbation of v-geometrically ergodic Markov chains. Journal of Applied Probability, 50(1):184–194, 2013.
  55. Solving geophysical inversion problems with intractable likelihoods: Linearized Gaussian approximations versus the correlated pseudo-marginal method. Mathematical Geosciences, pages 1–21, 2023.
  56. Inference of geostatistical hyperparameters with the correlated pseudo-marginal method. Advances in Water Resources, 173:104402, 2023.
  57. Weak convergence and optimal scaling of random walk Metropolis algorithms. The Annals of Applied Probability, 7(1):110–120, 1997.
  58. Michael B. Giles. Multilevel Monte Carlo methods. Acta Numerica, 24:259–328, 2013.
  59. Correlated pseudo-marginal schemes for time-discretised stochastic kinetic models. Computational Statistics & Data Analysis, 136:92–107, 2019.
  60. Stability of doubly-intractable distributions. Electronic Communications in Probability, 25:1 – 13, 2020.
  61. Improved extraction of hydrologic information from geophysical data through coupled hydrogeophysical inversion. Water Resources Research, 46(11):W00D40, 2010.
  62. Y.C. Ho and X. Cao. Perturbation analysis and optimization of queueing network. Journal of Optimization Theory and Applications, 40:559–582, 1983.
  63. Honest Exploration of Intractable Probability Distributions via Markov Chain Monte Carlo. Statistical Science, 16(4):312 – 334, 2001.
  64. Spectral gaps and error estimates for infinite-dimensional metropolis–hastings with non-gaussian priors. The Annals of Applied Probability, 33(3):1827–1873, 2023.
  65. PASS-GLM: Polynomial approximate sufficient statistics for scalable Bayesian GLM inference. In Proceedings of the 31st International Conference on Neural Information Processing Systems, NIPS’17. Curran Associates Inc., 2017.
  66. Coresets for scalable Bayesian logistic regression. In Proceedings of the 30th International Conference on Neural Information Processing Systems, NIPS’16, page 4087–4095. Curran Associates Inc., 2016.
  67. Unbiased Markov chain Monte Carlo methods with couplings. Journal of the Royal Statistical Society Series B: Statistical Methodology, 82(3):543–600, 2020.
  68. On nonnegative unbiased estimators. The Annals of Statistics, 43(2):769–784, 2015.
  69. Scalable approximate MCMC algorithms for the horseshoe prior. Journal of Machine Learning Research, 21(1), 2020.
  70. Scalable approximate MCMC algorithms for the horseshoe prior. Journal of Machine Learning Research, 21(73):1–61, 2020.
  71. Error bounds for approximations of Markov chains used in Bayesian sampling. arXiv preprint arXiv:1711.05382, 2017.
  72. No free lunch for approximate MCMC. arXiv preprint arXiv:2010.12514, 2020.
  73. Galin L. Jones. On the Markov chain central limit theorem. Probability Surveys, 1:299–320, 2004.
  74. Honest exploration of intractable probability distributions via Markov chain Monte Carlo. Statistical Science, pages 312–334, 2001.
  75. Measuring sample quality in algorithms for intractable normalizing function problems. arXiv preprint arXiv:2109.05121, 2023.
  76. Maximal coupling procedure and stability of discrete Markov chains. I. Theory of Probability and Mathematical Statistics, 86:93–104, 2013.
  77. N.V. Kartashov. Inequalities in theorems of ergodicity and stability for Markov chains with common state space. I. Theory of Probability & Its Applications, 3:247–259, 1986.
  78. Tosio Kato. A Perturbation theory for linear operators. Springer Berlin Heidelberg, Berlin, Heidelberg, 1995.
  79. Austerity in MCMC land: Cutting the Metropolis-Hastings budget. In Eric P. Xing and Tony Jebara, editors, Proceedings of the 31st International Conference on Machine Learning, volume 32, pages 181–189. PMLR, 2014.
  80. Perturbation analysis of Markov chain Monte Carlo for graphical models, 2023.
  81. Fundamentals and Recent Developments in Approximate Bayesian Computation. Systematic Biology, 66(1):e66–e82, 2016.
  82. Yuanyuan Liu. Perturbation bounds for the stationary distributions of Markov chains. SIAM Journal on Matrix Analysis and Applications, 33(4):1057–1074, 2012.
  83. Multilevel delayed acceptance MCMC. SIAM/ASA Journal on Uncertainty Quantification, 11(1):1–30, 2023.
  84. On Russian roulette estimates for Bayesian inference with doubly-intractable likelihoods. Statistical Science, 30:443–467, 2015.
  85. Is there an analog of Nesterov acceleration for gradient-based MCMC? Bernoulli, 27(3):1942–1992, 2021.
  86. A complete recipe for stochastic gradient MCMC. In Proceedings of the 28th International Conference on Neural Information Processing Systems - Volume 2, NIPS’15, page 2917–2925. MIT Press, 2015.
  87. Sampling can be faster than optimization. Proceedings of the National Academy of Sciences, 116(42):20881–20885, 2019.
  88. Firefly Monte Carlo: Exact MCMC with subsets of data. In Proceedings of the 24th International Conference on Artificial Intelligence, IJCAI’15, page 4289–4295. AAAI Press, 2015.
  89. Mixing of Hamiltonian Monte Carlo on strongly log-concave distributions 2: Numerical integrators. In Kamalika Chaudhuri and Masashi Sugiyama, editors, Proceedings of the Twenty-Second International Conference on Artificial Intelligence and Statistics, volume 89 of Proceedings of Machine Learning Research, pages 586–595. PMLR, 2019.
  90. Dimensionally tight bounds for second-order Hamiltonian Monte Carlo. In S. Bengio, H. Wallach, H. Larochelle, K. Grauman, N. Cesa-Bianchi, and R. Garnett, editors, Advances in Neural Information Processing Systems, volume 31, pages 6027–6037. Curran Associates, Inc., 2018.
  91. Perturbation bounds for Monte Carlo within Metropolis via restricted approximations. Stochastic Processes and their Applications, 130(4):2200–2227, 2020.
  92. Stability of noisy Metropolis–Hastings. Statistics and Computing, 26:1187–1211, 2016.
  93. Tempering by subsampling. arXiv:1401.7145, 2014.
  94. Rates of convergence of the Hastings and Metropolis algorithms. The Annals of Statistics, 24(1):101–121, 1996.
  95. Markov Chains and Stochastic Stability. Cambridge Mathematical Library. Cambridge University Press, 2 edition, 2009.
  96. Antonietta Mira. Ordering and improving the performance of Monte Carlo Markov chains. Statistical Science, 16(4):340–350, 2001.
  97. Asynchronous SGD beats minibatch SGD under arbitrary delays. arXiv preprint arXiv:2206.07638, 2022.
  98. A. Mitrophanov. Sensitivity and convergence of uniformly ergodic Markov chains. Journal of Applied Probability, 42(4):1003–1014, 2005.
  99. An efficient Markov chain Monte Carlo method for distributions with intractable normalising constants. Biometrika, 93(2):451–458, 2006.
  100. Andrea Montanari. Computational implications of reducing data to sufficient statistics. Electronic Journal of Statistics, 9(2):2370–2390, 2015.
  101. Elliptical slice sampling. In Proceedings of the thirteenth international conference on artificial intelligence and statistics, pages 541–548. JMLR Workshop and Conference Proceedings, 2010.
  102. MCMC for doubly-intractable distributions. In Proceedings of the Twenty-Second Conference on Uncertainty in Artificial Intelligence, UAI’06, page 359–366. AUAI Press, 2006.
  103. The true cost of stochastic gradient Langevin dynamics. arXiv preprint arXiv:1706.02692, 2017.
  104. R. Neal. Bayesian Learning for Neural Networks. PhD thesis, University of Toronto, 1995.
  105. Radford M Neal. Slice sampling. The Annals of Statistics, 31(3):705–767, 2003.
  106. Approximations of geometrically ergodic reversible Markov chains. Advances in Applied Probability, 53(4):981–1022, 2021.
  107. Asymptotically exact, embarrassingly parallel MCMC. arXiv preprint arXiv:1311.4780, 2013.
  108. Stochastic gradient Markov chain Monte Carlo. Journal of the American Statistical Association, 116(533):433–450, 2021.
  109. Merging MCMC subposteriors through Gaussian-process approximations. Bayesian Analysis, 13(2):507–530, 2018.
  110. Bayesian inference in the presence of intractable normalizing functions. Journal of the American Statistical Association, 113(523):1372–1390, 2018.
  111. Optimal scaling and diffusion limits for the Langevin algorithm in high dimensions. The Annals of Applied Probability, 22(6):2320–2356, 2012.
  112. On some properties of Markov chain Monte Carlo simulation methods based on the particle filter. Journal of Econometrics, 171(2):134–151, 2012.
  113. DG-LMC: A turn-key and scalable synchronous distributed MCMC algorithm via Langevin Monte Carlo within Gibbs. In Marina Meila and Tong Zhang, editors, Proceedings of the 38th International Conference on Machine Learning, volume 139 of Proceedings of Machine Learning Research, pages 8577–8587. PMLR, 2021.
  114. Quasi-stationary monte carlo and the scale algorithm. Journal of the Royal Statistical Society Series B: Statistical Methodology, 82(5):1167–1221, 2020.
  115. Bayesian inference for logistic models using Polya-Gamma latent variables. Journal of the American Statistical Association, 108, 2012.
  116. Exact sampling with coupled Markov chains and applications to statistical mechanics. Random Structures & Algorithms, 9(1-2):223–252, 1996.
  117. Speeding up MCMC by efficient data subsampling. Journal of the American Statistical Association, 114(526):831–843, 2019.
  118. Subsampling MCMC - an introduction for the survey statistician. Sankhya A, 80(1):33–69, 2018.
  119. Speeding up MCMC by delayed acceptance and data subsampling. Journal of Computational and Graphical Statistics, 27(1):12–22, 2018.
  120. The block-Poisson estimator for optimally tuned exact subsampling MCMC. Journal of Computational and Graphical Statistics, 30(4):877–888, 2021.
  121. Variational consensus Monte Carlo. In C. Cortes, N. Lawrence, D. Lee, M. Sugiyama, and R. Garnett, editors, Advances in Neural Information Processing Systems, volume 28, pages 1783–1791. Curran Associates, Inc., 2015.
  122. Computationally efficient inference for latent position network models. arXiv preprint arXiv:1804.02274v3, 2023.
  123. Global consensus Monte Carlo. Journal of Computational and Graphical Statistics, 30(2):249–259, 2021.
  124. The spike-and-slab lasso. Journal of the American Statistical Association, 113:0, 01 2018.
  125. Perturbation theory for Markov chains via Wasserstein distance. Bernoulli, 24:2610–2639, 2018.
  126. Perturbation theory for killed Markov processes and quasi-stationary distributions. arXiv preprint arXiv:2109.13819, 2021.
  127. Incorporating subsampling into Bayesian models for high-dimensional spatial data. arXiv preprint arXiv:2305.13221, 2023.
  128. Spectral subsampling MCMC for stationary time series. In Proceedings of the 37th International Conference on Machine Learning, volume 119, pages 8449–8458. PMLR, 2020.
  129. Large-sample asymptotics of the pseudo-marginal method. Biometrika, 108(1):37–51, 2021.
  130. Bayes and big data: The consensus Monte Carlo algorithm. In Big Data and Information Theory, pages 8–18. Routledge, 2016.
  131. E. Seneta. Numerical Solutions of Markov chains, chapter Sensitivity analysis, ergodicity coefficients, and rank-one updates for finite Markov chains., pages 121–129. Routledge, 1991.
  132. Pseudo-marginal Metropolis–Hastings sampling using averages of unbiased estimators. Biometrika, 104(3):727–734, 2017.
  133. On the efficiency of pseudo-marginal random walk Metropolis algorithms. The Annals of Statistics, 43(1):238 – 275, 2015.
  134. Acceleration via symplectic discretization of high-resolution differential equations. In H. Wallach, H. Larochelle, A. Beygelzimer, F. d'Alché-Buc, E. Fox, and R. Garnett, editors, Advances in Neural Information Processing Systems, volume 32, pages 5744–5752. Curran Associates, Inc., 2019.
  135. Handbook of Approximate Bayesian Computation. CRC press, 2018.
  136. On the origin of implicit regularization in stochastic gradient descent. In International Conference on Learning Representations, 2021.
  137. Scalable Bayes via barycenter in Wasserstein space. Journal of Machine Learning Research, 19(1):312–346, 2018.
  138. David Strauss. On a general class of models for interaction. SIAM Review, 28(4):513–527, 1986.
  139. Posterior consistency for Gaussian process approximations of Bayesian posterior distributions. Mathematics of Computation, 87(310):721–753, 2018.
  140. Expansion of the global error for numerical schemes solving stochastic differential equations. Stochastic Analysis and Applications, 8(4):483–509, 1990.
  141. Inferring coalescence times from DNA sequence data. Genetics, 145(2):505–518, 1997.
  142. GPU-accelerated Gibbs sampling: a case study of the horseshoe probit model. Statistics and Computing, 29, 2019.
  143. The block pseudo-marginal sampler. arXiv preprint arXiv:1603.02485, 2016.
  144. Expectation propagation as a way of life: A framework for Bayesian inference on partitioned data. Journal of Machine Learning Research, 21(1), 2020.
  145. Rapid convergence of the unadjusted Langevin algorithm: Isoperimetry suffices. In H. Wallach, H. Larochelle, A. Beygelzimer, F. d'Alché-Buc, E. Fox, and R. Garnett, editors, Advances in Neural Information Processing Systems, volume 32, pages 8094 – 8106. Curran Associates, Inc., 2019.
  146. Spectral subsampling MCMC for stationary multivariate time series with applications to vector ARTFIMA processes. Econometrics and Statistics, 2022.
  147. Exploration of the (non-)asymptotic bias and variance of stochastic gradient Langevin dynamics. Journal of Machine Learning Research, 17(159):1–48, 2016.
  148. Ulrike von Luxberg. A tutorial on spectral clustering. Statistics and Computing, 17(4):394–416, 2007.
  149. Efficient MCMC sampling with dimension-free convergence rate using ADMM-type splitting. Journal of Machine Learning Research, 23(25):1–69, 2022.
  150. Swiss: A scalable Markov chain Monte Carlo divide-and-conquer strategy. Stat, 12(1):e523, 2023.
  151. Parallelizing MCMC via Weierstrass sampler. arXiv: 1312.4605, 2013.
  152. Bayesian learning via stochastic gradient Langevin dynamics. In Proceedings of the 28th International Conference on International Conference on Machine Learning, volume 110, page 681–688. PMLR, 2011.
  153. Peter Whittle. Estimation and information in stationary time series. Arkiv för matematik, 2(5):423–434, 1953.
  154. Machine learning and the future of Bayesian computation. arXiv preprint arXiv:2304.11251, 2023.
  155. Efficient inference for stochastic differential equation mixed-effects models using correlated particle pseudo-marginal algorithms. Computational Statistics & Data Analysis, 157:107151, 2021.
  156. Adaptive multi-fidelity polynomial chaos approach to Bayesian inference in inverse problems. Journal of Computational Physics, 381:110–128, 2019.
  157. A correlated pseudo-marginal approach to doubly intractable problems. arXiv preprint arXiv:2210.02734, 2022.
  158. A useful variant of the Davis—Kahan theorem for statisticians. Biometrika, 102(2):315–323, 2015.
  159. Q. Zhang and F Liang. Bayesian analysis of exponential random graph models using stochastic gradient Markov chain Monte Carlo. Bayesian Analysis, 2023.
  160. S. Zhang. Existence and application of optimal Markovian coupling with respect to non-negative lower semi-continuous functions. Acta Math Sinica, 16(2):261–270, 2000.
  161. Improved discretization analysis for underdamped Langevin Monte Carlo. In The Thirty Sixth Annual Conference on Learning Theory, pages 36–71. PMLR, 2023.
  162. The benefits of implicit regularization from SGD in least squares problems. In M. Ranzato, A. Beygelzimer, Y. Dauphin, P.S. Liang, and J. Wortman Vaughan, editors, Advances in Neural Information Processing Systems, volume 34, pages 5456–5468. Curran Associates, Inc., 2021.
  163. Stochastic gradient Hamiltonian Monte Carlo methods with recursive variance reduction. In H. Wallach, H. Larochelle, A. Beygelzimer, F. d'Alché-Buc, E. Fox, and R. Garnett, editors, Advances in Neural Information Processing Systems, volume 32, pages 3835–3846. Curran Associates, Inc., 2019.

Summary

  • The paper presents a rigorous framework for quantifying biases in perturbed MCMC methods using metrics like the Wasserstein distance.
  • It demonstrates how algorithmic approximations enable practical MCMC implementations for complex, high-dimensional models.
  • The work bridges theoretical insights with practical applications by outlining perturbation strategies that preserve convergence to target distributions.

Analysis and Implications of Perturbed Markov Chains

The chapter on perturbed Markov chains by Rudolf, Smith, and Quiroz deals with the integral role computational approximations play in Markov Chain Monte Carlo (MCMC) methods, particularly within the constraints of modern computational resources. The text explores how practical needs often necessitate perturbations in the ideal MCMC algorithms, and it explores the theoretical foundation for analyzing these modifications. Perturbations arise from various strategies in approximating otherwise computationally intractable models, involving substituting exact likelihoods with surrogate models or reduced fidelity calculations.

Key Insights into Perturbation Theory

The chapter explores avenues through which perturbation theory helps in measuring the deviation introduced in MCMC processes. Essentially, it elaborates on quantifying bias by contrasting the actual, perturbed MCMC chain against an idealized version. The crux of the analysis lies in the theoretical guardrails set by perturbation theory to ensure that the deviations do not significantly alter the essence of the ideal target distribution.

By addressing non-asymptotically exact algorithms, the authors highlight that practical computational constraints often lead to methods that approximate rather than precisely reproduce the target distribution. The chapter reaffirms that the consistent assumption in MCMC is the eventual convergence of the Markov chain to the target distribution. Yet, in cases of algorithmic perturbation, such convergence must be carefully managed and evaluated for bias.

Core Methodological Contributions

The chapter outlines a taxonomy of perturbed Markov chains, emphasizing situations where these methods prove useful. Specific focal areas include:

  • Approximation of Computationally Intractable Targets: For complex models, such as those defined through the solutions of high-complexity partial differential equations (PDEs), perturbations via methods like subsampling and divide-and-conquer facilitate computation.
  • Algorithm Approximations: Practical implementations often approximate ideal algorithms like Hamiltonian Monte Carlo (HMC) through numerical integrators, inducing perturbations.
  • Implicit Regularization and Tempering: Perturbed methods can lead to advantageous modifications of the target distribution, such as induced regularization effects.

Mathematical Framework and Application

The authors develop a robust mathematical foundation to analyze MCMC perturbations. They employ variations in the Wasserstein distance to provide precise bounds for deviations between the exact and perturbed models. This work also covers classical total variation distances, noting their limitations in capturing the fine-grained distinctions provided by Wasserstein metrics.

The paper details illustrative mathematical results, using simple instances to demonstrate theoretical claims, such as the bounds provided by theorems linking the transition kernels of perturbed and unperturbed chains. An emphasis is placed on the practical utility of these theoretical results by translating them into algorithmic performance insights.

Practical and Theoretical Implications

Perturbations of MCMC algorithms have tangible implications both in practical applications and theoretical development. Practically, these perturbations facilitate the feasibility of MCMC implementations on real-world, high-dimensional problems by reducing computational costs. Theoretically, developing a deep understanding of perturbations enables more informed design and analysis of algorithms, potentially leading to new frameworks that maintain accuracy while optimizing computational resources.

Future Directions and Challenges

The research opens avenues for further exploration into the efficacy of perturbed MCMC methods, inviting inquiries into more adaptive scenarios where perturbations self-correct or adjust dynamically based on runtime analytics. Also, questions remain about the relationships between increasingly sophisticated models for approximating the derivatives of MCMC processes. Lastly, expanding on coupling theories and innovative bounding techniques could enrich the understanding of perturbed MCMC dynamics in emerging AI applications.

In summation, the chapter firmly establishes the necessity and utility of perturbation theory in the effective application of MCMC methods, emphasizing both the nuanced distinctions and high-impact insights this approach brings to solving contemporary stochastic problems. The allowance for approximations without significant sacrifices in accuracy or convergence delineates an essential landscape for researchers to further probe and refine MCMC methodologies.