Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
167 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Dimension-free Relaxation Times of Informed MCMC Samplers on Discrete Spaces (2404.03867v1)

Published 5 Apr 2024 in stat.CO, math.PR, and stat.ML

Abstract: Convergence analysis of Markov chain Monte Carlo methods in high-dimensional statistical applications is increasingly recognized. In this paper, we develop general mixing time bounds for Metropolis-Hastings algorithms on discrete spaces by building upon and refining some recent theoretical advancements in Bayesian model selection problems. We establish sufficient conditions for a class of informed Metropolis-Hastings algorithms to attain relaxation times that are independent of the problem dimension. These conditions are grounded in high-dimensional statistical theory and allow for possibly multimodal posterior distributions. We obtain our results through two independent techniques: the multicommodity flow method and single-element drift condition analysis; we find that the latter yields a tighter mixing time bound. Our results and proof techniques are readily applicable to a broad spectrum of statistical problems with discrete parameter spaces.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (69)
  1. VS Anil Kumar and Hariharan Ramesh. Coupling vs. conductance for the Jerrum–Sinclair chain. Random Structures & Algorithms, 18(1):1–17, 2001.
  2. Yves F Atchadé. Approximate spectral gaps for Markov chain mixing times in high dimensions. SIAM Journal on Mathematics of Data Science, 3(3):854–872, 2021.
  3. On the computational complexity of MCMC-based estimators in large samples. The Annals of Statistics, 37(4):2011–2055, 2009.
  4. Explicit constraints on the geometric rate of convergence of random walk Metropolis-Hastings. arXiv preprint arXiv:2307.11644, 2023.
  5. Uncertainty quantification for Bayesian CART. The Annals of Statistics, 49(6):3482–3509, 2021.
  6. Rapidly mixing multiple-try Metropolis algorithms for model selection problems. Advances in Neural Information Processing Systems, 35:25842–25855, 2022.
  7. Order-based structure learning without score equivalence. Biometrika, 2023. doi: https://doi.org/10.1093/biomet/asad052.
  8. Underdamped Langevin MCMC: a non-asymptotic analysis. In Conference on learning theory, pages 300–323. PMLR, 2018.
  9. Optimal dimension dependence of the Metropolis-adjusted Langevin algorithm. In Conference on Learning Theory, pages 1260–1300. PMLR, 2021.
  10. Markov chain Monte Carlo convergence diagnostics: a comparative review. Journal of the American statistical Association, 91(434):883–904, 1996.
  11. Arnak S Dalalyan. Theoretical guarantees for approximate sampling from smooth and log-concave densities. Journal of the Royal Statistical Society Series B: Statistical Methodology, 79(3):651–676, 2017.
  12. Generating a random permutation with random transpositions. Zeitschrift für Wahrscheinlichkeitstheorie und verwandte Gebiete, 57(2):159–179, 1981.
  13. David L Donoho. CART and best-ortho-basis: a connection. The Annals of Statistics, 25(5):1870–1911, 1997.
  14. Hybrid Monte Carlo. Physics letters B, 195(2):216–222, 1987.
  15. High-dimensional Bayesian inference via the unadjusted Langevin algorithm. Bernoulli, 25:2854–2882, 2019.
  16. Log-concave sampling: Metropolis-Hastings algorithms are fast. Journal of Machine Learning Research, 20(183):1–42, 2019.
  17. On the geometric ergodicity of hybrid samplers. Journal of Applied Probability, 40(1):123–146, 2003.
  18. Improving multiple-try Metropolis with local balancing. Journal of Machine Learning Research, 24(248):1–59, 2023.
  19. Inference from iterative simulation using multiple sequences. Statistical Science, 7(4):457–472, 1992.
  20. Approaches for Bayesian variable selection. Statistica Sinica, pages 339–373, 1997.
  21. Lectures on probability theory and statistics: École d’été de Probabilités de Saint-Flour XXVI-1996. Springer, 1996.
  22. Riemann manifold Langevin and Hamiltonian Monte Carlo methods. Journal of the Royal Statistical Society Series B: Statistical Methodology, 73(2):123–214, 2011.
  23. A practical sequential stopping rule for high-dimensional Markov chain Monte Carlo. Journal of Computational and Graphical Statistics, 25(3):684–700, 2016.
  24. Oops I took a gradient: scalable sampling for discrete distributions. In International Conference on Machine Learning, pages 3831–3841. PMLR, 2021.
  25. In search of lost mixing time: adaptive Markov chain Monte Carlo schemes for Bayesian variable selection with very large p𝑝pitalic_p. Biometrika, 108(1):53–69, 2021.
  26. Small-world MCMC and convergence to multi-modal distributions: from slow mixing to fast mixing. The Annals of Applied Probability, 17(1):284–304, 2007.
  27. Bayesian variable selection regression for genome-wide association studies and other large-scale problems. The Annals of Applied Statistics, 5(3):1780, 2011.
  28. Venkatesan Guruswami. Rapidly mixing Markov chains: A comparison of techniques. arXiv preprint arXiv:1603.01512, 2000.
  29. W Keith Hastings. Monte Carlo sampling methods using Markov chains and their applications. Biometrika, 57(1):97–109, 04 1970.
  30. Geometric ergodicity of Metropolis algorithms. Stochastic processes and their applications, 85(2):341–361, 2000.
  31. Daniel Jerison. The drift and minorization method for reversible Markov chains. Stanford University, 2016.
  32. The Markov chain Monte Carlo method: an approach to approximate counting and integration. Approximation algorithms for NP-hard problems, pages 482–520, 1996.
  33. Elementary bounds on Poincaré and log-Sobolev constants for decomposable Markov chains. The Annals of Applied Probability, pages 1741–1765, 2004.
  34. Scalable approximate MCMC algorithms for the horseshoe prior. Journal of Machine Learning Research, 21(73):1–61, 2020.
  35. On mixing rates for Bayesian CART. arXiv preprint arXiv:2306.00126, 2023.
  36. Transdimensional sampling algorithms for Bayesian variable selection in classification problems with many more variables than observations. Journal of Computational and Graphical Statistics, 18(3):592–612, 2009.
  37. Extended stochastic block models with application to criminal networks. The Annals of Applied Statistics, 16(4):2369, 2022.
  38. Markov chains and mixing times, volume 107. American Mathematical Society, 2017.
  39. Importance is important: a guide to informed importance tempering methods. arXiv preprint arXiv:2304.06251, 2023.
  40. Adaptive random neighbourhood informed Markov chain Monte Carlo for high-dimensional Bayesian variable selection. Statistics and Computing, 32(5):84, 2022.
  41. The multiple-try method and local optimization in Metropolis sampling. Journal of the American Statistical Association, 95(449):121–134, 2000.
  42. Bayesian graphical models for discrete data. International Statistical Review/Revue Internationale de Statistique, pages 215–232, 1995.
  43. Markov chain decomposition for convergence rate analysis. The Annals of Applied Probability, pages 581–606, 2002.
  44. Antonietta Mira. Ordering and improving the performance of Monte Carlo Markov chains. Statistical Science, pages 340–350, 2001.
  45. Bayesian variable selection with shrinking and diffusing priors. The Annals of Statistics, 42(2):789–817, 2014.
  46. Mixing times are hitting times of large sets. Journal of Theoretical Probability, 28(2):488–519, 2015.
  47. Rapid mixing of a Markov chain for an exponentially weighted aggregation estimator. arXiv preprint arXiv:1909.11773, 2019.
  48. Monte Carlo statistical methods, volume 2. Springer, 1999.
  49. Langevin diffusions and Metropolis-Hastings algorithms. Methodology and computing in applied probability, 4:337–357, 2002.
  50. Geometric convergence and central limit theorems for multidimensional Hastings and Metropolis algorithms. Biometrika, 83(1):95–110, 1996.
  51. Robert W Robinson. Counting unlabeled acyclic digraphs. In Combinatorial Mathematics V: Proceedings of the Fifth Australian Conference, Held at the Royal Melbourne Institute of Technology, August 24–26, 1976, pages 28–43. Springer, 1977.
  52. Jeffrey S Rosenthal. Minorization conditions and convergence rates for Markov chain Monte Carlo. Journal of the American Statistical Association, 90(430):558–566, 1995.
  53. Jeffrey S Rosenthal. Optimal proposal distributions and adaptive MCMC. In Handbook of Bayesian Variable Selection, pages 93–111. Chapman & Hall/CRC Boca Raton, FL, 2011.
  54. Convergence rates and asymptotic standard errors for Markov chain Monte Carlo algorithms for Bayesian probit regression. Journal of the Royal Statistical Society Series B: Statistical Methodology, 69(4):607–623, 2007.
  55. Alistair Sinclair. Improved bounds for mixing rates of Markov chains and multicommodity flow. Combinatorics, Probability and Computing, 1(4):351–370, 1992.
  56. M.G. Tadesse and M. Vannucci. Handbook of Bayesian variable selection. CRC Press, 2021.
  57. On the computational complexity of Metropolis-adjusted Langevin algorithms for Bayesian posterior sampling. arXiv preprint arXiv:2206.06491, 2022.
  58. The Hamming ball sampler. Journal of the American Statistical Association, 112(520):1598–1611, 2017.
  59. Conditions for rapid mixing of parallel and simulated tempering on multimodal distributions. The Annals of Applied Probability, 19(2):617–640, 2009.
  60. On the computational complexity of high-dimensional Bayesian variable selection. The Annals of Statistics, 44(6):2497–2532, 2016.
  61. Giacomo Zanella. Informed proposals for local MCMC in discrete spaces. Journal of the American Statistical Association, 115(530):852–865, 2020.
  62. Scalable importance tempering and Bayesian variable selection. Journal of the Royal Statistical Society Series B: Statistical Methodology, 81(3):489–517, 2019.
  63. A Langevin-like sampler for discrete distributions. In International Conference on Machine Learning, pages 26375–26396. PMLR, 2022.
  64. Complexity analysis of Bayesian learning of high-dimensional DAG models and their equivalence classes. The Annals of Statistics, 51(3):1058–1085, 2023.
  65. Fast model-fitting of Bayesian variable selection regression using the iterative complex factorization algorithm. Bayesian analysis, 14(2):573, 2019.
  66. Rapid convergence of informed importance tempering. In International Conference on Artificial Intelligence and Statistics, pages 10939–10965. PMLR, 2022.
  67. Dimension-free mixing for high-dimensional Bayesian variable selection. Journal of the Royal Statistical Society Series B: Statistical Methodology, 84(5):1751–1784, 2022.
  68. Shuheng Zhou. Thresholded Lasso for high dimensional variable selection and statistical estimation. arXiv preprint arXiv:1002.1583, 2010.
  69. Mixing time of Metropolis-Hastings for Bayesian community detection. Journal of Machine Learning Research, 22:10–1, 2021.
Citations (3)

Summary

We haven't generated a summary for this paper yet.