Dimension-free Relaxation Times of Informed MCMC Samplers on Discrete Spaces (2404.03867v1)
Abstract: Convergence analysis of Markov chain Monte Carlo methods in high-dimensional statistical applications is increasingly recognized. In this paper, we develop general mixing time bounds for Metropolis-Hastings algorithms on discrete spaces by building upon and refining some recent theoretical advancements in Bayesian model selection problems. We establish sufficient conditions for a class of informed Metropolis-Hastings algorithms to attain relaxation times that are independent of the problem dimension. These conditions are grounded in high-dimensional statistical theory and allow for possibly multimodal posterior distributions. We obtain our results through two independent techniques: the multicommodity flow method and single-element drift condition analysis; we find that the latter yields a tighter mixing time bound. Our results and proof techniques are readily applicable to a broad spectrum of statistical problems with discrete parameter spaces.
- VS Anil Kumar and Hariharan Ramesh. Coupling vs. conductance for the Jerrum–Sinclair chain. Random Structures & Algorithms, 18(1):1–17, 2001.
- Yves F Atchadé. Approximate spectral gaps for Markov chain mixing times in high dimensions. SIAM Journal on Mathematics of Data Science, 3(3):854–872, 2021.
- On the computational complexity of MCMC-based estimators in large samples. The Annals of Statistics, 37(4):2011–2055, 2009.
- Explicit constraints on the geometric rate of convergence of random walk Metropolis-Hastings. arXiv preprint arXiv:2307.11644, 2023.
- Uncertainty quantification for Bayesian CART. The Annals of Statistics, 49(6):3482–3509, 2021.
- Rapidly mixing multiple-try Metropolis algorithms for model selection problems. Advances in Neural Information Processing Systems, 35:25842–25855, 2022.
- Order-based structure learning without score equivalence. Biometrika, 2023. doi: https://doi.org/10.1093/biomet/asad052.
- Underdamped Langevin MCMC: a non-asymptotic analysis. In Conference on learning theory, pages 300–323. PMLR, 2018.
- Optimal dimension dependence of the Metropolis-adjusted Langevin algorithm. In Conference on Learning Theory, pages 1260–1300. PMLR, 2021.
- Markov chain Monte Carlo convergence diagnostics: a comparative review. Journal of the American statistical Association, 91(434):883–904, 1996.
- Arnak S Dalalyan. Theoretical guarantees for approximate sampling from smooth and log-concave densities. Journal of the Royal Statistical Society Series B: Statistical Methodology, 79(3):651–676, 2017.
- Generating a random permutation with random transpositions. Zeitschrift für Wahrscheinlichkeitstheorie und verwandte Gebiete, 57(2):159–179, 1981.
- David L Donoho. CART and best-ortho-basis: a connection. The Annals of Statistics, 25(5):1870–1911, 1997.
- Hybrid Monte Carlo. Physics letters B, 195(2):216–222, 1987.
- High-dimensional Bayesian inference via the unadjusted Langevin algorithm. Bernoulli, 25:2854–2882, 2019.
- Log-concave sampling: Metropolis-Hastings algorithms are fast. Journal of Machine Learning Research, 20(183):1–42, 2019.
- On the geometric ergodicity of hybrid samplers. Journal of Applied Probability, 40(1):123–146, 2003.
- Improving multiple-try Metropolis with local balancing. Journal of Machine Learning Research, 24(248):1–59, 2023.
- Inference from iterative simulation using multiple sequences. Statistical Science, 7(4):457–472, 1992.
- Approaches for Bayesian variable selection. Statistica Sinica, pages 339–373, 1997.
- Lectures on probability theory and statistics: École d’été de Probabilités de Saint-Flour XXVI-1996. Springer, 1996.
- Riemann manifold Langevin and Hamiltonian Monte Carlo methods. Journal of the Royal Statistical Society Series B: Statistical Methodology, 73(2):123–214, 2011.
- A practical sequential stopping rule for high-dimensional Markov chain Monte Carlo. Journal of Computational and Graphical Statistics, 25(3):684–700, 2016.
- Oops I took a gradient: scalable sampling for discrete distributions. In International Conference on Machine Learning, pages 3831–3841. PMLR, 2021.
- In search of lost mixing time: adaptive Markov chain Monte Carlo schemes for Bayesian variable selection with very large p𝑝pitalic_p. Biometrika, 108(1):53–69, 2021.
- Small-world MCMC and convergence to multi-modal distributions: from slow mixing to fast mixing. The Annals of Applied Probability, 17(1):284–304, 2007.
- Bayesian variable selection regression for genome-wide association studies and other large-scale problems. The Annals of Applied Statistics, 5(3):1780, 2011.
- Venkatesan Guruswami. Rapidly mixing Markov chains: A comparison of techniques. arXiv preprint arXiv:1603.01512, 2000.
- W Keith Hastings. Monte Carlo sampling methods using Markov chains and their applications. Biometrika, 57(1):97–109, 04 1970.
- Geometric ergodicity of Metropolis algorithms. Stochastic processes and their applications, 85(2):341–361, 2000.
- Daniel Jerison. The drift and minorization method for reversible Markov chains. Stanford University, 2016.
- The Markov chain Monte Carlo method: an approach to approximate counting and integration. Approximation algorithms for NP-hard problems, pages 482–520, 1996.
- Elementary bounds on Poincaré and log-Sobolev constants for decomposable Markov chains. The Annals of Applied Probability, pages 1741–1765, 2004.
- Scalable approximate MCMC algorithms for the horseshoe prior. Journal of Machine Learning Research, 21(73):1–61, 2020.
- On mixing rates for Bayesian CART. arXiv preprint arXiv:2306.00126, 2023.
- Transdimensional sampling algorithms for Bayesian variable selection in classification problems with many more variables than observations. Journal of Computational and Graphical Statistics, 18(3):592–612, 2009.
- Extended stochastic block models with application to criminal networks. The Annals of Applied Statistics, 16(4):2369, 2022.
- Markov chains and mixing times, volume 107. American Mathematical Society, 2017.
- Importance is important: a guide to informed importance tempering methods. arXiv preprint arXiv:2304.06251, 2023.
- Adaptive random neighbourhood informed Markov chain Monte Carlo for high-dimensional Bayesian variable selection. Statistics and Computing, 32(5):84, 2022.
- The multiple-try method and local optimization in Metropolis sampling. Journal of the American Statistical Association, 95(449):121–134, 2000.
- Bayesian graphical models for discrete data. International Statistical Review/Revue Internationale de Statistique, pages 215–232, 1995.
- Markov chain decomposition for convergence rate analysis. The Annals of Applied Probability, pages 581–606, 2002.
- Antonietta Mira. Ordering and improving the performance of Monte Carlo Markov chains. Statistical Science, pages 340–350, 2001.
- Bayesian variable selection with shrinking and diffusing priors. The Annals of Statistics, 42(2):789–817, 2014.
- Mixing times are hitting times of large sets. Journal of Theoretical Probability, 28(2):488–519, 2015.
- Rapid mixing of a Markov chain for an exponentially weighted aggregation estimator. arXiv preprint arXiv:1909.11773, 2019.
- Monte Carlo statistical methods, volume 2. Springer, 1999.
- Langevin diffusions and Metropolis-Hastings algorithms. Methodology and computing in applied probability, 4:337–357, 2002.
- Geometric convergence and central limit theorems for multidimensional Hastings and Metropolis algorithms. Biometrika, 83(1):95–110, 1996.
- Robert W Robinson. Counting unlabeled acyclic digraphs. In Combinatorial Mathematics V: Proceedings of the Fifth Australian Conference, Held at the Royal Melbourne Institute of Technology, August 24–26, 1976, pages 28–43. Springer, 1977.
- Jeffrey S Rosenthal. Minorization conditions and convergence rates for Markov chain Monte Carlo. Journal of the American Statistical Association, 90(430):558–566, 1995.
- Jeffrey S Rosenthal. Optimal proposal distributions and adaptive MCMC. In Handbook of Bayesian Variable Selection, pages 93–111. Chapman & Hall/CRC Boca Raton, FL, 2011.
- Convergence rates and asymptotic standard errors for Markov chain Monte Carlo algorithms for Bayesian probit regression. Journal of the Royal Statistical Society Series B: Statistical Methodology, 69(4):607–623, 2007.
- Alistair Sinclair. Improved bounds for mixing rates of Markov chains and multicommodity flow. Combinatorics, Probability and Computing, 1(4):351–370, 1992.
- M.G. Tadesse and M. Vannucci. Handbook of Bayesian variable selection. CRC Press, 2021.
- On the computational complexity of Metropolis-adjusted Langevin algorithms for Bayesian posterior sampling. arXiv preprint arXiv:2206.06491, 2022.
- The Hamming ball sampler. Journal of the American Statistical Association, 112(520):1598–1611, 2017.
- Conditions for rapid mixing of parallel and simulated tempering on multimodal distributions. The Annals of Applied Probability, 19(2):617–640, 2009.
- On the computational complexity of high-dimensional Bayesian variable selection. The Annals of Statistics, 44(6):2497–2532, 2016.
- Giacomo Zanella. Informed proposals for local MCMC in discrete spaces. Journal of the American Statistical Association, 115(530):852–865, 2020.
- Scalable importance tempering and Bayesian variable selection. Journal of the Royal Statistical Society Series B: Statistical Methodology, 81(3):489–517, 2019.
- A Langevin-like sampler for discrete distributions. In International Conference on Machine Learning, pages 26375–26396. PMLR, 2022.
- Complexity analysis of Bayesian learning of high-dimensional DAG models and their equivalence classes. The Annals of Statistics, 51(3):1058–1085, 2023.
- Fast model-fitting of Bayesian variable selection regression using the iterative complex factorization algorithm. Bayesian analysis, 14(2):573, 2019.
- Rapid convergence of informed importance tempering. In International Conference on Artificial Intelligence and Statistics, pages 10939–10965. PMLR, 2022.
- Dimension-free mixing for high-dimensional Bayesian variable selection. Journal of the Royal Statistical Society Series B: Statistical Methodology, 84(5):1751–1784, 2022.
- Shuheng Zhou. Thresholded Lasso for high dimensional variable selection and statistical estimation. arXiv preprint arXiv:1002.1583, 2010.
- Mixing time of Metropolis-Hastings for Bayesian community detection. Journal of Machine Learning Research, 22:10–1, 2021.