Scalability of Metropolis-within-Gibbs schemes for high-dimensional Bayesian models (2403.09416v1)
Abstract: We study general coordinate-wise MCMC schemes (such as Metropolis-within-Gibbs samplers), which are commonly used to fit Bayesian non-conjugate hierarchical models. We relate their convergence properties to the ones of the corresponding (potentially not implementable) Gibbs sampler through the notion of conditional conductance. This allows us to study the performances of popular Metropolis-within-Gibbs schemes for non-conjugate hierarchical models, in high-dimensional regimes where both number of datapoints and parameters increase. Given random data-generating assumptions, we establish dimension-free convergence results, which are in close accordance with numerical evidences. Applications to Bayesian models for binary regression with unknown hyperparameters and discretely observed diffusions are also discussed. Motivated by such statistical applications, auxiliary results of independent interest on approximate conductances and perturbation of Markov operators are provided.
- Amit, Y. (1996). Convergence properties of the Gibbs sampler for perturbations of Gaussians. The Annals of Statistics 24(1), 122–140.
- Explicit convergence bounds for Metropolis Markov chains: isoperimetry, spectral gaps and profiles. arXiv preprint arXiv:2211.08959.
- Dimension-free mixing times of Gibbs samplers for Bayesian hierarchical models. Ann. Statist. In press.
- On the computational complexity of MCMC-based estimators in large samples.
- Besag, J. and P. J. Green (1993). Spatial statistics and Bayesian computation. Journal of the Royal Statistical Society Series B: Statistical Methodology 55(1), 25–37.
- Retrospective exact simulation of diffusion sample paths with applications. Bernoulli 12(6), 1077–1098.
- Optimal tuning of the hybrid Monte Carlo algorithm. Bernoulli 19, 1501–1534.
- Estimating convergence of Markov chains with L-lag couplings. Advances in Neural Information Processing Systems 32.
- Isoperimetric constants for product probability measures. The Annals of Probability, 184–205.
- Handbook of Markov Chain Monte Carlo. Chapman and Hall.
- A calculus for Markov chain Monte Carlo: studying approximations in algorithms. arXiv preprint arXiv:2310.03853.
- Casella, G. and E. I. George (1992). Explaining the Gibbs Sampler. Am. Stat. 46, 167–174.
- Solidarity of Gibbs Samplers: the spectral gap. arXiv preprint arXiv:2304.02109.
- Dalalyan, A. S. (2017). Theoretical Guarantees for Approximate Sampling from Smooth and Log-Concave Densities. J. R. Stat. Soc. Ser. B. 79, 651–676.
- Gibbs Sampling, Exponential Families and Orthogonal Polynomials. Stat. Sci. 23, 151–178.
- Stochastic alternating projections. Illinois Journal of Mathematics 54(3), 963–979.
- Nonasymptotic convergence analysis for the unadjusted Langevin algorithm. Ann. Appl. Probab. 27, 1551–1587.
- Log–concave sampling: Metropolis–Hastings algorithms are fast! J. Mach. Learn. Res. 20, 1–42.
- mcmcse: Monte Carlo Standard Errors for MCMC. R package.
- Efficient parametrisations for normal linear mixed models. Biometrika 82(3), 479–488.
- Gelfand, A. E. and A. F. Smith (1990). Sampling-based approaches to calculating marginal densities. Journal of the American statistical association 85(410), 398–409.
- Bayesian Data Analysis. CRC press.
- Gelman, A. and J. L. Hill (2007). Data Analysis Using Regression and Multilevel/Hierarchical Models. Cambridge University Press.
- Adaptive Rejection Sampling for Gibbs Sampling. J. R. Stat. Soc. Ser. C 41, 337–348.
- Gong, L. and J. M. Flegal (2015). A Practical Sequential Stopping Rule for High-Dimensional Markov Chain Monte Carlo. J. Comput. Graph. Stat. 25, 684–700.
- Bayesian computation: a summary of the current state, and samples backwards and forwards. Stat. Comput. 25, 835–862.
- The No-U-Turn sampler: adaptively setting path lengths in Hamiltonian Monte Carlo. J. Mach. Learn. Res. 15(1), 1593–1623.
- Geometric ergodicity of Metropolis algorithms. Stochastic processes and their applications 85(2), 341–361.
- Jin, Z. and J. P. Hobert (2022). Dimension free convergence rates for Gibbs samplers for Bayesian linear mixed models. Stoch. Process. Their Appl. 148, 25–67.
- Component-wise Markov chain Monte Carlo: Uniform and geometric ergodicity under mixing and composition.
- Convergence of conditional Metropolis-Hastings samplers. Advances in Applied Probability 46(2), 422–445.
- Kamatani, K. (2014a). Local consistency of Markov chain Monte Carlo methods. Ann. Inst. Stat. Math. 66, 63–74.
- Kamatani, K. (2014b). Local consistency of Markov chain Monte Carlo methods. Annals of the Institute of Statistical Mathematics 66(1), 63–74.
- Ancillarity-sufficiency interweaving strategy (ASIS) for boosting MCMC estimation of stochastic volatility models. Computational Statistics & Data Analysis 76, 408–423.
- Khare, K. and J. P. Hobert (2011). A spectral analytic comparison of trace-class data augmentation algorithms and their sandwich variants. The Annals of Statistics 39(5), 2585–2606.
- Rates of convergence of some multivariate Markov chains with polynomial eigenfunctions. Ann. Appl. Probab. 2, 737–777.
- Markov chains and mixing times, Volume 107. American Mathematical Soc.
- The Barker proposal: combining robustness and efficiency in gradient-based MCMC. Journal of the Royal Statistical Society Series B: Statistical Methodology 84(2), 496–523.
- Random Walks in a Convex Body and an Improved Volume Algorithm. Random Struct. and Alg. 4, 359–412.
- Markov chain decomposition for convergence rate analysis. Annals of Applied Probability, 581–606.
- Computing Bayes: From Then ‘Til Now. Stat. Sci. In press.
- Neath, R. C. and G. L. Jones (2009). Variable-at-a-time implementations of Metropolis-Hastings. arXiv preprint arXiv:0903.0664.
- Statistical inference with stochastic gradient algorithms. arXiv preprint arXiv 2207.
- On polynomial-time computation of high-dimensional posterior measures by Langevin-type algorithms. Journal of the European Mathematical Society.
- Scalable inference for crossed random effects models. Biometrika 107, 25–40.
- Non-Centered Parameterizations for Hierarchical Models and Data Augmentation (with discussion). In Bayesian Statistics (J. M. Bernardo, M. J. Bayarri, J. O. Berger, A. P. Dawid, D. Heckerman, A. F. M. Smith and M. West, eds.), pp. 307–326.
- Data augmentation for diffusions. Journal of Computational and Graphical Statistics 22(3), 665–688.
- A General Framework for the Parametrization of Hierarchical Models. Statistical Science, 59–73.
- Scalable computation for Bayesian hierarchical models. arXiv preprint arXiv:2103.10875.
- Polson, N. G. and G. O. Roberts (1994). Bayes factors for discrete observations from diffusion processes. Biometrika 81(1), 11–26.
- Qin, Q. and J. P. Hobert (2019). Convergence complexity analysis of Albert and Chib’s algorithm for Bayesian probit regression. Ann. Statist. 47, 2320–2347.
- Qin, Q. and J. P. Hobert (2022). Wasserstein-based methods for convergence complexity analysis of MCMC with applications. Ann. Appl. Prob. 32, 124–166.
- Qin, Q. and G. L. Jones (2022). Convergence rates of two-component MCMC samplers. Bernoulli 28(2), 859–885.
- Spectral gap bounds for reversible hybrid Gibbs chains. arXiv preprint arXiv:2312.12782.
- Spectral Telescope: Convergence Rate Bounds for Random-Scan Gibbs Samplers Based on a Hierarchical Structure. arXiv preprint arXiv:2208.11299.
- Geometric ergodicity and hybrid Markov chains.
- Roberts, G. O. and J. S. Rosenthal (1998). Optimal scaling of discrete approximations to Langevin diffusions. J. R. Stat. Soc. Ser. B 60, 255–268.
- Roberts, G. O. and J. S. Rosenthal (2001). Markov Chains and De-Initializing Processes. Scand. J. Stat. 28, 489–504.
- Roberts, G. O. and S. H. Sahu (1997). Updating Schemes, Correlation Structure, Blocking and Parameterization for the Gibbs Sampler. J. R. Stat. Soc. Ser. B 59, 291–317.
- Roberts, G. O. and S. K. Sahu (2001). Approximate predetermined convergence properties of the Gibbs sampler. Journal of Computational and Graphical Statistics 10(2), 216–229.
- On inference for partially observed nonlinear diffusion models using the Metropolis–Hastings algorithm. Biometrika 88(3), 603–621.
- Rosenthal, J. S. (1995). Minorization Conditions and Convergence Rates for Markov Chain Monte Carlo. J. Am. Stat. Assoc 90, 558–566.
- Smith, A. F. and G. O. Roberts (1993). Bayesian computation via the Gibbs sampler and related Markov chain Monte Carlo methods. Journal of the Royal Statistical Society: Series B (Methodological) 55(1), 3–23.
- Stan Development Team (2024). RStan: the R interface to Stan. R package version 2.32.5.
- Computational Complexity of Metropolis-Adjusted Langevin Algorithms for Bayesian Posterior Sampling. arXiv preprint arXiv:2206.06491.
- On the Computational Complexity of Metropolis-Adjusted Langevin Algorithms for Bayesian Posterior Sampling. arXiv preprint arXiv:2206.06491.
- MALA-within-Gibbs samplers for high-dimensional distributions with sparse conditional structure. SIAM Journal on Scientific Computing 42(3), A1765–A1788.
- Van der Vaart, A. W. (2000). Asymptotic Statistics. Cambridge University Press.
- Minimax Mixing Time of the Metropolis-Adjusted Langevin Algorithm for Log-Concave Sampling. J. Mach. Learn. Res. 23, 1–63.
- Yang, J. and J. S. Rosenthal (2022). Complexity results for MCMC derived from quantitative bounds. Ann. Appl. Prob. 33, 1459–1500.
- Yu, Y. and X. L. Meng (2011). To center or not to center: That is not the question: an Ancillarity–Sufficiency Interweaving Strategy (ASIS) for boosting MCMC efficiency. Journal of Computational and Graphical Statistics 20(3), 531–570.