Papers

Topics

Authors

Recent

View all

Gemini 2.5 Flash

156 tokens/sec

GPT-4o

7 tokens/sec

Gemini 2.5 Pro Pro

45 tokens/sec

o3 Pro

4 tokens/sec

GPT-4.1 Pro

38 tokens/sec

DeepSeek R1 via Azure Pro

28 tokens/sec

2000 character limit reached

Scalability of Metropolis-within-Gibbs schemes for high-dimensional Bayesian models (2403.09416v1)

Published 14 Mar 2024 in stat.CO, math.ST, stat.ML, and stat.TH

Abstract: We study general coordinate-wise MCMC schemes (such as Metropolis-within-Gibbs samplers), which are commonly used to fit Bayesian non-conjugate hierarchical models. We relate their convergence properties to the ones of the corresponding (potentially not implementable) Gibbs sampler through the notion of conditional conductance. This allows us to study the performances of popular Metropolis-within-Gibbs schemes for non-conjugate hierarchical models, in high-dimensional regimes where both number of datapoints and parameters increase. Given random data-generating assumptions, we establish dimension-free convergence results, which are in close accordance with numerical evidences. Applications to Bayesian models for binary regression with unknown hyperparameters and discretely observed diffusions are also discussed. Motivated by such statistical applications, auxiliary results of independent interest on approximate conductances and perturbation of Markov operators are provided.

References (71)

Citations (4)

View on Semantic Scholar

Summary

The paper establishes dimension-free convergence of MwG schemes in high-dimensional Bayesian models using conditional conductance analysis.
It derives robust theoretical bounds that link MwG performance to Gibbs samplers in non-conjugate hierarchical settings.
Applications to Bayesian binary regression and discretely observed diffusions confirm the practical efficiency of coordinate-wise MCMC methods.

Scalability of Metropolis-within-Gibbs Schemes for High-Dimensional Bayesian Models

Abstract Overview

The paper "Scalability of Metropolis-within-Gibbs schemes for high-dimensional Bayesian models" investigates coordinate-wise Markov Chain Monte Carlo (MCMC) schemes, specifically Metropolis-within-Gibbs (MwG) samplers used for Bayesian non-conjugate hierarchical models. The authors analyze how the convergence properties of these MwG schemes relate to those of the corresponding Gibbs samplers (GS) via the notion of conditional conductance. The work establishes dimension-free convergence results under high-dimensional conditions and discusses applications to Bayesian binary regression models with unknown hyperparameters and discretely observed diffusions.

Introduction and Motivating Example

Coordinate-wise MCMC methods, such as GS and MwG, play a pivotal role in Bayesian inference for complex structured data models. These models can include spatial, hierarchical, or temporal dependencies. While GS are well-understood theoretically, their application is limited to cases with specialized conditional conjugacy properties. Consequently, general coordinate-wise samplers like MwG are essential but remain comparatively under-studied.

A motivating example used is the hierarchical logistic model, illustrating how empirical performance can validate theoretical insights. In high-dimensional regimes where the number of datapoints and parameters simultaneously increase, coordinate-wise schemes continue to show strong performance attributes.

Scalability of Coordinate-wise MCMC

Conditional Conductance and Theoretical Results

Understanding the performance of coordinate-wise MCMC schemes necessitates relating them to GS through the concept of conditional conductance. The conditional conductance measures how much an invariant update, like MhG, approximates an exact sample from the conditional distribution. The authors provide bounds on the conductance of generic coordinate-wise schemes with a particular focus on non-conjugate hierarchical Bayesian models:

$\Phi_s(P) \geq \kappa(P, \sX)\Phi_{s}(G) - \frac{\pi(K^c)}{s}\left(\frac{1}{d}\sum_{i = 1}^d\kappa_i(P_i,K)\right)$

where $\Phi_s(P)$ denotes the s-conductance of the coordinate-wise scheme $P$ and $\kappa(P, K)$ reflects the minimum conditional conductance over the set $K$ . This provides a powerful framework for evaluating the convergence properties of coordinate-wise schemes relative to GS, with the bound remaining informative even when controlling conditional conductance on a subset of the state space $K$ .

Statistical Applications and Auxiliary Results

Bayesian Hierarchical Models

For Bayesian hierarchical models, the paper applies the developed theory to establish dimension-free convergence properties. The main contributions include the identification of conditions under which MwG schemes exhibit desirable computational and mixing properties in high-dimensional settings. For instance, the authors prove that MwG schemes show dimension-free behavior of the total variation mixing times for hierarchical models:

$\pi_J(\d\psi, \d \bm{\theta}) = \mathcal{L}(\psi, \bm{\theta} \mid Y_{1:J})$

where $\pi_J$ denotes the posterior distribution for the entire dataset. Under certain conditions, the results imply that the mixing times remain uniformly bounded, effectively illustrating the robustness of coordinate-wise samplers in complex models.

Conductance Bounds for Specific Schemes

The paper also considers specific forms of MwG, including Independent Metropolis-Hastings (IMH) and random-walk Metropolis (RWM) updates within Gibbs. For IMH updates, given a uniform upper bound on the Radon-Nykodym derivative of the proposal distribution, the authors show that the conductance of MwG relates closely to the one of the GS. Similarly, for RWM within Gibbs, the conductance bound is derived under strong log-concavity assumptions of the target distribution, demonstrating that these schemes are not excessively slowed down in high dimensions.

$\kappa(P_i^{\x_{-i}}) \geq \frac{1}{M}$

Implications and Future Directions

The findings have substantial practical implications, especially in terms of computational efficiency for high-dimensional Bayesian inference. The use of MwG schemes offers a scalable alternative when GS may not be feasible due to the lack of conditional conjugacy. Moreover, applications such as Bayesian binary regression with unknown hyperparameters and data augmentation schemes for diffusions underline the broad applicability of the developed theory.

The work suggests that future research could explore extending these results to other high-dimensional structured Bayesian models, potentially bridging the gap between theory and practical implementation in MCMC. Moreover, the techniques employed might inspire further studies focusing on mixed update strategies and their theoretical guarantees under various model assumptions.

Conclusion

This paper makes significant strides in understanding the scalability and efficiency of MwG schemes for high-dimensional Bayesian models. By establishing dimension-free convergence properties and providing practical applications, the authors contribute valuable insights that enhance both the theoretical foundations and practical utility of coordinate-wise MCMC methods, paving the way for faster and more reliable Bayesian inference in complex models.

PDF Markdown

Tweets

https://twitter.com/sp_monte_carlo/status/1768627329138380894

https://twitter.com/StatMLPapers/status/1768488351219265746