Collapsed Gibbs Sampler

Updated 3 April 2026

Collapsed Gibbs sampler is an MCMC method that analytically marginalizes latent variables to reduce posterior dependence in hierarchical Bayesian models.
Its design improves chain mixing and convergence by reducing conditional dependence, often requiring fewer iterations than standard Gibbs sampling.
Variants like partially collapsed Gibbs samplers balance enhanced sampling efficiency with additional computational complexity in high-dimensional applications.

A collapsed Gibbs sampler is a Markov chain Monte Carlo (MCMC) algorithm for sampling from the joint posterior distribution of a hierarchical Bayesian model, wherein one or more sets of latent variables are analytically integrated ("collapsed") out of the conditional distributions used in the sampling cycle. This partial or full marginalization reduces conditional dependence among remaining variables, often accelerating the mixing of the Markov chain and improving sampling efficiency. Several algorithmic variants exist, collectively termed collapsed, partially collapsed (PCGS), or "partially collapsed Gibbs" samplers, depending on the degree and organization of marginalization. The method underpins state-of-the-art inference strategies for high-dimensional Bayesian models and has well-established theoretical properties (Amrouche et al., 2021, Kuo et al., 2024, Mak et al., 11 Jan 2026).

1. Theoretical Foundations and Formal Operator Perspective

The classical Gibbs sampler alternates sampling from the full conditional distributions of each variable or block, representing a Markov chain whose kernel is a composition of conditional replacement operators, each acting as an $L^2$ -orthogonal projection or, equivalently, a minimization in Kullback–Leibler (KL) divergence (I-projection) (Kuo et al., 2024). In the collapsed Gibbs sampler, the kernel is constructed by replacing certain conditionals with updates from marginal conditionals, where subsets of variables have been integrated out. The operator perspective unifies standard, blocked, and collapsed/partially collapsed variants by treating each update as an I-projection onto a constraint set defined by a conditional or marginal (Kuo et al., 2024).

For a disjoint partition of variables $X = (X_1,\ldots,X_d)$ and a collection of conditional distributions $f_{a_i \mid b_i}$ with supports $C_i = a_i \cup b_i$ , collapsed updates sequentially replace the conditional in the current density using a marginalization (integration) operator $M_{b_i}$ and the new conditional $T_i$ . Provided the permutation of updates is "permissible" (i.e., the conditioning sets nest appropriately), the Iterative Conditional Replacement (ICR) algorithm converges monotonically in KL divergence to the stationary law, even in the presence of incompatible or incomplete conditional families (Kuo et al., 2024).

2. Construction, Partial Collapsing, and Algorithmic Variants

The canonical construction of a partially collapsed Gibbs sampler proceeds as follows (Dyk et al., 2013):

Conditioning reduction: Replace a full-conditional update with a marginal-conditional or partial block update, integrating out selected variables from the conditional.
Step permutation: Reorder the sampling steps so that the variables whose values have just been collapsed are redrawn immediately afterward, preserving the Markov property.
Trimming: Remove the now-redundant sampling of collapsed variables, ensuring no loss of invariance or ergodicity.

A standard two-block Gibbs sampler alternates between $\theta \sim p(\theta|\phi)$ and $\phi \sim p(\phi|\theta)$ . In collapsed (or partially collapsed) schemes, one may collapse $\phi$ and sample $\theta \sim p(\theta)$ , or partially collapse by integrating only a subset of variables at each step. If certain steps are intractable, Metropolis–Hastings can be introduced within the PCGS architecture; however, this must be done carefully using the conditioning-reduction–permutation–trimming framework to preserve stationarity (Dyk et al., 2013). Arbitrary use of MH in a reduced-conditional update generally destroys detailed balance.

3. Statistical and Computational Consequences

Improved Mixing and Spectral Properties

Collapsing variables analytically generally reduces posterior dependence in Markov updates, leading to improved chain mixing and faster convergence. Empirical diagnostics—for example, Brooks–Gelman MPSRF—demonstrate that PCGS samplers can attain convergence in orders of magnitude fewer iterations than standard, non-collapsed counterparts (e.g., 1,250 iterations for PCGS vs. 36,000 for classical Gibbs in nonnegative sparse restoration) (Amrouche et al., 2021). The kernel of a collapsed sampler typically contracts in KL divergence at least as quickly as the parent chain, and never slower (Dyk et al., 2013, Kuo et al., 2024).

A general principle termed the "solidarity principle of the spectral gap" holds: every cycle and mixture of Gibbs steps (including blocked and collapsed Gibbs) inherits a spectral gap from the corresponding full Gibbs sampler, guaranteeing geometric ergodicity when the parent chain is gapped. However, different collapsed or blocked schemes may not inherit a gap from each other, and in some pathological configurations convergence can qualitatively degrade (Mak et al., 11 Jan 2026).

Complexity and Implementation

Each iteration of a collapsed or PCGS sampler is typically more computationally expensive per sweep, owing to analytic integration, potentially denser matrix operations (e.g., Cholesky factorizations for marginalized updates), or more complex proposal mechanisms. However, the reduction in effective autocorrelation time or improvement in effective sample size per unit wall time often more than compensates for the additional computational burden.

4. Typical Applications and Model Structures

Collapsed and partially collapsed Gibbs samplers are foundational in several Bayesian model families:

Sparse spike-and-slab models: In high-dimensional regression, PCGS efficiently samples inclusion indicators and local scale variables, integrating out regression coefficients and breaking strong dependence between variables, e.g., in spike-and-slab models for nonnegative deconvolution using generalized hyperbolic priors (Amrouche et al., 2021) and Bayesian variable selection with Laplace-type slabs (Chung, 10 Jan 2026).
Latent structure models (topic models, DPMMs): For latent Dirichlet allocation and Dirichlet process mixtures, fully collapsed Gibbs updates integrate out Dirichlet or parameter variables, leaving only labels for resampling, improving mixing especially in clustered data (Zhang et al., 2016, Khoufache et al., 2023).
Mixed-effects and hierarchical models: In multilevel Gaussian random-effects models, integrating out global means or fixed effects produces block-wise updates for random effects with provable $X = (X_1,\ldots,X_d)$ 0 mixing times under mild structural assumptions (Ghosh et al., 2021, Ekvall et al., 2019).
Modern hierarchical models: PCGS methods are utilized in settings such as ordinal quantile regression, where collapsing latent scale weights out of the scale update step yields significant reduction in autocorrelation and Markov chain dependence (Grabski et al., 2019).
Blind inverse problems: In contexts with intractable posterior structures (e.g., blind image deblurring using diffusion priors), collapsed samplers alternate analytic marginalizations and approximate updates (e.g., Langevin diffusion or "trimming" of latent chains), showing high empirical efficiency (Murata et al., 2023).

5. Convergence Theory and Operator Analysis

Operator-theoretic analyses reveal that each collapsed step acts as an I-projection; successive compositions yield monotone convergence in KL to the intersection of the constraint sets defined by the collapsed conditionals (Kuo et al., 2024). The approach sidesteps the need to verify irreducibility and aperiodicity of the overall chain, as each step is metrically contractive when the updating cycle is permissible.

For Gaussian and many exponential-family models, contraction arguments combined with minorization conditions yield explicit geometric-ergodicity rates, often independent of the problem dimensionality under mild model regularity conditions (e.g., for Bayesian VARs and two-factor random effects) (Ekvall et al., 2019, Ghosh et al., 2021). In the non-Gaussian or non-conjugate regime, expectation propagation or partial analytic integration can be leveraged to approximate collapsed steps, providing a runtime–accuracy tradeoff without loss of convergence guarantees (Aicher et al., 2018).

6. Limitations, Extensions, and Design Considerations

While collapsed and PCGS methods offer substantial mixing acceleration, analytic marginalization may not always be tractable, especially in non-conjugate or highly complex hierarchical models (Aicher et al., 2018). Implementation can be considerably more complex (e.g., reversible-jump proposals and Cholesky updates), require careful parameter preestimation (such as fitting hyperparameters for generalized hyperbolic priors), or involve delicate hand-tuning (e.g., for RJ proposals) (Amrouche et al., 2021).

Not all blockings or partial collapses are beneficial: certain choices can eliminate geometric ergodicity or worsen mixing, as demonstrated by explicit counterexamples in block-update Markov chains (Mak et al., 11 Jan 2026). Consequently, empirical or theoretical validation of spectral properties for the specific block/collapsing scheme is recommended.

Extensions include distributed versions for federated learning with DPMMs (Khoufache et al., 2023), structured sparsity (e.g., group slabs), and intractable cases addressed via approximate integration (e.g., EP-based approximate collapse) (Aicher et al., 2018).

7. Comparative Performance and Practical Guidelines

Empirical studies report that collapsed and PCGS methods lead to dramatic reductions in the number of MCMC iterations required for effective convergence and often comparable per-iteration computational cost relative to standard methods (Amrouche et al., 2021, Grabski et al., 2019, Khoufache et al., 2023). In ultra-high dimensions ( $X = (X_1,\ldots,X_d)$ 1 with sparse true solutions), collapsed samplers leveraging random-scan schemes with data-driven proposal weights achieve exact posterior inference with memory and computation requirements dramatically lower than full-sweep Gibbs, maintaining rigorous ergodicity and stationarity (Chung, 10 Jan 2026).

In practice, the following guidelines emerge:

Identify analytic marginalizations that reduce conditional dependence and are still computationally feasible.
Ensure the update schedule respects the permissible cycle condition for operator contraction.
For models with intractable marginalized steps, consider approximate integration or expectation propagation within the Gibbs cycle.
When incorporating MH updates inside a collapsed step, use the conditioning-reduction–permutation–trimming framework to preserve stationarity (Dyk et al., 2013).
Block or collapse only as far as gains in mixing outweigh complexity overhead or loss of geometric ergodicity.

The collapsed Gibbs and its partially collapsed descendants remain a cornerstone of efficient MCMC for hierarchical and latent-structure models, supported by a broad spectrum of theoretical and computational advances (Dyk et al., 2013, Amrouche et al., 2021, Kuo et al., 2024, Mak et al., 11 Jan 2026).