Variational Bagging for Robust Bayesian Inference
- Variational Bagging is an ensemble method that integrates bootstrap resampling with variational inference to correct under-dispersion and recover parameter dependence.
- It guarantees valid asymptotic uncertainty quantification and strong empirical calibration across various models, from classical parametrics to deep learning.
- The approach leverages parallel computing for computational efficiency while providing theoretical guarantees such as Bernstein–von Mises results and minimax contraction rates.
Variational Bagging (VB Bagging) is an ensemble-enhanced variational Bayes methodology designed to deliver robust, well-calibrated Bayesian uncertainty quantification. It systematically integrates bootstrap resampling with variational inference to produce a bagged variational posterior that corrects both the under-dispersion and loss of parameter dependence typical in mean-field variational approximations. Delivered as a black-box wrapper around standard variational Bayes routines, VB Bagging achieves valid asymptotic uncertainty quantification, enhances robustness under model misspecification, recovers dependence among parameters, and retains computational efficiency through parallelization. Its theoretical guarantees encompass Bernstein–von Mises (BvM) type results, minimax posterior contraction rates, and strong empirical calibration in a variety of model classes, from classical parametrics to deep learning architectures (Fan et al., 25 Nov 2025).
1. Algorithmic Framework
VB Bagging operates by repeatedly resampling the dataset and applying variational inference independently to each bootstrap replicate. The resulting variational posteriors are then averaged, forming the bagged variational posterior.
Procedure
- Data and Model: Let be i.i.d. samples from , modeled as , , possibly with latent variables .
- Variational Family: Denoted , typically mean-field.
- Bootstrap Replicates: Use replicates, each with size (often ).
Step-by-step:
- For :
- (a) Draw a bootstrap sample of size with replacement from the observed data.
- (b) Solve the variational inference problem on :
or equivalently maximize
- (c) Marginalize over to obtain .
- Form the bagged variational posterior by averaging:
Means, variances, and sampling from are obtained via mixture sampling: pick , then sample from .
2. Mean-Field Variational Family and Its Limitations
The mean-field variational family is defined as:
This product structure enables efficient variational inference algorithms like coordinate ascent variational inference; however, it enforces independence among parameters, removing dependence structure. Under classical BvM results for mean-field VB, the posterior captures only the marginal variances:
where is diagonal due to the mean-field assumption. This leads to approximate posteriors that are axis-aligned and under-dispersed, failing to capture the true joint uncertainty.
VB Bagging introduces a "sandwich" covariance correction, restoring parameter dependence by including the term
which is generally nonzero in the off-diagonals and recovers covariance structures lost under mean-field approximations.
3. Theoretical Guarantees
Bernstein–von Mises Theorem for VB Bagging
Let
and . For draws , under standard regularity and asymptotic conditions,
The first (diagonal) term accounts for VB under-dispersion; the second, "sandwich" term is typically non-diagonal and ensures recovery of parameter dependence (Fan et al., 25 Nov 2025).
Marginal Variance Correction
For correct model specification and (), the sandwich structure allows extraction of both off-diagonals and the correct scaling of marginal variances. The combined covariance structure is
and halving the diagonal retrieves the Fisher information covariance .
Asymptotic Coverage and Contraction
For models without latent variables and , any nominal credible ellipsoid
satisfies
demonstrating that variational bagging does not asymptotically undercover (Corollary 3.3 in (Fan et al., 25 Nov 2025)).
Posterior contraction for general models (possibly nonparametric) holds at essentially the same Hellinger rate as the ordinary posterior (up to a factor): for any diverging (Theorem 3.4, (Fan et al., 25 Nov 2025)).
4. Empirical Evaluation Across Model Classes
VB Bagging has been empirically validated on a range of models:
- Two-Dimensional Gaussian Mean: On with correlated , mean-field VB yields axis-aligned (zero-correlation) ellipses; VB Bagging recovers correct orientation and 95% region, closely matching HMC reference posteriors.
- Finite Mixture Models under Misspecification: With heavy-tailed t-mixtures/double-exponential true data and Gaussian mixture working models, VB intervals are too narrow; both BayesBag and VB Bagging deliver interval coverages matching the true sampling distribution.
- Sparse Linear Regression: For the spike-and-slab linear model under Gaussian and t-errors, VB Bagging matches or outperforms both MCMC Bayes and regular BayesBag in terms of relative squared error, with pronounced superiority under misspecification.
- Deep Neural Networks: For various regression architectures with heavy-tailed errors, mean-field VB under-covers (e.g., 93% for nominal 95% intervals); VB Bagging restores empirical coverage to the nominal level.
- Variational Autoencoders (VAE): The BVAE, implementing VB Bagging on the encoder, reconstructs low-dimensional manifolds (e.g., Swiss-roll, 1D curves) more sharply and closer to the true manifold than standard VAEs, observable especially in 1D examples (Fan et al., 25 Nov 2025).
5. Implementation Considerations and Practical Guidance
Bootstrap Size ()
- For trusted model specification, () is recommended, enabling recovery of off-diagonal structure and correct scaling of diagonals.
- Under model misspecification, correct marginal variance matching requires setting
where (VB) and (VB Bagging, ) are observed variances of a functional of .
Number of Bootstrap Replicates ()
- BayesBag commonly uses --$100$ (computationally intensive with MCMC).
- VB Bagging requires fewer replicates; --$30$ yields stable estimates, with as few as --$10$ sufficient for many use cases.
Computational Aspects
- Cost is approximately times a single VB run (linear in ), with perfect parallelism across replicates.
- ELBO (or evidence-gap) monitoring on each replicate is essential; non-convergent replicates should be dropped or re-run.
- Between-replicate variance of key functionals can be used to assess calibration of credible intervals.
6. Summary and Outlook
Variational Bagging constitutes a theoretically grounded, practically convenient enhancement to variational Bayes approaches. It achieves:
- Valid asymptotic uncertainty quantification (BvM limit with full sandwich covariance correction)
- Robustness to model misspecification, avoiding asymptotic under-coverage
- Recovery of parameter dependence lost under mean-field approximations
- Contraction rates equivalent to the ordinary Bayesian posterior (modulo logarithmic factors)
- Straightforward implementation atop existing VB routines, leveraging averaging and parallelism
VB Bagging is directly applicable to a wide range of Bayesian modeling frameworks, including high-dimensional, nonparametric, and deep learning settings, delivering improved coverage and credible set calibration without substantially increased computational cost (Fan et al., 25 Nov 2025).