Papers
Topics
Authors
Recent
2000 character limit reached

Variational Bagging for Robust Bayesian Inference

Updated 26 November 2025
  • Variational Bagging is an ensemble method that integrates bootstrap resampling with variational inference to correct under-dispersion and recover parameter dependence.
  • It guarantees valid asymptotic uncertainty quantification and strong empirical calibration across various models, from classical parametrics to deep learning.
  • The approach leverages parallel computing for computational efficiency while providing theoretical guarantees such as Bernstein–von Mises results and minimax contraction rates.

Variational Bagging (VB Bagging) is an ensemble-enhanced variational Bayes methodology designed to deliver robust, well-calibrated Bayesian uncertainty quantification. It systematically integrates bootstrap resampling with variational inference to produce a bagged variational posterior that corrects both the under-dispersion and loss of parameter dependence typical in mean-field variational approximations. Delivered as a black-box wrapper around standard variational Bayes routines, VB Bagging achieves valid asymptotic uncertainty quantification, enhances robustness under model misspecification, recovers dependence among parameters, and retains computational efficiency through parallelization. Its theoretical guarantees encompass Bernstein–von Mises (BvM) type results, minimax posterior contraction rates, and strong empirical calibration in a variety of model classes, from classical parametrics to deep learning architectures (Fan et al., 25 Nov 2025).

1. Algorithmic Framework

VB Bagging operates by repeatedly resampling the dataset and applying variational inference independently to each bootstrap replicate. The resulting variational posteriors are then averaged, forming the bagged variational posterior.

Procedure

  • Data and Model: Let X1,,XnXX_1,\ldots,X_n \in \mathcal{X} be i.i.d. samples from P0P_0, modeled as p(xθ)p(x \mid \theta), θΘRd\theta \in \Theta \subset \mathbb{R}^d, possibly with latent variables ZZ.
  • Variational Family: Denoted Q\mathcal{Q}, typically mean-field.
  • Bootstrap Replicates: Use BB replicates, each with size MM (often MnM \sim n).

Step-by-step:

  1. For b=1,,Bb=1, \ldots, B:

    • (a) Draw a bootstrap sample X(b)X_{(b)}^* of size MM with replacement from the observed data.
    • (b) Solve the variational inference problem on X(b)X_{(b)}^*:

    qb(θ,Z1:M)=argminqQKL[q(θ,Z1:M)π(θ,Z1:MX(b))],q_b^*(\theta, Z_{1:M}^*) = \arg\min_{q \in \mathcal{Q}} KL[q(\theta, Z_{1:M}^*)\,\|\,\pi(\theta, Z_{1:M}^* \mid X_{(b)}^*)],

    or equivalently maximize

    ELBO(b)(q)=Eq[logp(X(b),Z1:M,θ)]Eq[logq(θ,Z1:M)].ELBO_{(b)}(q) = \mathbb{E}_q \left[\log p(X_{(b)}^*, Z_{1:M}^*, \theta)\right] - \mathbb{E}_q[\log q(\theta, Z_{1:M}^*)].

  • (c) Marginalize over ZZ to obtain qb(θ)q_b^*(\theta).
  1. Form the bagged variational posterior by averaging:

qVB(θ)=1Bb=1Bqb(θ).q_{VB}(\theta) = \frac{1}{B}\sum_{b=1}^B q_b^*(\theta).

Means, variances, and sampling from qVBq_{VB} are obtained via mixture sampling: pick bUniform{1,,B}b \sim \text{Uniform}\{1, \ldots, B\}, then sample from qb(θ)q_b^*(\theta).

2. Mean-Field Variational Family and Its Limitations

The mean-field variational family is defined as:

QMF={q(θ,Z1:n)=j=1dqθj(θj)i=1nqZi(Zi)}.\mathcal{Q}_{MF} = \left\{ q(\theta, Z_{1:n}) = \prod_{j=1}^d q_{\theta_j}(\theta_j)\prod_{i=1}^n q_{Z_i}(Z_i) \right\}.

This product structure enables efficient variational inference algorithms like coordinate ascent variational inference; however, it enforces independence among parameters, removing dependence structure. Under classical BvM results for mean-field VB, the posterior qVBq_{VB} captures only the marginal variances:

n(θθ0)ΔnN(0,V~vb1),\sqrt{n}(\theta - \theta_0) - \Delta_n \rightsquigarrow N(0, \tilde V_{vb}^{-1}),

where V~vb\tilde V_{vb} is diagonal due to the mean-field assumption. This leads to approximate posteriors that are axis-aligned and under-dispersed, failing to capture the true joint uncertainty.

VB Bagging introduces a "sandwich" covariance correction, restoring parameter dependence by including the term

(Vvb1)Dvb(Vvb1),(V_{vb}^{-1}) D_{vb} (V_{vb}^{-1}),

which is generally nonzero in the off-diagonals and recovers covariance structures lost under mean-field approximations.

3. Theoretical Guarantees

Bernstein–von Mises Theorem for VB Bagging

Let

Vvb0=EP0[2vb(Xθ0)],Dvb0=EP0[vb(Xθ0)vb(Xθ0)],V_{vb}^0 = -\mathbb{E}_{P_0}[\nabla^2 \ell_{vb}(X \mid \theta_0)], \quad D_{vb}^0 = \mathbb{E}_{P_0}[\nabla \ell_{vb}(X \mid \theta_0)\, \nabla \ell_{vb}(X \mid \theta_0)^\top],

and c=limM/n(0,)c = \lim M/n \in (0, \infty). For draws θqVB\theta^\dagger \sim q_{VB}, under standard regularity and asymptotic conditions,

n(θθ0)ΔnX1:ndN(0,1cV~vb1+1cVvb1DvbVvb1).\sqrt{n} (\theta^\dagger - \theta_0) - \Delta_n \mid X_{1:n} \overset{d}{\longrightarrow} N\left(0, \frac{1}{c}\tilde V_{vb}^{-1} + \frac{1}{c}V_{vb}^{-1} D_{vb} V_{vb}^{-1}\right).

The first (diagonal) term accounts for VB under-dispersion; the second, "sandwich" term is typically non-diagonal and ensures recovery of parameter dependence (Fan et al., 25 Nov 2025).

Marginal Variance Correction

For correct model specification and M=nM=n (c=1c=1), the sandwich structure allows extraction of both off-diagonals and the correct scaling of marginal variances. The combined covariance structure is

2diag(Vvb1)+offdiag(Vvb1),2\cdot \text{diag}(V_{vb}^{-1}) + \text{offdiag}(V_{vb}^{-1}),

and halving the diagonal retrieves the Fisher information covariance Vvb1V_{vb}^{-1}.

Asymptotic Coverage and Contraction

For models without latent variables and M=nM=n, any nominal 1α1-\alpha credible ellipsoid

Cn,α={θ:n(θθ^MLE)Σ^1(θθ^MLE)rn,1α2}C_{n,\alpha} = \{\theta: n(\theta-\hat\theta_{MLE})^\top \widehat{\Sigma}^{-1}(\theta - \hat\theta_{MLE}) \leq r_{n,1-\alpha}^2\}

satisfies

limP0(θ0Cn,α)1α,\lim P_0(\theta_0 \in C_{n,\alpha}) \geq 1-\alpha,

demonstrating that variational bagging does not asymptotically undercover (Corollary 3.3 in (Fan et al., 25 Nov 2025)).

Posterior contraction for general models (possibly nonparametric) holds at essentially the same Hellinger rate ϵn\epsilon_n as the ordinary posterior (up to a logn\sqrt{\log n} factor): EP0[qVB(H2(Pθ,P0)Mnϵn2logn)]0\mathbb{E}_{P_0}\bigl[ q_{VB}\big(H^2(P_\theta, P_0) \geq M_n\,\epsilon_n^2\,\log n \bigr) \bigr] \to 0 for any diverging MnM_n \to \infty (Theorem 3.4, (Fan et al., 25 Nov 2025)).

4. Empirical Evaluation Across Model Classes

VB Bagging has been empirically validated on a range of models:

  • Two-Dimensional Gaussian Mean: On XiN(μ,Σ)X_i \sim N(\mu, \Sigma) with correlated Σ\Sigma, mean-field VB yields axis-aligned (zero-correlation) ellipses; VB Bagging recovers correct orientation and 95% region, closely matching HMC reference posteriors.
  • Finite Mixture Models under Misspecification: With heavy-tailed t-mixtures/double-exponential true data and Gaussian mixture working models, VB intervals are too narrow; both BayesBag and VB Bagging deliver interval coverages matching the true sampling distribution.
  • Sparse Linear Regression: For the spike-and-slab linear model under Gaussian and t-errors, VB Bagging matches or outperforms both MCMC Bayes and regular BayesBag in terms of relative squared error, with pronounced superiority under misspecification.
  • Deep Neural Networks: For various regression architectures with heavy-tailed errors, mean-field VB under-covers (e.g., 93% for nominal 95% intervals); VB Bagging restores empirical coverage to the nominal level.
  • Variational Autoencoders (VAE): The BVAE, implementing VB Bagging on the encoder, reconstructs low-dimensional manifolds (e.g., Swiss-roll, 1D curves) more sharply and closer to the true manifold than standard VAEs, observable especially in 1D examples (Fan et al., 25 Nov 2025).

5. Implementation Considerations and Practical Guidance

Bootstrap Size (MM)

  • For trusted model specification, M=nM=n (c=1c=1) is recommended, enabling recovery of off-diagonal structure and correct scaling of diagonals.
  • Under model misspecification, correct marginal variance matching requires setting

M^=v~nv~nv~nn,\hat M^* = \frac{\tilde v_n^*}{\tilde v_n^* - \tilde v_n} n,

where v~n\tilde v_n (VB) and v~n\tilde v_n^* (VB Bagging, M=nM=n) are observed variances of a functional of θ\theta.

Number of Bootstrap Replicates (BB)

  • BayesBag commonly uses B50B \sim 50--$100$ (computationally intensive with MCMC).
  • VB Bagging requires fewer replicates; B20B \approx 20--$30$ yields stable estimates, with as few as B=5B=5--$10$ sufficient for many use cases.

Computational Aspects

  • Cost is approximately BB times a single VB run (linear in nn), with perfect parallelism across replicates.
  • ELBO (or evidence-gap) monitoring on each replicate is essential; non-convergent replicates should be dropped or re-run.
  • Between-replicate variance of key functionals can be used to assess calibration of credible intervals.

6. Summary and Outlook

Variational Bagging constitutes a theoretically grounded, practically convenient enhancement to variational Bayes approaches. It achieves:

  • Valid asymptotic uncertainty quantification (BvM limit with full sandwich covariance correction)
  • Robustness to model misspecification, avoiding asymptotic under-coverage
  • Recovery of parameter dependence lost under mean-field approximations
  • Contraction rates equivalent to the ordinary Bayesian posterior (modulo logarithmic factors)
  • Straightforward implementation atop existing VB routines, leveraging averaging and parallelism

VB Bagging is directly applicable to a wide range of Bayesian modeling frameworks, including high-dimensional, nonparametric, and deep learning settings, delivering improved coverage and credible set calibration without substantially increased computational cost (Fan et al., 25 Nov 2025).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)
Slide Deck Streamline Icon: https://streamlinehq.com

Whiteboard

Forward Email Streamline Icon: https://streamlinehq.com

Follow Topic

Get notified by email when new papers are published related to Variational Bagging Approach.