Variational Bayes with Intractable Likelihood (VBIL)

Updated 8 December 2025

VBIL is a variational inference method that approximates Bayesian posteriors when the likelihood is intractable by using unbiased estimators.
It employs an augmented space with auxiliary variables to modify the ELBO, ensuring scalable and efficient gradient estimation.
Algorithmic variants like MLMC and RQMC reduce estimator variance, making VBIL effective for state-space, latent variable, and large-scale models.

Variational Bayes with Intractable Likelihood (VBIL) refers to a class of variational inference methodologies designed to perform Bayesian posterior approximation when the likelihood function is inaccessible either due to analytic intractability or excessive computational cost, but can be estimated unbiasedly (typically by simulation, particle filters, or subsampling). VBIL generalizes classical variational Bayes (VB), enabling scalable and efficient inference for complex models such as state-space models, latent variable models, generalized linear mixed models, and intractable or simulator-based Bayesian models, including those encountered in approximate Bayesian computation (ABC) (Tran et al., 2015, Gunawan et al., 2017, He et al., 2021, Zens et al., 15 Sep 2025).

1. Conceptual Foundations and Model Setup

VBIL operates in settings where the posterior density

$\pi(\theta\mid y) \propto p(\theta) p(y\mid \theta)$

is intractable because $p(y\mid\theta)$ cannot be evaluated pointwise. Instead, an unbiased estimator $\hat p(y\mid \theta)$ is available, such that $\mathbb{E}[\hat p(y\mid\theta)] = p(y\mid\theta)$ (Tran et al., 2015).

The central aim is to approximate $\pi(\theta|y)$ by a tractable family $q_\lambda(\theta)$ , parameterized by $\lambda$ , using a variational objective. The evidence lower bound (ELBO) used in standard VB,

$\mathcal{L}(\lambda) = \mathbb{E}_{q_\lambda}[\log p(\theta) + \log p(y|\theta) - \log q_\lambda(\theta)],$

needs modification, since $\log p(y|\theta)$ is unavailable. VBIL offers unbiased or variance-controlled approximate solutions for these settings.

To address the intractable likelihood, VBIL operates on an augmented space:

$T_N(\theta, z) = \frac{p(\theta) p(y|\theta) e^z g_N(z|\theta)}{p(y)},$

where $g_N(z|\theta)$ is the distribution of the log-likelihood noise and $z$ is the auxiliary (randomization) variable (Tran et al., 2015). The variational family then becomes $q_{\lambda,N}(\theta, z) = q_\lambda(\theta) g_N(z|\theta)$ .

2. Variational Objectives and Gradient Estimation

The variational optimization proceeds by maximizing the ELBO, now defined on the augmented space:

$\mathcal{L}(\lambda) = \mathbb{E}_{q_\lambda(\theta)\,g_N(z|\theta)} [\log p(\theta) + \log \hat p(y|\theta) - \log q_\lambda(\theta)].$

This ensures that only unbiased likelihood estimators are required, and all expectations can be approximated via Monte Carlo (Tran et al., 2015, Gunawan et al., 2017).

Gradient-based optimization is necessary for large-scale or non-conjugate problems. The canonical approach employs the score function (REINFORCE) estimator:

$\nabla_\lambda \mathcal{L}(\lambda) = \mathbb{E}\Big[\nabla_\lambda \log q_\lambda(\theta) \left\{ h(\theta, z) - \log q_\lambda(\theta) \right\} \Big],$

with $h(\theta, z) = \log p(\theta) + \log p(y|\theta) + z$ , and the expectation taken over $q_\lambda(\theta) g_N(z|\theta)$ . Variance reduction strategies include subtracting fitted control variates and employing natural gradients, $F^{-1}(\lambda) \nabla_\lambda \mathcal{L}(\lambda)$ , utilizing the Fisher information of $q_\lambda$ (Tran et al., 2015, Gunawan et al., 2017).

Alternative unbiased gradient schemes for more complex likelihood-free models use multilevel Monte Carlo (MLMC) telescoping decompositions and randomized quasi-Monte Carlo (RQMC) (He et al., 2021). These approaches achieve unbiased ELBO and gradient estimation even in highly nested or simulator-based settings.

3. Algorithmic Implementations and Computational Structure

A typical VBIL algorithm involves the following loop (from (Tran et al., 2015, Gunawan et al., 2017, He et al., 2021)):

Draw $S$ variational samples $\theta^{(s)} \sim q_\lambda(\theta)$ .
For each $\theta^{(s)}$ , draw the auxiliary variable $z^{(s)} \sim g_N(z|\theta^{(s)})$ or generate a simulation/proxy for the likelihood.
Compute stochastic gradients of the ELBO with control variates and natural gradients as appropriate.
Update $\lambda$ using gradient ascent with an adaptive or Robbins–Monro step-size.
Continue until convergence criteria are met, typically monitoring the ELBO's moving average.

For models with intractable predictive integrals or chance constraints, as in stochastic convex design problems, a variational family is posited for $q(\theta)$ (often Gaussian or log-concave exponential family for convexity preservation), and expectation and feasibility computations are performed by Monte Carlo reparameterization sampling (Jaiswal et al., 2020).

For large datasets, unbiased gradient estimates can be obtained using mini-batch or subsampling strategies, and with MapReduce for massive or distributed data (Gunawan et al., 2017). VBILL (Variational Bayes with Intractable Log-Likelihood) further exploits availability of unbiased log-likelihood gradient estimators, streamlining reparameterization gradients and enabling scalable variational inference for big data and panel data (Gunawan et al., 2017).

MLMC-based VBIL (e.g., "unbiased MLMC-based variational Bayes") constructs an unbiased estimator of both the log-likelihood and its gradient via a multilevel antithetic telescoping sum, optionally with RQMC for variance and cost reductions (He et al., 2021).

4. Theoretical Properties and Consistency

VBIL approximations are characterized by the following properties:

For models where the true posterior is sharply peaked, and regularity conditions are satisfied, the variational posterior contracts to the true parameter and the VBIL-optimal solution set converges to the Bayes or maximum likelihood solution as $n \to \infty$ (Jaiswal et al., 2020).
For chance-constrained design, the variationally-approximated feasible set remains convex if $q(\theta)$ is log-concave and constraints are convex in $(x,\theta)$ (Jaiswal et al., 2020). Moreover, as sample size grows, the solution set of the VBIL problem converges to the true set, both in objective value and Hausdorff distance.
Finite-sample error bounds can be derived: for any $x$ not truly feasible, the probability that $q_n(g_i(x,\theta)\le 0) \ge \beta$ is bounded above by a term proportional to the KL-divergence concentration rate and vanishes as $n\rightarrow\infty$ (Jaiswal et al., 2020).
In model selection tasks with latent regression models, a marginal likelihood approximation based on mean-field variational posteriors (Variational Bayes Criterion, VBC) achieves asymptotic consistency: the probability of selecting the true model converges to one as $n\rightarrow\infty$ under standard smoothness, identifiability, and contraction properties (Zens et al., 15 Sep 2025).

5. Applications and Empirical Performance

VBIL finds broad applications in scenarios where traditional MCMC is computationally prohibitive or intractable:

State-space models: Likelihoods are estimated by particle filters, enabling inference for time series or dynamical systems (Tran et al., 2015).
Approximate Bayesian computation: Likelihoods are replaced by simulation and kernel-based density estimation, with unbiased proxies used in the variational objective (Tran et al., 2015, He et al., 2021).
Latent variable and mixed models: Marginal likelihoods over latent variables are approximated using mean-field variational posteriors, supporting scalable Bayesian variable selection and model averaging (Zens et al., 15 Sep 2025).
Large-scale datasets: Data subsampling and MapReduce schemes enable scalable variational inference for datasets otherwise infeasible for exact or MCMC methods, delivering 10–50x speedups and convergence within minutes for millions of observations (Gunawan et al., 2017).
Chance-constrained stochastic design: Allows tractable and convex approximations to infeasible design sets under parameter uncertainty, while maintaining finite-sample error guarantees (Jaiswal et al., 2020).

Empirically, VBIL-based approaches often achieve accuracy comparable to tailored pseudo-marginal or SMC MCMC samplers, but with far lower computational cost. In model selection tasks, the approximate variational Bayes criterion is nearly identical in accuracy to full mean-field VB and consistent in variable selection, while reducing runtime by 95–99% in high-dimensional or large-sample latent regression settings (Zens et al., 15 Sep 2025).

6. Advances: MLMC, RQMC, and Algorithmic Variants

Recent developments have addressed intrinsic variance and bias limitations encountered in naïve Monte Carlo gradient estimators for likelihood-free inference:

Unbiased MLMC gradient estimators construct telescoping antithetic sums over increasing sample sizes, yielding single-term unbiased estimates of nested log-likelihoods and achieving finite variance and expected cost (He et al., 2021).
Randomized Quasi-Monte Carlo (RQMC), when compatible with model structure, can accelerate convergence rates to near-optimal $O(2^{-(2-\varepsilon)\ell})$ for gradient and ELBO estimation, substantially reducing computational demand. Theoretical complexity is rigorously controlled under mild smoothness assumptions, and empirical results confirm improved ELBOs and posterior accuracy relative to standard MC and VBIL (He et al., 2021).
Subsampling and distributed algorithms exploit data independence and variance reduction via control variates for robust, scalable VBIL implementations suitable for modern massive datasets (Gunawan et al., 2017).

These algorithmic extensions generalize the VBIL paradigm, allowing unbiased and variance-efficient variational inference in a wide class of intractable or likelihood-free Bayesian models.

7. Limitations and Practical Recommendations

VBIL's practical performance is sensitive to the variance of the likelihood estimator used. Asymptotic variance of the VBIL estimator scales linearly with the variance of $\log\hat p(y|\theta)$ , which is milder than the exponential sensitivity in pseudo-marginal MCMC, permitting the use of moderately noisy likelihoods for computational efficiency (Tran et al., 2015). Step-size schedules, Monte Carlo sample sizes, control variate fitting, and natural-gradient computation require careful tuning for stability and optimal convergence (Gunawan et al., 2017, He et al., 2021). In model selection with latent regression, fixing latent densities at null-model VB fits dramatically accelerates computation, with only minor first-order bias in marginal cases (Zens et al., 15 Sep 2025).

In summary, VBIL and its algorithmic descendants constitute a general, robust framework for variational Bayesian inference in complex models with intractable likelihoods, offering scalable accuracy, theoretical guarantees, and broad practical utility across latent variable models, stochastic design, likelihood-free inference, and large-scale data analytics.