Papers
Topics
Authors
Recent
Search
2000 character limit reached

Variational Lower Bound and Negative ELBO

Updated 3 April 2026
  • Variational Lower Bound (Negative ELBO) is a core metric that quantifies the gap between approximate and true posteriors in probabilistic models.
  • It is decomposed into KL divergence and reconstruction terms, enabling clear rate–distortion trade-offs in variational autoencoders and related models.
  • Recent advances leverage gradient estimators, semi-implicit methods, and geometric interpretations to enhance optimization, diagnostics, and convergence analysis.

A variational lower bound, typically called the Evidence Lower Bound (ELBO), is a central objective in variational inference for probabilistic models, particularly in variational autoencoders (VAEs) and Bayesian latent variable models. The negative ELBO—often referred to as the variational free energy—serves as a minimization target and quantifies the gap between an approximate posterior and the true posterior. In modern research, the development and analysis of negative ELBO-based objectives have become essential for model design, gradient estimator theory, and understanding information-theoretic properties of generative models.

1. Formal Definition and Standard Decomposition

The ELBO for observed data xx and latent variable zz, under a model pθ(x,z)p_\theta(x, z) and variational distribution qϕ(z)q_\phi(z) (or qϕ(zx)q_\phi(z|x) for amortized inference), is defined as: ELBO(ϕ)=Eqϕ(z)[logpθ(x,z)logqϕ(z)]\mathrm{ELBO}(\phi) = \mathbb{E}_{q_\phi(z)}\left[\log p_\theta(x, z) - \log q_\phi(z)\right] Its negative form—minimized in practice—is

ELBO(ϕ)=Eqϕ(z)[logqϕ(z)logpθ(x,z)]-\mathrm{ELBO}(\phi) = \mathbb{E}_{q_\phi(z)}\left[\log q_\phi(z) - \log p_\theta(x, z)\right]

Minimizing ELBO-\mathrm{ELBO} is equivalent to minimizing the Kullback–Leibler divergence to the true posterior: ELBO(ϕ)=KL(qϕ(z)pθ(zx))logpθ(x)-\mathrm{ELBO}(\phi) = \mathrm{KL}\big(q_\phi(z)\,\|\,p_\theta(z|x)\big) - \log p_\theta(x) where logpθ(x)\log p_\theta(x) is constant with respect to zz0 (Yin et al., 2018, Chérief-Abdellatif, 2018). In amortized settings (e.g., VAEs), one commonly writes: zz1 This decomposition tracks rate (KL term) and distortion (reconstruction term) as in rate–distortion theory (Alemi et al., 2017, Lastras, 2019).

2. Information-Theoretic Interpretations and Structural Decomposition

At stationary points for a wide class of models, the ELBO can be expressed as a sum (and differences) of entropies: zz2 For standard exponential family models (notably Gaussian VAEs), this reduces to a tractable, closed-form expression in terms of parameterized variances and means (Damm et al., 2020, Lücke et al., 2022). This entropy sum characterization enables efficient, variance-free evaluation of the ELBO at convergence, and provides principled diagnostics for phenomena such as posterior collapse. The negative ELBO, zz3, thus admits an interpretation as the aggregate mismatch and compression cost imposed by approximate inference.

3. Advanced Variational Families and Sandwich Bounds

Generative models increasingly employ variational families for which the marginal density zz4 is intractable. Semi-implicit variational inference (SIVI) constructs two-level mixtures: zz5 where both zz6 and zz7 are reparameterizable but not necessarily tractable jointly (Yin et al., 2018). The SIVI framework defines Monte Carlo-based lower and upper bounds that sandwich the true ELBO: zz8 where zz9 is the number of auxiliary samples. Both bounds converge monotonically to the true ELBO as pθ(x,z)p_\theta(x, z)0. SIVI provides an unbiased surrogate objective whose gradient can be estimated stochastically, and which, for finite pθ(x,z)p_\theta(x, z)1, is always a valid lower bound. This approach generalizes to doubly semi-implicit settings, where both prior and variational distributions are semi-implicit mixtures, preserving the sandwich property (Molchanov et al., 2018).

4. Gradient Estimators and Optimization Strategies

Gradient-based optimization of the negative ELBO typically employs two classes of estimators:

  • Score-function (REINFORCE) estimators: Use the identity

pθ(x,z)p_\theta(x, z)2

but suffer from high variance (Dib, 2020).

  • Reparameterization (pathwise) estimators: For pθ(x,z)p_\theta(x, z)3, pθ(x,z)p_\theta(x, z)4, leverage

pθ(x,z)p_\theta(x, z)5

When available, reparameterization yields lower-variance, unbiased estimates.

Variance reduction can be achieved by deterministic quasi-Monte Carlo or quantization schemes; for instance, Quantized Variational Inference (QVI) replaces Monte Carlo with optimal cubature over quantized support points, yielding zero-variance (but biased) gradients. The bias decays polynomially with the number of quantization points, and Richardson extrapolation can further reduce bias (Dib, 2020).

For the VR-IWAE class of bounds, the choice of estimator (reparameterized or doubly-reparameterized) affects signal-to-noise ratio (SNR) scaling with sample size pθ(x,z)p_\theta(x, z)6 and model class (Daudel et al., 2024). In high-dimensional regimes, importance weight collapse may nullify SNR gains unless pθ(x,z)p_\theta(x, z)7 is exponentially large in pθ(x,z)p_\theta(x, z)8.

5. Extensions: Rate–Distortion, Thermodynamic, and Discrete Variants

Interpreting the negative ELBO through the lens of rate–distortion theory, minimizing pθ(x,z)p_\theta(x, z)9 corresponds to the sum qϕ(z)q_\phi(z)0, where the first term measures information encoding cost and the second term, distortion (Lastras, 2019, Alemi et al., 2017). This framework clarifies tradeoffs and motivates alternative objectives—e.g., enforcing minimum mutual information or rate lower bounds (free bits) to prevent posterior collapse (Alemi et al., 2017). Thermodynamic Variational Objectives (TVO) further generalize the ELBO via path integration over interpolations between the variational posterior and the model joint, yielding tighter bounds via Riemann sum approximations (Masrani et al., 2019).

For graphical models with discrete latent variables, entropy and expectations under expressive distributions (e.g., selective-SPNs) can be computed exactly, circumventing the limitations of sampling-based estimators and enabling direct optimization of the negative ELBO (Shih et al., 2020).

6. Practical Implementation and Model Selection

During training, the negative ELBO serves as the objective for stochastic gradient methods. Its minimization encourages the variational family to approach the true posterior while maximizing the marginal likelihood lower bound. The variance properties of the chosen gradient estimator, the tractability of the variational family, and the potential for bound tightness (e.g., via importance weighting or surrogate bounds) directly impact practical learning dynamics (Yin et al., 2018, Dib, 2020, Daudel et al., 2024).

For model selection, penalized ELBO approaches have been shown to yield consistent estimators even under model misspecification, provided suitable prior mass conditions (Chérief-Abdellatif, 2018). Closed-form entropy decompositions enable more efficient post-training model diagnostics and facilitate interpretable control over different components of the inference objective (Damm et al., 2020, Lygerakis et al., 2024).

7. Geometric, Asymptotic, and Theoretical Perspectives

Recent work situates the negative ELBO as a Bregman divergence—specifically qϕ(z)q_\phi(z)1 with respect to the exponential family log-partition function qϕ(z)q_\phi(z)2. This geometric perspective underpins rigorous convergence bounds for gradient-based algorithms, with convergence rates governed by spectral properties of the Fisher information matrix (Bohara et al., 17 Oct 2025).

In entropy-sum formulations, the negative ELBO at stationary points can be fully characterized in terms of entropies and cross-entropies of the variational distribution, the prior, and the conditional model distribution. These results extend to generalized exponential families and remain valid under broad practical conditions (finite/infinite data, deep networks, saddle or local optima) (Damm et al., 2020, Lücke et al., 2022).


Selected references:

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Variational Lower Bound (Negative ELBO).