Papers
Topics
Authors
Recent
Search
2000 character limit reached

Conjugate-Computation Variational Inference (CVI)

Updated 9 April 2026
  • CVI is a unified framework for variational inference that leverages closed-form updates for conjugate components and stochastic techniques for non-conjugate parts.
  • It formulates inference as mirror descent in the mean-parameter space, linking to natural-gradient methods and online Bayesian filtering.
  • CVI offers significant computational advantages in complex models, demonstrating faster convergence compared to fully black-box stochastic-gradient methods.

Conjugate-Computation Variational Inference (CVI) is a unified framework for variational inference (VI) in probabilistic models containing both conjugate and non-conjugate structure. CVI leverages closed-form updates for conjugate components while applying stochastic-gradient or surrogate-expansion techniques where conjugacy is absent, yielding an inference method with broad applicability and significant computational advantages in mixed-compositional models. By situating each update as mirror descent in the mean-parameter space of the variational family, CVI balances efficiency and generality, and is directly connected to natural-gradient methods, classical variational message-passing, and online Bayesian filtering algorithms (Khan et al., 2017, Khan et al., 2017).

1. Problem Formulation and Foundations

The central variational problem begins with the observed data yy and a model whose joint density can be factorized as

p(y,z)=pnc(y,z)pc(y,z),p(y, z) = p_{nc}(y, z)\,p_c(y, z),

where zz denotes latent variables; pc(y,z)p_c(y, z) contains terms conjugate to the approximating exponential family, and pnc(y,z)p_{nc}(y,z) encompasses non-conjugate terms (Khan et al., 2017). The variational approximation is of exponential-family form,

q(zλ)=h(z)exp{ϕ(z),λA(λ)},q(z|\lambda) = h(z) \exp\{\langle \phi(z), \lambda \rangle - A(\lambda)\},

with sufficient statistics ϕ(z)\phi(z), natural parameters λ\lambda, log partition function AA, and base measure h(z)h(z). A bijection exists between natural parameters p(y,z)=pnc(y,z)pc(y,z),p(y, z) = p_{nc}(y, z)\,p_c(y, z),0 and mean parameters p(y,z)=pnc(y,z)pc(y,z),p(y, z) = p_{nc}(y, z)\,p_c(y, z),1. Optimizing the evidence lower bound (ELBO)

p(y,z)=pnc(y,z)pc(y,z),p(y, z) = p_{nc}(y, z)\,p_c(y, z),2

can be equivalently reparameterized as p(y,z)=pnc(y,z)pc(y,z),p(y, z) = p_{nc}(y, z)\,p_c(y, z),3 and optimized over p(y,z)=pnc(y,z)pc(y,z),p(y, z) = p_{nc}(y, z)\,p_c(y, z),4.

For fully conjugate models, classical variational message passing or coordinate ascent yield closed-form solutions. Non-conjugate factors introduce intractable expectation terms, commonly handled by local analytic bounds or black-box stochastic gradient estimators, at the expense of ignoring remaining conjugate structure or slow convergence (Khan et al., 2017).

2. CVI Algorithmic Structure and Mean-Parameter Mirror Descent

The core innovation in CVI is the application of mirror-descent in the mean-parameter space rather than stochastic gradient in the natural-parameter space. For mean-parameters p(y,z)=pnc(y,z)pc(y,z),p(y, z) = p_{nc}(y, z)\,p_c(y, z),5, the update takes the form

p(y,z)=pnc(y,z)pc(y,z),p(y, z) = p_{nc}(y, z)\,p_c(y, z),6

where p(y,z)=pnc(y,z)pc(y,z),p(y, z) = p_{nc}(y, z)\,p_c(y, z),7 is the convex dual of p(y,z)=pnc(y,z)pc(y,z),p(y, z) = p_{nc}(y, z)\,p_c(y, z),8, and p(y,z)=pnc(y,z)pc(y,z),p(y, z) = p_{nc}(y, z)\,p_c(y, z),9 is the associated Bregman divergence. The ELBO gradient decomposes additively:

zz0

By construction, the non-conjugate gradient zz1 will require Monte Carlo or surrogate evaluation, while the conjugate part yields closed-form natural-parameter updates. The resulting update for the variational distribution is

zz2

with

zz3

Here, zz4 can be updated as a running average using the chosen step-size zz5 (Khan et al., 2017, Khan et al., 2017).

3. Implementation Workflow and Algorithmic Details

The principal algorithmic form for single-factor exponential families is:

Step Operation
1 Compute mean params: zz6
2 Estimate zz7
3 Update site: zz8
4 Conjugate update: zz9

In mean-field Bayesian networks, coordinate-wise mirror descent across factors recovers variational message-passing (VMP) or stochastic variational inference (SVI) for conjugate terms, with stochastic-gradient corrections for the non-conjugate terms.

Common step-size schedules include pc(y,z)p_c(y, z)0 with pc(y,z)p_c(y, z)1, or small constant values, and are not heavily tuned in practice. Computational cost is dominated by the non-conjugate gradient estimation and closed-form conjugate updates, which often scale linearly in data size (Khan et al., 2017).

4. Theoretical Properties and Convergence

CVI's convergence relies on standard mirror descent results. Provided pc(y,z)p_c(y, z)2 is differentiable with Lipschitz-continuous gradient, stochastic gradient estimates pc(y,z)p_c(y, z)3 are unbiased and have bounded variance, and the dual log-partition pc(y,z)p_c(y, z)4 is strongly convex, the iterates converge almost surely to a stationary point of the ELBO. Robbins–Monro conditions on step-sizes are sufficient:

pc(y,z)p_c(y, z)5

This covers models in which the exponential-family structure is preserved in the variational and conjugate components, subsuming earlier algorithms including non-conjugate VMP with exact gradients and unit step-size (Khan et al., 2017).

5. Applications and Model Classes

CVI is broadly applicable to any probabilistic model decomposable into conjugate (pc(y,z)p_c(y, z)6) and non-conjugate (pc(y,z)p_c(y, z)7) blocks. Covered models include:

  • Gaussian-process classification,
  • Generalized linear models,
  • Kalman filters with non-Gaussian observations,
  • Gamma-factor models,
  • Poisson–gamma matrix factorization,
  • Deep exponential-family models (Khan et al., 2017).

For logistic regression, CVI encompasses both Jaakkola–Jordan surrogate bounding and Pólya–gamma data augmentation, rendering logistic likelihoods conditionally conjugate to Gaussian priors. The optimal pc(y,z)p_c(y, z)8 in binary logistic models is Gaussian with natural parameter updates constructed from closed-form surrogate factors. Iterative updates alternate between closed-form Gaussian updates and auxiliary parameter optimization. In both quadratic bounding and Pólya–gamma augmentation, global parameter updates are analytically tractable, and auxiliary local updates are simple scalar functions of the second moment pc(y,z)p_c(y, z)9 (Durante et al., 2017).

Empirical results demonstrate CVI outperforms fully black-box stochastic-gradient methods in convergence speed and achieves at least comparable predictive performance, with improvements from one to two orders of magnitude in wall-clock time (Khan et al., 2017).

6. Special Cases, Variants, and Theoretical Connections

CVI generalizes and unifies several existing lines in VI and Bayesian optimization:

  • Gaussian special case: For pnc(y,z)p_{nc}(y,z)0, the updates mirror online Newton's method and the natural-gradient variant of VI, with extensions to diagonal precision (the "Vprop" algorithm). This can be implemented within RMSprop with minimal modification and admits a connection to extended Kalman filtering and regularized natural-gradient descent (Khan et al., 2017).
  • Coordinate-ascent VI: For fully-conjugate models, CVI reduces to classical coordinate-ascent updates.
  • Natural-gradient and online filtering view: The sequence of CVI updates aligns algebraically with natural-gradient methods under the Gauss–Newton approximation and with Bayesian state filtering in online settings.

7. Practical Considerations and Guidance

Implementing CVI requires three key components:

  • Efficient computation of natural-to-mean parameter mappings in the chosen exponential family,
  • Unbiased Monte Carlo or analytic estimation of pnc(y,z)p_{nc}(y,z)1,
  • The ability to sum natural parameters for conjugate-exponential family inference.

CVI strongly exploits closed-form updates wherever conjugacy is present, with negligible additional overhead in models with partial conjugacy. The method is robust to step-size choice, tolerant to minor tuning, and conceptually modular. In practice, the benefit is most pronounced in large-scale or structured models with a mix of conjugate and non-conjugate factors.

By reframing non-conjugate VI as conjugate VI in an augmented or surrogate-exponential family, CVI recovers efficient variational inference machinery without sacrificing generality or convergence guarantees (Khan et al., 2017, Durante et al., 2017, Khan et al., 2017).

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Conjugate-Computation Variational Inference (CVI).