Conjugate-Computation Variational Inference (CVI)
- CVI is a unified framework for variational inference that leverages closed-form updates for conjugate components and stochastic techniques for non-conjugate parts.
- It formulates inference as mirror descent in the mean-parameter space, linking to natural-gradient methods and online Bayesian filtering.
- CVI offers significant computational advantages in complex models, demonstrating faster convergence compared to fully black-box stochastic-gradient methods.
Conjugate-Computation Variational Inference (CVI) is a unified framework for variational inference (VI) in probabilistic models containing both conjugate and non-conjugate structure. CVI leverages closed-form updates for conjugate components while applying stochastic-gradient or surrogate-expansion techniques where conjugacy is absent, yielding an inference method with broad applicability and significant computational advantages in mixed-compositional models. By situating each update as mirror descent in the mean-parameter space of the variational family, CVI balances efficiency and generality, and is directly connected to natural-gradient methods, classical variational message-passing, and online Bayesian filtering algorithms (Khan et al., 2017, Khan et al., 2017).
1. Problem Formulation and Foundations
The central variational problem begins with the observed data and a model whose joint density can be factorized as
where denotes latent variables; contains terms conjugate to the approximating exponential family, and encompasses non-conjugate terms (Khan et al., 2017). The variational approximation is of exponential-family form,
with sufficient statistics , natural parameters , log partition function , and base measure . A bijection exists between natural parameters 0 and mean parameters 1. Optimizing the evidence lower bound (ELBO)
2
can be equivalently reparameterized as 3 and optimized over 4.
For fully conjugate models, classical variational message passing or coordinate ascent yield closed-form solutions. Non-conjugate factors introduce intractable expectation terms, commonly handled by local analytic bounds or black-box stochastic gradient estimators, at the expense of ignoring remaining conjugate structure or slow convergence (Khan et al., 2017).
2. CVI Algorithmic Structure and Mean-Parameter Mirror Descent
The core innovation in CVI is the application of mirror-descent in the mean-parameter space rather than stochastic gradient in the natural-parameter space. For mean-parameters 5, the update takes the form
6
where 7 is the convex dual of 8, and 9 is the associated Bregman divergence. The ELBO gradient decomposes additively:
0
By construction, the non-conjugate gradient 1 will require Monte Carlo or surrogate evaluation, while the conjugate part yields closed-form natural-parameter updates. The resulting update for the variational distribution is
2
with
3
Here, 4 can be updated as a running average using the chosen step-size 5 (Khan et al., 2017, Khan et al., 2017).
3. Implementation Workflow and Algorithmic Details
The principal algorithmic form for single-factor exponential families is:
| Step | Operation |
|---|---|
| 1 | Compute mean params: 6 |
| 2 | Estimate 7 |
| 3 | Update site: 8 |
| 4 | Conjugate update: 9 |
In mean-field Bayesian networks, coordinate-wise mirror descent across factors recovers variational message-passing (VMP) or stochastic variational inference (SVI) for conjugate terms, with stochastic-gradient corrections for the non-conjugate terms.
Common step-size schedules include 0 with 1, or small constant values, and are not heavily tuned in practice. Computational cost is dominated by the non-conjugate gradient estimation and closed-form conjugate updates, which often scale linearly in data size (Khan et al., 2017).
4. Theoretical Properties and Convergence
CVI's convergence relies on standard mirror descent results. Provided 2 is differentiable with Lipschitz-continuous gradient, stochastic gradient estimates 3 are unbiased and have bounded variance, and the dual log-partition 4 is strongly convex, the iterates converge almost surely to a stationary point of the ELBO. Robbins–Monro conditions on step-sizes are sufficient:
5
This covers models in which the exponential-family structure is preserved in the variational and conjugate components, subsuming earlier algorithms including non-conjugate VMP with exact gradients and unit step-size (Khan et al., 2017).
5. Applications and Model Classes
CVI is broadly applicable to any probabilistic model decomposable into conjugate (6) and non-conjugate (7) blocks. Covered models include:
- Gaussian-process classification,
- Generalized linear models,
- Kalman filters with non-Gaussian observations,
- Gamma-factor models,
- Poisson–gamma matrix factorization,
- Deep exponential-family models (Khan et al., 2017).
For logistic regression, CVI encompasses both Jaakkola–Jordan surrogate bounding and Pólya–gamma data augmentation, rendering logistic likelihoods conditionally conjugate to Gaussian priors. The optimal 8 in binary logistic models is Gaussian with natural parameter updates constructed from closed-form surrogate factors. Iterative updates alternate between closed-form Gaussian updates and auxiliary parameter optimization. In both quadratic bounding and Pólya–gamma augmentation, global parameter updates are analytically tractable, and auxiliary local updates are simple scalar functions of the second moment 9 (Durante et al., 2017).
Empirical results demonstrate CVI outperforms fully black-box stochastic-gradient methods in convergence speed and achieves at least comparable predictive performance, with improvements from one to two orders of magnitude in wall-clock time (Khan et al., 2017).
6. Special Cases, Variants, and Theoretical Connections
CVI generalizes and unifies several existing lines in VI and Bayesian optimization:
- Gaussian special case: For 0, the updates mirror online Newton's method and the natural-gradient variant of VI, with extensions to diagonal precision (the "Vprop" algorithm). This can be implemented within RMSprop with minimal modification and admits a connection to extended Kalman filtering and regularized natural-gradient descent (Khan et al., 2017).
- Coordinate-ascent VI: For fully-conjugate models, CVI reduces to classical coordinate-ascent updates.
- Natural-gradient and online filtering view: The sequence of CVI updates aligns algebraically with natural-gradient methods under the Gauss–Newton approximation and with Bayesian state filtering in online settings.
7. Practical Considerations and Guidance
Implementing CVI requires three key components:
- Efficient computation of natural-to-mean parameter mappings in the chosen exponential family,
- Unbiased Monte Carlo or analytic estimation of 1,
- The ability to sum natural parameters for conjugate-exponential family inference.
CVI strongly exploits closed-form updates wherever conjugacy is present, with negligible additional overhead in models with partial conjugacy. The method is robust to step-size choice, tolerant to minor tuning, and conceptually modular. In practice, the benefit is most pronounced in large-scale or structured models with a mix of conjugate and non-conjugate factors.
By reframing non-conjugate VI as conjugate VI in an augmented or surrogate-exponential family, CVI recovers efficient variational inference machinery without sacrificing generality or convergence guarantees (Khan et al., 2017, Durante et al., 2017, Khan et al., 2017).