Mean-field Variational Inference

Updated 30 December 2025

Mean-field Variational Inference is a technique that approximates the full Bayesian posterior by assuming a fully factorized (product-form) distribution for computational tractability.
It employs coordinate-ascent updates to optimize the evidence lower bound (ELBO), ensuring efficient and convergent inference.
Recent advances integrate geometric tools like gradient flows, Fokker–Planck PDEs, and interacting diffusions to provide theoretical guarantees and inspire new algorithmic variants.

Mean-field variational inference (MFVI) is a foundational technique in Bayesian inference, where the posterior distribution is approximated by restricting to fully factorized (product-form) probability measures. MFVI has emerged as the workhorse of scalable variational inference due to its algorithmic simplicity and the tractability of its coordinate-ascent updates. Recent research has provided a geometric, analytic, and computational unification of MFVI using the language of gradient flows, partial differential equations (PDEs), and interacting particle systems, placing the classical approach on a rigorous foundation and enabling new algorithmic variants (Ghosh et al., 2022). This article presents a comprehensive account of these representations, theoretical guarantees, and algorithmic implications.

1. MFVI: Formulation, Objective, and Coordinate-Ascent

Given data $x\in\mathbb{R}^n$ and latent variables $\theta\in\mathbb{R}^d$ with prior $\pi(\theta)$ and likelihood $p(x|\theta)$ , the exact posterior is $p(\theta|x) = \pi(\theta)p(x|\theta)/Z$ . The Bayesian inference problem is recast as minimizing the Kullback-Leibler divergence over all probability measures $\nu$ : $p = \operatorname*{arg\,min}_{\nu\in\mathcal{P}(\mathbb{R}^d)} D(\nu\|\;p).$ Alternatively, one may equivalently optimize the functional

$J(\nu) = \mathbb{E}_\nu[-\log p(x,\theta)] - H(\nu),$

where $H(\nu) = -\mathbb{E}_\nu[\log \nu]$ is the Shannon entropy. This is the negative evidence lower bound (ELBO).

MFVI restricts $\nu$ to the mean-field family: $\mathcal{M} = \{ \nu(\theta) = \prod_{i=1}^d \nu_i(\theta_i) \} \subset \mathcal{P}(\mathbb{R}^d).$ The ELBO for a product-form $\nu$ can be written as: $J(\nu) = \sum_{i=1}^d J_i(\nu_i; \nu_{-i}), \quad J_i(\nu_i; \nu_{-i}) = \mathbb{E}_{\nu_i}[\Psi_i(\cdot; \nu_{-i})] - H(\nu_i),$ where $\Psi_i(\theta_i; \nu_{-i}) = \mathbb{E}_{\nu_{-i}}[-\log p(x,\theta)]$ .

Coordinate-ascent variational inference (CAVI) alternately minimizes each $J_i$ while keeping the other factors fixed: $\nu_i^{k} = \operatorname*{arg\,min}_{\nu_i} J_i(\nu_i; \nu_{-i}^{k-1}).$ This yields a closed-form update: $\nu_i(\theta_i) \propto \exp(-\Psi_i(\theta_i; \nu_{-i})).$ Cyclic iterations are repeated until convergence.

2. Geometric Representations: Gradient Flows, PDEs, and Diffusions

Three analytic and probabilistic representations of MFVI are established (Ghosh et al., 2022):

a. Gradient Flow on Product Wasserstein Space.

Let $\mathcal{P}_2(\mathbb{R})$ denote probability measures on $\mathbb{R}$ with finite second moment, equipped with the 2-Wasserstein metric $W_2$ . The mean-field space is the product

$\mathcal{M}_2 = \prod_{i=1}^d \mathcal{P}_2(\mathbb{R}),$

with product metric $d^2(\nu, \mu) = \sum_{i=1}^d W_2^2(\nu_i, \mu_i)$ . The MFVI energy $\Phi(\nu) = (J_1(\nu_1; \nu_{-1}), \dots, J_d(\nu_d; \nu_{-d}))$ induces a gradient flow: $\partial_t \nu(t) = - \nabla_W \Phi(\nu(t)),$ where, for each $i$ ,

$\partial_t \nu_i(t) + \nabla_{W_2,i} J_i(\nu_i(t); \nu_{-i}(t)) = 0.$

b. Fokker–Planck–Type PDEs.

Writing $\nu_i(t) = \rho_i(t, \theta_i) d\theta_i$ , the marginal densities satisfy the coupled quasilinear parabolic PDE system: $\partial_t \rho_i(t, \theta_i) = \partial_{\theta_i} \left[ \rho_i(t, \theta_i) \, \partial_{\theta_i} \Psi_i(\theta_i; \rho_{-i}(t)) \right] + \partial^2_{\theta_i} \rho_i(t, \theta_i).$ This is interpreted as a continuity (transport) equation plus isotropic diffusion.

c. McKean–Vlasov Interacting Diffusion Process.

The PDE system above arises as the forward Kolmogorov equation for the interacting diffusion system: $d\theta_i(t) = -\partial_{\theta_i} \Psi_i(\theta_i(t); \rho_{-i}(t)) \, dt + dW_t^i,$ where $W^i$ are independent Brownian motions. Under sufficient regularity, the time-marginal laws of this SDE are exactly the solutions of the MFVI PDE.

3. Discretized Algorithms: Proximal-JKO Scheme and CAVI Convergence

The time-discretized version of the MFVI gradient flow corresponds to a proximal-point (JKO) step in the product Wasserstein metric: $\nu_{h,i}^k = \operatorname*{arg\,min}_{\nu_i \in \mathcal{P}_2(\mathbb{R})} \left\{ \frac{1}{2} W_2^2(\nu_i, \nu_{h,i}^{k-1}) + h J_i(\nu_i; \nu_{h,-i}^{k-1}) \right\},$ for step size $h>0$ . Piecewise-constant interpolation between iterates yields convergence (as $h\to0$ ) to the continuous Wasserstein gradient flow solution [(Ghosh et al., 2022), Theorem 4.3]. The proof uses tightness of energy-dissipation, the energy-dissipation inequality, and uniqueness via geodesic convexity.

4. Theoretical Guarantees: PDE Limits, Global Convergence, and Geometric Conditions

The continuous-time limit of parametric or particle-based MFVI is described by a system of coupled one-dimensional parabolic PDEs: $\partial_t \rho_i = \partial_{\theta_i}[ \rho_i \, \partial_{\theta_i} \Psi_i(\theta_i; \rho_{-i}) ] + \partial^2_{\theta_i} \rho_i,$ for $i=1,\dots,d$ . The convergence of time-discretized JKO/CAVI to this flow, and the associated SDE, is guaranteed under standard convexity conditions.

The key assumption is $\lambda$ -convexity of $-\log p(x, \theta)$ in each variable (i.e., the negative log-joint has Hessian $\succeq \lambda I$ in each coordinate). This ensures geodesic convexity of $J_i$ on $\mathcal{P}_2(\mathbb{R})$ and uniqueness plus exponential contractivity of the mean-field flow: $W_2(\nu(t), \nu'(t)) \leq e^{-\lambda t} W_2(\nu(0), \nu'(0)).$

These conditions yield both the correctness of the gradient-flow and SDE representations and the global convergence of practical algorithms (CAVI and its proximal-JKO variants) (Ghosh et al., 2022).

5. Practical Algorithmic Frameworks

The geometric perspective yields several classes of implementable MFVI algorithms:

Parametric MFVI: If variational factors are chosen from exponential families (e.g., Gaussian, Gaussian mixtures), the JKO coordinate-wise subproblems reduce to closed-form or tractable proximal updates.
Particle-Based MFVI: Each variational factor $\nu_i$ is represented empirically using particles; proximal-Wasserstein (JKO) updates are performed via interacting particle systems. In the continuous-time limit, one recovers the McKean–Vlasov SDE described above, permitting analysis of ergodicity and convergence rates.
Discretization and Numerical Realization: Depending on the chosen space, one may discretize either the measure dynamics (e.g., via particles or parametric surrogates) or the coupled PDEs (e.g., by finite-difference or finite-element methods). Algorithm design principles and convergence checks are inherited from the Wasserstein gradient flow literature.

6. Extensions, Connections, and Future Directions

The framework described in (Ghosh et al., 2022) enables several new theoretical and practical avenues:

Alternative Metric and Divergence Choices: By replacing the Wasserstein metric or regularizer (e.g., with Hellinger or Stein discrepancies), new gradient-flow and diffusion representations are derived, broadening the scope of tractable VI.
Accelerated and Higher-Order Schemes: The geometric viewpoint suggests importing inertial (momentum) or higher-order splitting schemes, promising faster convergence and improved exploration.
Weakening Convexity and Infinite-Dimensional Models: Ongoing work includes the analysis of cases with only displacement-semicovexity, extension to infinite-dimensional latent spaces (e.g., Gaussian-process models), and quantification of convergence rates that account for model dimension $d$ .
Rigorous Unification of MFVI Algorithms: The equivalence between coordinate ascent updates, gradient flows in product Wasserstein spaces, Fokker–Planck–type PDEs, and McKean–Vlasov diffusions provides a unified geometric and probabilistic foundation, enabling systematic derivation and analysis of MFVI algorithms and their numerical approximations.

Overall, this analytic and geometric unification of MFVI not only justifies the standard coordinate-ascent protocol but also invites importing powerful techniques from stochastic analysis, PDE theory, and optimal transport into the study and numerical implementation of scalable variational inference (Ghosh et al., 2022).

PDF Markdown Chat (Pro)

References (1)

On Representations of Mean-Field Variational Inference (2022)

Whiteboard

Generate a whiteboard explanation of this topic.

Topic to Video (Beta)

Generate a video overview of this topic.

Follow Topic

Get notified by email when new papers are published related to Mean-field Variational Inference.