Denoising Diffusion Variational Inference

Updated 5 April 2026

DDVI is a variational inference framework that utilizes denoising diffusion processes to create highly expressive posterior approximations.
It constructs a reverse diffusion chain optimized via an ELBO augmented with a sleep regularizer, unifying techniques from VAEs, score-based learning, and SDEs.
DDVI has demonstrated superior performance in latent variable modeling, deep Gaussian processes, and image restoration tasks, offering scalable inference in high-dimensional settings.

Denoising Diffusion Variational Inference (DDVI) is a class of variational inference algorithms that leverage denoising diffusion processes to construct highly expressive variational distributions for posterior approximation. Rooted in, but not limited to, the theory of latent-variable models, generative modeling, and deep probabilistic architectures, DDVI defines the variational posterior as the reversal of a fixed or learnable diffusion (noising) process and optimizes an explicit evidence lower bound (ELBO) functional. Its algorithmic instantiation and statistical interpretation unify methodologies from frequentist maximum-likelihood estimation, variational autoencoders, stochastic differential equations, and score-based learning.

1. Theoretical Foundations and Motivation

The foundational motivation for DDVI arises from the challenge of approximating intractable posteriors in latent-variable models of the form

$p_\theta(x, z) = p(z)\cdot p_\theta(x|z)$

where the true posterior $p_\theta(z|x)$ is analytically inaccessible. Classical variational inference (VI) achieves this with simple variational families $q_\phi(z|x)$ such as diagonal Gaussians or reparameterized normalizing flows, but these often lack sufficient expressivity.

DDVI eschews invertibility constraints and adversarial training instabilities (typical in normalizing flows or GAN-based VI) by employing diffusion-based constructions for the variational posterior. Specifically, DDVI introduces a sequence of auxiliary latents $(z_1,\ldots,z_T)$ and constructs an iterative refinement via a forward diffusion (noising) process, whose time-reversal, parameterized by neural networks, defines a flexible and deep variational family for $q_\phi(z|x)$ (Piriyakulkij et al., 2024). In the context of denoising diffusion models (DDMs), this gives rise to a generative process for data that is compatible with both frequentist and Bayesian statistical paradigms (Chen, 21 Oct 2025).

2. Diffusion-Based Variational Posterior Construction

A prototypical DDVI process defines a forward Markov chain (the "noising" process) over the latent $z_0 = z$ :

$r(z_t | z_{t-1}) = \mathcal{N}\left(\sqrt{1-\beta_t} z_{t-1},\ \beta_t I\right), \quad t = 1,\ldots,T$

with a user-chosen noise schedule $\beta_t$ . The reverse (denoising) chain, parameterized as:

$q_\phi(z_{t-1}|z_t, x) = \mathcal{N}\left(\mu_\phi(z_t, x, t), \Sigma_\phi(z_t, x, t)\right)$

with $q_\phi(z_T|x)$ taken as a diagonal Gaussian parameterized by an encoder network, defines the variational approximation.

The entire posterior $p_\theta(z|x)$ 0 is generated by sampling $p_\theta(z|x)$ 1 from $p_\theta(z|x)$ 2 and sequentially denoising via the learned reverse transitions down to $p_\theta(z|x)$ 3 (Piriyakulkij et al., 2024). This architecture enables arbitrarily deep iterative refinement, increased flexibility over flow-based VI, and compatibility with score-matching objectives for parameter learning.

3. Variational Objectives: ELBO, Score-Matching, and Wake-Sleep Regularization

The central variational objective in DDVI is the ELBO, often augmented with regularization terms:

$p_\theta(z|x)$ 4

where $p_\theta(z|x)$ 5 is a forward-KL "sleep" regularizer inspired by the wake-sleep algorithm of Hinton et al.; it encourages $p_\theta(z|x)$ 6 to cover the prior over noise trajectories, avoiding posterior collapse or "holes" (Piriyakulkij et al., 2024). The sleep term is computed by running diffusion from prior samples and matching the reverse process; $p_\theta(z|x)$ 7 is thus an independent denoising score-matching loss across auxiliary latents.

For DDMs viewed as deep latent-variable models, a similar ELBO is derived based on the full Markov chain, leading to a decomposition over conditional Kullback-Leibler divergences and tractable, weighted score-matching losses. Fixing forward diffusion and variance parameters ensures computational efficiency but retains an explicit lower bound for maximum likelihood (Chen, 21 Oct 2025).

In the context of inverse problems and image restoration, DDVI extends to variational objectives for likelihood estimation and regularization (RED-diff) by incorporating a tractable score-matching integral over the noising trajectory (Mardani et al., 2023, Cheng et al., 2024).

4. Algorithmic Implementations: Discrete Chains and Continuous SDEs

The canonical DDVI algorithm alternates stochastic updates to variational parameters and model parameters:

Sample minibatches of data $p_\theta(z|x)$ 8
For each $p_\theta(z|x)$ 9, sample $q_\phi(z|x)$ 0
Sequentially denoise via $q_\phi(z|x)$ 1 for $q_\phi(z|x)$ 2 down to $q_\phi(z|x)$ 3, producing $q_\phi(z|x)$ 4
Compute the ELBO and (optionally) the sleep regularizer; update $q_\phi(z|x)$ 5, $q_\phi(z|x)$ 6 via stochastic gradients

For DDMs, practical algorithms often restrict to fixed variance schedules and employ noise prediction objectives akin to those in DDPM (Ho et al.) (Chen, 21 Oct 2025). The continuous-time (SDE) limit further generalizes DDVI: the forward SDE is of the form

$q_\phi(z|x)$ 7

and the reverse process is parameterized using learned score networks (Chen, 21 Oct 2025, Xu et al., 2024).

Pseudocode for DDVI core loops, including the sleep phase, can be found in (Piriyakulkij et al., 2024). In deep Gaussian processes (DGPs), the approach is extended to score-based reverse SDEs over inducing points, with explicit Girsanov-based pathwise ELBOs (Xu et al., 2024, Xu et al., 23 Sep 2025).

5. Applications in Latent Variable Modeling, Deep Gaussian Processes, and Inverse Problems

DDVI has been empirically validated in multiple settings:

Deep latent variable models: Outperforms VAEs, normalizing flows, and adversarial posteriors in ELBO, sample quality, clustering (NMI, purity), and unsupervised visualization tasks on MNIST, CIFAR-10, and genomics data (Piriyakulkij et al., 2024).
Deep Gaussian Processes: Discrete and SDE-based DDVI improve posterior expressivity and marginal likelihood in sparse GP frameworks; explicit variational path-space bounds enable stable and scalable learning (Xu et al., 2024, Xu et al., 23 Sep 2025).
Real-world image denoising: DDVI with adaptive likelihood (learned noise precision), local variance rectification, and per-step variational Bayes yields state-of-the-art performance in unsupervised denoising on realistic signal-dependent noise (Cheng et al., 2024).
Inverse problems in imaging: DDVI's RED-diff variant provides a transparent ELBO and tractable diffusion-score regularizer, outperforming sampling-based methods in inpainting and superresolution benchmarks (Mardani et al., 2023).

A summary of algorithmic settings is presented below.

Setting	Forward/Reverse Process	Objective Structure
Latent variable models	Discrete Markov chain over $q_\phi(z\|x)$ 8	ELBO + sleep (forward-KL) regularizer
DGPs	SDEs over inducing points ( $q_\phi(z\|x)$ 9)	Pathwise ELBO via Girsanov theorem
Image denoising	Discrete $(z_1,\ldots,z_T)$ 0 with variational Bayes	ELBO with adaptive (VB) likelihood
Inverse problems	SDE and RED-diff regularizer	Score-matching loss + $(z_1,\ldots,z_T)$ 1 fidelity

6. Extensions: Diffusion Bridge VI, Adaptive Starts, and Localized Inference

A prominent extension, Diffusion Bridge Variational Inference (DBVI), addresses limitations of DDVI in scenarios where the unconditional start of the reverse SDE is far from the complex posterior (notably in DGPs). DBVI introduces a Doob-bridged diffusion by parameterizing the initial distribution of the reverse process via an amortized neural network, conditioned on structured summaries (such as inducing inputs). This substantially reduces the posterior gap and accelerates convergence while preserving theoretical guarantees such as closed-form bridge marginals and explicit ELBOs (Xu et al., 23 Sep 2025).

Other developments include adaptive likelihoods with spatially varying noise, implemented in real-world image denoising via variational inference for noise precision, and local Gaussian rectification to compensate for pixel correlation structures (Cheng et al., 2024).

7. Statistical and Practical Considerations

Statistically, DDVI unifies variational inference with denoising diffusion formulations by interpreting the forward diffusion as a scalable surrogate for the intractable E-step of the EM algorithm. The learned reverse process (denoiser) maximizes a surrogate complete-data likelihood under the variational posterior, optimizing a valid lower bound to observed data likelihood in the frequentist regime (Chen, 21 Oct 2025). Pathwise KL decompositions, Girsanov transformations in the SDE limit, and wake-sleep regularization strengthen both theoretical guarantees and empirical robustness.

Practical guidelines include careful tuning of diffusion schedules ( $(z_1,\ldots,z_T)$ 2), sleep regularizer weights ( $(z_1,\ldots,z_T)$ 3), SNR-based weighting in inverse problems, and exploiting parallelism across data and diffusion steps. Empirical studies consistently demonstrate that DDVI architectures outperform classical VI baselines in posterior fit, likelihood, sample quality, as well as convergence speed in large-scale and high-dimensional deep probabilistic models (Piriyakulkij et al., 2024, Xu et al., 23 Sep 2025, Cheng et al., 2024, Mardani et al., 2023).