Papers
Topics
Authors
Recent
Search
2000 character limit reached

Variational Mode-Seeking Loss (VML)

Updated 19 December 2025
  • The paper introduces VML as a reverse KL divergence minimization technique that aligns diffusion model posteriors with Bayesian measurement posteriors for accurate MAP inference.
  • It details the analytical framework for linear inverse problems and integrates local VML minimization into reverse diffusion steps via the VML–MAP algorithm.
  • Empirical results show that VML–MAP achieves superior LPIPS and FID scores over baselines in image restoration tasks like inpainting, super-resolution, and deblurring.

The variational mode-seeking loss (VML) is a functional introduced to address inverse problems within the framework of diffusion models, specifically targeting efficient and accurate maximum a posteriori (MAP) inference. VML is defined as the reverse Kullback-Leibler (KL) divergence between the diffusion model's noisy posterior and the true Bayesian measurement posterior. Its minimization, performed at each step of the reverse diffusion process, consistently guides samples toward the MAP estimate, thereby providing both theoretical clarity and practical advantages in image restoration and related tasks. VML is analytically tractable in the case of linear inverse problems and underpins the VML-MAP inference algorithm, which demonstrates favorable empirical results in both computational efficiency and estimation accuracy (Gutha et al., 11 Dec 2025).

1. Formal Definition and Conceptual Underpinning

Given a pre-trained, unconditional diffusion model, let p(x0xt)p(x_0|x_t) denote the diffusion posterior: the conditional distribution over clean images x0x_0 given a noisy intermediate xtx_t at reverse-time step tt. The measurement posterior p(x0y)p(x_0|y) targets the true Bayesian posterior under a known measurement operator (e.g., y=A(x0)+ny = A(x_0) + n).

The variational mode-seeking loss at time tt is defined as the reverse KL divergence:

VMLt(xt)=DKL(p(x0xt)p(x0y))=p(x0xt)logp(x0xt)p(x0y)dx0\mathrm{VML}_t(x_t) = D_{KL}\bigl(p(x_0|x_t)\,\Vert\,p(x_0|y)\bigr) = \int p(x_0|x_t) \log \frac{p(x_0|x_t)}{p(x_0|y)} dx_0

Minimizing VMLt(xt)\mathrm{VML}_t(x_t) aligns the high-density mode of p(x0xt)p(x_0|x_t) with that of p(x0y)p(x_0|y), thus iteratively steering the reverse diffusion chain toward the MAP mode of the solution space (Gutha et al., 11 Dec 2025).

2. Analytical Derivation and Closed-form for Linear Inverse Problems

Substituting the Gaussian likelihoods p(xtx0)=N(x0,σt2I)p(x_t|x_0) = \mathcal{N}(x_0, \sigma_t^2 I) and p(yx0)=N(A(x0),σy2I)p(y|x_0) = \mathcal{N}(A(x_0), \sigma_y^2 I), and using Bayes’ rules, the VML admits an expansion:

VMLt(xt)=logp(xt)12σt2Ep(x0xt)[xtx02]+12σy2Ep(x0xt)[yA(x0)2]+C\mathrm{VML}_t(x_t) = -\log p(x_t) - \frac{1}{2\sigma_t^2} \mathbb{E}_{p(x_0|x_t)}[\|x_t-x_0\|^{2}] + \frac{1}{2\sigma_y^2} \mathbb{E}_{p(x_0|x_t)}[\|y-A(x_0)\|^{2}] + C

For linear inverse problems (where A(x0)=Hx0A(x_0) = Hx_0), applying Tweedie’s formula and using the covariance Cov[x0xt]\mathrm{Cov}[x_0|x_t] yields the explicit form:

VMLt(xt)=logp(xt)12σt2D(xt,t)xt2 +12σy2yHD(xt,t)2 +12σt2Tr[Cov(x0xt)]+12σy2Tr[HCov(x0xt)HT]+C\begin{aligned} \mathrm{VML}_t(x_t) =& -\log p(x_t)-\frac{1}{2\sigma_t^2}\|D(x_t,t)-x_t\|^2 \ &+ \frac{1}{2\sigma_y^2}\|y-H D(x_t,t)\|^2 \ &+\frac{1}{2\sigma_t^2}\mathrm{Tr}[\mathrm{Cov}(x_0|x_t)] + \frac{1}{2\sigma_y^2}\mathrm{Tr}[H \mathrm{Cov}(x_0|x_t) H^T] + C \end{aligned}

where D(xt,t)=E[x0xt]D(x_t,t) = \mathbb{E}[x_0|x_t] is the denoiser’s mean (Gutha et al., 11 Dec 2025).

For practical inference, a simplified version drops the higher-order covariance-trace terms, justified as their contribution vanishes as t0t\rightarrow 0:

VMLts(xt)logp(xt)12σt2D(xt,t)xt2+12σy2yHD(xt,t)2+const\mathrm{VML}^s_t(x_t) \approx -\log p(x_t) - \frac{1}{2\sigma_t^2}\|D(x_t,t) - x_t\|^2 + \frac{1}{2\sigma_y^2}\|y - H D(x_t,t)\|^2 + {\rm const}

3. Inference Algorithm: VML–MAP

The VML-MAP algorithm integrates local VML minimization into each reverse-diffusion step:

  • For each time step tit_i (from largest tNt_N to 0), starting from xtNN(0,σ(tN)2I)x_{t_N} \sim \mathcal{N}(0, \sigma(t_N)^2 I).
  • Perform KK steps of gradient descent on VMLtis(x)\mathrm{VML}^s_{t_i}(x); the gradient is given by

xVMLs=xlogp(x)1σt2(D(x,t)x)+1σy2HT(yHD(x,t))\nabla_x\mathrm{VML}^s = -\nabla_x\log p(x) - \frac{1}{\sigma_t^2}(D(x,t)-x) + \frac{1}{\sigma_y^2}H^T(y - H D(x,t))

where xlogp(x)-\nabla_x\log p(x) is the learned score model sθ(x,t)s_\theta(x,t).

  • Advance with a standard reverse-diffusion step: sample xti1N(D(x,ti),σ(ti1)2I)x_{t_{i-1}} \sim \mathcal{N}(D(x,t_i), \sigma(t_{i-1})^2 I).

The cumulative network call complexity is N\approx N (for each denoiser call) plus NKN \cdot K (for the gradient steps), usually much less than the O(1000)O(1000) neural calls needed for some alternative posterior-sampling methods (Gutha et al., 11 Dec 2025).

4. Theoretical Properties: Mode-Seeking and MAP Consistency

As t0t\rightarrow0, the conditional p(x0xt)p(x_0|x_t) contracts to a sharp Gaussian about xtx_t. In this limit, the asymptotic relation

limt0[VMLt(x)+nlogσt]=logp(xy)  +  const\lim_{t\to0}\bigl[\,\mathrm{VML}_t(x)+n\log\sigma_t\,\bigr] = -\,\log p(x|y)\;+\;\mathrm{const}

implies that VMLt\mathrm{VML}_t is minimized at the MAP estimate argmaxxlogp(xy)\arg\max_x\log p(x|y). The “trace of covariance” terms in the linear VML form uniformly tend to constants as t0t \to 0, so dropping them does not alter the minimizer. Thus, local VML minimization at each step naturally drives the iterates toward the MAP solution (Gutha et al., 11 Dec 2025).

5. Empirical Performance and Benchmarking

VML-MAP’s efficacy was systematically validated on image restoration tasks—half-mask inpainting, 4×4\times super-resolution, and uniform deblurring—on ImageNet64, ImageNet256, FFHQ256, and latent CelebA256 datasets. Comparative methods included DDRM and IIGDM (posterior sampling), MAPGA (MAP estimation via PF-ODE), and DAPS.

Key results (ImageNet64, 1000 image subset):

Task Method LPIPS↓ FID
inpainting VML-MAP 0.146 38.7
MAPGA 0.172 46.3
DDRM 0.262 57.0
4× super-res VML-MAP 0.136 61.9
MAPGA 0.203 83.9
DDRM 0.235 78.2
deblurring VML-MAP 105.5
MAPGA 114.3
DDRM 198.0

VML-MAP exceeded baselines in both LPIPS and FID across standard and large-scale (ImageNet256/FFHQ256) benchmarks using only O(103)\mathcal{O}(10^3) neural net calls. A preconditioned variant (VML-MAPpre^{\rm pre}) further improved outcomes on ill-conditioned problems. For a given computation budget (e.g., 20 reverse steps ×\times 50 gradient updates), VML-MAP outperformed DDRM/IIGDM that required $500$–$1000$ diffusion steps. Qualitative assessments reveal VML-MAP reconstructions with sharper textures and stricter measurement-consistency (Gutha et al., 11 Dec 2025).

6. Limitations and Open Extensions

VML’s practical deployment is subject to several constraints:

  • The optimization step is currently reliant on first-order gradient descent; more sophisticated (e.g., quasi-Newton) optimizers might speed up convergence but must retain computational efficiency.
  • Performance degrades as measurement noise σy\sigma_y increases, since the measurement-fit term’s influence diminishes, leading to blurred reconstructions.
  • For latent diffusion models, VML must be adapted to account for decoder nonlinearities, making optimization harder and resulting in blurrier samples compared to pixel-space implementations.

Directions for further research include the design of robust optimizers for VML minimization, effective treatment of non-linear measurement operators particularly in latent spaces, and the development of joint score-plus-posterior neural estimators to minimize the number of required gradient steps (Gutha et al., 11 Dec 2025).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Variational Mode-Seeking Loss (VML).