Variational Mode-Seeking Loss (VML)

Updated 19 December 2025

The paper introduces VML as a reverse KL divergence minimization technique that aligns diffusion model posteriors with Bayesian measurement posteriors for accurate MAP inference.
It details the analytical framework for linear inverse problems and integrates local VML minimization into reverse diffusion steps via the VML–MAP algorithm.
Empirical results show that VML–MAP achieves superior LPIPS and FID scores over baselines in image restoration tasks like inpainting, super-resolution, and deblurring.

The variational mode-seeking loss (VML) is a functional introduced to address inverse problems within the framework of diffusion models, specifically targeting efficient and accurate maximum a posteriori (MAP) inference. VML is defined as the reverse Kullback-Leibler (KL) divergence between the diffusion model's noisy posterior and the true Bayesian measurement posterior. Its minimization, performed at each step of the reverse diffusion process, consistently guides samples toward the MAP estimate, thereby providing both theoretical clarity and practical advantages in image restoration and related tasks. VML is analytically tractable in the case of linear inverse problems and underpins the VML-MAP inference algorithm, which demonstrates favorable empirical results in both computational efficiency and estimation accuracy (Gutha et al., 11 Dec 2025).

1. Formal Definition and Conceptual Underpinning

Given a pre-trained, unconditional diffusion model, let $p(x_0|x_t)$ denote the diffusion posterior: the conditional distribution over clean images $x_0$ given a noisy intermediate $x_t$ at reverse-time step $t$ . The measurement posterior $p(x_0|y)$ targets the true Bayesian posterior under a known measurement operator (e.g., $y = A(x_0) + n$ ).

The variational mode-seeking loss at time $t$ is defined as the reverse KL divergence:

$\mathrm{VML}_t(x_t) = D_{KL}\bigl(p(x_0|x_t)\,\Vert\,p(x_0|y)\bigr) = \int p(x_0|x_t) \log \frac{p(x_0|x_t)}{p(x_0|y)} dx_0$

Minimizing $\mathrm{VML}_t(x_t)$ aligns the high-density mode of $p(x_0|x_t)$ with that of $p(x_0|y)$ , thus iteratively steering the reverse diffusion chain toward the MAP mode of the solution space (Gutha et al., 11 Dec 2025).

2. Analytical Derivation and Closed-form for Linear Inverse Problems

Substituting the Gaussian likelihoods $p(x_t|x_0) = \mathcal{N}(x_0, \sigma_t^2 I)$ and $p(y|x_0) = \mathcal{N}(A(x_0), \sigma_y^2 I)$ , and using Bayes’ rules, the VML admits an expansion:

$\mathrm{VML}_t(x_t) = -\log p(x_t) - \frac{1}{2\sigma_t^2} \mathbb{E}_{p(x_0|x_t)}[\|x_t-x_0\|^{2}] + \frac{1}{2\sigma_y^2} \mathbb{E}_{p(x_0|x_t)}[\|y-A(x_0)\|^{2}] + C$

For linear inverse problems (where $A(x_0) = Hx_0$ ), applying Tweedie’s formula and using the covariance $\mathrm{Cov}[x_0|x_t]$ yields the explicit form:

$\begin{aligned} \mathrm{VML}_t(x_t) =& -\log p(x_t)-\frac{1}{2\sigma_t^2}\|D(x_t,t)-x_t\|^2 \ &+ \frac{1}{2\sigma_y^2}\|y-H D(x_t,t)\|^2 \ &+\frac{1}{2\sigma_t^2}\mathrm{Tr}[\mathrm{Cov}(x_0|x_t)] + \frac{1}{2\sigma_y^2}\mathrm{Tr}[H \mathrm{Cov}(x_0|x_t) H^T] + C \end{aligned}$

where $D(x_t,t) = \mathbb{E}[x_0|x_t]$ is the denoiser’s mean (Gutha et al., 11 Dec 2025).

For practical inference, a simplified version drops the higher-order covariance-trace terms, justified as their contribution vanishes as $t\rightarrow 0$ :

$\mathrm{VML}^s_t(x_t) \approx -\log p(x_t) - \frac{1}{2\sigma_t^2}\|D(x_t,t) - x_t\|^2 + \frac{1}{2\sigma_y^2}\|y - H D(x_t,t)\|^2 + {\rm const}$

3. Inference Algorithm: VML–MAP

The VML-MAP algorithm integrates local VML minimization into each reverse-diffusion step:

For each time step $t_i$ (from largest $t_N$ to 0), starting from $x_{t_N} \sim \mathcal{N}(0, \sigma(t_N)^2 I)$ .
Perform $K$ steps of gradient descent on $\mathrm{VML}^s_{t_i}(x)$ ; the gradient is given by

$\nabla_x\mathrm{VML}^s = -\nabla_x\log p(x) - \frac{1}{\sigma_t^2}(D(x,t)-x) + \frac{1}{\sigma_y^2}H^T(y - H D(x,t))$

where $-\nabla_x\log p(x)$ is the learned score model $s_\theta(x,t)$ .

Advance with a standard reverse-diffusion step: sample $x_{t_{i-1}} \sim \mathcal{N}(D(x,t_i), \sigma(t_{i-1})^2 I)$ .

The cumulative network call complexity is $\approx N$ (for each denoiser call) plus $N \cdot K$ (for the gradient steps), usually much less than the $O(1000)$ neural calls needed for some alternative posterior-sampling methods (Gutha et al., 11 Dec 2025).

4. Theoretical Properties: Mode-Seeking and MAP Consistency

As $t\rightarrow0$ , the conditional $p(x_0|x_t)$ contracts to a sharp Gaussian about $x_t$ . In this limit, the asymptotic relation

$\lim_{t\to0}\bigl[\,\mathrm{VML}_t(x)+n\log\sigma_t\,\bigr] = -\,\log p(x|y)\;+\;\mathrm{const}$

implies that $\mathrm{VML}_t$ is minimized at the MAP estimate $\arg\max_x\log p(x|y)$ . The “trace of covariance” terms in the linear VML form uniformly tend to constants as $t \to 0$ , so dropping them does not alter the minimizer. Thus, local VML minimization at each step naturally drives the iterates toward the MAP solution (Gutha et al., 11 Dec 2025).

5. Empirical Performance and Benchmarking

VML-MAP’s efficacy was systematically validated on image restoration tasks—half-mask inpainting, $4\times$ super-resolution, and uniform deblurring—on ImageNet64, ImageNet256, FFHQ256, and latent CelebA256 datasets. Comparative methods included DDRM and IIGDM (posterior sampling), MAPGA (MAP estimation via PF-ODE), and DAPS.

Key results (ImageNet64, 1000 image subset):

Task	Method	LPIPS↓	FID↓
inpainting	VML-MAP	0.146	38.7
	MAPGA	0.172	46.3
	DDRM	0.262	57.0
4× super-res	VML-MAP	0.136	61.9
	MAPGA	0.203	83.9
	DDRM	0.235	78.2
deblurring	VML-MAP	—	105.5
	MAPGA	—	114.3
	DDRM	—	198.0

VML-MAP exceeded baselines in both LPIPS and FID across standard and large-scale (ImageNet256/FFHQ256) benchmarks using only $\mathcal{O}(10^3)$ neural net calls. A preconditioned variant (VML-MAP $^{\rm pre}$ ) further improved outcomes on ill-conditioned problems. For a given computation budget (e.g., 20 reverse steps $\times$ 50 gradient updates), VML-MAP outperformed DDRM/IIGDM that required $500$–$1000$ diffusion steps. Qualitative assessments reveal VML-MAP reconstructions with sharper textures and stricter measurement-consistency (Gutha et al., 11 Dec 2025).

6. Limitations and Open Extensions

VML’s practical deployment is subject to several constraints:

The optimization step is currently reliant on first-order gradient descent; more sophisticated (e.g., quasi-Newton) optimizers might speed up convergence but must retain computational efficiency.
Performance degrades as measurement noise $\sigma_y$ increases, since the measurement-fit term’s influence diminishes, leading to blurred reconstructions.
For latent diffusion models, VML must be adapted to account for decoder nonlinearities, making optimization harder and resulting in blurrier samples compared to pixel-space implementations.

Directions for further research include the design of robust optimizers for VML minimization, effective treatment of non-linear measurement operators particularly in latent spaces, and the development of joint score-plus-posterior neural estimators to minimize the number of required gradient steps (Gutha et al., 11 Dec 2025).

Markdown Report Issue Upgrade to Chat

References (1)

Mode-Seeking for Inverse Problems with Diffusion Models (2025)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Variational Mode-Seeking Loss (VML).

Variational Mode-Seeking Loss (VML)

1. Formal Definition and Conceptual Underpinning

2. Analytical Derivation and Closed-form for Linear Inverse Problems

3. Inference Algorithm: VML–MAP

4. Theoretical Properties: Mode-Seeking and MAP Consistency

5. Empirical Performance and Benchmarking

6. Limitations and Open Extensions

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Don't miss out on important new AI/ML research

Variational Mode-Seeking Loss (VML)

1. Formal Definition and Conceptual Underpinning

2. Analytical Derivation and Closed-form for Linear Inverse Problems

3. Inference Algorithm: VML–MAP

4. Theoretical Properties: Mode-Seeking and MAP Consistency

5. Empirical Performance and Benchmarking

6. Limitations and Open Extensions

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Related Topics

Don't miss out on important new AI/ML research

Sign up for free to explore the frontiers of research