Papers
Topics
Authors
Recent
Search
2000 character limit reached

Diffusion Priors in Inverse Problems

Updated 8 February 2026
  • Diffusion priors are SDE-parameterized probabilistic models that learn score functions to capture complex geometric, semantic, and multimodal features.
  • They integrate data-driven priors with Bayesian inverse problem formulations, using reverse SDE sampling and likelihood gradients for robust reconstruction.
  • They excel in handling sparse, noisy, or partial data, yielding higher fidelity outcomes compared to conventional hand-crafted priors.

A diffusion prior is a probabilistic model, parameterized via a diffusion process (forward SDE and learned reverse SDE), used to inject powerful, data-driven structural constraints into generative and inverse problems. Unlike traditional priors (e.g., sparsity, smoothness), diffusion priors are learned from data and are capable of capturing the intricate geometric, semantic, or multimodal structure of complex domains such as images, 3D objects, or biological assemblies.

1. Diffusion Priors: Formalism and Scoring

A diffusion prior p(x)p(x) is constructed by training a stochastic differential equation (SDE)-based generative model. The forward SDE corrupts samples x0p0x_0 \sim p_0 into isotropic noise over time, typically: dxt=f(xt,t)dt+g(t)dWtd x_t = f(x_t, t)\,dt + g(t)\,dW_t In the Karras–EDM formulation, f0f\equiv 0 and g(t)=2tg(t) = \sqrt{2t}; so xtx_t diffuses to Gaussian noise as tTt \to T.

The reverse SDE (Anderson 1982) moves from pure noise back to data: dxt=[f(xt,t)g(t)2xtlogpt(xt)]dt+g(t)dW~td x_t = [f(x_t, t) - g(t)^2\,\nabla_{x_t}\log p_t(x_t)]\,dt + g(t)\,d\tilde{W}_t where pt(xt)p_t(x_t) is the marginal at diffusion time tt, and xtlogpt(xt)\nabla_{x_t}\log p_t(x_t) is the time-dependent Stein score. The score function is estimated by training a neural network sθ(x,t)s_\theta(x,t) via denoising score matching (DSM), using Gaussian perturbation p0t(xtx0)=N(xt;x0,t2I)p_{0t}(x_t|x_0) = \mathcal{N}(x_t;x_0,t^2 I) and minimizing

L(θ)=Et,x0,xt[λ(t)  xtlogp0t(xtx0)sθ(xt,t)2]L(\theta) = \mathbb{E}_{t, x_0, x_t}[\lambda(t)\; \|\nabla_{x_t}\log p_{0t}(x_t|x_0) - s_\theta(x_t, t)\|^2]

with the analytical score xtlogp0t(xtx0)=(xtx0)/t2\nabla_{x_t}\log p_{0t}(x_t|x_0) = -(x_t - x_0)/t^2 (Möbius et al., 2024).

2. Diffusion Priors in Bayesian Inverse Problems

Given incomplete or noisy observations yy arising from an unknown xx through

yA(x)+η,ηN(0,σ2I)y \approx A(x) + \eta, \quad \eta \sim \mathcal{N}(0, \sigma^2 I)

the Bayesian posterior is

p(xy)p(yx)p(x)p(x|y) \propto p(y|x)\,p(x)

The diffusion prior p(x)p(x) encodes the data manifold more expressively than classical hand-crafted priors.

To incorporate the likelihood, one defines an energy E(x;y)E(x;y) for the forward operator and writes

p(yx)exp[E(x;y)]p(y|x) \propto \exp[-E(x;y)]

allowing for flexible data terms (e.g., permutation-invariant matching of projections). Posterior sampling leverages a reverse SDE augmented by likelihood gradients: xlogp^t(xy)=xlogpt(x)+ζ(t)xlogp(ysθ(x,t))\nabla_x \log\hat{p}_t(x|y) = \nabla_x \log p_t(x) + \zeta(t)\,\nabla_x \log p(y|\,s_\theta(x, t)) where sθ(x,t)s_\theta(x, t) is the denoised guess, and ζ(t)\zeta(t) balances prior and data fit. The resulting "Diffusion Posterior Sampling" iteratively applies Euler–Maruyama or other discretizations that couple score and data likelihood (Möbius et al., 2024).

3. Implementation Details and Posterior Sampling

The general sampling algorithm proceeds as follows:

  1. Initialization: Sample xt0N(0,t02I)x_{t_0} \sim \mathcal{N}(0, t_0^2 I).
  2. Reverse SDE Step: For discrete times t0>t1>>tN=0t_0 > t_1 > \ldots > t_N = 0, with Δt=titi+1\Delta t = t_i - t_{i+1},
    • Compute prior drift using the approximate posterior score
    • Apply a second-order correction and inject noise
    • Repeat until t=0t = 0

Concretely,

xti+1=xti+c[logp^ti(xtiy)+logp^ti+1(xy)]Δt+N(0,g2I)x_{t_{i+1}} = x_{t_i} + c \left[ \nabla \log \hat{p}_{t_i}(x_{t_i}|y) + \nabla \log \hat{p}_{t_{i+1}}(x'|y) \right]\Delta t + \mathcal{N}(0,\,g^2 I)

with x=xti+tixlogp^ti(xtiy)Δtx' = x_{t_i} + t_i \nabla_x \log \hat{p}_{t_i}(x_{t_i}|y)\Delta t and cc, gg as in the main text (Möbius et al., 2024).

Data terms E(x;y)E(x;y) are application-specific but always enter as gradients in the posterior score.

4. Application Domains and Empirical Advantages

Diffusion priors have been applied in 3D reconstruction from incomplete projections (e.g., cryo-EM), image restoration, medical imaging, and other ill-posed inverse problems:

  • Cryo-EM/ShapeNet: Reconstruction from as few as 1–5 projections and/or low-resolution traces using diffusion priors achieves RMSD reductions from 5–12 Å (ML) to 3–8 Å (DPS), and halves error metrics such as Chamfer Distance and EMD relative to classical maximum-likelihood (Möbius et al., 2024).
  • Resolution Regimes: Diffusion priors are especially beneficial with very sparse, noisy, or partial data, enabling "intermediate-resolution" reconstructions that are unattainable with hand-crafted priors.

The table below exemplifies the gains in molecular assembly reconstruction:

Regime ML RMSD (Å) DPS RMSD (Å)
Single view + coarse points 5–12 3–8
4–6 projections (ShapeNet chairs) High error ~50% reduction
Cryo-EM (Proteins) 5–12 3–8

The unified reverse SDE sampling brings together generative priors and experimental constraints, yielding improved sample quality and data fidelity, especially in strongly underdetermined settings (Möbius et al., 2024).

5. Generalization, Limitations, and Future Directions

Diffusion priors exhibit:

  • Generalization: Flexible adaptation to varied forward models and noise regimes via redefinition of the likelihood and score coupling.
  • Complex Structure Capture: Unlike simple priors, diffusion models fit non-trivial manifolds such as biologically realistic 3D forms or textile texture.
  • No closed-form likelihood: The prior p(x)p(x) is only accessible through the learned score; direct evaluation of p(xy)p(x|y) is intractable—one must sample.
  • Sampling overhead: Posterior sampling is considerably more expensive than with analytic priors, though methods such as second-order or flow-based acceleration are under active development.

Open questions include optimal score interpolation, balancing prior vs. likelihood (schedule ζ(t)\zeta(t)), and rigorous convergence analysis of the posterior sampler.

6. Theoretical and Methodological Foundations

Diffusion priors provide a data-driven density p(x)p(x) defined through the integrated reverse-time SDE, with score-learning rooted in denoising score matching:

  • Posterior formulation:

p(xy)p(yx)p(x)p(x|y) \propto p(y|x) p(x)

  • Training criterion:

L(θ)=Et,x0,xt[λ(t)xtlogp0t(xtx0)sθ(xt,t)2]L(\theta) = \mathbb{E}_{t, x_0, x_t}[ \lambda(t) \|\nabla_{x_t} \log p_{0t}(x_t|x_0) - s_\theta(x_t, t) \|^2 ]

  • Reverse SDE:

dxt=[f(xt,t)g(t)2xlogpt(xt)]dt+g(t)dW~td x_t = [f(x_t, t) - g(t)^2 \nabla_x \log p_t(x_t) ] dt + g(t) d\tilde{W}_t

  • Posterior score:

xlogp^t(xy)=xlogpt(x)+ζ(t)xlogp(ysθ(x,t))\nabla_x \log \hat{p}_t(x|y) = \nabla_x \log p_t(x) + \zeta(t) \nabla_x \log p(y| s_\theta(x, t))

Empirically, these constructs harmonize denoiser priors with experimental likelihoods such that solutions interpolate flexibly between strong generative prior support and observed data (Möbius et al., 2024).


In summary, a diffusion prior is a data-derived, SDE-parametrized probabilistic model leveraged as the prior in a Bayesian inverse problem framework. By learning a rich score and unifying with the forward-model likelihood in a reverse SDE-driven sampler, diffusion priors enable high-fidelity solutions for ill-posed problems inaccessible to conventional priors—especially when observations are highly incomplete, noisy, or partial (Möbius et al., 2024).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Topic to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Diffusion Priors.