Papers
Topics
Authors
Recent
2000 character limit reached

Diffusion-Based Posterior Sampling

Updated 22 December 2025
  • Diffusion-based posterior sampling is a framework that uses probabilistic diffusion to connect tractable priors to complex Bayesian posteriors, enabling scalable uncertainty quantification.
  • The PDPS method leverages Monte Carlo sampling and Langevin dynamics to estimate posterior scores, delivering non-asymptotic error bounds and superior performance in image restoration.
  • A three-stage sampling procedure with warm-start initialization mitigates bias and ensures polynomial-time convergence for high-dimensional, multimodal inverse problems.

Diffusion-based posterior sampling encompasses a family of transport and stochastic simulation methods, where a probabilistic diffusion process connects a tractable reference density to a complex Bayesian posterior. In this context, the prior is often modeled by a data-driven score-based diffusion model, and the measurement likelihood is incorporated through Monte Carlo estimators, Langevin dynamics, or optimization-based surrogates for conditional score approximations. This methodology enables scalable uncertainty quantification and generative posterior inference in general nonlinear, noisy inverse problems, and Bayesian inverse problems with multi-modal, high-dimensional targets. Rigorous non-asymptotic error bounds now exist, together with practical plug-and-play implementation strategies. The framework is exemplified by Provable Diffusion Posterior Sampling (PDPS), which provides polynomial-time convergence and empirical superiority in image restoration tasks (Chang et al., 8 Dec 2025).

1. Mathematical Formulation: Bayesian Posterior and Diffusion Transport

The diffusion-based posterior sampling framework considers observations yRny \in \mathbb{R}^n generated via

Y=G(X0)+n,nρ(n),Y = \mathcal{G}(X_0) + n,\quad n \sim \rho(n),

with a known forward operator G:RdRn\mathcal{G}: \mathbb{R}^d \to \mathbb{R}^n and noise density ρ\rho. The posterior distribution is

π(x0y)exp(y(x0))π0(x0),y(x0)=logρ(yG(x0)),\pi(x_0|y) \propto \exp(-\ell_y(x_0))\,\pi_0(x_0),\quad \ell_y(x_0) = -\log\rho(y - \mathcal{G}(x_0)),

where π0\pi_0 denotes a pretrained, data-driven prior. To construct a transport from a tractable terminal density πT\pi_T back to the posterior, the process evolves under

dXt=Xtdt+2dBt,X0π0,t[0,T],dX_t = -X_t\,dt + \sqrt{2}\,dB_t,\quad X_0 \sim \pi_0,\quad t \in [0, T],

where XTπTX_T \sim \pi_T. The time-reversal process starting from πT\pi_T is then

dXˉt=(Xˉt+2xlogqTt(Xˉty))dt+2dBt,d\bar{X}_t = (\bar{X}_t + 2\nabla_x\log q_{T-t}(\bar{X}_t|y))\,dt + \sqrt{2}\,dB_t,

with initial condition Xˉ0πT(y)\bar{X}_0 \sim \pi_T(\cdot|y) and XˉTπ(y)\bar{X}_T \sim \pi(\cdot|y).

2. Posterior Score Estimation via Restricted Gaussian Oracle (RGO) and Monte Carlo

A central challenge is the estimation of the time-dependent posterior score

st(x;y)=xlogqt(xy),s_t(x;y) = \nabla_x\log q_t(x|y),

where qt(xy)q_t(x|y) is the joint time-tt marginal. Utilizing a conditional Tweedie identity yields

st(x;y)=xμtD(t,x,y)σt2,D(t,x,y)=E[X0Xt=x,Y=y],s_t(x;y) = -\frac{x - \mu_t D(t, x, y)}{\sigma_t^2},\quad D(t, x, y) = \mathbb{E}[X_0|X_t = x, Y = y],

with μt=et\mu_t = e^{-t} and σt2=1e2t\sigma_t^2 = 1 - e^{-2t}. The denoiser expectation D(t,x,y)D(t, x, y) is defined under the "restricted Gaussian oracle" (RGO) density

pt(x0x,y)exp(xμtx022σt2y(x0))π0(x0).p_t(x_0|x, y) \propto \exp\left(-\frac{\|x - \mu_t x_0\|^2}{2\sigma_t^2} - \ell_y(x_0)\right)\pi_0(x_0).

Practically, pt(x,y)p_t(\cdot|x, y) is approximated by Monte Carlo sampling, specifically by generating samples from a Langevin SDE: dX0,s=(logπ0(X0,s)+μtσt2(xμtX0,s)y(X0,s))ds+2dBs,s[0,S].dX_{0,s} = (\nabla\log\pi_0(X_{0,s}) + \frac{\mu_t}{\sigma_t^2}(x - \mu_t X_{0,s}) - \nabla\ell_y(X_{0,s}))\,ds + \sqrt{2}\,dB_s,\quad s \in [0, S]. With prior score replaced by a pretrained estimator spriors_{\mathrm{prior}}, the denoiser and score are estimated as

D^mS(t,x,y)=1mi=1mX0,S,ix,y,t,s^mS(t,x,y)=xμtD^mS(t,x,y)σt2.\widehat{D}_m^S(t, x, y) = \frac{1}{m}\sum_{i=1}^m X_{0,S,i}^{x, y, t},\quad \widehat{s}_m^S(t, x, y) = -\frac{x - \mu_t \widehat{D}_m^S(t, x, y)}{\sigma_t^2}.

3. Warm-Start Initialization and Three-Stage Sampling Procedure

Since πT(y)\pi_T(\cdot|y) may deviate from a standard Gaussian for small TT, PDPS avoids bias by initializing the reverse chain with a "warm start." This is accomplished via an outer Langevin chain using the estimated posterior score: dXT,u=s^mS(T,XT,u,y)du+2dBu,u[0,U],dX_{T,u} = \widehat{s}_m^S(T, X_{T,u}, y)\,du + \sqrt{2}\,dB_u,\quad u \in [0, U], ensuring Xˉ0πT(y)\bar{X}_0 \sim \pi_T(\cdot|y) approximately, given sufficient mixing time UU. The complete algorithm entails: (i) warm-start Langevin to sample Xˉ0\bar{X}_0, (ii) time-reversal diffusion using s^mS\widehat{s}_m^S for t[0,TT0]t \in [0, T-T_0], and (iii) a final scaling/denoising step.

4. Non-Asymptotic Convergence and Error Bounds

PDPS delivers rigorous non-asymptotic error bounds in 2-Wasserstein distance: W2(π(y),π^)C1(T0)+C2T0TEs^mS(t)st2+C3eU/CLSI,W_2(\pi(\cdot|y),\,\widehat{\pi}) \leq C_1(T_0) + C_2\int_{T_0}^T \mathbb{E}\|\widehat{s}_m^S(t)-s_t\|^2 + C_3 e^{-U/C_{\mathrm{LSI}}}, where errors decompose into early-stop, score-estimation, and warm-start errors. Under the following conditions:

  • π(y)\pi(\cdot|y) is α\alpha-semi-log-concave, sub-Gaussian tails (VSGV_{\mathrm{SG}}),
  • Prior score-matching error εprior\varepsilon_{\mathrm{prior}},
  • Conditioning κy<\kappa_y < \infty: for any ε>0\varepsilon > 0, selecting

T0ε,Ulog(1/ε),mε2,Slog(1/ε)T_0 \sim \sqrt{\varepsilon},\quad U \sim \log(1/\varepsilon),\quad m \sim \varepsilon^{-2},\quad S \sim \log(1/\varepsilon)

yields final error

W2(π(y),π^)Cε1/4log(1/ε)W_2(\pi(\cdot|y),\,\widehat{\pi}) \leq C\,\varepsilon^{1/4} \sqrt{\log(1/\varepsilon)}

with constants CC polynomial in (κy,εprior,α,VSG)(\kappa_y, \varepsilon_{\mathrm{prior}}, \alpha, V_{\mathrm{SG}}), dimension-free.

5. Numerical Benchmarks: Image Deblurring and Robustness

PDPS was evaluated on FFHQ 64×6464\times64 images for Gaussian, motion, and nonlinear (GOPRO) blur. In all scenarios, PDPS exceeded classical TV and prior DPS in PSNR and SSIM:

Method Gaussian Motion Nonlinear
TV 23.95/0.81 24.65/0.80 19.70/0.53
DPS 24.15/0.81 26.66/0.88 20.93/0.68
PDPS (ours) 26.42/0.87 28.86/0.92 28.44/0.91

PDPS generated reconstructions with finer texture, fewer artifacts, and pixel-wise uncertainty maps. Robustness was confirmed under cross-dataset prior mismatch, retaining a decisive PSNR/SSIM advantage.

6. Structural Design Considerations and Practical Implications

Key insights from PDPS highlight the necessity of:

  • Small diffusion time TT: Ensures the log-concavity of the RGO target for Langevin accuracy, yet large enough for log-Sobolev warm-start properties.
  • Plug-and-play modularity: Decoupled prior learning enables generalization to arbitrary likelihoods post-prior training.
  • Three-stage procedure: Inner Langevin (RGO), outer Langevin (warm start), coupled with reverse diffusion, admits non-asymptotic error control even for multimodal posteriors.
  • Computational scaling: Monte Carlo sample size mm and inner Langevin time SS scale logarithmically with precision, rendering the sampler practical for high-fidelity posterior inference.

7. Significance: Advances Over Previous Heuristic Methods

PDPS resolves several open issues in diffusion posterior sampling by (i) eliminating heuristic score and likelihood approximations, (ii) providing theoretical guarantees for convergence and uncertainty quantification, and (iii) demonstrating empirical improvements in both accuracy and robustness. The approach generalizes to inverse problems with complex forward models and arbitrary likelihoods, subject only to regularity assumptions and log-concavity, setting a new standard in Bayesian inversion.

8. Outstanding Challenges and Future Directions

Open directions include further optimizing parallelization for large-scale inverse problems, extending multi-modal uncertainty quantification in non-log-concave settings, and integrating advanced score-matching estimators for settings with challenging priors. Analysis of lower bounds and computational barriers for worst-case priors remains salient, informed by hardness results established in cryptographic complexity (Gupta et al., 20 Feb 2024). The extension of PDPS methodology to hierarchical models and simulation-based inference is another promising avenue for future research.


The PDPS framework synthesizes data-driven priors, rigorous Monte Carlo posterior score estimation, and non-asymptotic transport analysis, yielding a theoretically founded, practical, and robust Bayesian inversion sampler (Chang et al., 8 Dec 2025).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (2)

Whiteboard

Topic to Video (Beta)

Follow Topic

Get notified by email when new papers are published related to Diffusion-Based Posterior Sampling.