Diffusion Posterior Sampling Overview

Updated 28 June 2026

Diffusion posterior sampling is a framework that integrates trained diffusion models with likelihood guidance to perform Bayesian posterior inference in inverse problems.
It employs a reverse diffusion process augmented with measurement gradients, yielding sharp reconstructions while managing varying noise levels.
The approach is plug-and-play, leveraging pre-trained diffusion priors to flexibly handle different operators and noise regimes without task-specific retraining.

Diffusion posterior sampling (DPS) refers to a spectrum of methodologies that utilize diffusion-based generative models to perform posterior inference in inverse problems, where the target is sampling from $p(x \mid y) \propto p(y \mid x) p(x)$ . These methods combine a learned diffusion prior with explicit or implicit likelihood guidance, enabling flexible, plug-and-play solutions to imaging and other high-dimensional Bayesian inverse tasks. DPS distinguishes itself from traditional MCMC and purely variational approaches by leveraging the strong data priors encoded in modern diffusion models, and by accommodating a wide variety of degradation operators and noise statistics without task-specific retraining.

1. Core Principles and Mathematical Formulation

The foundational setup in DPS considers an unknown signal $x_0$ observed through a forward operator $H$ and additive noise $n$ as $y = H x_0 + n, \ n \sim \mathcal{N}(0, \sigma^2 I)$ . The resulting posterior has the form

$p(x \mid y) \propto \exp\left(-\frac{1}{2\sigma^2} \|y - H x\|_2^2 \right) p(x),$

where $p(x)$ is implicitly represented by a pre-trained score-based diffusion model. Sampling from $p(x \mid y)$ thus requires combining the diffusion prior with a likelihood-guided correction.

DPS operates by augmenting the standard reverse-time stochastic differential equation (SDE) or discrete Markov chain of the diffusion model with a measurement-consistency term:

$x_{t-1} = x_t + \alpha_t s_\theta(x_t, t) + \rho \beta_t \nabla_x \log p(y \mid x_t) + \sqrt{\gamma_t} z_t,$

where $s_\theta$ is the trained score (denoising) network, $x_0$ 0 are time-dependent coefficients, $x_0$ 1 is a "guidance scale" balancing prior and data fidelity, and $x_0$ 2 represents injected noise. For Gaussian $x_0$ 3, the log-likelihood gradient is explicit: $x_0$ 4 (Syarubany, 25 Dec 2025).

In general, the reverse SDE or DDPM update is conditioned on both the data and the current estimate, often using Tweedie's formula for the posterior mean estimation in the latent space.

2. Algorithmic Construction and Conditioning Strategies

A typical DPS algorithm proceeds as follows:

Initialization: Begin with $x_0$ 5 corresponding to pure noise.
Reverse Diffusion with Likelihood Guidance: For each time step $x_0$ $x_{0}$ 6:
1. Compute the prior-driven denoising step.
2. Calculate the measurement likelihood gradient at the current $x_0$ 7.
3. Update $x_0$ 8 by combining the denoising and measurement gradients with stochasticity.
Parameter Selection: The guidance scale $x_0$ 9 and the noise standard deviation $H$ 0 are tuned to balance data fidelity and stability. Excessive $H$ 1 may induce artifacts, while insufficient $H$ 2 leads to under-enforced measurement consistency (Syarubany, 25 Dec 2025).

Alternative conditioning mechanisms include:

Manifold-Constrained Gradient (MCG): Enforces measurement consistency by a hard projection after each step; can amplify high-frequency noise under additive noise models (Syarubany, 25 Dec 2025).
Annealed Guidance: Varies $H$ 3 over diffusion time to accommodate different regularization strengths at different scales; excessive smoothness in scheduling can under-enforce consistency (Syarubany, 25 Dec 2025).
Direct Likelihood Approximations: For non-Gaussian or nonlinear measurements (e.g., Poisson or nonlinear tomography), the measurement gradient is formulated via chain rule using Jacobians of the conditional denoiser (Li et al., 2023).

3. Theoretical Properties and Practical Performance

DPS provides a flexible framework in which the pre-trained prior remains untouched for each inverse problem, allowing zero-shot application to new operators and noise statistics. Empirical ablations demonstrate:

Optimal performance at moderate guidance scales: On $H$ 4 super-resolution with additive Gaussian noise, the best performance is achieved at $H$ 5 and $H$ 6, with combined metric score $H$ 7 (Syarubany, 25 Dec 2025).
Qualitative Structure Restoration: With optimal parameters, DPS reconstructs sharp edges and coherent mid-frequency details that are suppressed in the downsampled inputs. Alternative methods either introduce oscillatory artifacts or struggle with texture fidelity.
Sensitivity to Noise and Guidance Hyperparameters: Larger $H$ 8 attenuates the measurement signal and degrades PSNR/SSIM; too small $H$ 9 can cause overfitting. Overly large $n$ 0 produces instabilities and visual artifacts (Syarubany, 25 Dec 2025).

In comparison to methods involving hard projection or scheduled annealing, fixed-moderate DPS achieves a better trade-off between high-frequency restoration and overall image quality in standard DDPM settings.

4. Extensions and Application Domains

DPS has demonstrated competitive or superior results in:

Single and Multi-measurement Bayesian inverse problems: Including super-resolution, CT/MRI reconstruction, deblurring, phase retrieval, and inpainting (Syarubany, 25 Dec 2025, Chung et al., 2022, Li et al., 2023).
Nonlinear and Non-Gaussian Forward Models: DPS seamlessly extends to nonlinear operators and signal-dependent noise regimes (Poisson), using backpropagation through neural decoders and auxiliary chain rules for measurement gradient calculation (Li et al., 2023).
Plug-and-Play Inference: The framework requires no retraining of the diffusion model for each measurement operator or noise model; measurement-conditioning is implemented at inference time (Syarubany, 25 Dec 2025, Chung et al., 2022).
Algorithmic Variants: DPS serves as a building block for advanced posterior inference techniques such as simulation-based inference in tall-data settings (with score model aggregation), sequential/temporal inverse problems (with transition models), and hybrid Langevin-diffusion samplers for enhanced MCMC mixing (Linhart et al., 2024, Stevens et al., 2024, Zhao, 1 Jun 2025).

5. Open Problems, Limitations, and Practical Recommendations

Despite its generality, DPS is subject to several practical and theoretical considerations:

Operator and Noise Model Match: Careful tuning of $n$ 1 is required to avoid under- or overfitting to the measured data in mismatched scenarios (Syarubany, 25 Dec 2025).
Choice of Guidance Scale $n$ 2: Empirical evidence recommends values just below one (e.g., $n$ 3) to maximize reconstruction fidelity while maintaining stability (Syarubany, 25 Dec 2025).
No Diverse Posterior Samples: Standard DPS may behave like a MAP estimator—consistently producing sharp, high-quality samples with limited diversity, rather than true posterior draws. This effect has been observed and quantified empirically, and algorithms incorporating additional randomness or explicit posterior sampling steps are needed for proper uncertainty quantification (Xu et al., 31 Jan 2025).
No Retraining, but Potential Error Accumulation: Blending prior and measurement-gradient guidance avoids the need for specialized training, but may accumulate errors or lead to suboptimal sample diversity, particularly in highly ill-posed or nonidentifiable inverse problems (Syarubany, 25 Dec 2025, Chung et al., 2022).
Practical Guidance:
- Tune hyperparameters to maximize task-specific fidelity metrics.
- Begin with guidance scale $n$ 4 and noise level $n$ 5 set to sensor characteristics.
- Avoid retraining priors; leverage DPS as an inference-time plug-and-play method with clear diagnostic ablation (Syarubany, 25 Dec 2025).

6. Representative Quantitative Results

PS-scale $n$ 6	Noise $n$ 7	Combined Score (PSNR/40 + SSIM)
0.95	0.01	1.45231 (best)
0.90	0.01	1.44452
0.80	0.01	1.42857
0.50	0.05	1.32122
0.20	0.05	1.16456

Moderate guidance scales and low observation noise yield the best combined metric. Decreasing $n$ 8 or increasing $n$ 9 degrades both distortion and perceptual quality (Syarubany, 25 Dec 2025).

7. Impact and Outlook

Diffusion posterior sampling has established itself as a highly effective paradigm in computational imaging and Bayesian inverse problems, enabling high-quality reconstructions in challenging regimes and broadening the applicability of diffusion models as plug-and-play priors. By balancing denoising prior information and explicit likelihood constraints, DPS offers a principled yet flexible alternative to classic MAP optimization or MCMC approaches, without the need for retraining on each new measurement scenario.

Significant ongoing research aims at improving posterior sample diversity, theoretical error bounds, and generalizing DPS frameworks to settings involving nonlinearities, non-Gaussian noise, and high-dimensional, multimodal posteriors. Recent empirical and theoretical analyses clarify both the strengths and subtle limitations of DPS—especially its tendency toward MAP-like solutions and sensitivity to guide scale selection—providing directions for next-generation diffusion-based Bayesian inference (Syarubany, 25 Dec 2025, Xu et al., 31 Jan 2025).