Papers
Topics
Authors
Recent
2000 character limit reached

SURE Guided Posterior Sampling (SGPS)

Updated 5 January 2026
  • SGPS is a trajectory-corrected inference algorithm that integrates diffusion denoising with SURE-based error correction and PCA noise estimation.
  • It interleaves conditional posterior guidance with local residual measurement to achieve high-quality image reconstructions under tight computational budgets.
  • The method leverages unbiased risk estimates and KL convergence theory to mitigate error accumulation, ensuring efficient correction during sampling.

SURE Guided Posterior Sampling (SGPS) is a trajectory-corrected inference algorithm for diffusion-based inverse problems that leverages Stein’s Unbiased Risk Estimate (SURE) and PCA-based noise estimation to mitigate error accumulation in the critical early and middle stages of sampling. SGPS consistently achieves high-quality reconstructions under tight computational budgets—requiring fewer than 100 Neural Function Evaluations (NFEs)—by interleaving diffusion denoising, conditional posterior guidance, local residual noise measurement, and data-guided correction steps (Kim et al., 29 Dec 2025).

1. Inverse Problem Formulation and Diffusion Priors

The core objective is to recover an unknown image xRnx \in \mathbb{R}^n from noisy linear measurements: y=Ax+η,ηN(0,σy2Im)y = A x + \eta, \quad \eta \sim \mathcal{N}(0, \sigma_y^2 I_m) where ARm×nA \in \mathbb{R}^{m \times n} is a known forward operator (e.g., for super-resolution or deblurring), and σy\sigma_y is the measurement noise standard deviation.

A diffusion model serves as the learned prior, specified by a stochastic differential equation (SDE): dxt=2σ˙(t)σ(t)dwt,    x0pdata,  xTN(0,σT2I)\mathrm{d} x_t = \sqrt{2 \dot{\sigma}(t) \sigma(t)}\,\mathrm{d}w_t,\;\; x_0 \sim p_{\text{data}},\; x_T \approx \mathcal{N}(0, \sigma_T^2 I) with reparameterization in EDM by σ(t)=t,t[σmin0,σmax=T]\sigma(t) = t,\, t \in [\sigma_{\min} \approx 0, \sigma_{\max} = T].

The unconditional backward sampling (reverse-time ODE) is: dxdt=σ˙(t)σ(t)xlogp(x;σ(t))\frac{\mathrm{d}x}{\mathrm{d}t} = -\dot{\sigma}(t)\sigma(t)\nabla_x\log p(x;\sigma(t)) approximated in practice via a pre-trained denoiser DθD_\theta: xlogp(x;σ)=Dθ(x;σ)xσ2\nabla_x\log p(x;\sigma) = \frac{D_\theta(x;\sigma) - x}{\sigma^2}

Posterior sampling for the inverse problem uses Bayes’ rule: xtlogp(xty)=xtlogpt(xt)+xtlogp(yxt)\nabla_{x_t}\log p(x_t|y) = \nabla_{x_t}\log p_t(x_t) + \nabla_{x_t}\log p(y|x_t) The data-consistency term xtlogp(yxt)\nabla_{x_t}\log p(y|x_t) is typically intractable and must be approximated.

2. SURE-Based Trajectory Correction

2.1 Stein's Unbiased Risk Estimate (SURE)

SURE provides an unbiased estimator of the mean squared error (MSE) for denoising under additive Gaussian noise. For xnoisy=x0+zx_{\text{noisy}} = x_0 + z, zN(0,σ2I)z \sim \mathcal{N}(0, \sigma^2 I), and denoiser ff: SURE(xnoisy)=nσ2+xnoisyf(xnoisy)2+2σ2tr(Jf(xnoisy))\text{SURE}(x_{\text{noisy}}) = -n \sigma^2 + \|x_{\text{noisy}} - f(x_{\text{noisy}})\|^2 + 2\sigma^2 \operatorname{tr}(J_f(x_{\text{noisy}})) where Jf=f/xnoisyJ_f = \partial f / \partial x_{\text{noisy}}. The trace is estimated by a Monte Carlo probe: trJf(x)b(f(x+ϵb)f(x))ϵ,    bN(0,I)\operatorname{tr} J_f(x) \approx \frac{b^\top(f(x + \epsilon b) - f(x))}{\epsilon},\;\; b \sim \mathcal{N}(0, I)

The SURE gradient direction is obtained by differentiating SURE w.r.t. xx: xSURE(x)=2(xf(x))2σ2x[trJf(x)]\nabla_x\,\mathrm{SURE}(x) = 2(x-f(x)) - 2\sigma^2 \nabla_x [\operatorname{tr} J_f(x)] The correction is: xcorrected=xnoisyαxnoisySURE(xnoisy)x_{\text{corrected}} = x_{\text{noisy}} - \alpha \nabla_{x_{\text{noisy}}} \mathrm{SURE}(x_{\text{noisy}}) where α\alpha is a user-chosen step size; experiments use α=0.5\alpha=0.5.

2.2 Local SURE Gradient Update

After applying conditional guidance, let xnoisyx_{\text{noisy}} be the resulting state. Given a residual noise estimate σ^0\hat{\sigma}_0 (see Section 3), the denoiser is applied: x^=f(xnoisy;σ^0)\hat{x} = f(x_{\text{noisy}}; \hat{\sigma}_0) with SURE evaluated as: SURE=nσ^02+xnoisyx^2+2σ^02b(f(xnoisy+ϵb)x^)ϵ\mathrm{SURE} = -n\hat{\sigma}_0^2 + \|x_{\text{noisy}} - \hat{x}\|^2 + 2\hat{\sigma}_0^2 \frac{b^\top(f(x_{\text{noisy}} + \epsilon b) - \hat{x})}{\epsilon} A correction step via autodiff follows, reducing residual noise and pulling samples toward the data manifold.

3. PCA-Based Residual Noise Estimation

Accurate SURE application requires knowledge of the residual variance σ^02\hat{\sigma}_0^2 in xnoisyx_{\text{noisy}}. SGPS employs a patch PCA estimator:

  • Decompose xnoisyx_{\text{noisy}} into ss overlapping patches {pi}i=1s\{p_i\}_{i=1}^s, compute mean μ\mu and covariance

Σ=1si=1s(piμ)(piμ)\Sigma = \frac{1}{s} \sum_{i=1}^s (p_i - \mu)(p_i - \mu)^\top

  • Eigen-decompose Σ\Sigma to obtain eigenvalues λ1λ2λr\lambda_1 \ge \lambda_2 \ge \ldots \ge \lambda_r. For each ii, define: τi=1ri+1j=irλj\tau_i = \frac{1}{r - i + 1} \sum_{j=i}^r \lambda_j The smallest ii with τi\tau_i equal to the median of {λi,...,λr}\{\lambda_i, ..., \lambda_r\} is chosen. The noise level is then

σ^0=τi\hat{\sigma}_0 = \sqrt{\tau_i}

This estimator is efficient and requires no additional training.

4. SURE Guided Posterior Sampling Algorithm

The SGPS algorithm proceeds as follows:

  1. Initialization: Sample xTN(0,σT2I)x_T \sim \mathcal{N}(0, \sigma_T^2 I).
  2. For t=T,,1t = T, \dots, 1:
    • a) Denoising: x^0t=Dθ(xt,σt)\hat{x}_{0|t} = D_\theta(x_t, \sigma_t).
    • b) Conditional Guidance: Use Langevin iterations to obtain x0t,yx_{0|t,y} that balances prior and data likelihood.
    • c) PCA Noise Estimation: Estimate residual noise σ^0\hat{\sigma}_0 from x0t,yx_{0|t,y}.
    • d) SURE Gradient Correction: Apply local correction using the SURE gradient to x0t,yx_{0|t,y}, yielding x0t,yx^*_{0|t,y}.
    • e) Sample for Next Step: xt1N(x0t,y,σt12I)x_{t-1} \sim \mathcal{N}(x^*_{0|t,y}, \sigma_{t-1}^2 I).
  3. Return x0x_0.

Distinctive features:

  • Estimated, not assumed, noise levels at each step (σ^0\hat{\sigma}_0 via PCA).
  • Local SURE-based correction at every iteration directly addresses sampling trajectory deviations.

5. Theoretical Properties

  • Gaussian-Preservation (Theorem 1): Small-step Langevin guidance ensures the output of the denoiser remains nearly Gaussian in Wasserstein-2 distance O(η2nσt2)O(\eta^2 n \sigma_t^2), justifying the use of SURE at each iteration.
  • KL-Convergence with SURE Correction (Theorem 2): Under local strong convexity of logp(xy)-\log p(x|y) and bounded SURE bias/variance, each correction step reduces the KL divergence to the true posterior, up to O(βt2)O(\beta_t^2) error, where βt=ασ^02\beta_t = \alpha \hat{\sigma}_0^2: DKL(qtp)(1βtμ)DKL(qtp)+βt2C+ΔtD_{\mathrm{KL}}(q_t^* || p) \leq (1 - \beta_t\mu) D_{\mathrm{KL}}(q_t || p) + \beta_t^2 C + \Delta_t
  • Error-Cascade Mitigation: By removing residual noise at each iteration, SGPS avoids error accumulation characteristic of earlier-stage high-noise samples, enabling accurate inference with <100<100 NFEs.

6. Empirical Performance and Cost Analysis

6.1 Benchmark Domains

SGPS was evaluated on linear (FFHQ256 super-resolution 4×4\times, box inpainting, random inpainting, Gaussian and motion deblurring) and nonlinear (phase retrieval, nonlinear deblurring, HDR recovery) inverse problems.

6.2 Quantitative Results

Performance with T=16T=16 (48\approx 48 NFE) and T=33T=33 (99\approx 99 NFE) is reported using PSNR (higher is better) and LPIPS (lower is better):

Method NFE SR4 PSNR / LPIPS InpaintBox PSNR / LPIPS InpaintRnd PSNR / LPIPS GaussDebl PSNR / LPIPS MotDebl PSNR / LPIPS
SGPS 99 29.38 / 0.179 24.23 / 0.133 30.47 / 0.116 29.35 / 0.179 31.24 / 0.148
DAPS 100 27.69 / 0.230 22.51 / 0.192 26.64 / 0.238 27.77 / 0.220 29.84 / 0.167
Method NFE PhaseRet PSNR / LPIPS NonlinDebl PSNR / LPIPS HDR PSNR / LPIPS
SGPS 99 24.08 / 0.268 27.33 / 0.197 24.87 / 0.179
DAPS 100 20.83 / 0.402 25.56 / 0.255 24.09 / 0.199

6.3 Computational Cost

  • On an RTX 4090: 48 NFE \approx 4.13 s/image, 99 NFE \approx 8.46 s/image.
  • In competitive SR4 settings at comparable runtime (4 s), SGPS achieves PSNR \approx 29.06 dB versus DDNM's 29.09 dB.
  • Overhead breakdown (for 48 NFE): SURE update (denoiser ×2\times2 + autograd) 51.2%, Langevin guidance 35.5%, forward denoise 11.2%, PCA 1.8%.

7. Implementation Considerations and Limitations

  • Denoiser: U-Net in VP-DDPM/EDM configuration, trained on FFHQ256 images.
  • Noise schedule: Geometric, from tmax=Tt_{\max} = T to tmin=0.02t_{\min} = 0.02, with ρ=7\rho = 7 (Karras et al.).
  • Sampling Steps: T=16T=16 (48 NFE), T=33T=33 (99 NFE).
  • Langevin Conditional Guidance: 100 iterations per outer step, step size η0.1\eta \approx 0.1.
  • PCA: Patch size 8×88 \times 8, stride 4, s1000s \approx 1000 patches per image.
  • SURE Hyperparameters: ϵ=max(xnoisy)/1000103\epsilon = \max(x_{\text{noisy}})/1000 \approx 10^{-3}, α=0.5\alpha = 0.5.
  • Trace Vectors: One random vector per step; additional vectors confer no empirical benefit.

Principal limitations include restriction to pixel-space diffusion samplers, an assumption of known AA (forward operator), and requirement of local strong convexity for convergence theory. PCA noise estimation may fail for images with little self-similarity; alternative estimators (e.g., spectral) are potential directions. The SURE update uses backpropagation; forward-mode JVP or SPSA could reduce cost. Blind or partially unknown forward operators, non-Gaussian noise, and adaptation to latent-diffusion models remain open areas.


For detailed derivations and algorithmic implementations, see (Kim et al., 29 Dec 2025).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Whiteboard

Topic to Video (Beta)

Follow Topic

Get notified by email when new papers are published related to SURE Guided Posterior Sampling (SGPS).