SURE Guided Posterior Sampling (SGPS)

Updated 5 January 2026

SGPS is a trajectory-corrected inference algorithm that integrates diffusion denoising with SURE-based error correction and PCA noise estimation.
It interleaves conditional posterior guidance with local residual measurement to achieve high-quality image reconstructions under tight computational budgets.
The method leverages unbiased risk estimates and KL convergence theory to mitigate error accumulation, ensuring efficient correction during sampling.

SURE Guided Posterior Sampling (SGPS) is a trajectory-corrected inference algorithm for diffusion-based inverse problems that leverages Stein’s Unbiased Risk Estimate (SURE) and PCA-based noise estimation to mitigate error accumulation in the critical early and middle stages of sampling. SGPS consistently achieves high-quality reconstructions under tight computational budgets—requiring fewer than 100 Neural Function Evaluations (NFEs)—by interleaving diffusion denoising, conditional posterior guidance, local residual noise measurement, and data-guided correction steps (Kim et al., 29 Dec 2025).

1. Inverse Problem Formulation and Diffusion Priors

The core objective is to recover an unknown image $x \in \mathbb{R}^n$ from noisy linear measurements: $y = A x + \eta, \quad \eta \sim \mathcal{N}(0, \sigma_y^2 I_m)$ where $A \in \mathbb{R}^{m \times n}$ is a known forward operator (e.g., for super-resolution or deblurring), and $\sigma_y$ is the measurement noise standard deviation.

A diffusion model serves as the learned prior, specified by a stochastic differential equation (SDE): $\mathrm{d} x_t = \sqrt{2 \dot{\sigma}(t) \sigma(t)}\,\mathrm{d}w_t,\;\; x_0 \sim p_{\text{data}},\; x_T \approx \mathcal{N}(0, \sigma_T^2 I)$ with reparameterization in EDM by $\sigma(t) = t,\, t \in [\sigma_{\min} \approx 0, \sigma_{\max} = T]$ .

The unconditional backward sampling (reverse-time ODE) is: $\frac{\mathrm{d}x}{\mathrm{d}t} = -\dot{\sigma}(t)\sigma(t)\nabla_x\log p(x;\sigma(t))$ approximated in practice via a pre-trained denoiser $D_\theta$ : $\nabla_x\log p(x;\sigma) = \frac{D_\theta(x;\sigma) - x}{\sigma^2}$

Posterior sampling for the inverse problem uses Bayes’ rule: $\nabla_{x_t}\log p(x_t|y) = \nabla_{x_t}\log p_t(x_t) + \nabla_{x_t}\log p(y|x_t)$ The data-consistency term $\nabla_{x_t}\log p(y|x_t)$ is typically intractable and must be approximated.

2. SURE-Based Trajectory Correction

2.1 Stein's Unbiased Risk Estimate (SURE)

SURE provides an unbiased estimator of the mean squared error (MSE) for denoising under additive Gaussian noise. For $x_{\text{noisy}} = x_0 + z$ , $z \sim \mathcal{N}(0, \sigma^2 I)$ , and denoiser $f$ : $\text{SURE}(x_{\text{noisy}}) = -n \sigma^2 + \|x_{\text{noisy}} - f(x_{\text{noisy}})\|^2 + 2\sigma^2 \operatorname{tr}(J_f(x_{\text{noisy}}))$ where $J_f = \partial f / \partial x_{\text{noisy}}$ . The trace is estimated by a Monte Carlo probe: $\operatorname{tr} J_f(x) \approx \frac{b^\top(f(x + \epsilon b) - f(x))}{\epsilon},\;\; b \sim \mathcal{N}(0, I)$

The SURE gradient direction is obtained by differentiating SURE w.r.t. $x$ : $\nabla_x\,\mathrm{SURE}(x) = 2(x-f(x)) - 2\sigma^2 \nabla_x [\operatorname{tr} J_f(x)]$ The correction is: $x_{\text{corrected}} = x_{\text{noisy}} - \alpha \nabla_{x_{\text{noisy}}} \mathrm{SURE}(x_{\text{noisy}})$ where $\alpha$ is a user-chosen step size; experiments use $\alpha=0.5$ .

2.2 Local SURE Gradient Update

After applying conditional guidance, let $x_{\text{noisy}}$ be the resulting state. Given a residual noise estimate $\hat{\sigma}_0$ (see Section 3), the denoiser is applied: $\hat{x} = f(x_{\text{noisy}}; \hat{\sigma}_0)$ with SURE evaluated as: $\mathrm{SURE} = -n\hat{\sigma}_0^2 + \|x_{\text{noisy}} - \hat{x}\|^2 + 2\hat{\sigma}_0^2 \frac{b^\top(f(x_{\text{noisy}} + \epsilon b) - \hat{x})}{\epsilon}$ A correction step via autodiff follows, reducing residual noise and pulling samples toward the data manifold.

3. PCA-Based Residual Noise Estimation

Accurate SURE application requires knowledge of the residual variance $\hat{\sigma}_0^2$ in $x_{\text{noisy}}$ . SGPS employs a patch PCA estimator:

Decompose $x_{\text{noisy}}$ into $s$ overlapping patches $\{p_i\}_{i=1}^s$ , compute mean $\mu$ and covariance

$\Sigma = \frac{1}{s} \sum_{i=1}^s (p_i - \mu)(p_i - \mu)^\top$

Eigen-decompose $\Sigma$ to obtain eigenvalues $\lambda_1 \ge \lambda_2 \ge \ldots \ge \lambda_r$ . For each $i$ , define: $\tau_i = \frac{1}{r - i + 1} \sum_{j=i}^r \lambda_j$ The smallest $i$ with $\tau_i$ equal to the median of $\{\lambda_i, ..., \lambda_r\}$ is chosen. The noise level is then

$\hat{\sigma}_0 = \sqrt{\tau_i}$

This estimator is efficient and requires no additional training.

4. SURE Guided Posterior Sampling Algorithm

The SGPS algorithm proceeds as follows:

Initialization: Sample $x_T \sim \mathcal{N}(0, \sigma_T^2 I)$ .
For $t = T, \dots, 1$ :
- a) Denoising: $\hat{x}_{0|t} = D_\theta(x_t, \sigma_t)$ .
- b) Conditional Guidance: Use Langevin iterations to obtain $x_{0|t,y}$ that balances prior and data likelihood.
- c) PCA Noise Estimation: Estimate residual noise $\hat{\sigma}_0$ from $x_{0|t,y}$ .
- d) SURE Gradient Correction: Apply local correction using the SURE gradient to $x_{0|t,y}$ , yielding $x^*_{0|t,y}$ .
- e) Sample for Next Step: $x_{t-1} \sim \mathcal{N}(x^*_{0|t,y}, \sigma_{t-1}^2 I)$ .
Return $x_0$ .

Distinctive features:

Estimated, not assumed, noise levels at each step ( $\hat{\sigma}_0$ via PCA).
Local SURE-based correction at every iteration directly addresses sampling trajectory deviations.

5. Theoretical Properties

Gaussian-Preservation (Theorem 1): Small-step Langevin guidance ensures the output of the denoiser remains nearly Gaussian in Wasserstein-2 distance $O(\eta^2 n \sigma_t^2)$ , justifying the use of SURE at each iteration.
KL-Convergence with SURE Correction (Theorem 2): Under local strong convexity of $-\log p(x|y)$ and bounded SURE bias/variance, each correction step reduces the KL divergence to the true posterior, up to $O(\beta_t^2)$ error, where $\beta_t = \alpha \hat{\sigma}_0^2$ : $D_{\mathrm{KL}}(q_t^* || p) \leq (1 - \beta_t\mu) D_{\mathrm{KL}}(q_t || p) + \beta_t^2 C + \Delta_t$
Error-Cascade Mitigation: By removing residual noise at each iteration, SGPS avoids error accumulation characteristic of earlier-stage high-noise samples, enabling accurate inference with $<100$ NFEs.

6. Empirical Performance and Cost Analysis

6.1 Benchmark Domains

SGPS was evaluated on linear (FFHQ256 super-resolution $4\times$ , box inpainting, random inpainting, Gaussian and motion deblurring) and nonlinear (phase retrieval, nonlinear deblurring, HDR recovery) inverse problems.

6.2 Quantitative Results

Performance with $T=16$ ( $\approx 48$ NFE) and $T=33$ ( $\approx 99$ NFE) is reported using PSNR (higher is better) and LPIPS (lower is better):

Method	NFE	SR4 PSNR / LPIPS	InpaintBox PSNR / LPIPS	InpaintRnd PSNR / LPIPS	GaussDebl PSNR / LPIPS	MotDebl PSNR / LPIPS
SGPS	99	29.38 / 0.179	24.23 / 0.133	30.47 / 0.116	29.35 / 0.179	31.24 / 0.148
DAPS	100	27.69 / 0.230	22.51 / 0.192	26.64 / 0.238	27.77 / 0.220	29.84 / 0.167

Method	NFE	PhaseRet PSNR / LPIPS	NonlinDebl PSNR / LPIPS	HDR PSNR / LPIPS
SGPS	99	24.08 / 0.268	27.33 / 0.197	24.87 / 0.179
DAPS	100	20.83 / 0.402	25.56 / 0.255	24.09 / 0.199

6.3 Computational Cost

On an RTX 4090: 48 NFE $\approx$ 4.13 s/image, 99 NFE $\approx$ 8.46 s/image.
In competitive SR4 settings at comparable runtime (4 s), SGPS achieves PSNR $\approx$ 29.06 dB versus DDNM's 29.09 dB.
Overhead breakdown (for 48 NFE): SURE update (denoiser $\times2$ + autograd) 51.2%, Langevin guidance 35.5%, forward denoise 11.2%, PCA 1.8%.

7. Implementation Considerations and Limitations

Denoiser: U-Net in VP-DDPM/EDM configuration, trained on FFHQ256 images.
Noise schedule: Geometric, from $t_{\max} = T$ to $t_{\min} = 0.02$ , with $\rho = 7$ (Karras et al.).
Sampling Steps: $T=16$ (48 NFE), $T=33$ (99 NFE).
Langevin Conditional Guidance: 100 iterations per outer step, step size $\eta \approx 0.1$ .
PCA: Patch size $8 \times 8$ , stride 4, $s \approx 1000$ patches per image.
SURE Hyperparameters: $\epsilon = \max(x_{\text{noisy}})/1000 \approx 10^{-3}$ , $\alpha = 0.5$ .
Trace Vectors: One random vector per step; additional vectors confer no empirical benefit.

Principal limitations include restriction to pixel-space diffusion samplers, an assumption of known $A$ (forward operator), and requirement of local strong convexity for convergence theory. PCA noise estimation may fail for images with little self-similarity; alternative estimators (e.g., spectral) are potential directions. The SURE update uses backpropagation; forward-mode JVP or SPSA could reduce cost. Blind or partially unknown forward operators, non-Gaussian noise, and adaptation to latent-diffusion models remain open areas.

For detailed derivations and algorithmic implementations, see (Kim et al., 29 Dec 2025).

PDF Markdown Chat (Pro)

References (1)

SURE Guided Posterior Sampling: Trajectory Correction for Diffusion-Based Inverse Problems (2025)

Whiteboard

Generate a whiteboard explanation of this topic.

Topic to Video (Beta)

Generate a video overview of this topic.

Follow Topic

Get notified by email when new papers are published related to SURE Guided Posterior Sampling (SGPS).