SPGD: Stabilized Progressive Gradient Diffusion

Updated 21 December 2025

SPGD is a gradient management technique that mitigates conflicts and fluctuations in diffusion-based image restoration.
It employs a progressive likelihood warm-up strategy to distribute corrections incrementally, ensuring smoother convergence.
Adaptive directional momentum smoothing refines gradient updates, leading to quantitatively and perceptually superior results.

Stabilized Progressive Gradient Diffusion (SPGD) is a gradient management technique developed to enhance stability and performance in diffusion-model-based image restoration tasks. SPGD specifically targets two prevalent sources of instability inherent in standard diffusion-based solvers: conflicting gradient directions between denoising priors and data-consistency (likelihood) terms, and high temporal fluctuation of the likelihood gradient throughout the reverse diffusion process. By integrating a progressive likelihood warm-up strategy with adaptive directional momentum smoothing, SPGD achieves higher quality reconstructions that are both quantitatively and perceptually superior to earlier approaches, while also promoting stable convergence under practical compute budgets (Wu et al., 9 Jul 2025).

1. Gradient Dynamics and Instability in Diffusion-Based Restoration

Diffusion models for image restoration typically operate within a Bayesian inference framework, alternating between denoising steps informed by learned priors and likelihood guidance steps enforcing measurement consistency. At each reverse diffusion step, the update from $x_t$ to $x_{t-1}$ in deterministic DDIM sampling with a single DPS likelihood correction can be decomposed as: $x_{t-1} = \frac{1}{\sqrt{\alpha_t}}x_t - \Bigl(\frac{\sqrt{1-\bar\alpha_t}}{\sqrt{\alpha_t}} - \sqrt{1-\bar\alpha_{t-1}}\Bigr)\epsilon_\theta(x_t, t) - \zeta \nabla_{x_t}\|y - \mathcal{A}(\widehat{x}_0(x_t))\|^2$ where the first term is a fixed scaling, the second is the denoising prior gradient $\mathbf{g}_d(x_t)$ , and the third is the likelihood gradient $\mathbf{g}_l(x_t)$ . Empirical analysis demonstrates two dominant failure modes:

Gradient conflict: The angle between $\mathbf{g}_d$ and $\mathbf{g}_l$ is often far from orthogonal, indicating non-trivial opposition of prior and likelihood forces, especially early in the reverse chain.
Gradient fluctuation: The likelihood gradient $\mathbf{g}_l(x_t)$ exhibits sharp direction changes between consecutive time steps, causing erratic updates.

These phenomena degrade the generative process, inhibit convergence, and introduce visual artifacts (Wu et al., 9 Jul 2025).

2. Progressive Likelihood Warm-Up Mechanism

SPGD addresses gradient conflict through a progressive likelihood warm-up strategy. Rather than a single, substantial likelihood update prior to each denoising step, SPGD performs $N$ incremental likelihood steps, each of reduced magnitude, before the denoising operation. If $x_t^{(0)}=x_t$ and $j=0,\dots,N-1$ ,

$x_t^{(j+1)} = x_t^{(j)} - \frac{\zeta}{N}\, \widetilde{\mathbf{g}}_l(x_t^{(j)})$

After $N$ such steps, $x_t^{(N)}$ is passed to the standard denoising update. Under smoothness assumptions on the likelihood objective $L_t(x)$ , each inner step strictly decreases $L_t$ as long as $\zeta/N < 1/L$ , ensuring stable descent. This warm-up distributes the total likelihood correction more evenly along the reverse trajectory, systematically reducing the antagonism between prior and likelihood terms that results from abruptly large guidance steps (Wu et al., 9 Jul 2025).

3. Adaptive Directional Momentum (ADM) Smoothing

To suppress fluctuations in the likelihood gradient, SPGD introduces adaptive directional momentum (ADM) smoothing. At each substep $j$ of the warm-up sequence, the raw gradient $g_l^{(j)}$ is combined with a momentum term $\widetilde g_l^{(j-1)}$ using a direction-dependent weight: $\alpha_j = \frac{\mathrm{cosine}(\widetilde g_l^{(j-1)}, g_l^{(j)}) + 1}{2}$

$\widetilde g_l^{(j)} = \alpha_j \beta \widetilde g_l^{(j-1)} + (1-\alpha_j \beta) g_l^{(j)}$

where $\beta$ is a base momentum (e.g., 0.95). When the current and previous gradients are aligned, ADM maintains strong smoothing; sharp misalignment weakens the legacy contribution, allowing fast reorientation. This adaptive scheme stabilizes the effective guidance direction across inner steps, further promoting smooth convergence (Wu et al., 9 Jul 2025).

4. Integrated SPGD Algorithmic Framework

The complete SPGD workflow involves embedding the progressive likelihood warm-up (with ADM) within each reverse diffusion step. The process is as follows:

For each reverse step $t$ , initialize $x_t^{(0)} = x_t$ .
Iteratively compute the ADM-smoothed likelihood gradient and apply $N$ warm-up updates.
Apply standard DDIM-style denoising based on the refined $x_t^{(N)}$ .
Repeat until $x_0$ is obtained.

For typical experiments with $256 \times 256$ images (FFHQ/ImageNet), $T=100$ outer diffusion steps and $N=5$ inner warm-up steps are used, resulting in 500 total function evaluations. Key hyperparameters such as $\zeta$ (guidance strength) and $\beta$ (momentum) are selected per task via grid search, with optimal $\beta \in [0.9, 0.95]$ and $N=5$ providing consistently superior performance (Wu et al., 9 Jul 2025).

5. Empirical Results and Comparative Performance

SPGD demonstrates state-of-the-art restoration results across a range of image restoration tasks—including inpainting, Gaussian/motion deblurring, and super-resolution—on FFHQ and ImageNet datasets ( $256 \times 256$ ). Representative metrics are shown below.

Task	Baseline (DPS)	SPGD
Inpainting	PSNR: 26.11 dB	PSNR: 30.87 dB
	SSIM: 0.802	SSIM: 0.889
	LPIPS: 0.180	LPIPS: 0.120

SPGD produces both higher quantitative results and perceptually improved outputs. Ablation confirms that while progressive warm-up alone improves PSNR by $\sim$ 3 dB over baseline, the combination with ADM delivers the best overall performance, indicating synergy between the two components. Performance is robust as long as $N\approx5$ to $10$; values of $N$ that are too large can lead to overcorrection without benefit (Wu et al., 9 Jul 2025).

SPGD’s unified warm-up/momentum approach is distinct from strategies such as History Gradient Update (HGU) (He et al., 2023). HGU incorporates exponential moving averages of past gradients (in both vanilla-momentum and Adam variants) for data-fidelity guidance within diffusion sampling. HGU is effective at smoothing stochasticity and accelerating convergence—particularly in measurement-consistency terms—but does not directly address gradient conflicts between prior and likelihood or apply fine-grained inner warm-up steps per reverse pass. In contrast, SPGD explicitly mitigates conflict between denoising and likelihood gradients and attenuates erratic guidance by combining per-step progressive updates and adaptive, direction-aware smoothing (He et al., 2023, Wu et al., 9 Jul 2025).

7. Limitations and Future Directions

SPGD’s efficacy depends on hyperparameter selection (notably $\zeta$ , $\beta$ , and $N$ ), which must be tuned per application. Excessive inner steps ( $N>10$ ) may lead to diminishing returns or overcorrection. All theoretical guarantees assume smoothness and local optimality, while real image restoration losses may exhibit nonconvexity. Future research aims to develop automatic tuning strategies for SPGD parameters, explore integration with second-order optimization techniques within the diffusion sampling loop, and extend the methodology to learned conditional samplers or non-Gaussian measurement noise models (Wu et al., 9 Jul 2025).

Markdown Upgrade to Chat

References (2)

Enhancing Diffusion Model Stability for Image Restoration via Gradient Management (2025)

Fast and Stable Diffusion Inverse Solver with History Gradient Update (2023)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Stabilized Progressive Gradient Diffusion (SPGD).