Papers
Topics
Authors
Recent
2000 character limit reached

Gradient Tracking Diffusion Strategy

Updated 28 December 2025
  • Gradient tracking diffusion strategy is a method that integrates historical gradient information to stabilize and guide sampling in diffusion models.
  • The approach employs progressive likelihood warm-up and adaptive directional momentum smoothing to balance prior and likelihood gradients effectively.
  • Empirical results with SPGD show superior restoration performance in tasks like inpainting and deblurring, improving metrics such as PSNR, SSIM, and LPIPS.

Gradient Tracking Diffusion Strategy refers to a class of techniques—prominently in modern generative modeling and decentralized optimization—that incorporate, track, and manage gradient information within diffusion-based frameworks. These approaches aim either to stabilize, accelerate, or bias the generative or optimization trajectory by exploiting historical and/or structured gradients, rather than relying solely on pointwise or myopic updates. They are critical both in high-dimensional inverse problems (e.g., image restoration, hyperspectral covariance estimation) and in distributed learning settings.

1. Mathematical Formulation: Gradient Tracking in Diffusion Schemes

In the context of image restoration via diffusion models, the canonical framework involves Bayesian inference with a pre-trained unconditional diffusion prior and an explicit data (likelihood) constraint. Mathematically, the conditional score is decomposed as: xtlogp(xty)=xtlogpθ(xt)+xtlogp(yxt)\nabla_{x_t} \log p(x_t|y) = \nabla_{x_t} \log p_\theta(x_t) + \nabla_{x_t} \log p(y|x_t) where:

  • xtlogpθ(xt)ϵθ(xt,t)/1αˉt\nabla_{x_t} \log p_\theta(x_t) \approx -\epsilon_\theta(x_t,t)/\sqrt{1-\bar\alpha_t} is the learned prior score,
  • xtlogp(yxt)ζxtyA(x^0(xt))22\nabla_{x_t} \log p(y|x_t) \approx -\zeta \nabla_{x_t} \| y - A(\hat x_0(x_t)) \|_2^2 is a DPS-style likelihood gradient, with x^0(xt)=(xt1αˉtϵθ(xt,t))/αˉt\hat x_0(x_t) = (x_t - \sqrt{1-\bar\alpha_t} \epsilon_\theta(x_t,t))/\sqrt{\bar\alpha_t} (Wu et al., 9 Jul 2025).

A single-step DDIM-type update with explicit gradient guidance is: xt1=1/αtxtgd(xt)ζgl(xt)x_{t-1} = 1/\sqrt{\alpha_t}\, x_t - g_d(x_t) - \zeta g_l(x_t) with gd(xt)g_d(x_t) and gl(xt)g_l(x_t) representing scaled prior and likelihood gradients, respectively. This structure enables precise tracking and management of the contributions of each gradient component throughout the reverse diffusion process.

2. Instabilities in Naïve Gradient Guidance: Empirical Gradient Dynamics

Direct combination of prior and likelihood gradients in diffusion guidance leads to two key instabilities:

  • Direction conflict: Empirically, the angle between gd(xt)g_d(x_t) and gl(xt)g_l(x_t) can deviate significantly from orthogonality, especially in early steps, leading to update directions that oppose each other. This degrades the effectiveness of both priors (Wu et al., 9 Jul 2025).
  • Temporal fluctuation: The likelihood gradient direction gl(xt)g_l(x_t) can vary erratically between consecutive timesteps (large angle between gl(xt)g_l(x_t) and gl(xt+1)g_l(x_{t+1})), injecting high-frequency noise into the sampling trajectory. This non-smoothness often manifests as restoration artifacts or stalling.

Both phenomena are quantitatively observed in angular statistics along the reverse trajectory and are directly linked to suboptimal recovery quality.

3. Stabilized Progressive Gradient Diffusion (SPGD): Algorithmic Strategy

To overcome these instabilities, Stabilized Progressive Gradient Diffusion (SPGD) introduces two intertwined mechanisms:

  • Progressive Likelihood Warm-Up: Instead of applying the full likelihood gradient in a single step, the update is split into NN smaller sub-steps per diffusion iteration (xt(j+1)=xt(j)(ζ/N)g~l(xt(j))x_t^{(j+1)} = x_t^{(j)} - (\zeta/N)\, \tilde{g}_l(x_t^{(j)})). This ensures that the likelihood term is adaptively introduced, mitigating abrupt conflicts with the prior and gradually aligning directions (Wu et al., 9 Jul 2025).
  • Adaptive Directional Momentum (ADM) Smoothing: Within the inner loop, the raw likelihood gradient glg_l is smoothed by a momentum accumulator that is adaptively weighted based on the directional cosine similarity between successive gradients:

g~l(j)=αjβg~l(j1)+(1αjβ)gl(xt(j))\tilde{g}_l^{(j)} = \alpha_j \beta \tilde{g}_l^{(j-1)} + (1 - \alpha_j \beta) g_l(x_t^{(j)})

where αj=(sim(g~l(j1),gl(xt(j)))+1)/2\alpha_j = (\text{sim}(\tilde{g}_l^{(j-1)}, g_l(x_t^{(j)})) + 1)/2, and β\beta is the base momentum. Perfect alignment results in standard momentum, while disagreement damps the accumulation, reducing erratic propagation.

These operations together yield a robust, smooth, and directionally stable update path toward the data-consistent solution.

4. Implementation: SPGD Sampling Procedure

Below is an outline of the SPGD diffusion sampling scheme (Wu et al., 9 Jul 2025):

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
Input: y, A, ε_θ, T (outer steps), N (inner steps), ζ (likelihood strength), β (momentum)
Initialize x_T ~ N(0,I)
For t = T downto 1:
    Set x_t^(0) = x_t
    For j = 0 to N-1:
        Compute g_l = ∇_x || y - A( \hat x_0(x_t^(j)) ) ||^2
        If j > 0:
            α_j = (sim( \tilde{g}_l, g_l ) + 1)/2
            \tilde{g}_l = α_j β \tilde{g}_l + (1 - α_j β) g_l
        Else:
            \tilde{g}_l = g_l
        x_t^(j+1) = x_t^(j) - (ζ/N) \tilde{g}_l
    End
    Compute g_d = ( √(1-\barα_t)/√α_t - √(1-\barα_{t-1}) ) ε_θ(x_t^(N), t)
    x_{t-1} = 1/√α_t x_t^(N) - g_d
End
Return x_0

This procedure inserts an N-step likelihood warm-up with ADM smoothing before the standard prior-based update at each diffusion step, resulting in a stable reverse trajectory.

5. Comparative Performance and Empirical Results

SPGD achieves consistently superior restoration results across diverse image restoration settings, as shown below (metrics: PSNR↑, SSIM↑, LPIPS↓; best in bold) (Wu et al., 9 Jul 2025):

Task PnP-ADMM DPS RED-Diff SPGD (Ours)
Inpainting (FFHQ) 27.99/0.729/0.306 26.11/0.802/0.180 27.17/0.799/0.159 30.87/0.889/0.120
Gauss. deblur 26.07/0.758/0.260 26.51/0.782/0.181 24.69/0.672/0.288 27.83/0.775/0.172
Motion deblur 25.86/0.772/0.278 25.58/0.752/0.212 26.24/0.706/0.255 29.41/0.834/0.158
SR ×4 27.75/0.835/0.246 27.06/0.803/0.187 29.06/0.800/0.243 29.35/0.831/0.137

On ImageNet, similar gains are maintained:

  • Inpainting: SPGD 26.28/0.798/0.165, best among compared methods.
  • Gaussian deblur: SPGD 24.80/0.651/0.229, best among compared methods.

SPGD also produces substantially more visually coherent and artifact-free restorations in inpainting, deblurring, and super-resolution tasks. For example, in inpainting, it reconstructs fine facial features with natural shading, and in deblurring, it recovers crisp edges and high-quality textures while suppressing ringing and ghosting.

6. Broader Context: Connections and Variants

Gradient-tracking diffusion strategies unify several sampling and optimization approaches:

  • SPGD is one realization, explicitly targeting prior–likelihood conflict and temporal instability in diffusion sampling (Wu et al., 9 Jul 2025).
  • Related approaches include History Gradient Update (HGU), which tracks and aggregates historical data-fidelity gradients in diffusion-based inverse solvers via momentum or Adam-style updates, yielding accelerated convergence and better robustness (He et al., 2023).
  • Outside generative restoration, similar gradient-tracking formulations underpin decentralized optimization, such as Flexible Gradient Tracking Algorithms and Local-Update Gradient Tracking, which coordinate stochastic and communication-minimized iterations while maintaining consensus and convergence guarantees (Berahas et al., 2023, Nguyen et al., 2022).
  • Multiple works in image translation (e.g., Asymmetric Gradient Guidance) also rely on gradient-tracking to balance fidelity and style/content constraints in the sampling loop (Kwon et al., 2023).

These methods share the core principle of leveraging rich local or recent gradient history—often via smoothing, momentum, or progressive scheduling—in the presence of stochasticity or competing objectives to enhance the efficacy, efficiency, and stability of diffusion-based updates.


In summary, gradient-tracking diffusion strategies, and SPGD in particular, represent a mathematically rigorous and empirically validated solution to core instabilities in guided diffusion, providing systematic control over gradient interactions and noise. They are broadly extensible across inverse imaging, multitask restoration, optimization, and distributed learning contexts (Wu et al., 9 Jul 2025, He et al., 2023, Berahas et al., 2023, Nguyen et al., 2022).

Whiteboard

Topic to Video (Beta)

Follow Topic

Get notified by email when new papers are published related to Gradient Tracking Diffusion Strategy.