Papers
Topics
Authors
Recent
Search
2000 character limit reached

Adaptive Projected Guidance (APG)

Updated 5 February 2026
  • Adaptive Projected Guidance (APG) is a sampling modification for diffusion models that improves image fidelity by reducing oversaturation and visual artifacts at high guidance scales.
  • It employs a principled vector decomposition, selectively down-weights the parallel component, and introduces reverse momentum to refine the denoising update.
  • APG offers a plug-and-play, low-overhead enhancement to existing CFG pipelines, compatible with various diffusion architectures and samplers.

Adaptive Projected Guidance (APG) is a sampling modification for conditional diffusion models that addresses oversaturation and visual artifacts associated with high guidance scales in classifier-free guidance (CFG). APG introduces a principled vector decomposition, parallel-component down-weighting, and reverse momentum to the guidance update. This plug-and-play procedure is compatible with all standard samplers and conditional diffusion architectures, incurring negligible overhead and enabling improved generative fidelity, diversity, and alignment at large guidance scales while systematically reducing oversaturation and detail distortions (Sadat et al., 2024).

1. Mathematical Formulation and Derivation

APG arises as a principled extension of the standard classifier-free guidance (CFG) update used in conditional diffusion models. In the noise-prediction paradigm, the unconditional and conditional denoiser outputs at timestep tt are denoted as εuncond(xt,t)\varepsilon_{\text{uncond}}(x_t, t) and εcond(xt,t)\varepsilon_{\text{cond}}(x_t, t), respectively. The canonical CFG update, with guidance scale ww, is: Δx=w(εcond(xt,t)εuncond(xt,t))\Delta x = w \left( \varepsilon_{\text{cond}}(x_t, t) - \varepsilon_{\text{uncond}}(x_t, t) \right) or, equivalently in the predicted-clean space,

Δx=w(x^0,condx^0,uncond)\Delta x = w \left( \hat{x}_{0,\mathrm{cond}} - \hat{x}_{0,\mathrm{uncond}} \right)

APG modifies this update through the following steps:

  1. Decomposition: Define the guidance vector g=x^0,condx^0,uncondg = \hat{x}_{0,\mathrm{cond}} - \hat{x}_{0,\mathrm{uncond}}. Decompose gg into parallel and orthogonal components relative to x^0,cond\hat{x}_{0,\mathrm{cond}}:
    • g=g,x^0,condx^0,cond2x^0,condg=ggg_\parallel = \frac{ \langle g, \hat{x}_{0,\mathrm{cond}} \rangle }{ \| \hat{x}_{0,\mathrm{cond}} \|^2 } \hat{x}_{0,\mathrm{cond}} \qquad g_\perp = g - g_\parallel
  2. Down-weighting: Empirical evidence indicates gg_\parallel induces oversaturation, whereas gg_\perp governs detail. Introduce a hyperparameter η[0,1]\eta \in [0, 1]:
    • g~(η)=g+ηg\tilde{g}(\eta) = g_\perp + \eta g_\parallel
  3. Guided update: The APG update becomes:
    • x^0,guided=x^0,cond+wg~(η)\hat{x}_{0,\mathrm{guided}} = \hat{x}_{0,\mathrm{cond}} + w \tilde{g}(\eta)
  4. Gradient ascent interpretation, rescaling, and reverse momentum: Observing the guidance update as a step of gradient ascent on f(x^0,cond,x^0,uncond)=12x^0,condx^0,uncond2f(\hat{x}_{0,\mathrm{cond}}, \hat{x}_{0,\mathrm{uncond}} ) = \tfrac12 \| \hat{x}_{0,\mathrm{cond}} - \hat{x}_{0,\mathrm{uncond}} \|^2, two additional modifications are introduced:
    • Rescaling: Restrict the update norm to within a radius rr: g~r=g~(η)min(1,rg~(η))\tilde{g}_r = \tilde{g}(\eta) \cdot \min\left( 1, \frac{r}{\|\tilde{g}(\eta)\|} \right)
    • Reverse momentum: Maintain a negative-momentum buffer mt=g~r+βmt1m_t = \tilde{g}_r + \beta m_{t-1}, with β<0\beta < 0 and m0=0m_0 = 0

The final APG-guided prediction utilizes mtm_t, yielding: x^0,guided=x^0,cond+(w1)mt\hat{x}_{0,\mathrm{guided}} = \hat{x}_{0,\mathrm{cond}} + (w-1)m_t

2. Algorithmic Realization and Pseudocode

APG can be realized as a direct, low-overhead extension to the CFG sampling pipeline. The following pseudocode illustrates the method precisely in the context of a diffusion sampler:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
initialize momentum m ← 0
for t = T…1:
  x_t → obtain model outputs
     ε_uncond ← denoiser(x_t, t, null)
     ε_cond   ← denoiser(x_t, t, y)
  convert both ε_⋆ to clean-prediction
     x0_uncond, x0_cond
  g ← x0_cond − x0_uncond
  [// decompose]
  α ← ⟨g , x0_cond⟩/∥x0_cond∥^2
  g_parallel ← α⋅x0_cond
  g_perp     ← g − g_parallel
  [// down-weight parallel]
  g_tilde ← g_perp + η⋅g_parallel
  [// rescale]
  scale ← min(1, r/∥g_tilde∥)
  g_tilde ← scale⋅g_tilde
  [// reverse momentum]
  m ← g_tilde + β⋅m
  [// final guided denoised prediction]
  x0_guided ← x0_cond + (w−1)⋅m
  [// convert back to the sampler’s “noise” or velocity form]
  ε_guided ← convert_clean_to_noise(x0_guided, x_t)
  x_{t−1} ← apply_sampler_step(x_t, ε_guided, t)
end

Hyperparameters:

  • Guidance scale ww (as in CFG)
  • Parallel-weight η[0,1]\eta \in [0, 1], default $0$
  • Rescaling radius r>0r > 0 (typically matching average norm of updates)
  • Reverse-momentum coefficient β[1,0]\beta \in [-1, 0], default 0.5-0.5

3. Computational Implementation and Compatibility

APG is intended as a drop-in replacement for the standard CFG step in diffusion model sampling. No additional denoiser or neural network evaluations are necessary. The computational overhead is limited to a single inner product and projection, one norm calculation and clamping operation (for rescaling), and a running vector addition (for momentum buffering). For a 512×512512 \times 512 image, this overhead is 1\ll 1 ms compared to approximately $130$ ms for a single denoiser pass.

Conversions between noise-prediction and clean-prediction are standard and natively supported in modern diffusion toolkits. APG is directly compatible with any conditional diffusion model—including EDM2, DiT (e.g., DiT-XL/2), Stable Diffusion variants (2/3/XL), distilled samplers (SDXL-Lightning, PixArt-δ\delta), and rectified-flow models such as SD3. Sampler-agnosticity is maintained: DDIM, PNDM, DPM++, UniPC, and others are supported without modification (Sadat et al., 2024).

4. Empirical Performance and Evaluation

APG demonstrates broad improvements in quantitative fidelity and qualitative artifact suppression across a variety of datasets and diffusion model architectures. The most relevant testbeds include:

  • Class-conditional ImageNet (EDM2-S, EDM2-XXL, DiT-XL/2)
  • Text-to-image generation on MS–COCO (Stable Diffusion 2.1, Stable Diffusion XL)
  • Fast and distilled models (SDXL-Lightning, PixArt-δ\delta)
  • Rectified-flow models (SD3) for robust text rendering

Key metrics include FID (Fréchet Inception Distance, \downarrow), precision (\uparrow), recall (\uparrow), mean saturation (HSV channel, \downarrow), and RMS grayscale contrast (\downarrow).

Model (w) Guidance FID↓ Prec↑ Recl↑ Sat↓ Contra↓
EDM2-S (4) CFG 10.42 0.85 0.48 0.46 0.27
APG 6.49 0.85 0.62 0.33 0.21
DiT-XL/2 (4) CFG 19.14 0.92 0.35 0.37 0.25
APG 9.34 0.89 0.56 0.30 0.20
SD XL (15) CFG 26.29 0.62 0.49 0.28 0.24
APG 25.35 0.64 0.50 0.18 0.17

Qualitative analysis finds:

  • Significant reduction in oversaturation and "pasted-on" contrast with high ww
  • Elimination of local artifacts (e.g., "fried-egg" textures, checkerboarding)
  • Enhanced text spelling consistency in SD3 rectified-flow (cf. Fig. 8)
  • Mitigation of mode-drift in toy Gaussian-mixture settings (cf. Fig. 18)

5. Hyperparameter Selection and Key Insights

APG's parameterization provides explicit control over the tradeoff between saturation and detail:

  • Parallel-weight η\eta: Default η=0\eta=0 (parallel update removed entirely); values up to $0.25$ yield increased image "punch" and color saturation.
  • Rescale radius rr: Set to match the typical g\Vert g \Vert; e.g., r[1.5,10]r \in [1.5, 10] for EDM2/DiT and r[7.5,25]r \in [7.5, 25] for SD 2.1/XL. Small rr underguides (blurry results), while large rr suppresses rescaling's effect.
  • Reverse momentum β\beta: Negative values in [0.25,0.8][-0.25, -0.8] consistently improve FID and recall. Default β=0.5\beta = -0.5.

A summary takeaway is that APG acts as a nearly zero-overhead, plug-and-play substitute for CFG, retaining all major benefits (sample alignment, fidelity) while robustly mitigating oversaturation and artifact formation at high guidance strengths (Sadat et al., 2024).

6. Significance and Applications

APG enables the practical use of high guidance scales in conditional diffusion sampling. This removes the primary empirical limitation of CFG (oversaturation, detail artifacts), broadening the effective parameter range for applications requiring strong condition alignment. APG's architecture-agnostic, sampler-agnostic construction and negligible runtime cost facilitate immediate adoption across research and production pipelines in image, text-image, and other generative domains. Empirical improvements in both established quantitative metrics (FID, recall) and qualitative robustness (artifact elimination, improved text rendering) position APG as a robust methodological advancement within guided diffusion model sampling.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Adaptive Projected Guidance (APG).