Adaptive Projected Guidance (APG)

Updated 5 February 2026

Adaptive Projected Guidance (APG) is a sampling modification for diffusion models that improves image fidelity by reducing oversaturation and visual artifacts at high guidance scales.
It employs a principled vector decomposition, selectively down-weights the parallel component, and introduces reverse momentum to refine the denoising update.
APG offers a plug-and-play, low-overhead enhancement to existing CFG pipelines, compatible with various diffusion architectures and samplers.

Adaptive Projected Guidance (APG) is a sampling modification for conditional diffusion models that addresses oversaturation and visual artifacts associated with high guidance scales in classifier-free guidance (CFG). APG introduces a principled vector decomposition, parallel-component down-weighting, and reverse momentum to the guidance update. This plug-and-play procedure is compatible with all standard samplers and conditional diffusion architectures, incurring negligible overhead and enabling improved generative fidelity, diversity, and alignment at large guidance scales while systematically reducing oversaturation and detail distortions (Sadat et al., 2024).

1. Mathematical Formulation and Derivation

APG arises as a principled extension of the standard classifier-free guidance (CFG) update used in conditional diffusion models. In the noise-prediction paradigm, the unconditional and conditional denoiser outputs at timestep $t$ are denoted as $\varepsilon_{\text{uncond}}(x_t, t)$ and $\varepsilon_{\text{cond}}(x_t, t)$ , respectively. The canonical CFG update, with guidance scale $w$ , is: $\Delta x = w \left( \varepsilon_{\text{cond}}(x_t, t) - \varepsilon_{\text{uncond}}(x_t, t) \right)$ or, equivalently in the predicted-clean space,

$\Delta x = w \left( \hat{x}_{0,\mathrm{cond}} - \hat{x}_{0,\mathrm{uncond}} \right)$

APG modifies this update through the following steps:

Decomposition: Define the guidance vector $g = \hat{x}_{0,\mathrm{cond}} - \hat{x}_{0,\mathrm{uncond}}$ . Decompose $g$ $g$ into parallel and orthogonal components relative to $\hat{x}_{0,\mathrm{cond}}$ $\overset{x}{^}_{0, cond}$ :
- $g_\parallel = \frac{ \langle g, \hat{x}_{0,\mathrm{cond}} \rangle }{ \| \hat{x}_{0,\mathrm{cond}} \|^2 } \hat{x}_{0,\mathrm{cond}} \qquad g_\perp = g - g_\parallel$
Down-weighting: Empirical evidence indicates $g_\parallel$ $g_{∥}$ induces oversaturation, whereas $g_\perp$ $g_{⊥}$ governs detail. Introduce a hyperparameter $\eta \in [0, 1]$ $η \in [0, 1]$ :
- $\tilde{g}(\eta) = g_\perp + \eta g_\parallel$
Guided update: The APG update becomes:
- $\hat{x}_{0,\mathrm{guided}} = \hat{x}_{0,\mathrm{cond}} + w \tilde{g}(\eta)$
Gradient ascent interpretation, rescaling, and reverse momentum: Observing the guidance update as a step of gradient ascent on $f(\hat{x}_{0,\mathrm{cond}}, \hat{x}_{0,\mathrm{uncond}} ) = \tfrac12 \| \hat{x}_{0,\mathrm{cond}} - \hat{x}_{0,\mathrm{uncond}} \|^2$ , two additional modifications are introduced:
- Rescaling: Restrict the update norm to within a radius $r$ : $\tilde{g}_r = \tilde{g}(\eta) \cdot \min\left( 1, \frac{r}{\|\tilde{g}(\eta)\|} \right)$
- Reverse momentum: Maintain a negative-momentum buffer $m_t = \tilde{g}_r + \beta m_{t-1}$ , with $\beta < 0$ and $m_0 = 0$

The final APG-guided prediction utilizes $m_t$ , yielding: $\hat{x}_{0,\mathrm{guided}} = \hat{x}_{0,\mathrm{cond}} + (w-1)m_t$

2. Algorithmic Realization and Pseudocode

APG can be realized as a direct, low-overhead extension to the CFG sampling pipeline. The following pseudocode illustrates the method precisely in the context of a diffusion sampler:

initialize momentum m ← 0
for t = T…1:
  x_t → obtain model outputs
     ε_uncond ← denoiser(x_t, t, null)
     ε_cond   ← denoiser(x_t, t, y)
  convert both ε_⋆ to clean-prediction
     x0_uncond, x0_cond
  g ← x0_cond − x0_uncond
  [// decompose]
  α ← ⟨g , x0_cond⟩/∥x0_cond∥^2
  g_parallel ← α⋅x0_cond
  g_perp     ← g − g_parallel
  [// down-weight parallel]
  g_tilde ← g_perp + η⋅g_parallel
  [// rescale]
  scale ← min(1, r/∥g_tilde∥)
  g_tilde ← scale⋅g_tilde
  [// reverse momentum]
  m ← g_tilde + β⋅m
  [// final guided denoised prediction]
  x0_guided ← x0_cond + (w−1)⋅m
  [// convert back to the sampler’s “noise” or velocity form]
  ε_guided ← convert_clean_to_noise(x0_guided, x_t)
  x_{t−1} ← apply_sampler_step(x_t, ε_guided, t)
end

Hyperparameters:

Guidance scale $w$ (as in CFG)
Parallel-weight $\eta \in [0, 1]$ , default $0$
Rescaling radius $r > 0$ (typically matching average norm of updates)
Reverse-momentum coefficient $\beta \in [-1, 0]$ , default $-0.5$

3. Computational Implementation and Compatibility

APG is intended as a drop-in replacement for the standard CFG step in diffusion model sampling. No additional denoiser or neural network evaluations are necessary. The computational overhead is limited to a single inner product and projection, one norm calculation and clamping operation (for rescaling), and a running vector addition (for momentum buffering). For a $512 \times 512$ image, this overhead is $\ll 1$ ms compared to approximately $130$ ms for a single denoiser pass.

Conversions between noise-prediction and clean-prediction are standard and natively supported in modern diffusion toolkits. APG is directly compatible with any conditional diffusion model—including EDM2, DiT (e.g., DiT-XL/2), Stable Diffusion variants (2/3/XL), distilled samplers (SDXL-Lightning, PixArt- $\delta$ ), and rectified-flow models such as SD3. Sampler-agnosticity is maintained: DDIM, PNDM, DPM++, UniPC, and others are supported without modification (Sadat et al., 2024).

4. Empirical Performance and Evaluation

APG demonstrates broad improvements in quantitative fidelity and qualitative artifact suppression across a variety of datasets and diffusion model architectures. The most relevant testbeds include:

Class-conditional ImageNet (EDM2-S, EDM2-XXL, DiT-XL/2)
Text-to-image generation on MS–COCO (Stable Diffusion 2.1, Stable Diffusion XL)
Fast and distilled models (SDXL-Lightning, PixArt- $\delta$ )
Rectified-flow models (SD3) for robust text rendering

Key metrics include FID (Fréchet Inception Distance, $\downarrow$ ), precision ( $\uparrow$ ), recall ( $\uparrow$ ), mean saturation (HSV channel, $\downarrow$ ), and RMS grayscale contrast ( $\downarrow$ ).

Model (w)	Guidance	FID↓	Prec↑	Recl↑	Sat↓	Contra↓
EDM2-S (4)	CFG	10.42	0.85	0.48	0.46	0.27
	APG	6.49	0.85	0.62	0.33	0.21
DiT-XL/2 (4)	CFG	19.14	0.92	0.35	0.37	0.25
	APG	9.34	0.89	0.56	0.30	0.20
SD XL (15)	CFG	26.29	0.62	0.49	0.28	0.24
	APG	25.35	0.64	0.50	0.18	0.17

Qualitative analysis finds:

Significant reduction in oversaturation and "pasted-on" contrast with high $w$
Elimination of local artifacts (e.g., "fried-egg" textures, checkerboarding)
Enhanced text spelling consistency in SD3 rectified-flow (cf. Fig. 8)
Mitigation of mode-drift in toy Gaussian-mixture settings (cf. Fig. 18)

5. Hyperparameter Selection and Key Insights

APG's parameterization provides explicit control over the tradeoff between saturation and detail:

Parallel-weight $\eta$ : Default $\eta=0$ (parallel update removed entirely); values up to $0.25$ yield increased image "punch" and color saturation.
Rescale radius $r$ : Set to match the typical $\Vert g \Vert$ ; e.g., $r \in [1.5, 10]$ for EDM2/DiT and $r \in [7.5, 25]$ for SD 2.1/XL. Small $r$ underguides (blurry results), while large $r$ suppresses rescaling's effect.
Reverse momentum $\beta$ : Negative values in $[-0.25, -0.8]$ consistently improve FID and recall. Default $\beta = -0.5$ .

A summary takeaway is that APG acts as a nearly zero-overhead, plug-and-play substitute for CFG, retaining all major benefits (sample alignment, fidelity) while robustly mitigating oversaturation and artifact formation at high guidance strengths (Sadat et al., 2024).

6. Significance and Applications

APG enables the practical use of high guidance scales in conditional diffusion sampling. This removes the primary empirical limitation of CFG (oversaturation, detail artifacts), broadening the effective parameter range for applications requiring strong condition alignment. APG's architecture-agnostic, sampler-agnostic construction and negligible runtime cost facilitate immediate adoption across research and production pipelines in image, text-image, and other generative domains. Empirical improvements in both established quantitative metrics (FID, recall) and qualitative robustness (artifact elimination, improved text rendering) position APG as a robust methodological advancement within guided diffusion model sampling.

Markdown Report Issue Upgrade to Chat

References (1)

Eliminating Oversaturation and Artifacts of High Guidance Scales in Diffusion Models (2024)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Adaptive Projected Guidance (APG).

Adaptive Projected Guidance (APG)

1. Mathematical Formulation and Derivation

2. Algorithmic Realization and Pseudocode

3. Computational Implementation and Compatibility

4. Empirical Performance and Evaluation

5. Hyperparameter Selection and Key Insights

6. Significance and Applications

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Don't miss out on important new AI/ML research

Adaptive Projected Guidance (APG)

1. Mathematical Formulation and Derivation

2. Algorithmic Realization and Pseudocode

3. Computational Implementation and Compatibility

4. Empirical Performance and Evaluation

5. Hyperparameter Selection and Key Insights

6. Significance and Applications

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Related Topics

Don't miss out on important new AI/ML research

Sign up for free to explore the frontiers of research