Adaptive Projected Guidance (APG)
- Adaptive Projected Guidance (APG) is a sampling modification for diffusion models that improves image fidelity by reducing oversaturation and visual artifacts at high guidance scales.
- It employs a principled vector decomposition, selectively down-weights the parallel component, and introduces reverse momentum to refine the denoising update.
- APG offers a plug-and-play, low-overhead enhancement to existing CFG pipelines, compatible with various diffusion architectures and samplers.
Adaptive Projected Guidance (APG) is a sampling modification for conditional diffusion models that addresses oversaturation and visual artifacts associated with high guidance scales in classifier-free guidance (CFG). APG introduces a principled vector decomposition, parallel-component down-weighting, and reverse momentum to the guidance update. This plug-and-play procedure is compatible with all standard samplers and conditional diffusion architectures, incurring negligible overhead and enabling improved generative fidelity, diversity, and alignment at large guidance scales while systematically reducing oversaturation and detail distortions (Sadat et al., 2024).
1. Mathematical Formulation and Derivation
APG arises as a principled extension of the standard classifier-free guidance (CFG) update used in conditional diffusion models. In the noise-prediction paradigm, the unconditional and conditional denoiser outputs at timestep are denoted as and , respectively. The canonical CFG update, with guidance scale , is: or, equivalently in the predicted-clean space,
APG modifies this update through the following steps:
- Decomposition: Define the guidance vector . Decompose into parallel and orthogonal components relative to :
- Down-weighting: Empirical evidence indicates induces oversaturation, whereas governs detail. Introduce a hyperparameter :
- Guided update: The APG update becomes:
- Gradient ascent interpretation, rescaling, and reverse momentum: Observing the guidance update as a step of gradient ascent on , two additional modifications are introduced:
- Rescaling: Restrict the update norm to within a radius :
- Reverse momentum: Maintain a negative-momentum buffer , with and
The final APG-guided prediction utilizes , yielding:
2. Algorithmic Realization and Pseudocode
APG can be realized as a direct, low-overhead extension to the CFG sampling pipeline. The following pseudocode illustrates the method precisely in the context of a diffusion sampler:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 |
initialize momentum m ← 0
for t = T…1:
x_t → obtain model outputs
ε_uncond ← denoiser(x_t, t, null)
ε_cond ← denoiser(x_t, t, y)
convert both ε_⋆ to clean-prediction
x0_uncond, x0_cond
g ← x0_cond − x0_uncond
[// decompose]
α ← ⟨g , x0_cond⟩/∥x0_cond∥^2
g_parallel ← α⋅x0_cond
g_perp ← g − g_parallel
[// down-weight parallel]
g_tilde ← g_perp + η⋅g_parallel
[// rescale]
scale ← min(1, r/∥g_tilde∥)
g_tilde ← scale⋅g_tilde
[// reverse momentum]
m ← g_tilde + β⋅m
[// final guided denoised prediction]
x0_guided ← x0_cond + (w−1)⋅m
[// convert back to the sampler’s “noise” or velocity form]
ε_guided ← convert_clean_to_noise(x0_guided, x_t)
x_{t−1} ← apply_sampler_step(x_t, ε_guided, t)
end |
Hyperparameters:
- Guidance scale (as in CFG)
- Parallel-weight , default $0$
- Rescaling radius (typically matching average norm of updates)
- Reverse-momentum coefficient , default
3. Computational Implementation and Compatibility
APG is intended as a drop-in replacement for the standard CFG step in diffusion model sampling. No additional denoiser or neural network evaluations are necessary. The computational overhead is limited to a single inner product and projection, one norm calculation and clamping operation (for rescaling), and a running vector addition (for momentum buffering). For a image, this overhead is ms compared to approximately $130$ ms for a single denoiser pass.
Conversions between noise-prediction and clean-prediction are standard and natively supported in modern diffusion toolkits. APG is directly compatible with any conditional diffusion model—including EDM2, DiT (e.g., DiT-XL/2), Stable Diffusion variants (2/3/XL), distilled samplers (SDXL-Lightning, PixArt-), and rectified-flow models such as SD3. Sampler-agnosticity is maintained: DDIM, PNDM, DPM++, UniPC, and others are supported without modification (Sadat et al., 2024).
4. Empirical Performance and Evaluation
APG demonstrates broad improvements in quantitative fidelity and qualitative artifact suppression across a variety of datasets and diffusion model architectures. The most relevant testbeds include:
- Class-conditional ImageNet (EDM2-S, EDM2-XXL, DiT-XL/2)
- Text-to-image generation on MS–COCO (Stable Diffusion 2.1, Stable Diffusion XL)
- Fast and distilled models (SDXL-Lightning, PixArt-)
- Rectified-flow models (SD3) for robust text rendering
Key metrics include FID (Fréchet Inception Distance, ), precision (), recall (), mean saturation (HSV channel, ), and RMS grayscale contrast ().
| Model (w) | Guidance | FID↓ | Prec↑ | Recl↑ | Sat↓ | Contra↓ |
|---|---|---|---|---|---|---|
| EDM2-S (4) | CFG | 10.42 | 0.85 | 0.48 | 0.46 | 0.27 |
| APG | 6.49 | 0.85 | 0.62 | 0.33 | 0.21 | |
| DiT-XL/2 (4) | CFG | 19.14 | 0.92 | 0.35 | 0.37 | 0.25 |
| APG | 9.34 | 0.89 | 0.56 | 0.30 | 0.20 | |
| SD XL (15) | CFG | 26.29 | 0.62 | 0.49 | 0.28 | 0.24 |
| APG | 25.35 | 0.64 | 0.50 | 0.18 | 0.17 |
Qualitative analysis finds:
- Significant reduction in oversaturation and "pasted-on" contrast with high
- Elimination of local artifacts (e.g., "fried-egg" textures, checkerboarding)
- Enhanced text spelling consistency in SD3 rectified-flow (cf. Fig. 8)
- Mitigation of mode-drift in toy Gaussian-mixture settings (cf. Fig. 18)
5. Hyperparameter Selection and Key Insights
APG's parameterization provides explicit control over the tradeoff between saturation and detail:
- Parallel-weight : Default (parallel update removed entirely); values up to $0.25$ yield increased image "punch" and color saturation.
- Rescale radius : Set to match the typical ; e.g., for EDM2/DiT and for SD 2.1/XL. Small underguides (blurry results), while large suppresses rescaling's effect.
- Reverse momentum : Negative values in consistently improve FID and recall. Default .
A summary takeaway is that APG acts as a nearly zero-overhead, plug-and-play substitute for CFG, retaining all major benefits (sample alignment, fidelity) while robustly mitigating oversaturation and artifact formation at high guidance strengths (Sadat et al., 2024).
6. Significance and Applications
APG enables the practical use of high guidance scales in conditional diffusion sampling. This removes the primary empirical limitation of CFG (oversaturation, detail artifacts), broadening the effective parameter range for applications requiring strong condition alignment. APG's architecture-agnostic, sampler-agnostic construction and negligible runtime cost facilitate immediate adoption across research and production pipelines in image, text-image, and other generative domains. Empirical improvements in both established quantitative metrics (FID, recall) and qualitative robustness (artifact elimination, improved text rendering) position APG as a robust methodological advancement within guided diffusion model sampling.