Classifier-Free Guidance Rescaling

Updated 26 November 2025

Classifier-Free Guidance Rescaling is a suite of mathematically informed modifications that adapt and normalize the guidance signal in diffusion models.
It employs time- and state-dependent schedules, feedback mechanisms, and norm-preserving techniques to balance fidelity and diversity during sampling.
Empirical results show techniques like EP-CFG and LF-CFG reduce artifacts such as oversaturation while preserving semantic detail and improving sample quality.

Classifier-Free Guidance (CFG) Rescaling denotes a suite of theoretically-informed and empirically-driven modifications to the standard classifier-free guidance mechanism in both continuous and discrete diffusion models, aimed at adapting, normalizing, or reshaping the magnitude, schedule, or structure of guidance during the sampling process. Originally designed to balance fidelity to conditioning inputs (such as text or class labels) and sample quality, standard CFG applies a fixed scalar amplification to the conditional-minus-unconditional prediction. However, fixed-scale schemes are increasingly shown to reduce sample diversity, exacerbate artifacts (e.g., oversaturation, color drift), and introduce pronounced bias or instability, especially at high guidance strengths and across diverse data domains. Rescaling approaches, as surveyed below, mathematically and algorithmically correct these flaws by local or global normalization, frequency-aware reweighting, spatial and semantic adaptation, nonlinear correction, or by imposing time- and state-dependent schedules on the guidance signal.

1. Fundamentals of Classifier-Free Guidance and Its Limitations

Classifier-Free Guidance in conditional diffusion models computes, at each denoising step $t$ , the noise or score estimate as

$\hat\epsilon_\theta(x_t; w) = \epsilon_\theta(x_t, t) + w\bigl(\epsilon_\theta(x_t, c, t) - \epsilon_\theta(x_t, t)\bigr),$

where $w$ is a fixed guidance scale, $\epsilon_\theta(x_t, t)$ is the unconditional prediction, and $\epsilon_\theta(x_t, c, t)$ the conditional one. This linear interpolation is widely adopted due to its simplicity and effectiveness in improving semantic alignment and reducing conditional entropy (Zheng et al., 2023, Zhang et al., 13 Dec 2024).

However, several issues arise:

Oversaturation and Artifact Accumulation: Large $w$ introduces norm inflation, leading to over-contrast, color artifacts, and loss of naturalness, especially in image-space diffusion models (Zhang et al., 13 Dec 2024, Song et al., 26 Jun 2025).
Loss of Diversity: Increasing $w$ tends to collapse the distribution, suppressing weaker modes and diminishing fine-grained variation, particularly early and late in the sampling trajectory (Jin et al., 26 Sep 2025).
Unbalanced or Non-Stationary Guidance: Fixed $w$ ignores the varying informativeness of the conditional signal at different steps, resulting in either under- or over-regularization at various noise levels (Malarz et al., 14 Feb 2025, Rojas et al., 11 Jul 2025).
Spatial and Semantic Inconsistency: Applying a constant scalar $w$ across all spatial regions fails to adapt to varying semantic density, causing over-crisp vs. blurry regions within a single sample (Shen et al., 8 Apr 2024).
Discrepancy with Underlying Stochastic Dynamics: Linear mixing does not respect the nonlinear structure of the underlying Fokker–Planck (FP) equation, leading to solution mismatch at large scalings (Zheng et al., 2023).

These limitations motivate a range of rescaling and adaptive techniques to ensure that the guidance signal remains well-behaved, semantically effective, and physically consistent throughout the reverse diffusion process.

2. Theoretical Analyses: Stagewise Dynamics and Error Sources

Recent theoretical work elucidates how CFG modifies the sampling dynamics in multimodal settings, identifying three stages as the diffusion trajectory proceeds from high to low noise (Jin et al., 26 Sep 2025):

Direction Shift (early/high noise): Strong $w$ pushes trajectories prematurely toward the weighted conditional mean, creating initialization bias and increasing $\|x_t\|$ .
Mode Separation (intermediate noise): Basins of attraction emerge; the initial bias from Direction Shift can suppress weaker modes, reducing diversity.
Concentration (late/low noise): Multiplicative amplification of the score contracts sample trajectories within their respective basin, removing subtle variation.

Analysis shows that fixed high $w$ is harmful during both early and late stages, eroding both global and local diversity. This connects to empirical findings in discrete diffusion domains where high guidance at early steps also produces degraded sample quality due to excessively rapid unmasking transitions (Rojas et al., 11 Jul 2025). These theoretical insights rationalize the empirical benefits of time-varying schedules and context-dependent rescaling.

3. Time- and State-Dependent Rescaling Schedules

A central class of rescaling strategies adapts $w$ as a function of timestep $t$ and (optionally) the current state $x_t$ .

Time-varying (Schedule-based) Rescaling

Several works formalize $w_t$ as a time-dependent curve:

Triangular or Beta-shaped Schedules: Guidance is suppressed at the endpoints and peaks at mid-sampling, e.g., by setting $w_t = \omega\,\beta(t/T)$ with $\beta(\cdot)$ given by the $\mathrm{Beta}(a,b)$ probability density (Malarz et al., 14 Feb 2025). For stepwise schedules, two-piece linear "triangular" profiles achieve similar objectives (Jin et al., 26 Sep 2025).
Empirical Benefits: These schedules consistently yield better quality-diversity tradeoffs than any constant $w$ . As demonstrated, a time-varying profile reduces early mean bias, late contraction, and preserves both global structure and micro-detail, particularly at low NFE (Jin et al., 26 Sep 2025, Malarz et al., 14 Feb 2025).

State-dependent (Feedback-based) Rescaling

State-adaptive guidance leverages the current informativeness of the conditional branch:

Feedback Guidance: Feedback Guidance (Koulischer et al., 6 Jun 2025) derives a state- and step-dependent coefficient $\lambda(x_t, t)$ from an additive error model and online estimates of $p(c|x_t)$ . $\lambda$ increases automatically where the conditional score is uncertain and contracts where it is reliable.
Practical Impact: Feedback-based dynamic guidance matches or outperforms both vanilla CFG and limited-interval guidance, especially when prompt complexity varies or samples deviate from the data manifold (Koulischer et al., 6 Jun 2025).

4. Local, Energy, and Frequency-normalized Rescaling

Multiple approaches directly reweight or normalize the magnitude of the guided prediction at each step:

Energy- and Norm-preserving Rescaling

EP-CFG: The guided prediction is rescaled such that its squared $\ell_2$ -norm (interpreted as "energy") matches that of the conditional branch: $x_t^{\mathrm{cfg'}} = x_t^{\mathrm{cfg}} \sqrt{\|x_t^c\|^2 / \|x_t^{\mathrm{cfg}}\|^2}$ (Zhang et al., 13 Dec 2024). This method suppresses monotonic norm inflation and reduces high-contrast artifacts. A robust variant computes energies over restricted value percentiles to further suppress outlier-induced artifacts.
ZeResFDG: The RescaleCFG module in ZeResFDG matches the per-sample standard deviation (or other moment) of the guided output to the conditional output, often after subtracting the mean, and blends the rescaled and raw predictions (Rychkovskiy et al., 14 Oct 2025).
Characteristic Guidance: A fixed-point nonlinear correction term $\Delta$ is computed to enforce consistency with the FP equation, especially effective at large $w$ (Zheng et al., 2023).

Frequency- and Spatially-aware Rescaling

Low-frequency Downweighting (LF-CFG): LF-CFG decomposes outputs into low- and high-frequency components, downweights low-frequency regions (where redundant information accumulates) adaptively at each step, and injects the corrected signal only into the guided difference term. This substantially reduces oversaturation and spurious artifacts under high $w$ without harming semantic fidelity (Song et al., 26 Jun 2025).
Frequency-Decoupled Guidance (ZeResFDG): ZeResFDG splits the guidance into low- and high-frequency bands, with distinct reweightings, and employs zero-projection to remove unconditional drift early in the chain (Rychkovskiy et al., 14 Oct 2025).
Semantic/Spatial Rescaling (S-CFG): S-CFG constructs per-region guidance scales derived from cross- and self-attention maps so that semantic units receive uniform amplification; this both sharpens details and ensures regionally consistent prompt alignment (Shen et al., 8 Apr 2024).

5. Algorithmic Implementations and Workflow Integration

The table below summarizes key rescaling methods and their implementation characteristics:

Method/Component	Core Principle	Inference Overhead	Integrates with
EP-CFG (Zhang et al., 13 Dec 2024)	Energy norm matching	$<1\%$	Any CFG sampler
ZeResFDG (Rychkovskiy et al., 14 Oct 2025)	Std. dev rescaling + frequency decoupling	$<3\%$	SD/SDXL pipelines
LF-CFG (Song et al., 26 Jun 2025)	Low-frequency downweighting	$\sim2\%$	Any DDIM/DDPM
S-CFG (Shen et al., 8 Apr 2024)	Per-region adaptive rescale	$1$– $3\%$	Latent/pixel DDPM
Characteristic (Zheng et al., 2023)	Nonlinear correction via FP PDE	$2\times$	DDPM/DDIM, DPM++
$\beta$ -CFG (Malarz et al., 14 Feb 2025)	Time-varying scale w/ gradient normalization	negligible	any diffusion

These methods are training-free, requiring only minor modification to the guidance calculation within the sampling loop, and maintain compatibility with common samplers (DDPM, DDIM, DPM-Solver, etc.).

6. Comparative Empirical Results and Impact

Empirical results across established benchmarks (MSCOCO, ImageNet, QM9, Stable Diffusion v1.5/2.1/3.0/XL) consistently demonstrate the following:

EP-CFG reduces FID by $1$–$7$ points (depending on $\lambda$ ) while improving or maintaining CLIP similarity and PSNR relative to standard CFG. Oversaturation artifacts are suppressed and color fidelity is improved (Zhang et al., 13 Dec 2024).
LF-CFG yields a notable reduction in measured oversaturation ( $\downarrow0.07$ at $w=15$ on SD $\,3.0$ ) and improves FID/KID with negligible overhead. Relative improvement scales with guidance (Song et al., 26 Jun 2025).
S-CFG delivers lower FID for equal or higher CLIP scores, with users preferring its outputs $70$– $77\%$ of the time. Per-region rescaling benefits both sharpness and prompt fidelity, propagating to downstream tasks (Shen et al., 8 Apr 2024).
Time-varying/ $\beta$ -CFG scheduling strictly improves the quality–diversity tradeoff, particularly in low-step regimes, and allows finer control over semantic–diversity balance (Malarz et al., 14 Feb 2025, Jin et al., 26 Sep 2025).
Zero-projection and FDG (ZeResFDG) demonstrate sharper micro-detail and prompt adherence, particularly on high-resolution samples in SD/SDXL pipelines (Rychkovskiy et al., 14 Oct 2025).

7. Open Directions and Theoretical Interpretations

Recent theoretical analysis indicates that ideal rescaling should depend not only on step but also on local sample dynamics and semantic content, motivating:

Adaptive, content-aware strategies: Per-region semantic rescaling (Shen et al., 8 Apr 2024), frequency and spatial decompositions (Song et al., 26 Jun 2025, Rychkovskiy et al., 14 Oct 2025), and state-dependent scalar adjustments (Koulischer et al., 6 Jun 2025) all improve upon legacy fixed-scale logic.
Nonlinear corrections and orthogonalization: Nonlinear guidance updates consistent with the FP operator (Zheng et al., 2023) and error correction by orthogonalizing unconditional/conditional errors (Yang et al., 18 Nov 2025) both further control the norm/pathology of the guidance vector.
Generalization to discrete and masked diffusion: In the discrete context, rescaling must also account for imbalanced transition probabilities and the risk of premature unmasking (Rojas et al., 11 Jul 2025).

A plausible implication is that optimal guidance rescaling mechanisms require joint strategies: contextual schedule shaping, local norm or energy normalization, and semantic/frequency separation, possibly combined in a modular fashion, with little or no additional computational cost.

References

Theory-Informed Improvements to Classifier-Free Guidance for Discrete Diffusion Models (Rojas et al., 11 Jul 2025)
Rethinking the Spatial Inconsistency in Classifier-Free Diffusion Guidance (Shen et al., 8 Apr 2024)
Characteristic Guidance: Non-linear Correction for Diffusion Model at Large Guidance Scale (Zheng et al., 2023)
EP-CFG: Energy-Preserving Classifier-Free Guidance (Zhang et al., 13 Dec 2024)
Rethinking Oversaturation in Classifier-Free Guidance via Low Frequency (Song et al., 26 Jun 2025)
CADE 2.5 - ZeResFDG: Frequency-Decoupled, Rescaled and Zero-Projected Guidance for SD/SDXL Latent Diffusion Models (Rychkovskiy et al., 14 Oct 2025)
Guide-and-Rescale: Self-Guidance Mechanism for Effective Tuning-Free Real Image Editing (Titov et al., 2 Sep 2024)
CFG-EC: Error Correction Classifier-Free Guidance (Yang et al., 18 Nov 2025)
Normalized Attention Guidance: Universal Negative Guidance for Diffusion Models (Chen et al., 27 May 2025)
Classifier-free Guidance with Adaptive Scaling (Malarz et al., 14 Feb 2025)
Stage-wise Dynamics of Classifier-Free Guidance in Diffusion Models (Jin et al., 26 Sep 2025)
Feedback Guidance of Diffusion Models (Koulischer et al., 6 Jun 2025)