Papers
Topics
Authors
Recent
Search
2000 character limit reached

Classifier-Free Projection Guidance (CFPG)

Updated 13 December 2025
  • CFPG is a geometric refinement of classifier-free guidance that projects score vectors onto the principal (normal) direction to eliminate tangential drift.
  • It employs a rank-2 SVD to isolate the dominant singular vector, aligning unconditional scores with conditional targets in diffusion models.
  • Empirical evaluations demonstrate improved FID metrics and reduced artifacts in text-to-image synthesis with negligible computational overhead.

Classifier-Free Projection Guidance (CFPG), realized in the Tangential Damping Classifier-Free Guidance (TCFG) framework, is a geometric refinement of the classifier-free guidance (CFG) process for diffusion models. CFPG systematically projects the unconditional score vector onto the principal direction (normal to the data manifold), thereby eliminating harmful tangential drift in the reverse diffusion process. This results in improved sample fidelity and context alignment during conditional synthesis tasks, such as text-to-image generation, and imposes negligible computational overhead (Kwon et al., 23 Mar 2025).

1. Definition of Conditional and Unconditional Scores in Diffusion Models

Diffusion models generate data by reversing a forward noising process defined as

xt=x0+σ(t)z,zN(0,I),σ(0)=0,  σ(1)=σmaxx_t = x_0 + \sigma(t)\, z,\quad z \sim \mathcal{N}(0,I),\quad \sigma(0)=0,\; \sigma(1)=\sigma_{\max}

where x0x_0 is the original data and xtx_t is its noisy counterpart at time tt. The model learns a score network sθ(xt,t)xtlogpt(xt)s_\theta(x_t, t) \approx \nabla_{x_t} \log p_t(x_t).

In conditional generation, the network approximates

sθ(xt,t,y)xtlogpt(xty),s_\theta(x_t, t, y) \approx \nabla_{x_t} \log p_t(x_t \mid y),

while sθ(xt,t)s_\theta(x_t, t) remains the unconditional score. Classifier-free guidance combines these as

sCFG(xt,t,y)=sθ(xt,t)+ω(sθ(xt,t,y)sθ(xt,t)),ω=1+γ,s_{\mathrm{CFG}}(x_t, t, y) = s_\theta(x_t, t) + \omega \left( s_\theta(x_t, t, y) - s_\theta(x_t, t) \right), \quad \omega = 1 + \gamma,

used in the (discretized) reverse diffusion step: xtk1=xtkλksCFG(xtk,tk,y)+2β(tk)σ(tk)2ξk,x_{t_{k-1}} = x_{t_k} - \lambda_k\, s_{\mathrm{CFG}}(x_{t_k}, t_k, y) + \sqrt{2\, \beta(t_k)\, \sigma(t_k)^2}\, \xi_k, where λk\lambda_k is schedule-dependent and ξk\xi_k is Brownian noise.

2. Geometric Properties of Scores and Manifold Structure

Under the manifold hypothesis, clean data x0x_0 resides on a low-dimensional manifold M0Rd\mathcal{M}_0 \subset \mathbb{R}^d. For small tt, Stanczuk et al. ('24) have shown that xtlogpt(xt)\nabla_{x_t}\log p_t(x_t) is predominantly normal to the intermediate manifold Mt\mathcal{M}_t: TplogptNplogptt00,\frac{\|\mathbf{T}_p\, \nabla \log p_t\|}{\|\mathbf{N}_p\, \nabla \log p_t\|} \xrightarrow{t \to 0} 0, where Tp\mathbf{T}_p and Np\mathbf{N}_p are tangent and normal projections at point pp. Within CFG, the conditional and unconditional scores at each step typically align in their normal components but have misaligned tangential components. Allowing the tangential part of sθ(xt,t)s_\theta(x_t, t) to persist can deflect samples off the data manifold, reducing alignment to the conditioning signal.

3. Singular Value Decomposition and Tangential Damping

At every step, TCFG forms the d×2d \times 2 score matrix: S=[sθ(xt,t,y)sθ(xt,t)]=UΣV,S = \left[ s_\theta(x_t, t, y)\quad s_\theta(x_t, t) \right] = U \Sigma V^\top, where Σ=diag(σ1,σ2)\Sigma = \mathrm{diag}(\sigma_1, \sigma_2), URd×2U \in \mathbb{R}^{d \times 2}, and VR2×2V \in \mathbb{R}^{2 \times 2} contains right singular vectors v1v_1 and v2v_2. Empirically, σ1σ2\sigma_1 \gg \sigma_2; v1v_1 approximates the shared normal direction, v2v_2 the tangential subspace.

TCFG (CFPG) suppresses tangential components of the unconditional score by projecting onto v1v_1: s^θ(xt,t)=(sθ(xt,t)v1)v1=sθ(xt,t)v1v1,\hat{s}_\theta(x_t, t) = (s_\theta(x_t, t) \cdot v_1 ) v_1 = s_\theta(x_t, t)\, v_1\, v_1^\top, effectively removing the v2v_2 component. This can equivalently be written using V=[v1 0]V' = [v_1\ 0]: s^θ(xt,t)=sθ(xt,t)VV.\hat{s}_\theta(x_t, t) = s_\theta(x_t, t)\, V' V^\top. This process aligns the unconditional score with the conditional target along the normal direction while nullifying divergent tangential influences.

4. Algorithmic Implementation of Classifier-Free Projection Guidance

The CFPG procedure, as instantiated in TCFG, is as follows:

1
2
3
4
5
6
7
8
9
10
for k = T,T1,,1:
    s_uncond  s_θ(x_{t_k}, t_k)
    s_cond    s_θ(x_{t_k}, t_k, y)
    S = [ s_cond , s_uncond ]          # S ∈ ℝ^{d×2}
    (U, Σ, V^T) = SVD(S)               # full_matrices=False
    v1 = V[:,0]                        # principal right–singular vector
    ŝ_uncond = (s_uncondv1)·v1       # Eq.(6) in paper
    s̃ = ŝ_uncond + ω·(s_cond  ŝ_uncond) # projection-guided score
    x_{t_{k1}} = x_{t_k}  λ_k·s̃ + (2β(t_k)σ(t_k)²)·ξ_k
output: x_0
This modification is computationally light, requiring only a rank-2 SVD per step. All original scheduling (PNDM, DDIM, Euler, etc.) and hyperparameters are preserved, with the only addition being the insertion of the projection after extracting the predicted noises.

5. Practical Considerations and Hyperparameter Selection

Guidance scale (ω\omega) recommendations are identical to those for standard CFG: 7.5 for Stable Diffusion v1.5, 5.0 for SDXL, etc. No additional thresholding or tuning is required; only the top singular vector is retained. According to Table A.1 of the supplement, even at 1024×10241024\times1024 latent resolution, the overhead is below 0.01% in runtime and under 20 MB of memory on an RTX 3090. Standard sampling parameters (step count, CFG scale) remain optimal, with the only modification being insertion of the projection operation. This lightweight nature facilitates broad practical deployment.

6. Empirical Evaluation and Impact

Across major CFG-based diffusion models, TCFG (CFPG) consistently improves Fréchet Inception Distance (FID) metrics without degrading alignment, as measured by CLIPScore. Empirical results include:

Model Vanilla FID TCFG FID CLIPScore (unchanged)
SD v1.5, 50 steps, 7.5 13.26 13.12 0.31
SDXL, 50 steps, 5.0 13.36 12.65 0.32
SD v3 (Flow) 16.66 13.74
DiT, ImageNet, 50k 32.67 29.50

For DiT, recall increased from 0.13 to 0.19 and sFID dropped from 17.92 to 13.27. Qualitative analyses (Figs. 8–10) demonstrate reduction of artifacts and more natural color balance. TCFG thus enables more accurate and contextually coherent image synthesis, supporting existing guidance frameworks in both fidelity and semantic quality.

7. Summary and Context within the Diffusion Literature

Classifier-Free Projection Guidance (as instantiated in TCFG) advances the geometric understanding of guided diffusion sampling by explicitly correcting for tangential misalignment in the guidance phase. By enforcing the unconditional score's alignment to the principal manifold direction at each step, CFPG brings theoretical and empirical improvements to score-based generative modeling. This strategy integrates seamlessly with DDPM-family samplers and established CFG architectures, requiring negligible additional resources and no adaptation of existing hyperparameters. These properties position CFPG as a practically effective and theoretically motivated extension of classifier-free guidance across a range of state-of-the-art diffusion models (Kwon et al., 23 Mar 2025).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Classifier-Free Projection Guidance (CFPG).