Classifier-Free Projection Guidance (CFPG)
- CFPG is a geometric refinement of classifier-free guidance that projects score vectors onto the principal (normal) direction to eliminate tangential drift.
- It employs a rank-2 SVD to isolate the dominant singular vector, aligning unconditional scores with conditional targets in diffusion models.
- Empirical evaluations demonstrate improved FID metrics and reduced artifacts in text-to-image synthesis with negligible computational overhead.
Classifier-Free Projection Guidance (CFPG), realized in the Tangential Damping Classifier-Free Guidance (TCFG) framework, is a geometric refinement of the classifier-free guidance (CFG) process for diffusion models. CFPG systematically projects the unconditional score vector onto the principal direction (normal to the data manifold), thereby eliminating harmful tangential drift in the reverse diffusion process. This results in improved sample fidelity and context alignment during conditional synthesis tasks, such as text-to-image generation, and imposes negligible computational overhead (Kwon et al., 23 Mar 2025).
1. Definition of Conditional and Unconditional Scores in Diffusion Models
Diffusion models generate data by reversing a forward noising process defined as
where is the original data and is its noisy counterpart at time . The model learns a score network .
In conditional generation, the network approximates
while remains the unconditional score. Classifier-free guidance combines these as
used in the (discretized) reverse diffusion step: where is schedule-dependent and is Brownian noise.
2. Geometric Properties of Scores and Manifold Structure
Under the manifold hypothesis, clean data resides on a low-dimensional manifold . For small , Stanczuk et al. ('24) have shown that is predominantly normal to the intermediate manifold : where and are tangent and normal projections at point . Within CFG, the conditional and unconditional scores at each step typically align in their normal components but have misaligned tangential components. Allowing the tangential part of to persist can deflect samples off the data manifold, reducing alignment to the conditioning signal.
3. Singular Value Decomposition and Tangential Damping
At every step, TCFG forms the score matrix: where , , and contains right singular vectors and . Empirically, ; approximates the shared normal direction, the tangential subspace.
TCFG (CFPG) suppresses tangential components of the unconditional score by projecting onto : effectively removing the component. This can equivalently be written using : This process aligns the unconditional score with the conditional target along the normal direction while nullifying divergent tangential influences.
4. Algorithmic Implementation of Classifier-Free Projection Guidance
The CFPG procedure, as instantiated in TCFG, is as follows:
1 2 3 4 5 6 7 8 9 10 |
for k = T,T−1,…,1: s_uncond ← s_θ(x_{t_k}, t_k) s_cond ← s_θ(x_{t_k}, t_k, y) S = [ s_cond , s_uncond ] # S ∈ ℝ^{d×2} (U, Σ, V^T) = SVD(S) # full_matrices=False v1 = V[:,0] # principal right–singular vector ŝ_uncond = (s_uncond⋅v1)·v1 # Eq.(6) in paper s̃ = ŝ_uncond + ω·(s_cond − ŝ_uncond) # projection-guided score x_{t_{k−1}} = x_{t_k} − λ_k·s̃ + √(2β(t_k)σ(t_k)²)·ξ_k output: x_0 |
5. Practical Considerations and Hyperparameter Selection
Guidance scale () recommendations are identical to those for standard CFG: 7.5 for Stable Diffusion v1.5, 5.0 for SDXL, etc. No additional thresholding or tuning is required; only the top singular vector is retained. According to Table A.1 of the supplement, even at latent resolution, the overhead is below 0.01% in runtime and under 20 MB of memory on an RTX 3090. Standard sampling parameters (step count, CFG scale) remain optimal, with the only modification being insertion of the projection operation. This lightweight nature facilitates broad practical deployment.
6. Empirical Evaluation and Impact
Across major CFG-based diffusion models, TCFG (CFPG) consistently improves Fréchet Inception Distance (FID) metrics without degrading alignment, as measured by CLIPScore. Empirical results include:
| Model | Vanilla FID | TCFG FID | CLIPScore (unchanged) |
|---|---|---|---|
| SD v1.5, 50 steps, 7.5 | 13.26 | 13.12 | 0.31 |
| SDXL, 50 steps, 5.0 | 13.36 | 12.65 | 0.32 |
| SD v3 (Flow) | 16.66 | 13.74 | – |
| DiT, ImageNet, 50k | 32.67 | 29.50 | – |
For DiT, recall increased from 0.13 to 0.19 and sFID dropped from 17.92 to 13.27. Qualitative analyses (Figs. 8–10) demonstrate reduction of artifacts and more natural color balance. TCFG thus enables more accurate and contextually coherent image synthesis, supporting existing guidance frameworks in both fidelity and semantic quality.
7. Summary and Context within the Diffusion Literature
Classifier-Free Projection Guidance (as instantiated in TCFG) advances the geometric understanding of guided diffusion sampling by explicitly correcting for tangential misalignment in the guidance phase. By enforcing the unconditional score's alignment to the principal manifold direction at each step, CFPG brings theoretical and empirical improvements to score-based generative modeling. This strategy integrates seamlessly with DDPM-family samplers and established CFG architectures, requiring negligible additional resources and no adaptation of existing hyperparameters. These properties position CFPG as a practically effective and theoretically motivated extension of classifier-free guidance across a range of state-of-the-art diffusion models (Kwon et al., 23 Mar 2025).