Classifier-Free Projection Guidance (CFPG)

Updated 13 December 2025

CFPG is a geometric refinement of classifier-free guidance that projects score vectors onto the principal (normal) direction to eliminate tangential drift.
It employs a rank-2 SVD to isolate the dominant singular vector, aligning unconditional scores with conditional targets in diffusion models.
Empirical evaluations demonstrate improved FID metrics and reduced artifacts in text-to-image synthesis with negligible computational overhead.

Classifier-Free Projection Guidance (CFPG), realized in the Tangential Damping Classifier-Free Guidance (TCFG) framework, is a geometric refinement of the classifier-free guidance (CFG) process for diffusion models. CFPG systematically projects the unconditional score vector onto the principal direction (normal to the data manifold), thereby eliminating harmful tangential drift in the reverse diffusion process. This results in improved sample fidelity and context alignment during conditional synthesis tasks, such as text-to-image generation, and imposes negligible computational overhead (Kwon et al., 23 Mar 2025).

1. Definition of Conditional and Unconditional Scores in Diffusion Models

Diffusion models generate data by reversing a forward noising process defined as

$x_t = x_0 + \sigma(t)\, z,\quad z \sim \mathcal{N}(0,I),\quad \sigma(0)=0,\; \sigma(1)=\sigma_{\max}$

where $x_0$ is the original data and $x_t$ is its noisy counterpart at time $t$ . The model learns a score network $s_\theta(x_t, t) \approx \nabla_{x_t} \log p_t(x_t)$ .

In conditional generation, the network approximates

$s_\theta(x_t, t, y) \approx \nabla_{x_t} \log p_t(x_t \mid y),$

while $s_\theta(x_t, t)$ remains the unconditional score. Classifier-free guidance combines these as

$s_{\mathrm{CFG}}(x_t, t, y) = s_\theta(x_t, t) + \omega \left( s_\theta(x_t, t, y) - s_\theta(x_t, t) \right), \quad \omega = 1 + \gamma,$

used in the (discretized) reverse diffusion step: $x_{t_{k-1}} = x_{t_k} - \lambda_k\, s_{\mathrm{CFG}}(x_{t_k}, t_k, y) + \sqrt{2\, \beta(t_k)\, \sigma(t_k)^2}\, \xi_k,$ where $\lambda_k$ is schedule-dependent and $\xi_k$ is Brownian noise.

2. Geometric Properties of Scores and Manifold Structure

Under the manifold hypothesis, clean data $x_0$ resides on a low-dimensional manifold $\mathcal{M}_0 \subset \mathbb{R}^d$ . For small $t$ , Stanczuk et al. ('24) have shown that $\nabla_{x_t}\log p_t(x_t)$ is predominantly normal to the intermediate manifold $\mathcal{M}_t$ : $\frac{\|\mathbf{T}_p\, \nabla \log p_t\|}{\|\mathbf{N}_p\, \nabla \log p_t\|} \xrightarrow{t \to 0} 0,$ where $\mathbf{T}_p$ and $\mathbf{N}_p$ are tangent and normal projections at point $p$ . Within CFG, the conditional and unconditional scores at each step typically align in their normal components but have misaligned tangential components. Allowing the tangential part of $s_\theta(x_t, t)$ to persist can deflect samples off the data manifold, reducing alignment to the conditioning signal.

3. Singular Value Decomposition and Tangential Damping

At every step, TCFG forms the $d \times 2$ score matrix: $S = \left[ s_\theta(x_t, t, y)\quad s_\theta(x_t, t) \right] = U \Sigma V^\top,$ where $\Sigma = \mathrm{diag}(\sigma_1, \sigma_2)$ , $U \in \mathbb{R}^{d \times 2}$ , and $V \in \mathbb{R}^{2 \times 2}$ contains right singular vectors $v_1$ and $v_2$ . Empirically, $\sigma_1 \gg \sigma_2$ ; $v_1$ approximates the shared normal direction, $v_2$ the tangential subspace.

TCFG (CFPG) suppresses tangential components of the unconditional score by projecting onto $v_1$ : $\hat{s}_\theta(x_t, t) = (s_\theta(x_t, t) \cdot v_1 ) v_1 = s_\theta(x_t, t)\, v_1\, v_1^\top,$ effectively removing the $v_2$ component. This can equivalently be written using $V' = [v_1\ 0]$ : $\hat{s}_\theta(x_t, t) = s_\theta(x_t, t)\, V' V^\top.$ This process aligns the unconditional score with the conditional target along the normal direction while nullifying divergent tangential influences.

4. Algorithmic Implementation of Classifier-Free Projection Guidance

The CFPG procedure, as instantiated in TCFG, is as follows:

for k = T,T−1,…,1:
    s_uncond ← s_θ(x_{t_k}, t_k)
    s_cond   ← s_θ(x_{t_k}, t_k, y)
    S = [ s_cond , s_uncond ]          # S ∈ ℝ^{d×2}
    (U, Σ, V^T) = SVD(S)               # full_matrices=False
    v1 = V[:,0]                        # principal right–singular vector
    ŝ_uncond = (s_uncond⋅v1)·v1       # Eq.(6) in paper
    s̃ = ŝ_uncond + ω·(s_cond − ŝ_uncond) # projection-guided score
    x_{t_{k−1}} = x_{t_k} − λ_k·s̃ + √(2β(t_k)σ(t_k)²)·ξ_k
output: x_0

This modification is computationally light, requiring only a rank-2 SVD per step. All original scheduling (PNDM, DDIM, Euler, etc.) and hyperparameters are preserved, with the only addition being the insertion of the projection after extracting the predicted noises.

5. Practical Considerations and Hyperparameter Selection

Guidance scale ( $\omega$ ) recommendations are identical to those for standard CFG: 7.5 for Stable Diffusion v1.5, 5.0 for SDXL, etc. No additional thresholding or tuning is required; only the top singular vector is retained. According to Table A.1 of the supplement, even at $1024\times1024$ latent resolution, the overhead is below 0.01% in runtime and under 20 MB of memory on an RTX 3090. Standard sampling parameters (step count, CFG scale) remain optimal, with the only modification being insertion of the projection operation. This lightweight nature facilitates broad practical deployment.

6. Empirical Evaluation and Impact

Across major CFG-based diffusion models, TCFG (CFPG) consistently improves Fréchet Inception Distance (FID) metrics without degrading alignment, as measured by CLIPScore. Empirical results include:

Model	Vanilla FID	TCFG FID	CLIPScore (unchanged)
SD v1.5, 50 steps, 7.5	13.26	13.12	0.31
SDXL, 50 steps, 5.0	13.36	12.65	0.32
SD v3 (Flow)	16.66	13.74	–
DiT, ImageNet, 50k	32.67	29.50	–

For DiT, recall increased from 0.13 to 0.19 and sFID dropped from 17.92 to 13.27. Qualitative analyses (Figs. 8–10) demonstrate reduction of artifacts and more natural color balance. TCFG thus enables more accurate and contextually coherent image synthesis, supporting existing guidance frameworks in both fidelity and semantic quality.

7. Summary and Context within the Diffusion Literature

Classifier-Free Projection Guidance (as instantiated in TCFG) advances the geometric understanding of guided diffusion sampling by explicitly correcting for tangential misalignment in the guidance phase. By enforcing the unconditional score's alignment to the principal manifold direction at each step, CFPG brings theoretical and empirical improvements to score-based generative modeling. This strategy integrates seamlessly with DDPM-family samplers and established CFG architectures, requiring negligible additional resources and no adaptation of existing hyperparameters. These properties position CFPG as a practically effective and theoretically motivated extension of classifier-free guidance across a range of state-of-the-art diffusion models (Kwon et al., 23 Mar 2025).

Markdown Report Issue Upgrade to Chat

References (1)

TCFG: Tangential Damping Classifier-free Guidance (2025)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Classifier-Free Projection Guidance (CFPG).