CFG++: Manifold-Constrained Guidance
- The paper demonstrates that CFG++ overcomes standard classifier-free guidance limitations by constraining samples to the learned noisy data manifold.
- It employs a geometric, inverse-problem formulation with in-manifold interpolation to restore one-step invertibility and reduce mode collapse.
- Empirical results show improved FID, PSNR, and diversity across text-to-image, inversion, and editing tasks, with seamless integration into existing frameworks.
CFG++: Manifold-Constrained Classifier-Free Guidance
@@@@1@@@@ Plus Plus (CFG++), or "manifold-constrained classifier-free guidance," is a methodology developed to address geometric and sample quality limitations observed in standard classifier-free guidance (CFG) for diffusion models. By ensuring that guidance steps remain within the learned noisy data manifold, CFG++ enables faithful image-to-noise inversion, reduces mode collapse, and delivers high alignment to prompts at lower guidance scales. Its geometry-aware framework is readily integrated into a variety of diffusion sampling algorithms and has been subsequently generalized (e.g., Rectified-CFG++ for flow models) and analyzed from probabilistic and inverse problem standpoints.
1. Standard Classifier-Free Guidance and Its Limitations
In conditional diffusion models, standard CFG achieves text-guided generation by combining conditional and unconditional score networks. The standard CFG score at time is
where and approximate and , respectively, and is the guidance scale.
Empirical deployment of CFG in deterministic samplers such as DDIM can produce high-fidelity, prompt-adherent images. However, with , the conditional sample is extrapolated outside the noisy data manifold, leading to non-invertibility, failure in image editing, and strong mode collapse with loss of sample diversity (Chung et al., 12 Jun 2024).
2. Off-Manifold Drift and the Inverse Problem View
CFG++ is motivated by a geometric analysis of the off-manifold drift problem. The “manifold” designates the set of plausible noisy states under the marginal . Geometrically, standard CFG moves samples beyond the locally linear data manifold, exacerbating pathologies such as loss of inversion, artifacts, and acute reduction in diversity—drawbacks not inherent to diffusion models, but to the extrapolative score combination in standard CFG (Chung et al., 12 Jun 2024).
Inspired by score-based inverse problem solvers, CFG++ reformulates prompt guidance as an inverse problem with a text-conditioned score matching loss:
where is the conventional conditional denoised estimate.
Minimizing pulls samples toward the manifold compatible with the textual condition.
3. CFG++ Update Rule and Algorithmic Structure
CFG++ applies a small, in-manifold interpolation during each reverse step, unlike the extrapolation performed in CFG. The update rule is given by
with .
The key distinguishing feature is the usage of the unconditional score for the re-noising operation within each sampling step, rather than applying guidance in both denoising and re-noising as in CFG. In DDIM, the step takes the form
provides a linear interpolation between unconditional () and conditional () sampling, and does not exceed 1 (never extrapolatory).
CFG++ is a drop-in replacement for CFG, requiring no architectural modifications and no additional evaluation cost (Chung et al., 12 Jun 2024, Saini et al., 9 Oct 2025).
4. Manifold Fidelity, Invertibility, and Integration
By construction, CFG++ maintains the reverse sampling trajectory on the learned noisy data manifold. This in-manifold interpolation restores approximate one-step invertibility for DDIM. The formulas for DDIM inversion—used for real-image inversion and editing—remain faithful, yielding high reconstruction fidelity (PSNR > 25 dB for COCO images), while standard CFG () yields non-invertible or highly distorted inversions (PSNR < 10 dB) (Chung et al., 12 Jun 2024).
The geometry-aware approach ensures that samples do not collapse off-manifold, preserving sample diversity and preventing artifacts. The integration of CFG++ into higher-order solvers (e.g., Karras–Euler, DPM-Solver++) or into distilled diffusion models is immediate: the scheme only replaces the initial denoising call with , leaving all further solver-internal score calls unconditional.
5. Guidance Scale, Diversity, and Empirical Performance
A defining feature of CFG++ is that the interpolation coefficient remains in , offering guidance that is neither over-aggressive nor prone to off-manifold collapse. Empirical results demonstrate that even is less aggressive than conventional scales (for 50-step DDIM) and avoids mode collapse at all tested settings (Chung et al., 12 Jun 2024).
On COCO 10k (Stable Diffusion v1.5, SDXL, Lightning, Turbo) with DDIM or DPM++2M solvers, CFG++ consistently improves FID by 0.9–1.0 points compared to matched-scale CFG and matches or increases CLIP alignment (Table 1, (Chung et al., 12 Jun 2024)). For inversion and editing, CFG++ delivers high-fidelity reconstruction and enables accurate edits that are nonviable under excessive CFG.
Across a wide range of text-to-image and inverse problem tasks (super-resolution, deblurring, inpainting), CFG++ yields consistently lower FID, LPIPS, and higher PSNR compared to vanilla or standard CFG (Table 3, (Chung et al., 12 Jun 2024)).
6. Extensions: Flow Models and Other Guidance Paradigms
CFG++ has been generalized to non-diffusion architectures, notably flow-matching models. In Rectified-CFG++ (Saini et al., 9 Oct 2025), the predictor-corrector scheme anchors the sample to the conditional flow and then interpolates at each time step between conditional and unconditional velocities at an intermediate latent, maintaining trajectories within a bounded tubular neighborhood of the transport manifold.
The algorithmic overhead is a single extra velocity evaluation per step, with robust stability across a broad range of models and sampling schedules. Empirical evaluations (MS-COCO, LAION-Aesthetic, T2I-CompBench) show that Rectified-CFG++ outperforms vanilla CFG on FID, compositional understanding, and text/text+image reward metrics, delivering improvements even at very low numbers of network function evaluations.
Algorithm 1. Rectified-CFG++ sampling loop (abridged):
1 2 3 4 5 6 7 |
for n = 1 ... N: v_cond = v_theta(x_n, t, y) x_pred = x_n + (Δt/2) * v_cond v_c = v_theta(x_pred, t-Δt/2, y) v_u = v_theta(x_pred, t-Δt/2, ∅) v_hat = v_cond + α(t) * (v_c - v_u) x_{n+1} = x_n + Δt * v_hat |
7. Practical Implications and Future Directions
CFG++ is directly applicable to text-to-image and video diffusion pipelines, real-image inversion and editing, robust 3D generation (via Score Distillation Sampling with reduced guidance noise), and text-conditioned scientific and medical imaging. Its manifold-constrained construction also allows generalization to cross-modal and non-image tasks (e.g., text-to-audio models).
As the geometric view of diffusion guidance matures, hybrid guidance frameworks combining CFG++ with other manifold- or curvature-aware methods (e.g., Saddle-Free Guidance) are anticipated. These would optimize the fidelity/diversity trade-off by leveraging both gradient-based and geometric corrections.
A synopsis of experimental outcomes:
| Model | Task | Metric | CFG | CFG++ |
|---|---|---|---|---|
| SDXL / Turbo | T2I COCO-10K, DDIM | FID | Baseline | ↓ 0.9–1.0 |
| SD XL | Inversion+Editing | PSNR | <10 dB | >25 dB |
| SD v1.5/PSLD | Inverse Problems (SR ×8) | FID (SR ×8) | 41.2 | 36.6 |
Editor’s term: “CFG++-style guidance” refers to manifold-constrained, in-manifold interpolation between conditional and unconditional branches, with unconditional-based re-noising.
In summary, CFG++ addresses longstanding weaknesses of standard classifier-free guidance by enforcing on-manifold geometry, recovering invertibility and editability, maintaining diversity, and achieving consistent improvement in generation quality at no additional computational cost (Chung et al., 12 Jun 2024, Saini et al., 9 Oct 2025).