Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
102 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Eliminating Oversaturation and Artifacts of High Guidance Scales in Diffusion Models (2410.02416v1)

Published 3 Oct 2024 in cs.LG and cs.CV

Abstract: Classifier-free guidance (CFG) is crucial for improving both generation quality and alignment between the input condition and final output in diffusion models. While a high guidance scale is generally required to enhance these aspects, it also causes oversaturation and unrealistic artifacts. In this paper, we revisit the CFG update rule and introduce modifications to address this issue. We first decompose the update term in CFG into parallel and orthogonal components with respect to the conditional model prediction and observe that the parallel component primarily causes oversaturation, while the orthogonal component enhances image quality. Accordingly, we propose down-weighting the parallel component to achieve high-quality generations without oversaturation. Additionally, we draw a connection between CFG and gradient ascent and introduce a new rescaling and momentum method for the CFG update rule based on this insight. Our approach, termed adaptive projected guidance (APG), retains the quality-boosting advantages of CFG while enabling the use of higher guidance scales without oversaturation. APG is easy to implement and introduces practically no additional computational overhead to the sampling process. Through extensive experiments, we demonstrate that APG is compatible with various conditional diffusion models and samplers, leading to improved FID, recall, and saturation scores while maintaining precision comparable to CFG, making our method a superior plug-and-play alternative to standard classifier-free guidance.

Eliminating Oversaturation and Artifacts of High Guidance Scales in Diffusion Models

The paper addresses a notable problem associated with the use of Classifier-Free Guidance (CFG) in diffusion models—oversaturation and unrealistic artifacts when employing high guidance scales. To counteract these issues, the authors introduce Adaptive Projected Guidance (APG), a method designed to retain the quality-boosting advantages of CFG while minimizing its negative effects.

Key Contributions

The core of the paper is the decompositional analysis of the CFG update rule, which is split into parallel and orthogonal components relative to the model's conditional prediction. This separation reveals that oversaturation predominantly stems from the parallel component, while quality enhancement is largely due to the orthogonal component. By decreasing the influence of the parallel component using a hyperparameter η\eta, their novel method, APG, effectively diminishes oversaturation without compromising the quality of the outputs.

In addition to projection, the authors establish a link between CFG and gradient ascent, motivating the introduction of rescaling and reverse momentum techniques. These adjustments control the influence of each CFG update, inhibiting significant drifts during the sampling process. Thus, APG can employ higher guidance scales than CFG without encountering oversaturation artifacts.

Numerical Results

The paper presents strong numerical results evidencing the efficacy of APG compared to CFG. Across multiple diffusion models, APG consistently improves Fréchet Inception Distance (FID), recall, and saturation scores while maintaining precision. These improvements were consistent under various conditions, including different sampling algorithms and distilled models.

Theoretical and Practical Implications

Theoretically, this work extends the understanding of CFG's update dynamics, offering insights into disentangling different effects within generative models. This allows researchers to better adjust model parameters to cater to specific quality or aesthetic preferences in image generation.

Practically, the introduction of APG greatly expands the usability of diffusion models by enhancing image quality while sidestepping common artifacts of high guidance scales. This could have significant implications in fields relying heavily on photorealistic image generation, such as content creation, advertising, and digital art.

Future Research Directions

The potential for enhancing speed in diffusion models remains a promising area for future work. Specifically, reducing the computational overhead of model guidance could facilitate the broader applicability of APG without the need for extensive computational resources. Moreover, exploring APG's adaptability to other domains such as audio or video generation offers intriguing avenues for multidisciplinary impact.

Conclusion

This paper contributes a nuanced, effective approach to tackling the pervasive problem of oversaturation in high-scale CFG guidance, demonstrating both theoretical innovation and practical utility. It presents the diffusion model community with a robust tool, APG, that not only preserves, but enhances the quality of generative model outputs, thereby cementing its role as a viable alternative to CFG. This work serves as a foundation for further exploration and optimization of generative guidance techniques, promising ongoing advancements in diffusion model capabilities.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (3)
  1. Seyedmorteza Sadat (9 papers)
  2. Otmar Hilliges (120 papers)
  3. Romann M. Weber (12 papers)
Citations (2)