Eliminating Oversaturation and Artifacts of High Guidance Scales in Diffusion Models
The paper addresses a notable problem associated with the use of Classifier-Free Guidance (CFG) in diffusion models—oversaturation and unrealistic artifacts when employing high guidance scales. To counteract these issues, the authors introduce Adaptive Projected Guidance (APG), a method designed to retain the quality-boosting advantages of CFG while minimizing its negative effects.
Key Contributions
The core of the paper is the decompositional analysis of the CFG update rule, which is split into parallel and orthogonal components relative to the model's conditional prediction. This separation reveals that oversaturation predominantly stems from the parallel component, while quality enhancement is largely due to the orthogonal component. By decreasing the influence of the parallel component using a hyperparameter , their novel method, APG, effectively diminishes oversaturation without compromising the quality of the outputs.
In addition to projection, the authors establish a link between CFG and gradient ascent, motivating the introduction of rescaling and reverse momentum techniques. These adjustments control the influence of each CFG update, inhibiting significant drifts during the sampling process. Thus, APG can employ higher guidance scales than CFG without encountering oversaturation artifacts.
Numerical Results
The paper presents strong numerical results evidencing the efficacy of APG compared to CFG. Across multiple diffusion models, APG consistently improves Fréchet Inception Distance (FID), recall, and saturation scores while maintaining precision. These improvements were consistent under various conditions, including different sampling algorithms and distilled models.
Theoretical and Practical Implications
Theoretically, this work extends the understanding of CFG's update dynamics, offering insights into disentangling different effects within generative models. This allows researchers to better adjust model parameters to cater to specific quality or aesthetic preferences in image generation.
Practically, the introduction of APG greatly expands the usability of diffusion models by enhancing image quality while sidestepping common artifacts of high guidance scales. This could have significant implications in fields relying heavily on photorealistic image generation, such as content creation, advertising, and digital art.
Future Research Directions
The potential for enhancing speed in diffusion models remains a promising area for future work. Specifically, reducing the computational overhead of model guidance could facilitate the broader applicability of APG without the need for extensive computational resources. Moreover, exploring APG's adaptability to other domains such as audio or video generation offers intriguing avenues for multidisciplinary impact.
Conclusion
This paper contributes a nuanced, effective approach to tackling the pervasive problem of oversaturation in high-scale CFG guidance, demonstrating both theoretical innovation and practical utility. It presents the diffusion model community with a robust tool, APG, that not only preserves, but enhances the quality of generative model outputs, thereby cementing its role as a viable alternative to CFG. This work serves as a foundation for further exploration and optimization of generative guidance techniques, promising ongoing advancements in diffusion model capabilities.