Score Distillation Sampling with Learned Manifold Corrective (2401.05293v2)

Published 10 Jan 2024 in cs.CV

Abstract: Score Distillation Sampling (SDS) is a recent but already widely popular method that relies on an image diffusion model to control optimization problems using text prompts. In this paper, we conduct an in-depth analysis of the SDS loss function, identify an inherent problem with its formulation, and propose a surprisingly easy but effective fix. Specifically, we decompose the loss into different factors and isolate the component responsible for noisy gradients. In the original formulation, high text guidance is used to account for the noise, leading to unwanted side effects such as oversaturation or repeated detail. Instead, we train a shallow network mimicking the timestep-dependent frequency bias of the image diffusion model in order to effectively factor it out. We demonstrate the versatility and the effectiveness of our novel loss formulation through qualitative and quantitative experiments, including optimization-based image synthesis and editing, zero-shot image translation network training, and text-to-3D synthesis.

References (40)

Citations (8)

View on Semantic Scholar

Summary

The paper introduces LMC-SDS as a correction to SDS by training a shallow network to remove noisy gradients and mitigate artifacts in image synthesis.
It demonstrates that lowering text guidance is feasible while maintaining stability and improving the overall visual quality of generated images.
Empirical results validate LMC-SDS across diverse applications, including text-to-image, image editing, and text-to-3D synthesis, with enhanced detail and clarity.

Introduction

Researchers have presented an insightful analysis of Score Distillation Sampling (SDS) as used with image diffusion models controlled by text prompts. SDS, utilized in creative applications such as text-to-image or text-to-3D synthesis, has shown remarkable capabilities but is not free from drawbacks. Issues like noisy gradients, excessive guidance, and side effects on image quality have been noted. Addressing these concerns, the authors have introduced an improved loss formulation known as SDS with Learned Manifold Corrective (LMC-SDS).

Understanding the Original SDS

SDS employs a pretrained text-to-image model to gauge how closely an image aligns with a textual description. While powerful, SDS could compromise image observations, be overly aggressive in matching the text prompt, or yield ineffective gradients contributing noise to the optimization goal. The paper breaks down the SDS loss, unearthing the components behind its weaknesses. They point out that even though high text guidance compensated for the noise in SDS, it led to picture degradation and artifacts.

The LMC-SDS Solution

The paper proposes a simple yet effective solution by training a shallow network to mimic the denoising inadequacies of the image diffusion model. The goal is to exclude this erroneous factor from influencing the gradients. LMC-SDS strives to yield better gradients, permit the use of lower text guidance, and enhance the visual quality of the results. The researchers back up their findings through a variety of experiments that showcase the robustness and flexibility of LMC-SDS across multiple applications.

Empirical Evidence and Applications

Extensive testing highlights LMC-SDS's superiority for tasks such as optimization-based image synthesis and editing, image-to-image translation network training, and even text-to-3D synthesis. For instance, in 3D asset generation, LMC-SDS enables the production of more detailed and sharp images compared to the original SDS formulation. Additionally, LMC-SDS allows for diverse outcomes in image editing by fixing certain parameters during the optimization process, displaying versatility in creative contexts.

Conclusion and Future Directions

LMC-SDS represents a leap forward in resolving critical issues associated with SDS when integrating diffusion models into optimization challenges. The meticulous decomposition of SDS and the proposition of LMC-SDS reflect a thoughtful approach to improving the gradients used in image manipulation tasks. The novel loss formulation is a substantial initial step toward stable and meaningful applications, with further exploration anticipated in enhancing the corrective component and employing these insights in practical creative scenarios.

PDF Markdown

Related Papers

Tweets

https://twitter.com/fly51fly/status/1745350717593076050

https://twitter.com/IAmACatAI/status/1745369969804886450

https://twitter.com/semisance/status/1745409490738966758