One-step Diffusion with Distribution Matching Distillation (2311.18828v4)

Published 30 Nov 2023 in cs.CV

Abstract: Diffusion models generate high-quality images but require dozens of forward passes. We introduce Distribution Matching Distillation (DMD), a procedure to transform a diffusion model into a one-step image generator with minimal impact on image quality. We enforce the one-step image generator match the diffusion model at distribution level, by minimizing an approximate KL divergence whose gradient can be expressed as the difference between 2 score functions, one of the target distribution and the other of the synthetic distribution being produced by our one-step generator. The score functions are parameterized as two diffusion models trained separately on each distribution. Combined with a simple regression loss matching the large-scale structure of the multi-step diffusion outputs, our method outperforms all published few-step diffusion approaches, reaching 2.62 FID on ImageNet 64x64 and 11.49 FID on zero-shot COCO-30k, comparable to Stable Diffusion but orders of magnitude faster. Utilizing FP16 inference, our model generates images at 20 FPS on modern hardware.

Authors (7)

Tianwei Yin (12 papers)
Richard Zhang (61 papers)
Eli Shechtman (102 papers)
Fredo Durand (39 papers)
William T. Freeman (114 papers)
Taesung Park (24 papers)
Michaël Gharbi (17 papers)

Citations (127)

View on Semantic Scholar

Summary

One-step Diffusion with Distribution Matching Distillation: An Overview

The paper "One-step Diffusion with Distribution Matching Distillation" presents an innovative approach to accelerate the sampling process of diffusion models, which have gained widespread attention for their capability to generate highly realistic images. Diffusion models typically require extensive computation, as they involve iterative processes to transform Gaussian noise into coherent images. This extensive computational demand poses a limitation for real-time applications.

Contribution and Methodology

The authors introduce Distribution Matching Distillation (DMD), a novel method to distill a diffusion model into a single-step image generator without significantly compromising image quality. The essence of DMD lies in matching the outputs of a one-step generator with a diffusion model at the distribution level, achieved by minimizing a KL divergence approximation through the difference of score functions.

Key elements of the methodology include:

Score Functions: The distribution matching is accomplished by utilizing two diffusion models to parameterize score functions for both the target and the synthetic distributions. These models are trained separately for each distribution.
Regression Loss: Combined with the distribution matching loss, a simple regression loss is incorporated to align the large-scale structure of the outputs between multi-step diffusion and one-step generators.
Performance: The method achieves competitive results on benchmarks such as ImageNet and zero-shot MS COCO, matching Stable Diffusion in quality while being significantly faster—generating images at up to 20 FPS using FP16 inference.

Numerical Results

The paper reports strong numerical performance, notably:

FID Scores: DMD achieved an FID of 2.62 on ImageNet 64×64 and 11.49 on zero-shot COCO-30k, demonstrating results comparable to costly diffusion models at a fraction of the computational expense.
Efficiency: A 100x reduction in neural network evaluations is achieved, showcasing the method’s efficiency.

Implications and Future Work

The implications of this research are twofold:

Practical Impact: DMD can transform existing diffusion models into highly efficient one-step generators, enabling their deployment in interactive applications where speed and responsiveness are crucial.
Theoretical Insights: The approach provides insights into distribution matching at the generative model level, potentially influencing future work in GANs, VAEs, and beyond.

In considering future directions, advancements may include extending the DMD methodology to more complex datasets and exploring the inclusion of adaptive guidance scales to further enhance flexibility and image quality. Additionally, refinement of score approximation techniques could yield further improvements in sample diversity and fidelity.

Overall, the paper presents a significant step forward in the field of efficient image generation, offering both practical solutions and theoretical contributions to the field of AI.