Improved Denoising Diffusion Probabilistic Models (2102.09672v1)

Published 18 Feb 2021 in cs.LG, cs.AI, and stat.ML

Abstract: Denoising diffusion probabilistic models (DDPM) are a class of generative models which have recently been shown to produce excellent samples. We show that with a few simple modifications, DDPMs can also achieve competitive log-likelihoods while maintaining high sample quality. Additionally, we find that learning variances of the reverse diffusion process allows sampling with an order of magnitude fewer forward passes with a negligible difference in sample quality, which is important for the practical deployment of these models. We additionally use precision and recall to compare how well DDPMs and GANs cover the target distribution. Finally, we show that the sample quality and likelihood of these models scale smoothly with model capacity and training compute, making them easily scalable. We release our code at https://github.com/openai/improved-diffusion

Authors (2)

Alex Nichol (10 papers)
Prafulla Dhariwal (15 papers)

Citations (2,957)

View on Semantic Scholar

Summary

The paper improves DDPM performance by introducing a hybrid objective and cosine noise schedule that boost log-likelihoods and reduce gradient noise.
The methodology accelerates sampling by learning reverse process variances, achieving high-quality image generation with as few as 50 forward passes.
The study demonstrates superior mode coverage compared to GANs and highlights the scalable potential of DDPMs for practical applications.

Improved Denoising Diffusion Probabilistic Models

In this paper, the authors present a series of advancements in the field of Denoising Diffusion Probabilistic Models (DDPMs). DDPMs are a type of generative model known for creating high-quality samples by learning to reverse a multi-step noising process. The improvements discussed in the paper revolve around achieving better log-likelihoods, reducing sampling time, and expanding the applicability of DDPMs in comparison to other generative models like GANs and autoregressive models.

Key Contributions

Enhancements in Log-Likelihood:
- The authors demonstrate that by learning the reverse process variances through a new parameterization and hybrid learning objective, DDPMs can achieve competitive log-likelihoods. This improvement addresses a notable gap where previous DDPMs were unable to compete with other likelihood-based models like VAEs and PixelCNN.
- They introduce a hybrid objective function that combines the traditional variational lower bound (VLB) with a simplified objective, leading to better log-likelihoods by reducing gradient noise during training.
Improved Noise Schedule:
- The traditional linear noise schedule used in DDPMs has been identified as suboptimal, particularly for lower resolution images. The authors propose a new cosine noise schedule that spreads out the noising process more evenly, allowing better retention of image quality for intermediate steps of the diffusion process.
Reduced Sampling Steps:
- By learning variances in the reverse diffusion process, the authors are able to produce high-quality samples using significantly fewer forward passes. Their findings suggest that comparable quality can be achieved with as few as 50 forward passes versus the hundreds previously required. This acceleration is crucial for making DDPMs more practical for real-world applications.
Comparison with GANs:
- The paper utilizes improved precision and recall metrics to compare the distribution coverage of DDPMs against GANs. The results show that DDPMs have superior recall for similar Fréchet Inception Distance (FID) scores, suggesting better mode coverage.
Scalability:
- The paper also examines how DDPMs scale with increased model capacity and training compute. Trends indicate that DDPMs can predictably improve in performance with more computational resources, reinforcing their potential for scaling in future applications.

Numerical Results and Claims

On the ImageNet $64 \times 64$ dataset, the enhanced models achieve a significant reduction in bits per dimension. For instance, they reach a log-likelihood of 3.53 bits/dim (better than several other noted models).
In terms of FID, which measures sample quality, the improved diffusion models achieve an FID of 2.92 at ImageNet $64 \times 64$ , closely trailing the most competitive GANs.

Implications and Future Developments

The improvements introduced in this work have several important implications for the field of generative modeling and AI:

Practical Deployment: Speeding up the sampling process makes DDPMs much more viable for deployment in time-sensitive applications such as real-time image and audio generation.
High-Quality and Diverse Samples: The better mode coverage observed in DDPMs compared to GANs makes them a superior choice for applications requiring high diversity, such as data augmentation or creative arts.
Framework for Future Research: The hybrid objective and improved noise schedule provide a solid foundation for further exploration into noise modeling and training objectives in generative models.

Conclusion

The paper presents substantial advancements in Denoising Diffusion Probabilistic Models by addressing both theoretical and practical challenges. The proposed modifications not only enhance log-likelihoods but also enable faster and more efficient sampling without compromising sample quality. These findings bolster the potential for DDPMs to become a cornerstone in the landscape of generative modeling, offering a robust alternative to existing models like GANs and VAEs. Future research may build upon this foundation, exploring even more efficient noise schedules, diverse training objectives, and scalable architectures.