Variational Diffusion Models (2107.00630v6)

Published 1 Jul 2021 in cs.LG and stat.ML

Abstract: Diffusion-based generative models have demonstrated a capacity for perceptually impressive synthesis, but can they also be great likelihood-based models? We answer this in the affirmative, and introduce a family of diffusion-based generative models that obtain state-of-the-art likelihoods on standard image density estimation benchmarks. Unlike other diffusion-based models, our method allows for efficient optimization of the noise schedule jointly with the rest of the model. We show that the variational lower bound (VLB) simplifies to a remarkably short expression in terms of the signal-to-noise ratio of the diffused data, thereby improving our theoretical understanding of this model class. Using this insight, we prove an equivalence between several models proposed in the literature. In addition, we show that the continuous-time VLB is invariant to the noise schedule, except for the signal-to-noise ratio at its endpoints. This enables us to learn a noise schedule that minimizes the variance of the resulting VLB estimator, leading to faster optimization. Combining these advances with architectural improvements, we obtain state-of-the-art likelihoods on image density estimation benchmarks, outperforming autoregressive models that have dominated these benchmarks for many years, with often significantly faster optimization. In addition, we show how to use the model as part of a bits-back compression scheme, and demonstrate lossless compression rates close to the theoretical optimum. Code is available at https://github.com/google-research/vdm .

Citations (925)

View on Semantic Scholar

Summary

The paper introduces Variational Diffusion Models that achieve state-of-the-art log-likelihoods on benchmarks like CIFAR-10 and ImageNet.
The paper provides a simplified variational lower bound expressed in terms of signal-to-noise ratio, unifying various diffusion models.
The paper demonstrates efficient optimization through joint noise schedule learning and Fourier feature enhancements, enabling near-optimal lossless compression.

An Overview of Variational Diffusion Models

The paper "Variational Diffusion Models" introduces a new family of diffusion-based generative models, termed Variational Diffusion Models (VDMs), which demonstrate considerable advancements in the field of likelihood-based generative modeling. Authored by Diederik P. Kingma, Tim Salimans, Ben Poole, and Jonathan Ho from Google Research, the paper showcases improvements in image density estimation benchmarks and addresses both theoretical underpinnings and practical implementations.

Key Contributions

State-of-the-Art Log-Likelihoods:
- VDMs obtain new state-of-the-art results on standard image density estimation benchmarks such as CIFAR-10 and ImageNet. The models outperform traditional autoregressive models, which have long been dominant in this space.
Enhanced Theoretical Understanding:
- The paper provides a simplified expression for the variational lower bound (VLB) in terms of the signal-to-noise ratio (SNR) of the diffusion process. This insight allows the authors to establish an equivalence between several previously proposed models, thereby solidifying the theoretical foundation of diffusion-based models.
Optimization of the Noise Schedule:
- One innovative aspect of VDMs is the ability to optimize the noise schedule jointly with the model. This results in minimizing the variance of the VLB estimator, thus accelerating the optimization process.
Efficient Implementation and Architectural Improvements:
- The introduction of Fourier features and other architectural changes enable VDMs to achieve superior likelihoods with significantly faster optimization times.
Lossless Compression:
- In addition to generative modeling, the models are adapted for use in bits-back compression schemes, demonstrating lossless compression rates close to the theoretical optimum.

Empirical and Theoretical Findings

Empirical Results:

The empirical results are stark. On the CIFAR-10 dataset without data augmentation, VDMs achieve a test set likelihood of 2.65 BPD, surpassing the previous best result of 2.80 BPD held by Sparse Transformers. This improvement is achieved with an order of magnitude less computational time—showing both efficiency and effectiveness. VDMs also achieve superior results on the ImageNet datasets.

Theoretical Insights:

One of the major theoretical contributions is the proof of VLB invariance to the noise schedule in continuous-time models. This insight implies that different diffusion processes can be equivalent in shaping the generative model, leading to the realization that any arbitrarily complex noise schedule can be simplified as long as the SNR endpoints are consistent.

Additionally, the paper provides a transformation making the continuous-time diffusion loss a more computationally stable integral over SNR. By parameterizing the denoising model through a noise prediction approach and incorporating Fourier features to capture fine details, the authors effectively bridge the performance gap between diffusion models and autoregressive models regarding likelihood.

Implications and Future Directions

Practical Implications:

The advances put forward by VDMs suggest that diffusion models can be efficiently applied to a wider array of supervised learning tasks and likelihood-based generative models. The capability to optimize the noise schedule dynamically and the incorporation of Fourier features highlight potential pathways for integrating these models into more complex tasks requiring high fidelity and fine-grained detail.

Theoretical Implications:

The invariance of the VLB to noise schedules offers a new lens through which to consider the design and development of generative models. The continuous-time approach and its inherent robustness against different diffusion processes underscore a significant theoretical development, offering a more flexible framework.

Future Directions:

Future research could focus on refining the practical implementations of bits-back coding to handle very deep models efficiently. This would bridge the gap between theoretical codelength and practical codelength, enhancing the usability of VDMs in real-world applications such as data compression and storage. Additionally, exploring the application of VDMs to other data modalities, such as audio and text, might yield fruitful avenues for expansion.

In summary, the work on Variational Diffusion Models represents a substantive step forward in generative modeling, offering both robust theoretical contributions and practical advancements. The improved understanding of the noise scheduling, combined with innovative model architectures, sets a new benchmark for future research and applications in the field.

PDF Markdown

Related Papers

GitHub

GitHub - google-research/vdm (290 stars)

Tweets

https://twitter.com/sameQCU/status/1848476751141253292

YouTube

Show All Videos