Maximum Likelihood Training of Score-Based Diffusion Models (2101.09258v4)

Published 22 Jan 2021 in stat.ML and cs.LG

Abstract: Score-based diffusion models synthesize samples by reversing a stochastic process that diffuses data to noise, and are trained by minimizing a weighted combination of score matching losses. The log-likelihood of score-based diffusion models can be tractably computed through a connection to continuous normalizing flows, but log-likelihood is not directly optimized by the weighted combination of score matching losses. We show that for a specific weighting scheme, the objective upper bounds the negative log-likelihood, thus enabling approximate maximum likelihood training of score-based diffusion models. We empirically observe that maximum likelihood training consistently improves the likelihood of score-based diffusion models across multiple datasets, stochastic processes, and model architectures. Our best models achieve negative log-likelihoods of 2.83 and 3.76 bits/dim on CIFAR-10 and ImageNet 32x32 without any data augmentation, on a par with state-of-the-art autoregressive models on these tasks.

Authors (4)

Yang Song (299 papers)
Conor Durkan (10 papers)
Iain Murray (37 papers)
Stefano Ermon (279 papers)

Citations (522)

View on Semantic Scholar

Summary

Essay on "Maximum Likelihood Training of Score-Based Diffusion Models"

The paper "Maximum Likelihood Training of Score-Based Diffusion Models" addresses the optimization of score-based diffusion models, which are prominent in the domain of deep generative modeling. These models, notable for their efficacy in tasks such as image, audio, and shape generation, utilize a stochastic process to gradually transform data into noise and reverse it to synthesize samples.

Core Concepts and Theoretical Contributions

Score-based diffusion models operate by perturbing data with a series of noise distributions, transforming them through a reverse diffusion process described by a stochastic differential equation (SDE). A key insight is the conversion of these models into continuous normalizing flows (CNFs), facilitating tractable likelihood computation. However, traditional training via score matching losses does not directly optimize the likelihood. The authors propose a weighting scheme that upper bounds the negative log-likelihood, effectively allowing approximate maximum likelihood training.

Theoretical Insights:

Likelihood Weighting: The paper identifies a specific weighting scheme for score matching losses that upper bounds the negative log-likelihood, enhancing the likelihood of these models across datasets and architectures.
Bound Tightness: The authors demonstrate scenarios under which their objective becomes an equality, provided the model approximates the true score of a reverse-time SDE.
Numerical Efficiency: By leveraging importance sampling, the approach tackles the increased variance introduced by the likelihood weighting.

Empirical Results

Empirically, the modified training consistently improves likelihoods across various datasets, including CIFAR-10 and ImageNet $32 \times 32$ , with negative log-likelihoods reaching 2.83 and 3.76 bits/dim, respectively. The models achieve this without data augmentation, equating to results of state-of-the-art autoregressive models.

Practical and Theoretical Implications

The findings present significant implications for both theory and application:

Theoretical Advancement: By integrating score-based diffusion models with CNFs, the paper enriches the generative modeling landscape, merging tractable likelihood computation with efficient training paradigms.
Practical Applications: Improved likelihoods are crucial for tasks involving compression, semi-supervised learning, and adversarial purification, expanding the utility of these models in various domains.

Future Directions in AI

The extension of score-based diffusion model frameworks to other probabilistic models presents a promising research trajectory. Experimentation with joint optimization of SDE parameters could further refine model performance. Additionally, integrating these findings with data augmentation techniques might yield even more substantial gains in generative tasks.

Conclusion

By redefining the training methodology of score-based diffusion models, this paper offers a robust approach to model likelihood optimization. The research supports the development of competitive alternatives to current generative models while maintaining computational efficiency. The implications for future research and practical applications open exciting avenues for advancements in AI and generative modeling.

PDF Markdown

Related Papers

Find Related Papers

Tweets

https://twitter.com/smnlssn/status/1792093269150589172