Diffusion Models With Learned Adaptive Noise (2312.13236v3)

Published 20 Dec 2023 in cs.LG and cs.CV

Abstract: Diffusion models have gained traction as powerful algorithms for synthesizing high-quality images. Central to these algorithms is the diffusion process, a set of equations which maps data to noise in a way that can significantly affect performance. In this paper, we explore whether the diffusion process can be learned from data. Our work is grounded in Bayesian inference and seeks to improve log-likelihood estimation by casting the learned diffusion process as an approximate variational posterior that yields a tighter lower bound (ELBO) on the likelihood. A widely held assumption is that the ELBO is invariant to the noise process: our work dispels this assumption and proposes multivariate learned adaptive noise (MULAN), a learned diffusion process that applies noise at different rates across an image. Specifically, our method relies on a multivariate noise schedule that is a function of the data to ensure that the ELBO is no longer invariant to the choice of the noise schedule as in previous works. Empirically, MULAN sets a new state-of-the-art in density estimation on CIFAR-10 and ImageNet and reduces the number of training steps by 50%. We provide the code, along with a blog post and video tutorial on the project page: https://s-sahoo.com/MuLAN

Citations (3)

View on Semantic Scholar

Summary

The paper introduces the MuLAN framework with a learned adaptive noise schedule that tailors noise levels per pixel to enhance density estimation.
It leverages a multivariate noise schedule, instance-conditional diffusion, and auxiliary variables to refine variational inference.
Empirical results demonstrate state-of-the-art performance on CIFAR-10 and ImageNet, achieving superior efficiency with reduced training time.

Diffusion Models With Learned Adaptive Noise: An Overview

In the field of machine learning, diffusion models have recently emerged as a robust framework for generating high-quality images. The strength of these models lies in their diffusion process, where a clean image is gradually corrupted with noise, effectively mapping it to a noise distribution. This process, rooted in thermodynamic principles, can dramatically influence the model's performance. The prevailing belief in this domain has been the invariance of the evidence lower bound (ELBO) objective to the choice of the noise process. However, the paper "Diffusion Models With Learned Adaptive Noise," authored by Sahoo et al., challenges this notion by introducing the multivariate learned adaptive noise (MuLAN) framework. This framework proposes a learned and adaptable diffusion process that operates at varying noise rates across an image, supposedly leading to superior density estimation capabilities compared to traditional diffusion methods.

Key Innovations of MuLAN

The innovation of MuLAN lies in three core components: a multivariate noise schedule, instance-conditional diffusion, and the use of auxiliary variables. These elements collectively ensure that the learning objective becomes sensitive to the noise schedule, countering the traditional assumption of invariance.

Multivariate Noise Schedule: Unlike the traditional scalar noise schedules, MuLAN adopts a per-pixel polynomial noise schedule that allows for varying levels of noise across different parts of an image. This approach considers spatial variations within the image, adapting to the specific characteristics of each instance—a significant departure from one-size-fits-all noise schedules.
Conditional Noising Process: MuLAN conditions its noising process on the input itself, thereby adapting the diffusion schedule to the image's characteristics. This feature emphasizes MuLAN's ability to understand and leverage intrinsic data properties, diverging from prior art that typically applies a uniform noising approach.
Auxiliary Variables: The incorporation of auxiliary variables aids in aligning the structure of the variational posterior with the model structure, facilitating more effective learning. It enables a joint optimization that refocuses on maximizing the ELBO while honing in on latent structures within the data.

Empirical Results and Claims

The authors report MuLAN achieving a new state-of-the-art in likelihood estimation for the CIFAR-10 and ImageNet datasets, benchmarking against established diffusion models. Notably, it does so with less than half the training time, without substantial changes to the core architecture like UNet, which enhances its compatibility with existing algorithms. This claim, if generalized, implies a paradigm shift in how noise schedules might be integrated into diffusion models for improved efficiency and efficacy.

Theoretical Underpinnings and Implications

Grounded in Bayesian inference, MuLAN views the learned noise schedule as an approximate variational posterior, striving for a tighter lower bound on marginal likelihood. This theoretically positions MuLAN to achieve better performance by mitigating the divergence between the true and approximated posteriors.

Theoretical Implications: This innovation underscores a meticulous consideration of the noise scheduling strategy, steering away from static assumptions toward model adaptability, a significant point of discussion in the field.

Practical Implications: With its enhanced estimation capacity and faster convergence rates, MuLAN offers significant practical advantages, particularly in resource-constrained machine learning applications. It sets a precedent for future models that emphasize flexibility and adaptability to data-specific characteristics through intelligent noise scheduling.

Conclusion and Future Directions

In conclusion, "Diffusion Models With Learned Adaptive Noise" presents a compelling case for revisiting conventional approaches to noise scheduling in diffusion models. The contributions made in terms of a learned adaptive process have not only improved the likelihood estimation of existing datasets but also introduced a methodologically rigorous framework that challenges previous norms. Looking ahead, MuLAN's approach paves the way for further exploration into differentiated noise schedules tailored to varied data domains, potentially broadening the application scope of diffusion models across diverse machine learning tasks. This paper sets a foundational precedent for integrating complex, instance-aware processes into broader machine learning models, encouraging further inquiry and innovation in this space.

PDF Markdown

Related Papers

GitHub

GitHub - s-sahoo/MuLAN (11 stars)

Tweets

https://twitter.com/volokuleshov/status/1740183509258654124

https://twitter.com/SkyLi0n/status/1866998658484306422

https://twitter.com/22146921/status/1740488285380899118

https://twitter.com/dpkingma/status/1828080271310123278

https://twitter.com/tdietterich/status/1800952901734629885

https://twitter.com/SkyLi0n/status/1858307505098567951