Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
162 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Non Gaussian Denoising Diffusion Models (2106.07582v1)

Published 14 Jun 2021 in cs.LG, cs.CV, cs.SD, and eess.AS

Abstract: Generative diffusion processes are an emerging and effective tool for image and speech generation. In the existing methods, the underline noise distribution of the diffusion process is Gaussian noise. However, fitting distributions with more degrees of freedom, could help the performance of such generative models. In this work, we investigate other types of noise distribution for the diffusion process. Specifically, we show that noise from Gamma distribution provides improved results for image and speech generation. Moreover, we show that using a mixture of Gaussian noise variables in the diffusion process improves the performance over a diffusion process that is based on a single distribution. Our approach preserves the ability to efficiently sample state in the training diffusion process while using Gamma noise and a mixture of noise.

Citations (44)

Summary

  • The paper proposes replacing traditional Gaussian noise with Gamma and mixture of Gaussians to better capture complex data distributions.
  • It introduces closed-form formulations that enable efficient sampling without calculating all previous diffusion steps, achieving lower FID in image tasks and improved PESQ, STOI, and MCD in audio.
  • The findings provide a promising framework that invites further research on optimizing noise distributions to enhance generative model performance.

Introduction

Denoising Diffusion Probabilistic Models (DDPMs) are a class of generative models that have recently shown impressive results in various domains such as imaging and speech synthesis. These models employ a diffusion process to add noise to the data and a reverse denoising process to generate new samples. Traditionally, DDPMs utilize Gaussian noise at each step of the diffusion process. While effective, the single-parameter Gaussian distribution may not always perfectly capture the complexity of the data distribution.

Expanding Beyond Gaussian Noise

The authors of this paper explore the utilization of non-Gaussian noise distributions to enhance the performance of DDPMs. Specifically, they examine the effects of two alternative noise distributions: Gamma distribution and a mixture of Gaussians. These distributions provide additional degrees of freedom and could potentially allow for a better representation of the underlying data distribution, leading to performance gains in generative tasks.

In their exploration, the authors derive diffusion models that accommodate these non-Gaussian distributions while maintaining the valuable property of sampling arbitrary states without requiring the computation of all previous diffusion steps. The closed-form formulations they provide for both Gamma and mixture of Gaussian distributions ensure the efficiency of the training and inference processes.

Empirical Validation

The researchers' approach is empirically validated across two domains: vision and audio. For image generation tasks, they demonstrate that their proposed method achieves lower FID (Fréchet Inception Distance) scores than traditional Gaussian-based DDPMs. In the field of speech data, metrics such as PESQ (Perceptual Evaluation of Speech Quality), STOI (short-time objective intelligibility), and MCD (Mel-Cepstral Distortion) indicate that the models using non-Gaussian distributions surpass standard Gaussian-based methods in various aspects.

Discussion and Future Work

The introduction of non-Gaussian noise distributions in DDPM provides another tool for practitioners looking to improve both the quality of generative models and their efficiency in training and inference. While the incorporation of Gamma and mixture of Gaussian noises offers promising results, the paper acknowledges that a comprehensive framework for determining which distributions are most advantageous in various scenarios is still necessary. The findings invite future research to expand on this work, providing a foundation for further exploration of noise distributions in the context of diffusion processes.

X Twitter Logo Streamline Icon: https://streamlinehq.com
Youtube Logo Streamline Icon: https://streamlinehq.com