Papers

Topics

Authors

Recent

View all

Gemini 2.5 Flash

102 tokens/sec

GPT-4o

59 tokens/sec

Gemini 2.5 Pro Pro

43 tokens/sec

o3 Pro

6 tokens/sec

GPT-4.1 Pro

50 tokens/sec

DeepSeek R1 via Azure Pro

28 tokens/sec

2000 character limit reached

1 3 898

A Survey on Generative Diffusion Model (2209.02646v10)

Published 6 Sep 2022 in cs.AI

Abstract: Deep generative models have unlocked another profound realm of human creativity. By capturing and generalizing patterns within data, we have entered the epoch of all-encompassing Artificial Intelligence for General Creativity (AIGC). Notably, diffusion models, recognized as one of the paramount generative models, materialize human ideation into tangible instances across diverse domains, encompassing imagery, text, speech, biology, and healthcare. To provide advanced and comprehensive insights into diffusion, this survey comprehensively elucidates its developmental trajectory and future directions from three distinct angles: the fundamental formulation of diffusion, algorithmic enhancements, and the manifold applications of diffusion. Each layer is meticulously explored to offer a profound comprehension of its evolution. Structured and summarized approaches are presented in https://github.com/chq1155/A-Survey-on-Generative-Diffusion-Model.

PDF HTML Abstract

Overview of Generative Diffusion Models

The paper provides a comprehensive survey on generative diffusion models, exploring their fundamental formulations, algorithmic improvements, and diverse applications across several domains. Diffusion models have emerged as a significant class of deep generative models, contributing to areas such as imagery, text, speech, biology, and healthcare.

Fundamental Formulations

Diffusion models, as discussed, revolve around a stochastic process that gradually transforms data distributions into a simpler prior, generally Gaussian, and reverses back during sampling. Three foundational formulations underline these processes:

Denoised Diffusion Probabilistic Models (DDPM): DDPM employs a discrete forward process with a sequence of noise coefficients resulting in a predefined Gaussian noise. The reverse process denoises these samples using a learned neural network in a step-by-step manner.
Score SDE Formulation: Extends the discrete-time methods to a continuous stochastic differential equation framework. This leverages ODEs and SDEs to improve integrability and flexibility.
Conditional Diffusion Probabilistic Models: These models use conditions such as text or class labels, employing classifier-free guidance or classifier-based guidance to generate controllable outputs.

Algorithm Improvements

The paper delineates four primary areas of advancements that aim to improve diffusion models:

Sampling Acceleration: Sampling in diffusion models inherently requires numerous iterations. Techniques like knowledge distillation, training-free sampling, and model merging with GANs and VAEs have been pursued to expedite sampling.
Diffusion Process Design: Innovations have been made to improve the forward diffusion processes, including operating in latent spaces and on non-Euclidean spaces, enhancing the ease of reverse processes and broadening the scope of applicable domains.
Likelihood Optimization: These strategies focus on optimizing the models' likelihood, improving the overall generative quality and learning efficiency.
Bridging Distributions: Techniques have been developed to bridge arbitrary distributions, which is particularly useful for tasks like image-to-image translation.

Applications

Generative diffusion models find applications across multiple domains:

Image Generation: Models excel in generating high-fidelity images both conditionally (e.g., text-to-image synthesis) and unconditionally.
3D and Video Generation: Bringing advancements to rendering 3D objects and video frames.
Medical Imaging: Used for super-resolution, denoising, and reconstruction, aiding diagnosis and treatment planning.
Text Generation: Assists in creating text based on conditions using parallel processing strategies.
Time Series and Audio Generation: Facilitates the synthesis of coherent sequences of data, aiding in prediction and transformation tasks.
Molecule and Graph Generation: Applied in science to model and predict molecular structures and interactions, significant for drug development.

Implications and Future Directions

The survey points to diffusion models as pivotal in generative modeling, offering robust frameworks for capturing complex data distributions. Future work is likely to focus on accelerating sampling methods, exploring new diffusion processes, and integrating with various machine learning paradigms to overcome limitations posed by large-scale and high-dimensional data. Furthermore, exploring more efficient methods for bridging distribution gaps could enhance their applicability across diverse fields like AI-driven scientific research and biomedical advancements.

This comprehensive survey underlines diffusion models' versatility and transformative potential, establishing them as prominent contributors to the generative modeling landscape, with ample room for future exploration and development.

PDF Markdown Bookmark Chat (Pro)

References (272)

Authors (7)

Hanqun Cao (7 papers)
Cheng Tan (140 papers)
Zhangyang Gao (58 papers)
Yilun Xu (52 papers)
Guangyong Chen (55 papers)
Pheng-Ann Heng (196 papers)
Stan Z. Li (222 papers)

Citations (122)

View on Semantic Scholar

GitHub

GitHub - chq1155/A-Survey-on-Generative-Diffusion-Model (898 stars)

Tweets

https://twitter.com/TheTuringPost/status/1771596301496717822

HackerNews

A Survey on Generative Diffusion Model (3 points, 0 comments)