Papers

Topics

Authors

Recent

View all

Gemini 2.5 Flash

41 tokens/sec

GPT-4o

59 tokens/sec

Gemini 2.5 Pro Pro

41 tokens/sec

o3 Pro

7 tokens/sec

GPT-4.1 Pro

50 tokens/sec

DeepSeek R1 via Azure Pro

28 tokens/sec

2000 character limit reached

2 1

Diffusion Models: A Comprehensive Survey of Methods and Applications (2209.00796v14)

Published 2 Sep 2022 in cs.LG, cs.AI, and cs.CV

Abstract: Diffusion models have emerged as a powerful new family of deep generative models with record-breaking performance in many applications, including image synthesis, video generation, and molecule design. In this survey, we provide an overview of the rapidly expanding body of work on diffusion models, categorizing the research into three key areas: efficient sampling, improved likelihood estimation, and handling data with special structures. We also discuss the potential for combining diffusion models with other generative models for enhanced results. We further review the wide-ranging applications of diffusion models in fields spanning from computer vision, natural language generation, temporal data modeling, to interdisciplinary applications in other scientific disciplines. This survey aims to provide a contextualized, in-depth look at the state of diffusion models, identifying the key areas of focus and pointing to potential areas for further exploration. Github: https://github.com/YangLing0818/Diffusion-Models-Papers-Survey-Taxonomy.

PDF HTML Abstract

Diffusion Models: A Comprehensive Survey of Methods and Applications

Diffusion models have established themselves as a significant advancement in the landscape of deep generative models, rivaling the previously dominant Generative Adversarial Networks (GANs) in tasks like image synthesis, video generation, and molecule design. The paper "Diffusion Models: A Comprehensive Survey of Methods and Applications" offers an extensive survey of current research, aiming to categorize the rapidly expanding body of work into key areas and review the extensive range of diffusion model applications.

Foundational Framework

The paper begins by providing a structured introduction to the foundations of diffusion models. It details three principal formulations: Denoising Diffusion Probabilistic Models (DDPMs), Score-Based Generative Models (SGMs), and Stochastic Differential Equations (Score SDEs). Each model employs a specific mechanism to progressively transform data into noise and subsequently reverse this noise back into new data samples.

DDPMs: These models utilize a Markov chain where data is progressively perturbed by Gaussian noise, and a learnable reverse process then denoises the data back to its original form.
SGMs: Central to these models is the concept of score function, defined as the gradient of the log probability density. They perturb data with Gaussian noise and estimate score functions at different noise levels.
Score SDEs: Incorporating both finite and infinite time steps in their formulations, Score SDEs generalize DDPMs and SGMs using differential equations to define forward and reverse diffusion processes.

Efficient Sampling

One of the significant challenges in leveraging diffusion models is the computational intensity involved in the iterative sampling process. Recent advancements aim to enhance sampling efficiency without compromising quality.

Learning-Free Sampling: This includes improved discretization schemes for SDEs and ODEs, such as Heun's method and predictor-corrector strategies, which balance the trade-off between sampling speed and accuracy.
Learning-Based Sampling: Techniques such as optimized discretization of time steps, truncated diffusion processes, and knowledge distillation are designed to reduce the number of sampling steps while maintaining or enhancing sample quality.

Improved Likelihood Estimation

Diffusion models traditionally depend on a variational lower bound (VLB) for likelihood estimation. Enhancing this estimation is crucial for better performance.

Noise Schedule Optimization: By optimizing the noise schedules in the forward process, models can better maximize the VLB, leading to higher log-likelihood values.
Reverse Variance Learning: Learning the variance parameters in the reverse process rather than using fixed values can yield more accurate data probabilities.
Exact Likelihood Computation: Methods such as integrating Score SDEs with advanced numerical solvers enable more precise calculation and maximization of the data likelihood.

Handling Special Structures

Given the varied nature of data, diffusion models have been adapted to address data with specific structures, including discrete data, invariant properties, and manifold structures.

Discrete Data: Techniques such as random walk transition kernels for discrete spaces and generalizations of score functions extend diffusion models to handle discrete datasets efficiently.
Invariant Structures: Models like GDSS leverage permutation invariance for graph data, while others guarantee translation and rotation invariance for molecular data.
Manifold Structures: Extending diffusion models to Riemannian manifolds and employing autoencoders to learn latent manifolds are key to making diffusion models applicable to a broader range of data modalities.

Connections with Other Generative Models

Diffusion models have shown potential for integration with other generative models, enhancing their application scope and performance.

VAEs: Integrating diffusion models with VAEs allows for better representation learning and sampling efficiency.
GANs: Diffusion models can stabilize GAN training and improve sampling quality by introducing noise schedules.
Normalizing Flows: Combining these models with diffusion processes enables the generation of complex data distributions with fewer steps.

Applications Across Domains

The versatility of diffusion models is highlighted through their applications in various domains:

Computer Vision: Tasks such as image super-resolution, inpainting, and translation benefit from diffusion models' ability to generate high-quality images.
Natural Language Processing: Text generation and conditional text synthesis are areas where diffusion models have shown significant promise.
Temporal Data Modeling: Imputation and forecasting of time series data have seen enhanced accuracy with diffusion-based approaches.
Multi-Modal Learning: Applications such as text-to-image and text-to-video generation leverage the flexibility of diffusion models for creating complex, conditionally generated content.
Robust Learning: Diffusion models contribute to the development of robust learning algorithms, capable of handling adversarial noise.
Interdisciplinary Applications: In fields such as computational chemistry and medical imaging, diffusion models facilitate tasks like molecule design and image reconstruction with high fidelity.

Future Directions

The paper concludes by outlining potential research directions, including revisiting and analyzing typical diffusion model assumptions, deepening theoretical understanding, and exploring latent representations more effectively. Additionally, the potential of diffusion foundation models and their applications in Artificial Intelligence Generated Content (AIGC) highlight promising areas for future exploration.

In summary, diffusion models are a dynamic and rapidly evolving area in deep generative modeling, promising high-quality, diverse, and controllable data generation across various domains. The surveyed methodologies and applications provide a comprehensive understanding of current advancements and future research potentials in this exciting field.

PDF Markdown Bookmark Chat (Pro)

References (340)

Authors (9)

Ling Yang (88 papers)
Zhilong Zhang (20 papers)
Yang Song (298 papers)
Shenda Hong (56 papers)
Runsheng Xu (40 papers)
Yue Zhao (394 papers)
Wentao Zhang (261 papers)
Bin Cui (165 papers)
Ming-Hsuan Yang (376 papers)

Citations (1,020)

View on Semantic Scholar

GitHub

GitHub - YangLing0818/Diffusion-Models-Papers-Survey-Taxonomy: Diffusion model papers, survey, and taxonomy (2,750 stars)

Tweets

https://twitter.com/TheTuringPost/status/1771596294131458420

https://twitter.com/Ratu_Bagus/status/1859291575928725607

https://twitter.com/musang_manga/status/1752333395915673887

https://twitter.com/annasimonofff/status/1937822643421643189