Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
97 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
5 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Progressive Deblurring of Diffusion Models for Coarse-to-Fine Image Synthesis (2207.11192v2)

Published 16 Jul 2022 in cs.CV and cs.LG

Abstract: Recently, diffusion models have shown remarkable results in image synthesis by gradually removing noise and amplifying signals. Although the simple generative process surprisingly works well, is this the best way to generate image data? For instance, despite the fact that human perception is more sensitive to the low frequencies of an image, diffusion models themselves do not consider any relative importance of each frequency component. Therefore, to incorporate the inductive bias for image data, we propose a novel generative process that synthesizes images in a coarse-to-fine manner. First, we generalize the standard diffusion models by enabling diffusion in a rotated coordinate system with different velocities for each component of the vector. We further propose a blur diffusion as a special case, where each frequency component of an image is diffused at different speeds. Specifically, the proposed blur diffusion consists of a forward process that blurs an image and adds noise gradually, after which a corresponding reverse process deblurs an image and removes noise progressively. Experiments show that the proposed model outperforms the previous method in FID on LSUN bedroom and church datasets. Code is available at https://github.com/sangyun884/blur-diffusion.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (4)
  1. Sangyun Lee (41 papers)
  2. Hyungjin Chung (38 papers)
  3. Jaehyeon Kim (16 papers)
  4. Jong Chul Ye (210 papers)
Citations (42)

Summary

  • The paper introduces a novel blur diffusion process that diffuses image frequency components at variable speeds to align generation with human visual perception.
  • It employs a rotated coordinate system and denoising score matching to progressively refine images from a coarse outline to detailed outputs.
  • Empirical results demonstrate significant improvements with FID scores of 7.86 on LSUN bedrooms and 5.89 on LSUN churches compared to traditional models.

Progressive Deblurring of Diffusion Models for Coarse-to-Fine Image Synthesis

The paper "Progressive Deblurring of Diffusion Models for Coarse-to-Fine Image Synthesis" introduces a novel approach to the image synthesis task, leveraging traditional diffusion model frameworks while incorporating a unique coarse-to-fine generative strategy. The authors examine the limitations inherent in conventional diffusion models that do not account for variable sensitivity in human perception, particularly the heightened sensitivity to low-frequency components of images. Through this work, the authors propose and empirically validate a new model that addresses these limitations more effectively.

The core contribution of this paper is the development of a blur diffusion process, which operates by diffusing distinct frequency components of an image at variable speeds. This process aligns more closely with the progressive refinement techniques seen in human visual perception and allows for generating images from a foundational coarse outline to a more refined, detailed result. The methodology involves executing the diffusion process in a rotated coordinate system and leveraging different velocities for each vector component. This approach results in a forward process that initially blurs and subsequently introduces noise to an image, paired with a reverse process aimed at progressive deblurring, with noise reduction happening concurrently.

In testing the proposed blur diffusion model, the authors report superior performance metrics compared to traditional diffusion models, as evidenced by the Fréchet Inception Distance (FID) scores obtained on the LSUN bedroom and church datasets. By utilizing a quartic blur schedule, the model achieves an FID of 7.86 on the bedroom dataset and 5.89 on the church dataset, outperforming benchmark models.

The paper also provides a comprehensive theoretical framework for its blur diffusion process, introducing a generalized diffusion model that extends traditional frameworks by allowing for transformations in a rotated coordinate space. The authors illustrate how their approach maintains tractability and facilitates model training via denoising score matching, while further enhancing perceptual quality during image generation.

From a practical standpoint, this research offers a promising avenue for image synthesis, offering improvements in perceptual fidelity without increasing computational load significantly. The methodological innovations proposed, especially in treating different image frequency components distinctively during the generative process, could be pivotal in advancing applications requiring high-resolution image generation, such as digital art or complex visual content design.

For future explorations, there remains substantial scope in refining the blur schedules further, exploring alternative orthonormal bases and matrices for diffusion processes, and extending the model to different modalities beyond image data. Additionally, assessing the applicability of such methods in real-time systems or interpreting the underlying representations these models learn during synthesis could open new paths for research.

Overall, this paper enriches the current body of knowledge on generative models by integrating a perceptual inductive bias into the diffusion framework, aligning computational processes closer to human visual salience, and enhancing output quality without introducing additional procedural complexity. Such advancements hold the potential to not only improve current implementations in machine-generated visual content but also inspire further academic inquiry into more robust, perception-oriented generative systems.