SRDiff: Single Image Super-Resolution with Diffusion Probabilistic Models (2104.14951v2)

Published 30 Apr 2021 in cs.CV

Abstract: Single image super-resolution (SISR) aims to reconstruct high-resolution (HR) images from the given low-resolution (LR) ones, which is an ill-posed problem because one LR image corresponds to multiple HR images. Recently, learning-based SISR methods have greatly outperformed traditional ones, while suffering from over-smoothing, mode collapse or large model footprint issues for PSNR-oriented, GAN-driven and flow-based methods respectively. To solve these problems, we propose a novel single image super-resolution diffusion probabilistic model (SRDiff), which is the first diffusion-based model for SISR. SRDiff is optimized with a variant of the variational bound on the data likelihood and can provide diverse and realistic SR predictions by gradually transforming the Gaussian noise into a super-resolution (SR) image conditioned on an LR input through a Markov chain. In addition, we introduce residual prediction to the whole framework to speed up convergence. Our extensive experiments on facial and general benchmarks (CelebA and DIV2K datasets) show that 1) SRDiff can generate diverse SR results in rich details with state-of-the-art performance, given only one LR input; 2) SRDiff is easy to train with a small footprint; and 3) SRDiff can perform flexible image manipulation including latent space interpolation and content fusion.

PDF Abstract

Single Image Super-Resolution with Diffusion Probabilistic Models

The paper "SRDiff: Single Image Super-Resolution with Diffusion Probabilistic Models" introduces an innovative approach to the ill-posed problem of single image super-resolution (SISR). The authors propose a novel framework, SRDiff, utilizing diffusion probabilistic models for generating high-resolution images from low-resolution inputs. This approach addresses prevalent issues in existing SISR methods, such as over-smoothing, mode collapse, and model inefficiency.

Methodology

The SRDiff model is the first to apply diffusion models, which have shown efficacy in generative tasks, to SISR. The model operates on a Markov chain basis to gradually transform Gaussian noise into a super-resolution image conditioned on an input LR image. It is trained by optimizing a variant of the variational lower bound on data likelihood, ensuring efficient training without mode collapse.

Key components of SRDiff include:

Residual Prediction: The model predicts the high-frequency residual absent in upsampled LR images, enhancing convergence speeds.
Conditional Noise Predictor: This employs a U-Net structure and RRDB-based encoder to predict noise in diffusion steps, conditioned on LR encoded information.

Results

The authors perform extensive experiments using the CelebA and DIV2K datasets, highlighting SRDiff’s efficacy:

Diverse Outputs: Compared to existing methods, SRDiff demonstrates the ability to generate multiple distinct high-quality images from a single LR input.
Training Efficiency: The model converges rapidly (approximately 30 hours on a single GPU for CelebA), and with a footprint significantly smaller than comparable models.
Image Manipulation: SRDiff supports advanced applications like latent space interpolation and content fusion, showcasing flexibility.

Performance metrics indicate that SRDiff achieves state-of-the-art results, with high LR-PSNR and LPIPS metrics affirming detail preservation and perceptual quality.

Implications and Future Work

SRDiff’s success marks a significant step towards more versatile, efficient super-resolution models. The model's ability to maintain diversity without mode collapse and reduced training requirements positions it as a promising tool for practical applications like video enhancement and surveillance.

Moving forward, further enhancements are anticipated around improving model performance and inference speeds. Additionally, this work opens avenues for applying diffusion models to a broader range of image restoration tasks such as denoising and deblurring, promoting a diversification in the application of diffusion processes within computer vision.

In summary, SRDiff’s introduction of diffusion probabilistic models to SISR demonstrates a substantial advance in overcoming traditional limitations, illustrating the potential these methods hold for future advancements in AI-driven image processing.

PDF Markdown Bookmark Chat (Pro)

Authors (7)

Haoying Li (7 papers)
Yifan Yang (578 papers)
Meng Chang (6 papers)
Huajun Feng (18 papers)
Zhihai Xu (14 papers)
Qi Li (352 papers)
Yueting Chen (17 papers)

Citations (517)

View on Semantic Scholar

SRDiff: Single Image Super-Resolution with Diffusion Probabilistic Models (2104.14951v2)

Single Image Super-Resolution with Diffusion Probabilistic Models

Methodology

Results

Implications and Future Work

Related Papers