Real-ESRGAN: Training Real-World Blind Super-Resolution with Pure Synthetic Data (2107.10833v2)

Published 22 Jul 2021 in eess.IV and cs.CV

Abstract: Though many attempts have been made in blind super-resolution to restore low-resolution images with unknown and complex degradations, they are still far from addressing general real-world degraded images. In this work, we extend the powerful ESRGAN to a practical restoration application (namely, Real-ESRGAN), which is trained with pure synthetic data. Specifically, a high-order degradation modeling process is introduced to better simulate complex real-world degradations. We also consider the common ringing and overshoot artifacts in the synthesis process. In addition, we employ a U-Net discriminator with spectral normalization to increase discriminator capability and stabilize the training dynamics. Extensive comparisons have shown its superior visual performance than prior works on various real datasets. We also provide efficient implementations to synthesize training pairs on the fly.

Citations (912)

View on Semantic Scholar

Summary

The paper presents a high-order degradation modeling technique that simulates diverse real-world image degradations using pure synthetic data.
The paper introduces an enhanced U-Net discriminator with spectral normalization for per-pixel feedback, stabilizing the adversarial training process.
The approach achieves improved visual quality and lower NIQE scores, demonstrating its effectiveness in real-world blind super-resolution applications.

Real-ESRGAN: Training Real-World Blind Super-Resolution with Pure Synthetic Data

The paper "Real-ESRGAN: Training Real-World Blind Super-Resolution with Pure Synthetic Data" by Xintao Wang, Liangbin Xie, Chao Dong, and Ying Shan presents a methodology to extend the Enhanced Super-Resolution Generative Adversarial Network (ESRGAN) for practical image restoration tasks. The core objective is to enhance the capability of super-resolution models to deal with real-world images, which often suffer from complex and diverse degradations, not well-represented by typical synthetic training data.

Methodology Overview

The Real-ESRGAN approach is primarily built upon two technical innovations:

High-Order Degradation Modeling:
- Traditional super-resolution methods assume fixed degradation models (e.g., bicubic downsampling), leading to a performance gap in real-world scenarios where degradations are far more varied and convoluted.
- Real-ESRGAN introduces a high-order degradation model, which involves iteratively applying a sequence of degradation operations such as blur, noise, and JPEG compression. This iterative application—with varying parameters—effectively simulates the diverse and complex degradations encountered in real-world images.
- Additionally, the synthesis mechanism incorporates $sinc$ filters, which emulate common artifacts like ringing and overshoot, often resulting from image sharpening and compression.
Enhanced Discriminator:
- To address the more complex degradation space, the authors design a U-Net discriminator with spectral normalization. This discriminator model produces per-pixel realness estimates, rather than a single scalar output, providing finer and more localized feedback during training.
- The spectral normalization stabilizes the training process and mitigates the risk of producing over-sharp and visually unpleasant artifacts.

Results and Implications

Qualitative Performance:

The proposed Real-ESRGAN is rigorously tested against other state-of-the-art methodologies including ESRGAN, DAN, CDC, RealSR, and BSRGAN on diverse real-world datasets (e.g., RealSR, DRealSR, OST300, DPED, ImageNet validation, and ADE20K validation).
Visual comparisons demonstrate that Real-ESRGAN effectively balances the enhancement of local details while suppressing artifacts that other methods might amplify or fail to remove.
Specifically, Real-ESRGAN shows notable improvements in handling various textures, edge sharpness, and maintaining structural integrity across heterogeneous image scenes.

Quantitative Metrics:

The paper reports NIQE (Naturalness Image Quality Evaluator) scores across multiple datasets to support its claims. Real-ESRGAN achieves lower NIQE scores on most testing datasets, reinforcing its superiority in maintaining perceived image quality.

Applicability and Future Work:

From a practical perspective, the adoption of high-order degradation modeling using entirely synthetic data makes Real-ESRGAN both versatile and scalable. It avoids the labor-intensive collection and alignment of paired training data from real-world sources.
The methodology points towards broader implications in enhancing downstream vision tasks (e.g., object recognition, semantic segmentation) where image quality plays a crucial role.
Future research may focus on further refining degradation models to cover even more diverse real-world scenarios and improving the stability and fidelity of GAN-based models. Additionally, integrating domain-adaptive strategies could further enhance the model's robustness across varied image domains.

Conclusion

Real-ESRGAN introduces a practical and efficient approach to blind super-resolution by leveraging purely synthetic training data and a robust high-order degradation model. The U-Net discriminator with spectral normalization boosts the model's capability to handle complex real-world degradations, achieving superior visual quality across multiple datasets. This work underscores the potential of synthetic data in training high-performance image processing models and paves the way for more generalizable and adaptive super-resolution technologies.

PDF Markdown

Related Papers

Tweets

https://twitter.com/catsuka/status/1761401588441067927

YouTube

Show All Videos