Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

ESRGAN: Enhanced Super-Resolution Generative Adversarial Networks (1809.00219v2)

Published 1 Sep 2018 in cs.CV

Abstract: The Super-Resolution Generative Adversarial Network (SRGAN) is a seminal work that is capable of generating realistic textures during single image super-resolution. However, the hallucinated details are often accompanied with unpleasant artifacts. To further enhance the visual quality, we thoroughly study three key components of SRGAN - network architecture, adversarial loss and perceptual loss, and improve each of them to derive an Enhanced SRGAN (ESRGAN). In particular, we introduce the Residual-in-Residual Dense Block (RRDB) without batch normalization as the basic network building unit. Moreover, we borrow the idea from relativistic GAN to let the discriminator predict relative realness instead of the absolute value. Finally, we improve the perceptual loss by using the features before activation, which could provide stronger supervision for brightness consistency and texture recovery. Benefiting from these improvements, the proposed ESRGAN achieves consistently better visual quality with more realistic and natural textures than SRGAN and won the first place in the PIRM2018-SR Challenge. The code is available at https://github.com/xinntao/ESRGAN .

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (9)
  1. Xintao Wang (132 papers)
  2. Ke Yu (44 papers)
  3. Shixiang Wu (2 papers)
  4. Jinjin Gu (56 papers)
  5. Yihao Liu (85 papers)
  6. Chao Dong (168 papers)
  7. Chen Change Loy (288 papers)
  8. Yu Qiao (563 papers)
  9. Xiaoou Tang (73 papers)
Citations (3,356)

Summary

ESRGAN: Enhanced Super-Resolution Generative Adversarial Networks

The paper "ESRGAN: Enhanced Super-Resolution Generative Adversarial Networks," authored by Xintao Wang et al., presents an enhancement over the Super-Resolution Generative Adversarial Network (SRGAN). Focused on the task of single image super-resolution (SISR), the paper addresses the limitations of SRGAN, namely the visual artifacts and lack of realistic texture generation, to propose an improved architecture termed as Enhanced SRGAN (ESRGAN).

Core Improvements

The authors target three key components of SRGAN—network architecture, adversarial loss, and perceptual loss—and present enhancements for each:

  1. Network Architecture:
    • Introduces Residual-in-Residual Dense Block (RRDB), which avoids the use of batch normalization (BN) layers. This RRDB constructs a multi-level residual network combined with dense connections, significantly increasing the model's capacity and ease of training.
    • The architecture adopts residual scaling and smaller initialization to facilitate the training of the deeper network while stabilizing it by avoiding potential magnification of input signals' magnitudes.
  2. Adversarial Loss:
    • The paper integrates the relativistic average GAN (RaGAN) into the SRGAN's discriminator to predict relative realness rather than absolute realness. This RaGAN helps in judging whether a real image is comparatively more realistic than a generated one, leading to more detailed texture generation in the super-resolved outputs.
  3. Perceptual Loss:
    • The perceptual loss is improved by applying constraints on features before activation, contrary to the original method which used features after activation. This change proved effective in providing stronger supervision for brightness consistency and better texture recovery.

Experimental Validation

The authors substantiate their claims through extensive experiments, showcasing that ESRGAN consistently outperforms state-of-the-art methods regarding both sharpness and texture details. The experiments included comparisons with PSNR-oriented approaches (e.g., EDSR, RCAN) and perceptual-driven methods (e.g., SRGAN, EnhanceNet). Qualitative results depicted in the paper demonstrate ESRGAN's ability to produce more visually pleasing and realistic outputs, having sharper edges and finer details than its predecessors.

Results and Implications

The ESRGAN model was evaluated on multiple benchmarks like Set5, Set14, BSD100, and Urban100, among others. The authors underline the practical success of ESRGAN by winning the PIRM2018-SR Challenge in Region 3, highlighting the model's superior perceptual index.

Strong Numerical Results:

  • ESRGAN achieves a PSNR/SSIM improvement on key benchmarks, benefiting from the use of larger, diverse datasets like DIV2K and Flickr2K and robust training methodologies.

Implications:

  • The structural improvements, particularly the elimination of BN layers and the adoption of RRDB blocks, provide a significant impact on the stability and performance of deep learning models for image restoration tasks.
  • The incorporation of RaGAN enriches the adversarial training framework, enabling generators to produce more detailed and realistic textures, hinting at broader applications in areas requiring detail-preservation under high magnification.
  • Enhanced perceptual loss fine-tuning paves the way for future improvements in training objectives, aligning generated outputs with human perceptual habits more effectively.

Future Developments

The paper opens several avenues for further advancements:

  • Advanced Loss Functions: Future research might explore more refined perceptual loss functions focused on specific texture elements or variations.
  • Application Extensions: The ESRGAN framework can be applied to other low-level computer vision tasks, extending the benefits of enhanced texture generation and stability.
  • Larger and Diverse Training Sets: Further exploration of diverse and extensive datasets could continue to improve model performance, especially in generalized applications across different image domains.

In conclusion, this paper provides in-depth improvements in the domain of super-resolution tasks through architectural innovations, advanced adversarial models, and refined perceptual loss strategies. The results signify a robust step forward in generating high-quality, realistic images from low-resolution inputs, establishing a new benchmark in the field of image super-resolution.

Youtube Logo Streamline Icon: https://streamlinehq.com