ESRGAN+ : Further Improving Enhanced Super-Resolution Generative Adversarial Network (2001.08073v2)

Published 21 Jan 2020 in eess.IV and cs.LG

Abstract: Enhanced Super-Resolution Generative Adversarial Network (ESRGAN) is a perceptual-driven approach for single image super resolution that is able to produce photorealistic images. Despite the visual quality of these generated images, there is still room for improvement. In this fashion, the model is extended to further improve the perceptual quality of the images. We have designed a novel block to replace the one used by the original ESRGAN. Moreover, we introduce noise inputs to the generator network in order to exploit stochastic variation. The resulting images present more realistic textures. The code is available at https://github.com/ncarraz/ESRGANplus .

Citations (134)

View on Semantic Scholar

Summary

The paper presents a novel Residual-in-Residual Dense Residual Block (RRDRB) that enhances feature exploitation and improves natural texture generation.
It integrates Gaussian noise after each dense block to introduce stochastic variations, refining local textures while preserving global structure.
Performance evaluations on datasets like PIRM and Urban100 demonstrate ESRGAN+ and nESRGAN+'s superior perceptual quality compared to traditional super-resolution models.

Enhancements in Image Super-Resolution with ESRGAN+

The paper "ESRGAN+ : Further Improving Enhanced Super-Resolution Generative Adversarial Network" details notable advancements in the domain of image super-resolution, an application that seeks to upscale a low-resolution image into a high-resolution one with fidelity to original fine details. This paper builds upon the foundation established by its predecessor, ESRGAN, which itself was a significant refinement over original SRGAN approaches for Single Image Super-Resolution (SISR).

The authors undertake a dual-faceted approach to improving perceptual image quality: introducing a novel network architecture and integrating noise inputs for enhanced stochasticity in texture generation.

Novel Network Architecture

The authors propose an architectural innovation labeled as the Residual-in-Residual Dense Residual Block (RRDRB). This configuration extends upon the Residual-in-Residual Dense Blocks (RRDB) seen in ESRGAN. The RRDRB amplifies network capacity, integrating additional levels of residual learning within the Dense blocks. This enables efficient feature exploitation and exploration, synergistically leveraging elements from ResNet and DenseNet architectures. Thus, the ESRGAN+ delivers augmented perceptual quality over previous iterations by crafting more naturally textured outputs without unduly increasing network complexity.

Incorporation of Noise Inputs

Inspired by the success of noise injection in GAN-based models for tasks like human face generation, the paper integrates Gaussian noise inputs into the super-resolution task. This novel approach in the context of super-resolution adds noise after each residual dense block, scaled per feature, replicating a perturbation-induced variation in natural textures. By doing so, it exploits stochastic variation to fine-tune finer details selectively, producing varied local textures while preserving the broader image structure and maintaining global consistency. The model that encapsulates these improvements is termed nESRGAN+.

Performance and Evaluation

The paper provides a rigorous evaluation of ESRGAN+ and nESRGAN+, demonstrating superior performance over contemporary perceptual-driven models such as EnhanceNet and the original ESRGAN, particularly in datasets like PIRM and Urban100. The models were evaluated based on perceptual index measures which combine Ma's score and NIQE. Notably, nESRGAN+ exhibited improved perceptual quality scores on PIRM datasets, emphasizing the contribution of noise integration to texture realism.

Implications and Future Potential

These advancements have significant implications for practical applications of image super-resolution in various fields such as medical imaging, satellite imagery, and digital media, where visualization fidelity is critically important. The introduction of RRDRB can catalyze further research into deep networks whereby residual learning is finely tuned for task-specific optimizations.

The novel exploration of noise inputs also opens avenues to investigate their applications across other domains requiring high-fidelity image synthesis. However, the paper notes limitations with the generalization capability of stochastic variation, especially in domains like structured architectural data, suggesting a potential line of inquiry exploring context-sensitive noise application techniques.

In conclusion, ESRGAN+ and nESRGAN+ exemplify focused advancements in SISR, blending architectural ingenuity with probabilistic modeling, marking a significant contribution to the ongoing enhancement of generative adversarial networks in image processing tasks. Researchers in the field are encouraged to explore adaptive architectures and stochastic integrations further to expand on the promising results showcased within this work.

PDF Markdown

Related Papers

GitHub

GitHub - ncarraz/ESRGANplus: ICASSP 2020 - ESRGAN+ : Further Improving Enhanced Super-Resolution Generative Adversarial Network - ICPR 2020 - Tarsier: Evolving Noise Injection in Super-Resolution GANs (130 stars)