GLEAN: Generative Latent Bank for Large-Factor Image Super-Resolution (2012.00739v1)

Published 1 Dec 2020 in cs.CV

Abstract: We show that pre-trained Generative Adversarial Networks (GANs), e.g., StyleGAN, can be used as a latent bank to improve the restoration quality of large-factor image super-resolution (SR). While most existing SR approaches attempt to generate realistic textures through learning with adversarial loss, our method, Generative LatEnt bANk (GLEAN), goes beyond existing practices by directly leveraging rich and diverse priors encapsulated in a pre-trained GAN. But unlike prevalent GAN inversion methods that require expensive image-specific optimization at runtime, our approach only needs a single forward pass to generate the upscaled image. GLEAN can be easily incorporated in a simple encoder-bank-decoder architecture with multi-resolution skip connections. Switching the bank allows the method to deal with images from diverse categories, e.g., cat, building, human face, and car. Images upscaled by GLEAN show clear improvements in terms of fidelity and texture faithfulness in comparison to existing methods.

Authors (5)

Kelvin C. K. Chan (34 papers)
Xintao Wang (132 papers)
Xiangyu Xu (48 papers)
Jinwei Gu (62 papers)
Chen Change Loy (288 papers)

Citations (239)

View on Semantic Scholar

Summary

The paper introduces a novel encoder-bank-decoder architecture that leverages a pre-trained GAN latent bank for efficient, high-fidelity large-factor image super-resolution.
It achieves superior texture quality and minimal artifacts by integrating convolutional features with latent vectors, eliminating the need for image-specific optimization.
Comparative evaluations on datasets like CelebA-HQ validate its effectiveness over traditional methods, suggesting broader applications in image restoration tasks.

Overview of GLEAN: Generative Latent Bank for Large-Factor Image Super-Resolution

The paper introduces GLEAN (Generative Latent Bank), an innovative approach for addressing the problem of large-factor image super-resolution (SR) by leveraging pre-trained Generative Adversarial Networks (GANs) like StyleGAN as a latent bank. The conventional methods struggle to maintain textural fidelity and often produce artifacts at high magnification factors, which GLEAN aims to overcome by incorporating rich priors without requiring image-specific optimization.

Methodology and Contributions

The authors present a novel encoder-bank-decoder architecture where GLEAN exploits the generative capabilities of a pre-trained GAN to serve as a latent bank. This architecture facilitates efficient conditioning and retrieval of prior information through a single forward pass. Specifically, the method involves:

Encoder: Extracts convolutional features and latent vectors from a low-resolution (LR) input image. These features encapsulate local structures and high-level cues essential for guiding the generator.
Generative Latent Bank: Utilizes pre-trained generator blocks capable of richly encoding priors, thus relieving the necessity of learning both fidelity and texture generation from scratch. The GAN latent bank is modified to incorporate both latent vectors and multi-resolution convolutional features, enhancing the quality of the generated outputs.
Decoder: Processes features from both the encoder and latent bank using a progressive fusion strategy, enhancing the output image quality and fidelity.

A significant advantage of GLEAN over traditional approaches is its ability to provide high-quality image upscaling while requiring only a single pass, which simplifies applications that demand fast execution or are computationally constrained.

Comparative Evaluation

The performance of GLEAN was demonstrated across multiple datasets and categories, including human faces, cats, cars, bedrooms, and towers, at magnification factors up to 64 $\times$ . For human faces, particularly using CelebA-HQ, GLEAN outperformed methods like ESRGAN $^+$ and PULSE, which either produced unrealistic textures or low-fidelity outputs. The quantitative results, such as PSNR and LPIPS, further corroborate its superior ability to maintain high fidelity and texture realism.

Implications and Future Directions

The methodological choice of utilizing a pre-trained GAN as a latent bank signifies a paradigm shift in SR strategies, potentially extending the notion of GAN-based dictionaries to other restoration tasks such as image denoising, inpainting, and colorization. The efficiency of GLEAN in implementing high-fidelity SR with lower computational overhead than iterative optimization methods highlights an important step toward practical applications.

Future research avenues may explore adaptive GAN priors for various image modalities and investigate the extension of GLEAN to a broader range of image transformations. Furthermore, enhancing the conditional mechanisms in GLEAN by integrating more advanced machine learning techniques could yield even more profound gains in SR and other restoration tasks.

In summary, GLEAN brings forward a practical and theoretically enriching approach to image super-resolution, leveraging the latent capabilities of GANs in a straightforward encoder-bank-decoder framework that tackles the challenges of large-factor SR with noteworthy efficacy.

PDF Markdown