PULSE: Self-Supervised Photo Upsampling via Latent Space Exploration of Generative Models (2003.03808v3)

Published 8 Mar 2020 in cs.CV, cs.LG, and eess.IV

Abstract: The primary aim of single-image super-resolution is to construct high-resolution (HR) images from corresponding low-resolution (LR) inputs. In previous approaches, which have generally been supervised, the training objective typically measures a pixel-wise average distance between the super-resolved (SR) and HR images. Optimizing such metrics often leads to blurring, especially in high variance (detailed) regions. We propose an alternative formulation of the super-resolution problem based on creating realistic SR images that downscale correctly. We present an algorithm addressing this problem, PULSE (Photo Upsampling via Latent Space Exploration), which generates high-resolution, realistic images at resolutions previously unseen in the literature. It accomplishes this in an entirely self-supervised fashion and is not confined to a specific degradation operator used during training, unlike previous methods (which require supervised training on databases of LR-HR image pairs). Instead of starting with the LR image and slowly adding detail, PULSE traverses the high-resolution natural image manifold, searching for images that downscale to the original LR image. This is formalized through the "downscaling loss," which guides exploration through the latent space of a generative model. By leveraging properties of high-dimensional Gaussians, we restrict the search space to guarantee realistic outputs. PULSE thereby generates super-resolved images that both are realistic and downscale correctly. We show proof of concept of our approach in the domain of face super-resolution (i.e., face hallucination). We also present a discussion of the limitations and biases of the method as currently implemented with an accompanying model card with relevant metrics. Our method outperforms state-of-the-art methods in perceptual quality at higher resolutions and scale factors than previously possible.

Citations (513)

View on Semantic Scholar

Summary

The paper presents a novel super-resolution method that upscales low-resolution images by exploring GAN latent spaces with a downscaling loss.
It leverages StyleGAN to navigate an unconstrained latent space, ensuring generated images are both realistic and consistent with their low-res counterparts.
Experimental results using MOS tests and NIQE scores demonstrate that PULSE outperforms traditional supervised techniques in perceptual image quality.

Self-Supervised Photo Upsampling via Latent Space Exploration of Generative Models

The paper presents a novel methodology for image super-resolution, titled PULSE (Photo Upsampling via Latent Space Exploration), which capitalizes on the capabilities of generative models to produce high-resolution images. This approach diverges from traditional supervised methods by adopting a self-supervised strategy and utilizing the latent space exploration of generative models, particularly GANs, to address the generation of high-quality, realistic super-resolved images that accurately downscale to given low-resolution images.

Methodology

PULSE redefines the objective of super-resolution from merely reconstructing a high-resolution image from a low-resolution input to generating a realistic super-resolved image that downscales to match the input. This is achieved through a novel formulation where the super-resolution task is guided by a "downscaling loss." This metric drives the exploration of the generative model's latent space, ensuring the generated high-resolution image is both plausible and capable of reproducing the low-resolution version upon downscaling.

Key to PULSE's approach is leveraging unconstrained latent space of GANs, specifically using StyleGAN to traverse potential high-resolution image candidates. Instead of optimizing pixel-wise errors, which can blur perceptual details, the approach seeks points in the latent space that directly satisfy the downscaling condition. This ensures that images remain within the realistic domain defined by the generator and adhere to the downscaling constraints.

Results

The experimental findings demonstrate that PULSE surpasses state-of-the-art methods in producing perceptually superior images. Quantitative evaluation using MOS tests placed PULSE above traditional methods such as FSRNet and FSRGAN in terms of perceptual quality for face images, with higher resolutions and larger scale factors than previously achievable. Furthermore, NIQE scores corroborate the high perceptual fidelity of PULSE's outputs, even outperforming the traditional ground truth dataset at the evaluated resolutions.

Implications and Future Directions

The implications of this research extend both practically and theoretically. Practically, PULSE provides a flexible and powerful alternative to supervised models, which typically require paired datasets and face limitations in generalizing across different degradation operators. PULSE's self-supervised nature, combined with generative modeling, allows for adaptation to various domains without requiring task-specific neural architectures.

Theoretically, this paper challenges the prevalent paradigms of super-resolution methodologies, advocating a shift towards manifold exploration in generative spaces as a more robust alternative to error-prone pixel reconstruction.

Looking ahead, future developments could focus on refining the integration of generative models with adaptive latent space constraints to ensure broader coverage and fidelity across diverse datasets. Additionally, investigating the biases inherent in generative models, as identified in the racial skew in StyleGAN's outputs, will be crucial in making these technologies more equitable and versatile.

In conclusion, PULSE offers a significant step forward for super-resolution technologies, charting a path for further exploration of generative models in computer vision tasks. Its self-supervised nature and focus on realistic upsampling set a foundation for future advancements in the generation of high-quality imagery across varied applications.

PDF Markdown

Related Papers

Tweets

https://twitter.com/XGargi/status/1769994090286862689