FSRNet: End-to-End Learning Face Super-Resolution with Facial Priors (1711.10703v1)

Published 29 Nov 2017 in cs.CV

Abstract: Face Super-Resolution (SR) is a domain-specific super-resolution problem. The specific facial prior knowledge could be leveraged for better super-resolving face images. We present a novel deep end-to-end trainable Face Super-Resolution Network (FSRNet), which makes full use of the geometry prior, i.e., facial landmark heatmaps and parsing maps, to super-resolve very low-resolution (LR) face images without well-aligned requirement. Specifically, we first construct a coarse SR network to recover a coarse high-resolution (HR) image. Then, the coarse HR image is sent to two branches: a fine SR encoder and a prior information estimation network, which extracts the image features, and estimates landmark heatmaps/parsing maps respectively. Both image features and prior information are sent to a fine SR decoder to recover the HR image. To further generate realistic faces, we propose the Face Super-Resolution Generative Adversarial Network (FSRGAN) to incorporate the adversarial loss into FSRNet. Moreover, we introduce two related tasks, face alignment and parsing, as the new evaluation metrics for face SR, which address the inconsistency of classic metrics w.r.t. visual perception. Extensive benchmark experiments show that FSRNet and FSRGAN significantly outperforms state of the arts for very LR face SR, both quantitatively and qualitatively. Code will be made available upon publication.

Authors (5)

Yu Chen (506 papers)
Ying Tai (88 papers)
Xiaoming Liu (145 papers)
Chunhua Shen (404 papers)
Jian Yang (505 papers)

Citations (462)

View on Semantic Scholar

Summary

The paper presents an end-to-end FSRNet that integrates facial priors to enhance super-resolution of extremely low-resolution face images.
It employs a coarse-to-fine network with a prior estimation module for generating accurate landmarks and parsing maps, boosting alignment performance.
FSRNet outperforms methods like SRResNet and VDSR, demonstrating significant gains in PSNR, SSIM, and facial alignment metrics.

Face Super-Resolution with FSRNet

The paper presents a novel approach to Face Super-Resolution (SR) by introducing the Face Super-Resolution Network (FSRNet). This network leverages facial priors in an end-to-end manner to enhance the super-resolution of very low-resolution face images, addressing limitations in traditional multi-stage processing methods.

Methodology

FSRNet is designed around a deep learning architecture consisting of:

Coarse SR Network: Initially, a coarse high-resolution image is generated to simplify facial landmark and parsing map estimation from low-resolution inputs.
Fine SR Network: This network includes:
- A fine SR encoder for extracting image features.
- A prior estimation network for generating facial landmark heatmaps and parsing maps.
- A fine SR decoder that uses both image features and prior information to generate the final high-resolution image.
FSRGAN: An extension of FSRNet incorporating adversarial loss, further enhancing the realism of the super-resolved images.

The paper introduces a multi-task learning strategy within the prior estimation network to efficiently handle both landmark detection and parsing, resulting in improved alignment and parsing metrics beyond typical photometric measures like PSNR and SSIM.

Results

Quantitative and qualitative benchmarking demonstrates that FSRNet outperforms state-of-the-art models such as SRResNet, VDSR, and CBN, especially in handling unaligned and very low-resolution face images (16x16 pixels), achieving significant enhancements in PSNR and SSIM scores. The introduction of complementary metrics using face alignment and parsing further asserts the model's capability in recovering accurate geometry and structure, with FSRNet achieving better alignment performance compared to SRResNet.

Implications and Future Directions

FSRNet's integration of facial geometric priors into an end-to-end learning framework marks a substantial advancement in SR techniques. This integration allows for more robust handling of variations like misalignment, pose, and occlusion.

Looking forward, potential developments could include optimizing the prior estimation network for more accurate landmark and parsing map predictions, as well as exploring other forms of prior knowledge such as facial textures. Additionally, implementing variations of FSRGAN may enhance perceptual realism further, aligning with increasing demands for high-quality image processing in domains like security, entertainment, and forensic analysis.

The introduction of new evaluation metrics also paves the way for more comprehensive assessment methods beyond traditional pixel-based measures, influencing broader SR applications in computer vision.