Generative Adversarial Networks and Perceptual Losses for Video Super-Resolution (1806.05764v2)

Published 14 Jun 2018 in cs.CV

Abstract: Video super-resolution (VSR) has become one of the most critical problems in video processing. In the deep learning literature, recent works have shown the benefits of using adversarial-based and perceptual losses to improve the performance on various image restoration tasks; however, these have yet to be applied for video super-resolution. In this work, we propose a Generative Adversarial Network(GAN)-based formulation for VSR. We introduce a new generator network optimized for the VSR problem, named VSRResNet, along with a new discriminator architecture to properly guide VSRResNet during the GAN training. We further enhance our VSR GAN formulation with two regularizers, a distance loss in feature-space and pixel-space, to obtain our final VSRResFeatGAN model. We show that pre-training our generator with the Mean-Squared-Error loss only quantitatively surpasses the current state-of-the-art VSR models. Finally, we employ the PercepDist metric (Zhang et al., 2018) to compare state-of-the-art VSR models. We show that this metric more accurately evaluates the perceptual quality of SR solutions obtained from neural networks, compared with the commonly used PSNR/SSIM metrics. Finally, we show that our proposed model, the VSRResFeatGAN model, outperforms current state-of-the-art SR models, both quantitatively and qualitatively.

Authors (4)

Alice Lucas (4 papers)
Santiago Lopez Tapia (1 paper)
Rafael Molina (20 papers)
Aggelos K. Katsaggelos (65 papers)

Citations (162)

View on Semantic Scholar

Summary

The paper introduces VSRResNet, a novel GAN-based model that bypasses motion compensation to achieve high-quality video super-resolution.
It employs deep residual blocks and perceptual losses using pre-trained VGG features to enhance image sharpness and detail.
The study demonstrates that the proposed approach outperforms state-of-the-art methods in PSNR, SSIM, and perceptual quality metrics.

Generative Adversarial Networks and Perceptual Losses for Video Super-Resolution

The paper "Generative Adversarial Networks and Perceptual Losses for Video Super-Resolution" by Alice Lucas, Santiago Lopez-Tapia, Rafael Molina, and Aggelos K. Katsaggelos presents a significant advancement in the field of video super-resolution (VSR) through the novel application of Generative Adversarial Networks (GANs) enhanced by perceptual losses. The core objective of VSR is to enhance low-resolution video sequences to high-resolution ones, a critical need driven by the prevalence of high-definition display technologies.

Methodological Innovations

This paper introduces VSRResNet, a generator architecture specifically tailored for VSR. It employs a deep residual learning paradigm comprising 34 convolution operations, organized into 15 residual blocks. This depth allows the network to circumvent motion-compensation methodologies traditionally required in VSR, thereby enabling the network to independently extract motion information inherent in sequences of low-resolution frames.

The generator is incorporated into a GAN framework where a discriminative network aids in training by differentiating between real high-resolution frames and those generated by VSRResNet. Notably, this adversarial framework is supplemented by perceptual losses computed using the Charbonnier distance in feature and pixel spaces, realized through the pre-trained VGG network. These perceptual losses are pivotal in elevating the visual quality of super-resolved video frames beyond what simple mean-squared error (MSE) objectives can achieve.

Numerical Results

The paper provides comprehensive quantitative assessments, demonstrating the superiority of the VSRResFeatGAN model over existing VSR approaches without motion compensation, across scale factors of 2, 3, and 4. The results are robust, with the proposed model achieving higher PSNR and SSIM values compared to competitive state-of-the-art methods such as VDSR and SRGAN, among others, while also showing qualitative improvements in image sharpness and detail reproduction.

Furthermore, the paper highlights the limitations of traditional metrics like PSNR and SSIM in capturing perceptual quality. The authors introduce the PercepDist metric, which aligns more closely with human visual perception and demonstrates that their model yields superior results in terms of perceptual quality.

Implications and Future Directions

The integration of GAN with perceptual losses for VSR marks a significant step forward in video processing technology. This approach addresses the conventional problem of blurriness in super-resolved frames by enhancing the fidelity of high-frequency details. Practically, these advancements facilitate improved display quality on UHD television and other high-resolution platforms.

Theoretically, the paper opens avenues for exploring even deeper architectures and more sophisticated loss functions in VSR. Future research may focus on optimizing GAN training dynamics further, especially to mitigate the subtle artifact patterns introduced during adversarial learning. Moreover, leveraging state-of-the-art image processing techniques and blending them with conventional motion analysis frameworks could yield VSR solutions with unprecedented quality and efficiency.

This paper serves as a catalyst for ongoing research into employing GANs for diverse image restoration tasks, encouraging exploration beyond video super-resolution into general image enhancement, compression, and restoration challenges in Artificial Intelligence.