Deep Laplacian Pyramid Networks for Fast and Accurate Super-Resolution (1704.03915v2)

Published 12 Apr 2017 in cs.CV

Abstract: Convolutional neural networks have recently demonstrated high-quality reconstruction for single-image super-resolution. In this paper, we propose the Laplacian Pyramid Super-Resolution Network (LapSRN) to progressively reconstruct the sub-band residuals of high-resolution images. At each pyramid level, our model takes coarse-resolution feature maps as input, predicts the high-frequency residuals, and uses transposed convolutions for upsampling to the finer level. Our method does not require the bicubic interpolation as the pre-processing step and thus dramatically reduces the computational complexity. We train the proposed LapSRN with deep supervision using a robust Charbonnier loss function and achieve high-quality reconstruction. Furthermore, our network generates multi-scale predictions in one feed-forward pass through the progressive reconstruction, thereby facilitates resource-aware applications. Extensive quantitative and qualitative evaluations on benchmark datasets show that the proposed algorithm performs favorably against the state-of-the-art methods in terms of speed and accuracy.

Citations (2,323)

View on Semantic Scholar

Summary

The paper introduces a novel LapSRN that progressively reconstructs high-frequency details through a coarse-to-fine Laplacian pyramid framework.
It eliminates bicubic upsampling by employing transposed convolutions, thereby reducing computational burden and artifacts.
Experimental results show state-of-the-art performance in PSNR and SSIM on benchmarks like BSDS100 and Urban100, achieving faster processing speeds.

Deep Laplacian Pyramid Networks for Fast and Accurate Super-Resolution

This paper presents the Laplacian Pyramid Super-Resolution Network (LapSRN), leveraging advances in Convolutional Neural Networks (CNNs) to enhance single-image super-resolution (SR). The innovative model introduces a coarse-to-fine framework that progressively reconstructs high-frequency residuals along the levels of a Laplacian pyramid. This process mitigates the complexity and artifacts commonly associated with bicubic interpolation and conventional SR techniques.

Model Architecture and Training

The LapSRN architecture is designed to process low-resolution (LR) images by extracting feature maps and predicting residuals at multiple scales through a cascade of CNNs. The network comprises two primary branches: feature extraction and image reconstruction. At each level, the model employs a series of convolutional layers followed by transposed convolutional layers for efficient upsampling. The sub-band residuals are incrementally predicted and combined to form high-resolution (HR) images.

One distinct advantage of LapSRN is its exclusion of the bicubic upsampling pre-processing step, which not only reduces computational burden but also avoids the introduction of artifacts. Instead, transposed convolutions are employed directly on features extracted from the LR images. Supervised learning is conducted using a Charbonnier loss function to better handle outliers, thus achieving improved image quality by ameliorating issues like image blurriness that are typical with $\ell_2$ loss functions.

Key Contributions and Advantages

High Accuracy: The LapSRN provides a substantial boost in accuracy. It efficiently captures complex mappings by directly processing LR images and learning pixel-wise high-frequency details through a deep network structure.
Fast Processing: LapSRN is designed for high-speed processing. Benchmark evaluations demonstrate that LapSRN is faster than many contemporary SR models including SRCNN, VDSR, and DRCN.
Progressive Reconstruction: The network's design allows it to generate multiple intermediate SR results within a single network pass, providing flexibility for resource-conscious applications.

Performance and Comparisons

Across several benchmark datasets, including Set5, Set14, BSDS100, Urban100, and Manga109, LapSRN exhibits superior performance both quantitatively and qualitatively. It outperforms state-of-the-art methods in terms of peak signal-to-noise ratio (PSNR), structural similarity index (SSIM), and information fidelity criterion (IFC), especially noticeable in high scaling factors like $4\times$ and $8\times$ .

For instance, in $4\times$ SR on the BSDS100 dataset, LapSRN achieved a PSNR of 27.32 and an SSIM of 0.728, surpassing competitive methods like VDSR and DRCN. This improved performance is also evident in visual comparisons where LapSRN successfully reconstructs fine details and textures with minimal artifacts.

Practical Implications and Future Directions

LapSRN has significant implications for real-world applications requiring high-resolution image reconstruction at high fidelity and speed. This includes scenarios like medical imaging, satellite image analysis, and video resolution enhancement where computational resources may be limited, and high accuracy is paramount.

In terms of future developments, enhancing the model’s capability to hallucinate fine details where the LR image lacks sufficient structure is a plausible direction. Additionally, exploring lighter and more efficient network architectures could further optimize the trade-off between model complexity and performance, making the approach suitable for deployment in edge devices and mobile applications.

Conclusion

The research presented in the paper marks a substantial step forward in the domain of super-resolution. By innovating with the Laplacian pyramid framework and optimizing key aspects of CNN-based SR methods, the authors have achieved a balance of high performance, efficiency, and versatility. The LapSRN sets a new benchmark for SR techniques, offering a robust solution for a wide range of practical and research applications.

PDF Markdown