Learning Parallax Attention for Stereo Image Super-Resolution (1903.05784v3)

Published 14 Mar 2019 in cs.CV

Abstract: Stereo image pairs can be used to improve the performance of super-resolution (SR) since additional information is provided from a second viewpoint. However, it is challenging to incorporate this information for SR since disparities between stereo images vary significantly. In this paper, we propose a parallax-attention stereo superresolution network (PASSRnet) to integrate the information from a stereo image pair for SR. Specifically, we introduce a parallax-attention mechanism with a global receptive field along the epipolar line to handle different stereo images with large disparity variations. We also propose a new and the largest dataset for stereo image SR (namely, Flickr1024). Extensive experiments demonstrate that the parallax-attention mechanism can capture correspondence between stereo images to improve SR performance with a small computational and memory cost. Comparative results show that our PASSRnet achieves the state-of-the-art performance on the Middlebury, KITTI 2012 and KITTI 2015 datasets.

Citations (240)

View on Semantic Scholar

Summary

The paper introduces PASSRnet, a network that employs a unique parallax-attention mechanism to effectively address disparity variations in stereo images.
It integrates a residual ASPP module to boost multi-scale feature extraction, achieving significant improvements in PSNR and SSIM on benchmark datasets.
The study also offers the Flickr1024 dataset, the largest stereo image SR dataset, paving the way for advanced 3D imaging and autonomous applications.

An Examination of Parallax Attention for Stereo Image Super-Resolution

The paper "Learning Parallax Attention for Stereo Image Super-Resolution" introduces a novel approach to enhancing the resolution of stereo image pairs by leveraging a parallax-attention mechanism. The work is centered on addressing the challenge posed by the disparity variations found in stereo images, which can complicate the task of super-resolution (SR) by introducing additional variability in features between paired images. The authors propose the Parallax-Attention Stereo Super-Resolution Network (PASSRnet) as a solution, which employs a mechanism to identify and integrate stereo correspondence effectively, even with large disparities.

Core Contributions and Methodology

The PASSRnet model is distinguished by several key contributions:

Parallax-Attention Mechanism: This mechanism is the core innovation of PASSRnet. It applies a global receptive field to the epipolar line across stereo images, aligning features in a manner that handles large disparity variations efficiently. This attention-based approach allows for capturing non-local dependencies, which are often missed by traditional convolutional methods due to their limited receptive fields.
Resilient Feature Extraction through Residual ASPP: The authors introduce a residual atrous spatial pyramid pooling (ASPP) module, which enhances multi-scale feature extraction by incorporating different dilation rates. This module aids the network in achieving feature representations with expanded receptive fields, crucial for capturing contextual information necessary for stereo correspondence.
Flickr1024 Dataset: Recognizing the need for robust training data, the authors created the Flickr1024 dataset, consisting of 1024 high-quality stereo image pairs. This dataset, touted as the largest for stereo image SR, provides diverse scenes necessary for training generalized models like PASSRnet.

The model utilizes several losses designed explicitly for stereo SR, including photometric, smoothness, and cycle losses, in addition to the traditional SR loss, to enforce accurate stereo correspondence through the network and achieve superior super-resolution results.

Experimental Validation

The efficacy of PASSRnet is demonstrated through rigorous experiments on established datasets such as Middlebury, KITTI 2012, and KITTI 2015. The results indicate a significant improvement in performance when compared to state-of-the-art single image and stereo image SR methods. Specifically, PASSRnet consistently achieves higher PSNR and SSIM scores, signifying better reconstruction accuracy and preservation of image structure detail. Notably, on larger disparity variations, PASSRnet has shown flexibility and efficiency over other methods that utilize fixed constraint disparity models.

Implications and Future Directions

The implications of this work are multifaceted. Practically, the integration of a parallax-attention mechanism could lead to more advanced stereo imaging applications in areas such as autonomous vehicles and 3D displays, where image detail and depth perception are critical. Theoretically, the paper opens new directions in utilizing attention mechanisms for image processing tasks, particularly where global context and non-local correspondence are necessary.

As future research, continued exploration into adaptive and more efficient attention mechanisms might provide further improvements. Additionally, extending PASSRnet to integrate other forms of depth estimation or multi-image fusion tasks presents an exciting opportunity. Such extensions could exploit the rich correspondence information extracted by parallax-attention mechanisms, potentially enhancing image recovery across a broader range of applications.

In summary, the paper provides a comprehensive framework for understanding and applying parallax attention in stereo image super-resolution, offering promising directions for both practical application and further theoretical development within the field of computer vision.

PDF Markdown