Detail-Preserving Transformer for Light Field Image Super-Resolution (2201.00346v1)

Published 2 Jan 2022 in cs.CV

Abstract: Recently, numerous algorithms have been developed to tackle the problem of light field super-resolution (LFSR), i.e., super-resolving low-resolution light fields to gain high-resolution views. Despite delivering encouraging results, these approaches are all convolution-based, and are naturally weak in global relation modeling of sub-aperture images necessarily to characterize the inherent structure of light fields. In this paper, we put forth a novel formulation built upon Transformers, by treating LFSR as a sequence-to-sequence reconstruction task. In particular, our model regards sub-aperture images of each vertical or horizontal angular view as a sequence, and establishes long-range geometric dependencies within each sequence via a spatial-angular locally-enhanced self-attention layer, which maintains the locality of each sub-aperture image as well. Additionally, to better recover image details, we propose a detail-preserving Transformer (termed as DPT), by leveraging gradient maps of light field to guide the sequence learning. DPT consists of two branches, with each associated with a Transformer for learning from an original or gradient image sequence. The two branches are finally fused to obtain comprehensive feature representations for reconstruction. Evaluations are conducted on a number of light field datasets, including real-world scenes and synthetic data. The proposed method achieves superior performance comparing with other state-of-the-art schemes. Our code is publicly available at: https://github.com/BITszwang/DPT.

Citations (86)

View on Semantic Scholar

Summary

The paper introduces the Detail-Preserving Transformer (DPT), a novel Transformer-based network for light field image super-resolution (LFSR) that captures both local details and global spatial-angular dependencies.
The proposed DPT features a Spatial-Angular Locally-Enhanced Self-Attention (SA-LSA) mechanism and a dual-branch structure processing content and gradient information fused via cross-attention.
Experimental results show that the DPT surpasses existing state-of-the-art LFSR methods across multiple datasets in PSNR and SSIM, demonstrating the effectiveness of Transformers for this task.

An Expert Overview of the Detail-Preserving Transformer for Light Field Image Super-Resolution

The paper "Detail-Preserving Transformer for Light Field Image Super-Resolution" introduces a novel approach to enhance the resolution of light field images, leveraging the capabilities of Transformer architectures. Light field imaging, which captures rich multi-directional light information to enable comprehensive 3D scene representations, faces inherent resolution trade-offs between angular and spatial dimensions. The authors address this challenge through their proposed Detail-Preserving Transformer (DPT), focusing on the spatial dimension of super-resolution tasks.

Core Contributions

This work distinguishes itself by reframing light field super-resolution (LFSR) as a sequence-to-sequence reconstruction task using Transformers. The DPT model establishes an innovative mechanism that captures both the local geometric details and long-range spatial-angular dependencies in light field images, orchestrated across multiple sub-aperture views (SAIs). Specifically, the model comprises two primary branches:

Spatial-Angular Locally-Enhanced Self-Attention (SA-LSA): This component enhances the Transformer model's ability to maintain locality in image details while simultaneously modeling non-local dependencies critical for effective LFSR.
Dual-Branch Structure with Content and Gradient Transformers: The content branch learns from raw light field sequences, while the gradient branch processes edge and detail-preserving gradient information. These modalities are later fused through cross-attention mechanisms to generate robust feature representations.

Methodological Approach

The paper introduces the SA-LSA layer, which provides improved tokenization through convolutional processing, allowing the preservation of crucial local image features within sub-aperture sequences. The Transformer architecture is applied to capture spatial-angular relationships among sequences in both horizontal and vertical formats. This method contrasts sharply with traditional convolution-based networks, which fall short in effectively modeling global dependencies inherent in light fields.

Furthermore, the DPT employs a sophisticated fusion strategy wherein the features derived from content and gradient analysis are aggregated via cross-attention fusion Transformers. This enables comprehensive feature learning that enhances image reconstruction fidelity, especially beneficial in scenarios requiring high upscaling factors (e.g.,, \times4 super-resolution).

Results and Implications

The experimental evaluations, conducted across several diverse light field datasets (including EPFL, HCInew, HCIold, INRIA, and STFgantry), demonstrate that the proposed DPT surpasses existing state-of-the-art models in both PSNR and SSIM metrics. Particularly noteworthy are the improvements in the preservation of fine details and structural consistency, validating the advantage of integrated spatial-angular modeling.

The results also underscore the potential of Transformer-based architectures in the domain of image super-resolution, suggesting avenues for their broader application in related tasks. Future developments may explore scaling capabilities, real-time processing optimizations, and integration into multi-view and computational photography systems.

Conclusion

The introduction of the Detail-Preserving Transformer marks a significant advancement in the field of light field image processing, offering a robust framework capable of addressing the unique challenges posed by high-dimensional light field data. The transformative approach detailed in the paper lays groundwork for further exploration and adaptation of Transformer models in complex imaging systems, thus enriching the potential for high-quality, detail-rich reconstructions in 3D imaging applications.

Related Papers

GitHub

GitHub - BITszwang/DPT (53 stars)