Image Super-Resolution with Cross-Scale Non-Local Attention and Exhaustive Self-Exemplars Mining (2006.01424v1)

Published 2 Jun 2020 in cs.CV, cs.LG, eess.IV, and stat.ML

Abstract: Deep convolution-based single image super-resolution (SISR) networks embrace the benefits of learning from large-scale external image resources for local recovery, yet most existing works have ignored the long-range feature-wise similarities in natural images. Some recent works have successfully leveraged this intrinsic feature correlation by exploring non-local attention modules. However, none of the current deep models have studied another inherent property of images: cross-scale feature correlation. In this paper, we propose the first Cross-Scale Non-Local (CS-NL) attention module with integration into a recurrent neural network. By combining the new CS-NL prior with local and in-scale non-local priors in a powerful recurrent fusion cell, we can find more cross-scale feature correlations within a single low-resolution (LR) image. The performance of SISR is significantly improved by exhaustively integrating all possible priors. Extensive experiments demonstrate the effectiveness of the proposed CS-NL module by setting new state-of-the-arts on multiple SISR benchmarks.

Citations (261)

View on Semantic Scholar

Summary

The paper introduces the novel CS-NL attention module to capture pixel-to-patch correlations across scales, significantly enhancing resolution outcomes.
It integrates cross-scale non-local priors with exhaustive self-exemplar mining in a recurrent framework, achieving superior PSNR and SSIM on multiple benchmarks.
The methodology improves reconstruction fidelity and parameter efficiency, paving the way for advanced applications in remote sensing, medical imaging, and surveillance.

Image Super-Resolution with Cross-Scale Non-Local Attention and Exhaustive Self-Exemplars Mining

The article "Image Super-Resolution with Cross-Scale Non-Local Attention and Exhaustive Self-Exemplars Mining" presents a novel approach for single image super-resolution (SISR) using cross-scale non-local (CS-NL) attention within a recurrent neural network framework. The authors identify and address a limitation in existing super-resolution methods that predominantly rely on local or in-scale feature priors, thereby neglecting the potential of cross-scale feature correlations inherent in natural images. The proposed CS-NL attention mechanism seeks to exploit these cross-scale correlations to improve super-resolution outcomes consistently.

Contributions and Methodology

The core contribution of the paper is the introduction of the CS-NL attention module. Unlike prior attention mechanisms that only consider feature correlations within the same scale, the CS-NL module identifies pixel-to-patch and patch-to-patch relationships across different scales within an image. This module is integrated into a self-exemplars mining (SEM) cell, which concurrently exploits local, in-scale non-local, and cross-scale non-local priors. The SEM cell operates within a recurrent neural network, enabling exhaustive feature fusion through multi-branch mutual projection.

The recurrent neural architecture proposed by the authors incorporates multiple SEM cells, each refining the super-resolved output iteratively by leveraging the recurrent feedback mechanism. The network is trained end-to-end with standard reconstruction loss, leveraging a large-scale image dataset to learn robust high-dimensional mappings from low-resolution to high-resolution space.

Experiments conducted across multiple benchmark datasets, including Set5, Set14, B100, Urban100, and Manga109, demonstrate that the proposed CS-NL attention mechanism yields substantial performance improvements over existing state-of-the-art methods. The quantitative results highlight that the proposed approach achieves superior PSNR and SSIM values, particularly significant on datasets with prevalent repetitive structures, where cross-scale self-similarities are abundant.

Numerical Results

The implementation of the CS-NL attention in SISR achieves noteworthy improvements in various scenarios. For instance, in urban imagery with repetitive architectural patterns, the method delivers enhanced reconstruction fidelity, outperforming existing techniques by a significant margin in terms of PSNR. Furthermore, the paper emphasizes the optimally balanced complexity-efficiency tradeoffs, noting the reduced parameter count compared to other competitive frameworks, thereby demonstrating superior parameter efficiency.

Theoretical and Practical Implications

The research establishes that cross-scale analysis, previously less explored in convolutional super-resolution networks, can effectively amplify the capacity to restore high-frequency details lost in low-resolution images. This breakthrough points toward potential applications in domains requiring high-precision image processing such as remote sensing, medical imaging, and high-definition surveillance.

Practically, the integration of CS-NL attention modules implies a path forward for designing networks that fully exploit intrinsic hierarchical information within single images without extensive reliance on external datasets. The methodology thus opens exploratory avenues in adaptive filtering and transformation-based enhancement strategies in computer vision tasks.

Conclusion and Future Directions

In conclusion, this paper makes significant contributions to the domain of image super-resolution by innovatively leveraging cross-scale information, culminating in improved resolution and reconstruction capabilities. Future research may extend these ideas to other image restoration tasks, integrating cross-scale correlation mining with generative models or employing it in unsupervised learning settings to further enhance the breadth and depth of effective image enhancement methodologies. The potential expansion of these theories into other domains of artificial intelligence could yield novel insights into the utilization of intrinsic data qualities for extensive analytical transformations.

PDF Markdown