Channel-wise and Spatial Feature Modulation Network for Single Image Super-Resolution (1809.11130v1)

Published 28 Sep 2018 in cs.CV

Abstract: The performance of single image super-resolution has achieved significant improvement by utilizing deep convolutional neural networks (CNNs). The features in deep CNN contain different types of information which make different contributions to image reconstruction. However, most CNN-based models lack discriminative ability for different types of information and deal with them equally, which results in the representational capacity of the models being limited. On the other hand, as the depth of neural networks grows, the long-term information coming from preceding layers is easy to be weaken or lost in late layers, which is adverse to super-resolving image. To capture more informative features and maintain long-term information for image super-resolution, we propose a channel-wise and spatial feature modulation (CSFM) network in which a sequence of feature-modulation memory (FMM) modules is cascaded with a densely connected structure to transform low-resolution features to high informative features. In each FMM module, we construct a set of channel-wise and spatial attention residual (CSAR) blocks and stack them in a chain structure to dynamically modulate multi-level features in a global-and-local manner. This feature modulation strategy enables the high contribution information to be enhanced and the redundant information to be suppressed. Meanwhile, for long-term information persistence, a gated fusion (GF) node is attached at the end of the FMM module to adaptively fuse hierarchical features and distill more effective information via the dense skip connections and the gating mechanism. Extensive quantitative and qualitative evaluations on benchmark datasets illustrate the superiority of our proposed method over the state-of-the-art methods.

Citations (238)

View on Semantic Scholar

Summary

The paper presents a novel CSFM network that integrates channel-wise and spatial attention to recalibrate features and improve super-resolution performance.
The approach uses a densely connected structure with gated fusion nodes to preserve long-term information and enhance multi-level feature modulation.
The network outperforms state-of-the-art methods by achieving higher PSNR and SSIM scores on benchmark datasets.

Channel-wise and Spatial Feature Modulation Network for Single Image Super-Resolution

The paper "Channel-wise and Spatial Feature Modulation Network for Single Image Super-Resolution" explores enhancing the performance of single image super-resolution (SISR) through a novel approach that leverages deep convolutional neural networks (CNNs) via a Channel-wise and Spatial Feature Modulation (CSFM) network. The method addresses two pivotal challenges: the lack of discriminative ability of CNN-based models and the loss of long-term information in deeper networks.

The CSFM network combines a set of innovative mechanisms, which include channel-wise and spatial attention to dynamically modulate multi-level features, alongside a densely connected structure that facilitates persistent memory through gated fusion nodes. The architecture is designed to efficiently transform low-resolution (LR) features into high-resolution (HR) outputs by capturing and enhancing significant information while suppressing redundancy.

Key Components and Architecture

Channel-wise and Spatial Attention Residual (CSAR) Blocks: Central to the CSFM architecture, these blocks integrate channel-wise and spatial attention within residual blocks to recalibrate feature responses. This dual attention mechanism ensures that both global channel dependencies and local spatial information are optimally employed to capture more important information for super-resolution tasks.
Feature-Modulation Memory (FMM) Module: Comprising several CSAR blocks, this module is constructed to modulate multi-level features for improved information capturing. It includes a gated fusion node that adaptively fuses information from current and previous modules to maintain long-term information flow across the network.
Densely Connected Structure: The overall architecture adopts a densely connected structure that ensures effective flow of information between modules, enhancing both the depth and learning ability of the network without incurring redundant computation.

Performance and Results

The evaluation of the CSFM network on benchmark datasets demonstrates its superior performance relative to state-of-the-art methods. For scale factors of 2x, 3x, and 4x, the network consistently achieves higher PSNR and SSIM values, notably outperforming existing solutions like MemNet, RDN, and EDSR, while also maintaining a balance between complexity and effectiveness. The novel mechanisms introduced by CSFM, specifically the attention-enhanced CSAR blocks and the memory-preserving GF nodes, contribute substantially to its advanced performance in reconstructing high-frequency details.

Implications and Future Work

The results validate the CSFM's capacity for improved SISR by emphasizing discriminative feature modulation, both channel-wise and spatially, coupled with effective long-term information preservation through dense connectivity. The research implies substantial robustness in handling complex scenes with varying frequency components, beneficial for applications requiring high fidelity image reconstruction.

Future developments could focus on extending the CSFM framework to accommodate real-time processing constraints and exploring its adaptability across different domains of computer vision, including video super-resolution and multi-frame enhancement scenarios. As the field of AI continues to evolve, such networks may eventually integrate with more comprehensive learning paradigms, possibly involving unsupervised or weakly supervised approaches to further alleviate data annotation bottlenecks.

PDF Markdown