Remote Sensing Image Fusion Based on Two-stream Fusion Network (1711.02549v3)

Published 7 Nov 2017 in cs.CV

Abstract: Remote sensing image fusion (also known as pan-sharpening) aims at generating high resolution multi-spectral (MS) image from inputs of a high spatial resolution single band panchromatic (PAN) image and a low spatial resolution multi-spectral image. Inspired by the astounding achievements of convolutional neural networks (CNNs) in a variety of computer vision tasks, in this paper, we propose a two-stream fusion network (TFNet) to address the problem of pan-sharpening. Unlike previous CNN based methods that consider pan-sharpening as a super resolution problem and perform pan-sharpening in pixel level, the proposed TFNet aims to fuse PAN and MS images in feature level and reconstruct the pan-sharpened image from the fused features. The TFNet mainly consists of three parts. The first part is comprised of two networks extracting features from PAN and MS images, respectively. The subsequent network fuses them together to form compact features that represent both spatial and spectral information of PAN and MS images, simultaneously. Finally, the desired high spatial resolution MS image is recovered from the fused features through an image reconstruction network. Experiments on Quickbird and \mbox{GaoFen-1} satellite images demonstrate that the proposed TFNet can fuse PAN and MS images, effectively, and produce pan-sharpened images competitive with even superior to state of the arts.

Citations (271)

View on Semantic Scholar

Summary

The paper introduces a Two-stream Fusion Network that integrates distinct CNN features to achieve effective pan-sharpening.
It employs an encoder-decoder architecture with ℓ1 loss and residual learning to reduce image blurring and enhance detail.
Empirical tests on Quickbird and GaoFen-1 datasets demonstrate superior spectral and spatial preservation over existing methods.

Overview of "Remote Sensing Image Fusion Based on Two-stream Fusion Network"

This paper presents a novel approach to remote sensing image fusion, or pan-sharpening, using a deep learning framework titled the Two-stream Fusion Network (TFNet). The objective of pan-sharpening is to generate a high-resolution multi-spectral (MS) image by integrating the spatial detail from a panchromatic (PAN) image and the spectral information from a low-resolution MS image. The authors propose a methodology that diverges from conventional pixel-level fusion techniques by leveraging the power of convolutional neural networks (CNNs) to perform feature-level fusion, ultimately reconstructing the desired high-resolution MS image from these features.

Methodological Innovation

The TFNet is structured as an encoder-decoder architecture comprising three core components: feature extraction, feature fusion, and image reconstruction. The first component extracts features using two distinct CNNs for the PAN and MS images. Subsequent fusion occurs by concatenating the features, followed by compacting them via a fusion network that integrates both spatial and spectral information. The final encoder-decoder stage reconstructs the high-resolution MS image from these fused features.

The network's architecture is characterized by a two-stream setup, which facilitates processing different feature domains for the PAN and MS images. Notably, the authors have introduced the use of an $\ell_1$ loss function as opposed to the traditionally used $\ell_2$ , achieving improved results by reducing blurring associated with image reconstruction. Additionally, residual learning is incorporated to further increase performance, based on its success in other low-level vision tasks.

Empirical Results

The efficacy of the TFNet was evaluated on datasets from Quickbird and GaoFen-1 satellites. The proposed model was found to outperform several existing methods, including classical techniques such as IHS and modern CNN-based methods like PNN. Quantitative assessments showed substantial improvements across various metrics, including spectral angle mapper (SAM), correlation coefficient (CC), and universal image quality index (UIQI). Results highlight the network's capability in preserving spectral information while enhancing spatial details, evident in visual assessments of image outputs.

Implications and Future Work

Practically, the TFNet offers a robust solution for remote sensing tasks, such as land cover classification and change detection, by providing high-quality pan-sharpened images. Theoretically, it advances the understanding of feature-level fusion for remote sensing imagery and sets a foundation for applying deep learning architectures in similar domains.

For future developments, the authors suggest refining the loss functions to better cater to the pan-sharpening context and exploring unsupervised methodologies to circumvent the dependency on large training datasets. These contrivances could pave the way for more generalized and adaptive image fusion models in remote sensing.

In conclusion, the Two-stream Fusion Network augments state-of-the-art pan-sharpening practices by leveraging CNNs for feature integration, showcasing promising potential in both spectral and spatial domains. Through this paper, the authors contribute significantly to the ongoing dialogue on deep learning’s role in enhancing remote sensing image processing techniques.

PDF Markdown