A Multi-Scale and Multi-Depth Convolutional Neural Network for Remote Sensing Imagery Pan-Sharpening (1712.09809v1)

Published 28 Dec 2017 in cs.CV

Abstract: Pan-sharpening is a fundamental and significant task in the field of remote sensing imagery processing, in which high-resolution spatial details from panchromatic images are employed to enhance the spatial resolution of multi-spectral (MS) images. As the transformation from low spatial resolution MS image to high-resolution MS image is complex and highly non-linear, inspired by the powerful representation for non-linear relationships of deep neural networks, we introduce multi-scale feature extraction and residual learning into the basic convolutional neural network (CNN) architecture and propose the multi-scale and multi-depth convolutional neural network (MSDCNN) for the pan-sharpening of remote sensing imagery. Both the quantitative assessment results and the visual assessment confirm that the proposed network yields high-resolution MS images that are superior to the images produced by the compared state-of-the-art methods.

Citations (432)

View on Semantic Scholar

Summary

The paper presents MSDCNN, which uses multi-scale feature extraction and residual learning to enhance pan-sharpening performance.
It outperforms traditional methods, achieving superior spatial and spectral fidelity as evidenced by metrics like PSNR and SAM.
The architecture leverages varied convolutional kernels and skip connections to effectively manage complex, non-linear image fusion challenges.

Overview of a Multi-Scale and Multi-Depth Convolutional Neural Network for Pan-Sharpening Remote Sensing Imagery

This paper presents a convolutional neural network (CNN) architecture known as the Multi-Scale and Multi-Depth Convolutional Neural Network (MSDCNN) to enhance the spatial resolution of multi-spectral (MS) images through the process of pan-sharpening. The fusion of high-resolution panchromatic (PAN) imagery with MS images allows for overcoming sensor limitations where images with both high spatial and spectral resolutions are unavailable.

Current Methods and Limitations

The paper begins by contextualizing pan-sharpening and summarizes three major techniques historically utilized for this task: Component Substitution (CS)-based methods, Multiresolution Analysis (MRA)-based techniques, and Model-based Optimization (MBO) approaches. Each of these conventional methods has intrinsic limitations, such as the potential for spatial distortion, spectral distortion, and time-consuming computational requirements, particularly in handling large datasets and complex nonlinear transformations involved in pan-sharpening.

Proposed MSDCNN Architecture

In light of these challenges, the authors propose MSDCNN, leveraging deep learning's capacity to model complex, nonlinear relationships better than the existing methods. MSDCNN integrates multi-scale feature extraction and residual learning into the CNN framework. The network is structured into a fundamental three-layer CNN for initial feature extraction and a deeper network incorporating multi-scale convolutional layers, improving the extraction of informative features across various spatial scales.

The multi-scale feature extraction blocks use convolutional kernels of varying sizes (3x3, 5x5, and 7x7) to capture the diverse spatial structures present in different regions of the remote sensing imagery. This component is pivotal in enhancing robustness against the variability of input image scales across data acquired from multiple sensors. Skip connections are also vital in the architecture, facilitating efficient training of deep networks by mitigating gradient vanishing problems, allowing for more effective utilization of deeper network structures.

Evaluation and Results

Quantitative analysis was performed using both simulated and real datasets from QuickBird, WorldView-2, and IKONOS sensors. MSDCNN outperformed traditional and other state-of-the-art fusion techniques across various metrics, including PSNR, Q, ERGAS, SAM, and Q4/Q8, indicating superior spatial and spectral fidelity. Figures in the paper show that MSDCNN achieves more accurate spatial details without sacrificing spectral quality compared to methods such as GS, PRACS, MTF-GLP, and PNN.

Implications and Future Directions

The findings in this work suggest that incorporating multi-scale and multi-depth learning mechanisms into CNN-based models provides a robust solution capable of achieving high-quality pan-sharpened images. This improvement holds considerable promise for varied remote sensing applications requiring high-resolution data.

However, the authors acknowledge the empirical nature of the current network design and the need for further analytical studies to refine CNN architecture optimization. Future work may focus on enhancing network designs through better understanding and expanding MSDCNN capabilities to other areas like hyperspectral denoising, spatial-temporal unified fusion, and network compression, facilitating broader adoption in resource-intensive remote sensing applications.

PDF Markdown