Algorithms for Semantic Segmentation of Multispectral Remote Sensing Imagery using Deep Learning (1703.06452v3)

Published 19 Mar 2017 in cs.CV and cs.AI

Abstract: Deep convolutional neural networks (DCNNs) have been used to achieve state-of-the-art performance on many computer vision tasks (e.g., object recognition, object detection, semantic segmentation) thanks to a large repository of annotated image data. Large labeled datasets for other sensor modalities, e.g., multispectral imagery (MSI), are not available due to the large cost and manpower required. In this paper, we adapt state-of-the-art DCNN frameworks in computer vision for semantic segmentation for MSI imagery. To overcome label scarcity for MSI data, we substitute real MSI for generated synthetic MSI in order to initialize a DCNN framework. We evaluate our network initialization scheme on the new RIT-18 dataset that we present in this paper. This dataset contains very-high resolution MSI collected by an unmanned aircraft system. The models initialized with synthetic imagery were less prone to over-fitting and provide a state-of-the-art baseline for future work.

Citations (424)

View on Semantic Scholar

Summary

The paper pioneers synthetic MSI pre-training to overcome label scarcity, yielding improved mean-class accuracy.
The paper adapts state-of-the-art DCNNs, including Sharpmask and RefineNet, to effectively segment non-RGB multispectral data.
It introduces the RIT-18 dataset as a benchmark for high-resolution semantic segmentation in unmanned aerial systems imagery.

Overview of "Algorithms for Semantic Segmentation of Multispectral Remote Sensing Imagery using Deep Learning"

This paper presents a thorough exploration of semantic segmentation applied to multispectral remote sensing imagery using deep learning methodologies. The authors, Kemker, Salvaggio, and Kanan, introduce innovative approaches to address the challenge posed by the scarcity of labeled multispectral imagery, specifically leveraging synthetic data to enhance model training and performance.

Key Contributions

Adaptation of DCNNs: The paper pioneers the adaptation of state-of-the-art Deep Convolutional Neural Networks (DCNNs) for semantic segmentation to multispectral imagery (MSI). The researchers refashion existing DCNN frameworks to effectively handle the non-RGB domain, which holds rich information beyond standard RGB datasets.
Synthetic Data for Pre-training: A novel contribution is the utilization of synthetically generated MSI through DIRSIG—Digital Imaging and Remote Sensing Image Generation software—as an alternative to real labeled data. The synthetic dataset is leveraged for pre-training models, leading to better generalization when applied to real-world datasets like the newly introduced RIT-18.
RIT-18 Dataset Introduction: RIT-18 is a new dataset consisting of high-resolution MSI collected by UAS (unmanned aerial systems). It provides a valuable benchmark, offering a appropriate standard for comparing state-of-the-art semantic segmentation algorithms designed explicitly for non-RGB remote sensing imagery.

Methodologies

The authors detail a two-stream convolutional neural network architecture, integrating both RGB and non-RGB spectral bands, thus capturing comprehensive spatial-spectral features from high-resolution MSI. The pipeline involves pre-training the DCNNs on synthetic datasets generated using DIRSIG, followed by fine-tuning on the real-world RIT-18 dataset.

The models examined include Sharpmask and RefineNet, both leveraging the ResNet-50 architecture but adapted in ways to enhance segmentation boundary sharpness and spatial resolution. Both networks exhibit structural adaptations allowing them to refine semantic feature maps iteratively, thus learning precise semantic boundaries without excessive reliance on post-processing steps like CRFs.

Numerical Results

Empirical evaluations reveal that pre-trained networks using synthetic data outperform traditionally initialized models, reducing overfitting potential and improving mean-class accuracy. The paper presents compelling accuracy benchmarks over numerous class labels within the RIT-18 dataset, with RefineNet achieving the highest mean-class accuracy. Such results underscore the feasibility and benefits of employing synthetic data pre-training for multispectral semantic segmentation tasks without compromising accuracy.

Implications and Future Work

The methodologies proposed hold significant implications for the remote sensing community. By effectively demonstrating the utility of synthetic data pre-training, this research opens new frontiers for utilizing DCNNs across various sensor modalities with limited labeled data. The proposed pipeline for integrating synthetic to real-data transfer learning could streamline applications in diverse remote sensing domains, ranging from precision agriculture to urban land use planning.

The paper suggests potential refinements through deeper models or enhanced spectral channels, projecting future exploration towards multi-modal data integration and the employment of sophisticated algorithms like ResNeXt for even finer segmentation granularity. Given the encouraging results, continued advancements along these lines can facilitate the deployment of AI in more challenging and data-scarce remote sensing contexts.

PDF Markdown