Beyond RGB: Very High Resolution Urban Remote Sensing With Multimodal Deep Networks (1711.08681v1)

Published 23 Nov 2017 in cs.NE and cs.CV

Abstract: In this work, we investigate various methods to deal with semantic labeling of very high resolution multi-modal remote sensing data. Especially, we study how deep fully convolutional networks can be adapted to deal with multi-modal and multi-scale remote sensing data for semantic labeling. Our contributions are threefold: a) we present an efficient multi-scale approach to leverage both a large spatial context and the high resolution data, b) we investigate early and late fusion of Lidar and multispectral data, c) we validate our methods on two public datasets with state-of-the-art results. Our results indicate that late fusion make it possible to recover errors steaming from ambiguous data, while early fusion allows for better joint-feature learning but at the cost of higher sensitivity to missing data.

Authors (3)

Nicolas Audebert (27 papers)
Bertrand Le Saux (59 papers)
Sébastien Lefèvre (41 papers)

Citations (477)

View on Semantic Scholar

Summary

Overview of "Beyond RGB: Very High Resolution Urban Remote Sensing With Multimodal Deep Networks"

This paper presents an exploration into semantic labeling of high-resolution multimodal remote sensing data using deep learning. Specifically, it adapts deep fully convolutional networks to handle multi-scale and multi-modal remote sensing inputs, a challenge inherent to urban semantic labeling tasks.

Key Contributions

The authors underscore three main contributions:

Multi-Scale Processing: An efficient multi-scale methodology is introduced, leveraging large spatial contexts and high-resolution data to enhance feature extraction and classification accuracy. This involves branching the deep networks to produce outputs at various resolutions, subsequently averaged to improve the semantic map's quality.
Fusion Strategies: The paper investigates both early and late fusion techniques for integrating Lidar and multispectral data. Early fusion employs FuseNet-like architectures where multimodal inputs are simultaneously processed and merged in the network. In contrast, late fusion employs residual correction on predictions from independently trained models, combining them to improve accuracy.
Validation and Performance: The methods are validated on the ISPRS Semantic Labeling Challenge datasets for Vaihingen and Potsdam, exhibiting state-of-the-art results. Late fusion demonstrated the ability to recover from errors stemming from ambiguous data, whereas early fusion improved joint feature learning, albeit with increased sensitivity to missing information.

Numerical Results and Observations

The application of multi-scale strategies demonstrated a modest numerical improvement but yielded significant qualitative benefits, notably in the regularization of predictions which enhances interpretability. The multi-resolution approach contributed to a 0.3% increase in overall accuracy on the Vaihingen dataset, underscoring its effectiveness in real-world scenarios.

Comparison of the fusion techniques revealed contrasting strengths; early fusion provided balanced semantic maps with high overall accuracy, while late fusion excelled in refining complex intra-class boundaries. These insights are illustrated by quantitative results, placing the methods among the top performers in the ISPRS challenge.

Implications and Future Directions

The paper highlights critical implications for both theoretical advancements in computer vision and practical applications in remote sensing. By extending traditional deep learning techniques beyond RGB imagery, the research opens pathways for more robust, generalized models capable of integrating auxiliary data sources.

Future work may explore robustness against data imperfections and devise strategies for handling missing multimodal data. The potential integration of generative models and adversarial learning frameworks presents avenues to further refine multimodal data processing.

In summary, this work offers an insightful contribution to the domain of urban remote sensing, proposing methodologies that effectively integrate diverse data sources for enhanced semantic mapping of urban environments. The nuanced approach to fusion and multi-scale processing lays the groundwork for sophisticated models that can adapt to increasingly complex data landscapes.

PDF Markdown

Related Papers

Find Related Papers