Building Extraction at Scale using Convolutional Neural Network: Mapping of the United States (1805.08946v1)

Published 23 May 2018 in cs.CV

Abstract: Establishing up-to-date large scale building maps is essential to understand urban dynamics, such as estimating population, urban planning and many other applications. Although many computer vision tasks has been successfully carried out with deep convolutional neural networks, there is a growing need to understand their large scale impact on building mapping with remote sensing imagery. Taking advantage of the scalability of CNNs and using only few areas with the abundance of building footprints, for the first time we conduct a comparative analysis of four state-of-the-art CNNs for extracting building footprints across the entire continental United States. The four CNN architectures namely: branch-out CNN, fully convolutional neural network (FCN), conditional random field as recurrent neural network (CRFasRNN), and SegNet, support semantic pixel-wise labeling and focus on capturing textural information at multi-scale. We use 1-meter resolution aerial images from National Agriculture Imagery Program (NAIP) as the test-bed, and compare the extraction results across the four methods. In addition, we propose to combine signed-distance labels with SegNet, the preferred CNN architecture identified by our extensive evaluations, to advance building extraction results to instance level. We further demonstrate the usefulness of fusing additional near IR information into the building extraction framework. Large scale experimental evaluations are conducted and reported using metrics that include: precision, recall rate, intersection over union, and the number of buildings extracted. With the improved CNN model and no requirement of further post-processing, we have generated building maps for the United States. The quality of extracted buildings and processing time demonstrated the proposed CNN-based framework fits the need of building extraction at scale.

Authors (6)

Hsiuhan Lexie Yang (2 papers)
Jiangye Yuan (3 papers)
Dalton Lunga (12 papers)
Melanie Laverdiere (1 paper)
Amy Rose (2 papers)
Budhendra Bhaduri (3 papers)

Citations (168)

View on Semantic Scholar

Summary

Building Extraction at Scale using Convolutional Neural Networks: An Analytical Approach

The paper explores an extensive analysis of deep convolutional neural networks (CNNs) applied to the task of extracting building footprints from remote sensing imagery, specifically targeting large-scale mapping for the continental United States. This paper aims to evaluate the suitability of various state-of-the-art CNN architectures in creating reliable building maps and proposes methods to improve the accuracy and efficiency of building extraction.

Methodology Overview

The research investigates four prominent CNN models: Branch-out CNN, Fully Convolutional Network (FCN), Conditional Random Field as Recurrent Neural Network (CRFasRNN), and SegNet. Each of these models offers semantic pixel-wise labeling capabilities and focuses on capturing textural information at multiple scales. The paper utilizes aerial imagery from the National Agriculture Imagery Program (NAIP) at a 1-meter resolution, evaluating the performance of these models using metrics such as precision, recall, intersection over union (IoU), and processing efficiency.

One notable aspect of the research is the introduction of signed-distance labeling, designed to elevate building extraction results to instance level. This technique extends the binary classification framework by mapping pixels based on their distance to building boundaries, facilitating more precise delineation of building outlines. Additionally, the paper explores the integration of near-infrared (IR) spectral data, combined with the RGB input using a simple model fusion strategy to enhance extraction accuracy and suppress false positives associated with vegetation.

Numerical Results and Analytical Insights

SegNet emerged as a preferred model, demonstrating superior performance in extracting building footprints with high precision and recall rates. When combined with signed-distance labeling and fused with a model incorporating near-IR data, SegNet delivered notable improvements, achieving an IoU of 0.58 and detecting 84.9% of buildings. These results underscore the value of leveraging multi-scale features and spectral data to improve extraction robustness across diverse terrains.

Furthermore, the paper highlights potential sources of commission errors, such as variations in imagery radiometric characteristics and the impacts of terrain, leading to the refinement of extraction models with additional training data. The insights offered by this research can profoundly impact urban planning, population modeling, and socioeconomic studies, providing robust, high-resolution building maps at a national scale.

Implications and Future Directions

This research offers significant practical implications by demonstrating that CNN-based frameworks can effectively generate large-scale building maps with minimal post-processing, highlighting the potential for operational efficiency and scalability using GPU clusters. The paper identifies the need for dedicated efforts toward improving model generalization through advanced domain adaptation techniques and optimizing CNN architectures for multi-band inputs.

Future developments could concentrate on leveraging high-performance computing resources for simultaneous training across multiple GPUs, allowing exploration of more complex network architectures while overcoming memory constraints. Additionally, further investigation into representative sample selection and its impact on domain adaptation in CNNs could pave the way for more generalized models capable of maintaining high performance across varied landscapes and radiometric conditions.

Overall, this paper contributes valuable methodologies and insights to the field of remote sensing and machine learning, promising future advancements in large-scale object detection and mapping using deep learning technologies.