MAP-Net: Multi Attending Path Neural Network for Building Footprint Extraction from Remote Sensed Imagery (1910.12060v2)

Published 26 Oct 2019 in cs.CV

Abstract: Accurately and efficiently extracting building footprints from a wide range of remote sensed imagery remains a challenge due to their complex structure, variety of scales and diverse appearances. Existing convolutional neural network (CNN)-based building extraction methods are complained that they cannot detect the tiny buildings because the spatial information of CNN feature maps are lost during repeated pooling operations of the CNN, and the large buildings still have inaccurate segmentation edges. Moreover, features extracted by a CNN are always partial which restricted by the size of the respective field, and large-scale buildings with low texture are always discontinuous and holey when extracted. This paper proposes a novel multi attending path neural network (MAP-Net) for accurately extracting multiscale building footprints and precise boundaries. MAP-Net learns spatial localization-preserved multiscale features through a multi-parallel path in which each stage is gradually generated to extract high-level semantic features with fixed resolution. Then, an attention module adaptively squeezes channel-wise features from each path for optimization, and a pyramid spatial pooling module captures global dependency for refining discontinuous building footprints. Experimental results show that MAP-Net outperforms state-of-the-art (SOTA) algorithms in boundary localization accuracy as well as continuity of large buildings. Specifically, our method achieved 0.68\%, 1.74\%, 1.46\% precision, and 1.50\%, 1.53\%, 0.82\% IoU score improvement without increasing computational complexity compared with the latest HRNetv2 on the Urban 3D, Deep Globe and WHU datasets, respectively. The TensorFlow implementation is available at https://github.com/lehaifeng/MAPNet.

Citations (173)

View on Semantic Scholar

Summary

The paper introduces a multi-path CNN that preserves spatial details, addressing shortcomings in conventional building footprint extraction from remote sensed imagery.
The attention mechanism refines feature channels, achieving up to 90.86% IoU and F1-score improvements of 0.88% to 1.50% across various datasets.
The method offers robust segmentation for urban monitoring and planning, effectively handling both small and large building structures.

Overview of MAP-Net for Building Footprint Extraction

Introduction and Motivation:

The paper introduces MAP-Net, a novel convolutional neural network designed to address challenges in building footprint extraction from remote sensing imagery. Traditional approaches often struggle with small structures due to lost spatial information during pooling, while large buildings suffer from discontinuous segmentation edges. This paper outlines the architecture and benefits of MAP-Net, which employs a multiple attending path strategy to maintain high-level semantic features alongside spatial localization.

Technical Approach:

MAP-Net involves a multi-path network architecture that extracts features across varying resolutions. Each path in the network performs feature extraction without repeated pooling, maintaining spatial resolution from shallow to deep layers. This contrasts significantly with conventional deep convolutional neural networks (DCNNs) that rely on encoder-decoder structures and often introduce spatial resolution loss leading to inaccurate localization, especially in semantic segmentation tasks.

The network includes:

Detail-Preserved Multi-path Feature Extraction: This feature extraction method keeps spatial details intact, thus enhancing localization for small and detailed structures.
Attention-based Feature Squeeze and Global Enhancement: Utilizing channel attention modules, MAP-Net optimizes feature channels via adaptive learning, refining multi-scale features and strengthening building representation. The global spatial pooling further enhances semantic information, addressing issues of discontinuity in large-scale building extraction.

Key Results:

MAP-Net achieves up to 90.86% IoU improvement compared to recent competing methods on datasets like Urban 3D, Deep Globe, and WHU. It surpasses traditional HRNetv2 performance improving F1-scores by 0.88% to 1.50% across various datasets without appreciable increases in computational complexity. This substantial improvement reflects MAP-Net's effective feature extraction and fusion strategy.

Discussion and Implications:

High-resolution imagery segmentation is critical for urban monitoring, disaster response, and planning. MAP-Net’s approach significantly improves accuracy in these applications by leveraging enhanced feature extraction mechanisms and attention modules. Potential advances in neural network architectures could further benefit remote sensing tasks beyond building footprint extraction, encompassing various land cover segmentation applications.

Conclusion and Future Directions:

MAP-Net presents significant progress towards reliable multi-scale feature extraction, maintaining spatial detail without compromising computational efficiency. Future research may focus on applying similar strategies to broader semantic segmentation tasks, integrating additional data types (e.g., DSM, Lidar), and enhancing real-time processing capabilities for dense urban environments. As AI continues evolving, the principles demonstrated in MAP-Net may inspire new methodologies in remote sensing, further optimizing the segmentation accuracy and efficiency.

PDF Markdown

Related Papers

GitHub

GitHub - lehaifeng/MAPNet (37 stars)