PolyWorld: Polygonal Building Extraction with Graph Neural Networks in Satellite Images (2111.15491v3)

Published 30 Nov 2021 in cs.CV

Abstract: While most state-of-the-art instance segmentation methods produce binary segmentation masks, geographic and cartographic applications typically require precise vector polygons of extracted objects instead of rasterized output. This paper introduces PolyWorld, a neural network that directly extracts building vertices from an image and connects them correctly to create precise polygons. The model predicts the connection strength between each pair of vertices using a graph neural network and estimates the assignments by solving a differentiable optimal transport problem. Moreover, the vertex positions are optimized by minimizing a combined segmentation and polygonal angle difference loss. PolyWorld significantly outperforms the state of the art in building polygonization and achieves not only notable quantitative results, but also produces visually pleasing building polygons. Code and trained weights are publicly available at https://github.com/zorzi-s/PolyWorldPretrainedNetwork.

Citations (66)

View on Semantic Scholar

Summary

The paper introduces PolyWorld, an end-to-end method combining CNN-based vertex detection with GNN-based vertex connection to extract building polygons directly from satellite images.
It achieves superior quantitative performance, reporting an Average Precision of 63.3 and improved IoU and C-IoU metrics compared to state-of-the-art methods.
The integration of differentiable optimal transport refines vertex assignments, significantly enhancing polygon regularity for practical geospatial applications.

Overview of PolyWorld: Polygonal Building Extraction with Graph Neural Networks in Satellite Images

The paper presents PolyWorld, a novel approach for extracting vectorized building polygons from satellite images using Graph Neural Networks (GNNs). This work targets the challenges of producing accurate polygonal representations of buildings, which are essential for various cartographic applications and remote sensing tasks that require more than mere binary segmentation masks.

PolyWorld stands out by processing satellite imagery to directly extract building vertices and predict the connection strength between them, forming precise polygons. The network employs a Convolutional Neural Network (CNN) to detect vertex candidates and a GNN to aggregate information and refine vertex positions. The differentiation of PolyWorld lies in its end-to-end architecture, which optimizes the vertex positions by minimizing both segmentation and polygonal angle difference losses, rather than relying on a post-processing stage of polygonization.

The core component of this approach is the integration of a differentiable optimal transport problem to find the best vertex assignments, thus generating valid building polygons. The combination of CNN-based feature extraction with GNN-based vertex connection is leveraged to achieve enhanced distinctiveness of descriptors, which is a critical factor for distinguishing between vertices during the matching process. The optimization and reorganization of vertex locations significantly enhance the accuracy and aesthetic quality of the generated polygons.

Key Numerical Results and Claims

The paper reports that PolyWorld significantly outperforms existing state-of-the-art methods in building polygonization. Notably, the method increases intersection-over-union (IoU) scores and achieves superior quantitative results both in terms of accuracy and polygon regularity when assessed on the CrowdAI Mapping Challenge dataset. For instance, PolyWorld achieves an impressive Average Precision (AP) of 63.3 with position refinement enabled and AP metrics maintained across different scales (AP_S, AP_M, AP_L) indicate robustness to varying building sizes.

Moreover, the proposed architecture manages to generate polygons with a complexity closer to ground truth, as evidenced by its high scores in complexity-aware IoU (C-IoU) metrics and lower Max Tangent Angle (MTA) Error, suggesting more precise alignment of predicted building contours to reference data.

Implications and Future Perspectives

PolyWorld's ability to efficiently generate vector polygons holds substantial implications for geospatial disciplines and applications. It facilitates precision in tasks such as urban planning, cartographic mapping, and city modeling, where accurate building representations are paramount. The advancement demonstrates the feasibility of using GNNs in conjunction with CNNs within the field of remote sensing, potentially guiding future methodologies in geospatial data analysis.

The approach points to further possibilities within AI, especially regarding the optimization of large datasets prevalent in satellite imagery. Future work could explore PolyWorld's adaptability to other geographies, its performance with varying resolutions, or its integration with other real-world data types, such as terrain or vegetation indices, for more holistic environment modeling.

In conclusion, PolyWorld not only showcases an innovative method of achieving high-fidelity building extraction from satellite images but also sets a precedent for incorporating advanced neural architectures in solving complex vectorization problems in computer vision and remote sensing. This paper positions itself as a strong step towards more efficient and precise geospatial data utilization.

Related Papers

GitHub

GitHub - zorzi-s/PolyWorldPretrainedNetwork: PolyWorld: Polygonal Building Extraction with Graph Neural Networks in Satellite Images (182 stars)

YouTube

Show All Videos