Building Instance Classification Using Street View Images (1802.09026v1)

Published 25 Feb 2018 in cs.CV and eess.IV

Abstract: Land-use classification based on spaceborne or aerial remote sensing images has been extensively studied over the past decades. Such classification is usually a patch-wise or pixel-wise labeling over the whole image. But for many applications, such as urban population density mapping or urban utility planning, a classification map based on individual buildings is much more informative. However, such semantic classification still poses some fundamental challenges, for example, how to retrieve fine boundaries of individual buildings. In this paper, we proposed a general framework for classifying the functionality of individual buildings. The proposed method is based on Convolutional Neural Networks (CNNs) which classify facade structures from street view images, such as Google StreetView, in addition to remote sensing images which usually only show roof structures. Geographic information was utilized to mask out individual buildings, and to associate the corresponding street view images. We created a benchmark dataset which was used for training and evaluating CNNs. In addition, the method was applied to generate building classification maps on both region and city scales of several cities in Canada and the US. Keywords: CNN, Building instance classification, Street view images, OpenStreetMap

Citations (268)

View on Semantic Scholar

Summary

The paper presents a novel method using street view images and CNNs to classify individual buildings by functionality.
It fine-tunes models like AlexNet, VGG16, and ResNet on façade data to achieve high accuracy in urban building classification.
The resulting maps support automated urban analysis and planning, especially in areas with limited cadastral data.

Building Instance Classification Using Street View Images

The paper under review introduces a method specifically focused on classifying individual buildings based on their functionality using street view images. This research departs from traditional land-use classification techniques which typically rely on remote sensing imagery to deliver patch-wise or pixel-wise classification. Conventional approaches predominantly utilize aerial or satellite images that capture roof structures, which tend to be less informative for discerning the specific uses of individual buildings. Instead, this paper leverages Convolutional Neural Networks (CNNs) trained on street view images that reveal more about façade structures, thus offering a more detailed perspective on building use.

Methodology Overview

The proposed approach employs street view imagery, such as Google StreetView, to analyze building façades. Here is a concise overview of the methodology:

Data Collection: The paper utilizes freely accessible street view images and geographic maps to obtain images of building façades. The building footprints and geographic locations necessary for image retrieval are sourced from platforms such as OpenStreetMap and Google Maps.
Image Preprocessing: The street view images undergo preprocessing to exclude irrelevant images. A pre-trained CNN model from the Places2 dataset is used for outlier removal, ensuring the retained images are suitable for façade analysis.
CNN Training and Classification: The researchers constructed a comprehensive benchmark dataset of building street view images to train CNN models. Various architectures, including AlexNet, VGG16, and ResNet, were fine-tuned using this data. Their application enables building instance classification at both regional and city scales, focusing on eight building categories functional in urban analysis: apartment, church, garage, house, industrial, office building, retail, and roof structure.

Results and Implications

The CNN models demonstrated considerable accuracy in classifying building functionalities based on façade imagery. The VGG16 architecture, among others tested, was particularly effective and chosen for its superior performance in generating building classification maps for several urban areas in Canada and the US, including Calgary, Boston, and Toronto. The resulting maps offer insight into urban structure, hinting at residential densities and business districts, and are invaluable for urban planning and analysis.

The research highlights potential applications in high-resolution urban analysis such as population density mapping, urban social structure studies, city economic behavioral analysis, and general urban planning. It suggests novel approaches for cities where access to comprehensive cadastral databases is restricted or absent. Accurate and automated building classification can significantly reduce the labor intensity associated with updating urban databases manually.

Challenges and Future Directions

The authors also discuss challenges such as inconsistent data labeling and image quality that can affect classification accuracy. Some street view images encompass multiple building types or insufficient façade information, complicating classification. To address these issues, future work may involve integrating additional data modalities, such as text from social media or brand identification within images, and improving methods for the automatic extraction of building footprints from remote sensing data.

The paper points towards multivariate data fusion, employing content from social media, text image descriptions, and possibly, refined methods for precisely classifying buildings without relying on street-level imagery. Expanding the dataset scope and integrating richer contextual data could yield more robust classification performance and broader application in diverse urban environments.

In conclusion, this work introduces an innovative methodology for the challenging task of building instance classification using publicly accessible data. The implications for urban planning and geography are substantial, offering new avenues for automated urban analysis and insights into urban landscapes at unprecedented granularity.

PDF Markdown