Unsupervised Domain Adaptation using Generative Adversarial Networks for Semantic Segmentation of Aerial Images (1905.03198v1)

Published 8 May 2019 in cs.CV

Abstract: Segmenting aerial images is being of great potential in surveillance and scene understanding of urban areas. It provides a mean for automatic reporting of the different events that happen in inhabited areas. This remarkably promotes public safety and traffic management applications. After the wide adoption of convolutional neural networks methods, the accuracy of semantic segmentation algorithms could easily surpass 80% if a robust dataset is provided. Despite this success, the deployment of a pre-trained segmentation model to survey a new city that is not included in the training set significantly decreases the accuracy. This is due to the domain shift between the source dataset on which the model is trained and the new target domain of the new city images. In this paper, we address this issue and consider the challenge of domain adaptation in semantic segmentation of aerial images. We design an algorithm that reduces the domain shift impact using Generative Adversarial Networks (GANs). In the experiments, we test the proposed methodology on the International Society for Photogrammetry and Remote Sensing (ISPRS) semantic segmentation dataset and found that our method improves the overall accuracy from 35% to 52% when passing from Potsdam domain (considered as source domain) to Vaihingen domain (considered as target domain). In addition, the method allows recovering efficiently the inverted classes due to sensor variation. In particular, it improves the average segmentation accuracy of the inverted classes due to sensor variation from 14% to 61%.

Citations (179)

View on Semantic Scholar

Summary

The paper introduces an unsupervised GAN framework that adapts segmentation models to new aerial domains, increasing accuracy from 35% to 52% and boosting sensor-specific performance.
It employs a cyclic GAN with cycle-consistency loss to translate source images into the target domain, preserving structural details for effective fine-tuning.
The approach reduces reliance on costly labeled data, making semantic segmentation more practical for urban monitoring, traffic management, and public safety applications.

Unsupervised Domain Adaptation for Semantic Segmentation Using GANs

The paper under review addresses the challenge of unsupervised domain adaptation for semantic segmentation in aerial images using Generative Adversarial Networks (GANs). Semantic segmentation, a critical task in image analysis, assigns a class label to every pixel in an image, thus providing comprehensive scene understanding. This task is particularly significant for applications such as urban area monitoring, traffic management, and public safety within the field of aerial imagery.

Methodology Overview

The authors propose an innovative method using GANs to handle the domain shift problem encountered when deploying pre-trained segmentation models in new regions unrepresented in the training dataset. The core objective is to adapt a model pre-trained on a source domain to perform accurately on a target domain with differing characteristics, such as resolution, sensor types, and object representations—without requiring newly labeled data.

The proposed framework operates in several key phases:

Source Segmentation Model Training: The initial step involves training a semantic segmentation model on the source domain dataset, utilizing a state-of-the-art architecture to achieve high segmentation accuracy.
GAN-based Domain Translation: A GAN architecture is employed to translate images from the source domain to mimic the appearance of the target domain. The GAN uses a cyclic loss function to maintain structural consistency between the source and translated images.
Dataset Translation: The newly trained GAN is then used to translate the entire source dataset into the target domain style, resulting in a dataset that preserves the scene structure but reflects the visual characteristics of the target domain.
Model Fine-tuning: The translated dataset is used to fine-tune the source-trained segmentation model. This adaptation aims to enhance the model's segmentation accuracy on the target domain without additional labeled data.

Experimental Insights and Results

The methodology's efficacy was evaluated on the ISPRS semantic segmentation benchmark dataset, specifically adapting models trained on the Potsdam dataset to the Vaihingen dataset. The experiments demonstrated a remarkable improvement in segmentation accuracy, notably increasing from 35% to 52%, with specific enhancements in handling sensor-induced domain shifts. The accuracy for classes significantly affected by sensor variations, such as vegetation types distinguished by spectral data, improved dramatically (e.g., from 14% to 61%).

Implications and Future Directions

The practical implications of this research are substantial for real-world applications requiring the deployment of segmentation models across several geographic areas with varying image characteristics. By significantly reducing the domain shift caused by sensor differences, the proposed method facilitates cost-effective model transferability across domains. This approach is particularly advantageous in contexts where collecting labeled data for every new domain is impractical.

Theoretically, this work contributes to the growing body of literature on domain adaptation by providing a robust unsupervised approach tailored for semantic segmentation. It opens avenues for further exploration into leveraging GANs for other types of domain adaptations and mixed-domain challenges.

Future developments could explore extensions of this method to semi-supervised scenarios, where limited labeled data in the target domain could be effectively incorporated to further enhance model adaptation.

In summary, this paper provides a comprehensive approach to addressing domain adaptation challenges in semantic segmentation using GANs, marking a step forward in transferring image analysis models across varied domains without additional costly annotation efforts. The methodology holds promise for broader applications in remote sensing and urban planning, reflecting the authors' successful alignment of state-of-the-art machine learning techniques with practical, actionable outcomes.

PDF Markdown