No More Discrimination: Cross City Adaptation of Road Scene Segmenters (1704.08509v1)

Published 27 Apr 2017 in cs.CV and cs.AI

Abstract: Despite the recent success of deep-learning based semantic segmentation, deploying a pre-trained road scene segmenter to a city whose images are not presented in the training set would not achieve satisfactory performance due to dataset biases. Instead of collecting a large number of annotated images of each city of interest to train or refine the segmenter, we propose an unsupervised learning approach to adapt road scene segmenters across different cities. By utilizing Google Street View and its time-machine feature, we can collect unannotated images for each road scene at different times, so that the associated static-object priors can be extracted accordingly. By advancing a joint global and class-specific domain adversarial learning framework, adaptation of pre-trained segmenters to that city can be achieved without the need of any user annotation or interaction. We show that our method improves the performance of semantic segmentation in multiple cities across continents, while it performs favorably against state-of-the-art approaches requiring annotated training data.

Citations (333)

View on Semantic Scholar

Summary

The paper introduces an unsupervised domain adaptation method that leverages adversarial learning to reduce domain bias in road scene segmentation.
It incorporates static-object priors from temporal Google Street View data to enhance pseudo-label stability across different urban settings.
The approach achieves performance close to fully-supervised models and demonstrates versatility on both synthetic and real-world datasets.

Cross-City Adaptation of Road Scene Segmenters: An Exploration of Unsupervised Domain Adaptation

This paper addresses the vital challenge of adapting road scene segmentation models to different urban environments without relying on annotated datasets for each new city. The authors propose an innovative unsupervised domain adaptation approach that notably enhances the performance of state-of-the-art road scene segmenters across varied urban landscapes. The methodology is predicated on adversarial learning techniques coupled with the utilization of temporal data from Google Street View.

Overcoming Domain Bias in Semantic Segmentation

Semantic segmentation has witnessed significant progress in recent years, particularly with the advent of deep learning and convolutional neural networks (CNNs). Yet, a critical impediment to their deployment across diverse geographical locales is the inherent domain bias, which degrades performance when models are applied to new cities absent in the training data.

The paper commences with empirical evidence highlighting this bias by demonstrating pronounced drops in mean intersection over union (mIoU) when a segmenter trained on the Cityscapes dataset is tested on images from cities like Rome, Rio, Tokyo, and Taipei. This awareness sets the stage for the proposed unsupervised adaptation framework that targets cross-city deployment challenges without the costly requirement of city-specific annotated datasets.

Methodological Advancements

Central to the authors' approach is the use of domain adversarial neural networks (DANN). The framework incorporates both global and class-wise domain adaptation, moving beyond just aligning global features across domains to refining class-specific feature spaces. Such an arrangement is crucial due to the diverse composition of urban elements across cities.

A salient feature of the proposed framework is the integration of static-object priors, derived from Google Street View's time-machine capability. By examining images of the same locations at different times, the framework harnesses temporal information to infer static-object regions, like buildings and roads, without human intervention. This inference aids in stabilizing the pseudo-labels assigned during the adaptation process, thereby refining the model's predictions.

The paper entails unsupervised adversarial learning of global and class-wise discriminators. For each class, a particular discriminator is trained to minimize the distributional discrepancies between the source and target domains, effectively allowing the feature extractor to learn representations that are invariant to domain shifts.

Noteworthy Results and Implications

The empirical results presented in the paper are compelling. The proposed unsupervised method achieves performance close to a fully-supervised model with ground truth from the target domain, reflecting the robustness and efficacy of the adversarial adaptation strategy. The static-object priors integration provides a nuanced edge in boosting segmentation quality, affording the approach further credence.

The application of this framework across synthetic to real image datasets underscores its versatility. Specifically, adapting from the synthetic SYNTHIA to the real-world Cityscapes dataset, the framework shows significant improvements in segmentation performance, affirming the potential for broader deployment across diverse sectors in intelligent transportation systems.

Future Directions

The implications of this research extend into more scalable AI systems capable of operating effectively in unstructured environments worldwide. Future work could explore unsupervised adaptation in other domains beyond road scenes, such as indoor environments or any field where annotated data acquisition poses challenges. Additionally, enriching the model with self-supervised learning techniques might further mitigate the need for pseudo-labels and annotation biases.

In summary, this work lays a strong foundation for adapting road scene segmenters across geographically diverse contexts, eschewing the laborious task of manual annotation while retaining high accuracy and consistency in complex environment interpretation. The demonstrated success of their framework portrays a promising trajectory towards a more adaptive and resilient artificial intelligence paradigm in dynamic real-world applications.

PDF Markdown