Curriculum Domain Adaptation for Semantic Segmentation of Urban Scenes (1707.09465v5)

Published 29 Jul 2017 in cs.CV and cs.LG

Abstract: During the last half decade, convolutional neural networks (CNNs) have triumphed over semantic segmentation, which is one of the core tasks in many applications such as autonomous driving. However, to train CNNs requires a considerable amount of data, which is difficult to collect and laborious to annotate. Recent advances in computer graphics make it possible to train CNNs on photo-realistic synthetic imagery with computer-generated annotations. Despite this, the domain mismatch between the real images and the synthetic data cripples the models' performance. Hence, we propose a curriculum-style learning approach to minimize the domain gap in urban scenery semantic segmentation. The curriculum domain adaptation solves easy tasks first to infer necessary properties about the target domain; in particular, the first task is to learn global label distributions over images and local distributions over landmark superpixels. These are easy to estimate because images of urban scenes have strong idiosyncrasies (e.g., the size and spatial relations of buildings, streets, cars, etc.). We then train a segmentation network while regularizing its predictions in the target domain to follow those inferred properties. In experiments, our method outperforms the baselines on two datasets and two backbone networks. We also report extensive ablation studies about our approach.

Authors (3)

Yang Zhang (1129 papers)
Philip David (3 papers)
Boqing Gong (100 papers)

Citations (314)

View on Semantic Scholar

Summary

Curriculum Domain Adaptation for Semantic Segmentation of Urban Scenes

This paper presents a novel approach to address the domain adaptation challenge in semantic segmentation of urban scenes, specifically targeting the discrepancies between synthetic and real-world imagery. The authors propose a curriculum domain adaptation strategy that starts with simpler tasks to derive necessary properties of the target domain before addressing the principal task of pixel-wise prediction. This technique leverages the structural idiosyncrasies of urban environments to enhance segmentation performance on real-world data.

Overview

With the rise of convolutional neural networks (CNNs) in tasks such as semantic segmentation, there is a substantial dependency on large datasets with high-quality annotations. However, the collection and annotation of such datasets, especially in diverse environments like urban scenes, is cumbersome and labor-intensive. This paper tackles this issue by using curriculum-style learning to bridge the domain gap between synthetic data, which are easier to generate, and real-world data.

Methodology

The proposed method initiates by learning "easy" tasks that are less affected by domain discrepancies. It involves learning global label distributions and local landmark superpixels to estimate properties of the target domain. Global label distributions capture the overall category proportions within an image, while landmark superpixels provide localized information about certain salient features that remain consistent across domains. These learned properties guide the training of semantic segmentation networks on real-world data even in the absence of pixel-wise annotations.

The adaptation strategy refines the network by enforcing its predictions to adhere to these learned distributions, effectively reducing the reliance on domain-invariant features. This approach circumvents the assumption that both domains share a common prediction function in a transformed feature space, a challenge in structured prediction problems such as semantic segmentation.

Experimental Results

The experiments demonstrate the efficacy of the proposed curriculum domain adaptation method against existing baselines. The authors benchmark their method on the Cityscapes dataset, with the SYNTHIA dataset serving as the source domain. The results reveal significant performance improvements over baselines that do not utilize domain adaptation and an existing state-of-the-art domain adaptation method. This substantiates the effectiveness of using inferred target properties to regularize network training, specifically in urban scene segmentation.

Implications and Future Work

This approach offers a compelling solution to the challenges posed by domain adaptation in semantic segmentation tasks, particularly when applied to environments requiring detailed and structured predictions. The incorporation of curriculum learning frameworks opens avenues for further research in adapting deep learning models to datasets exhibiting high domain variance.

Looking forward, future research could explore additional "easy" tasks within the curriculum framework and refine the methods for estimating target domain properties. Furthermore, investigating the application of this approach to other structured prediction problems beyond semantic segmentation holds promise for advancing domain adaptation research.

In summary, this paper contributes to the field by providing an innovative approach to domain adaptation that leverages domain-specific structures, enhancing semantic segmentation performance on real-world datasets despite originating from synthetic sources.

PDF Markdown

Related Papers

Find Related Papers