Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Self-Ensembling with GAN-based Data Augmentation for Domain Adaptation in Semantic Segmentation (1909.00589v1)

Published 2 Sep 2019 in cs.CV

Abstract: Deep learning-based semantic segmentation methods have an intrinsic limitation that training a model requires a large amount of data with pixel-level annotations. To address this challenging issue, many researchers give attention to unsupervised domain adaptation for semantic segmentation. Unsupervised domain adaptation seeks to adapt the model trained on the source domain to the target domain. In this paper, we introduce a self-ensembling technique, one of the successful methods for domain adaptation in classification. However, applying self-ensembling to semantic segmentation is very difficult because heavily-tuned manual data augmentation used in self-ensembling is not useful to reduce the large domain gap in the semantic segmentation. To overcome this limitation, we propose a novel framework consisting of two components, which are complementary to each other. First, we present a data augmentation method based on Generative Adversarial Networks (GANs), which is computationally efficient and effective to facilitate domain alignment. Given those augmented images, we apply self-ensembling to enhance the performance of the segmentation network on the target domain. The proposed method outperforms state-of-the-art semantic segmentation methods on unsupervised domain adaptation benchmarks.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (3)
  1. Jaehoon Choi (20 papers)
  2. Taekyung Kim (41 papers)
  3. Changick Kim (75 papers)
Citations (236)

Summary

Self-Ensembling with GAN-based Data Augmentation for Domain Adaptation in Semantic Segmentation

The paper "Self-Ensembling with GAN-based Data Augmentation for Domain Adaptation in Semantic Segmentation" addresses a significant challenge in deep learning-based semantic segmentation—the requirement of extensive labeled datasets for training. Given the prohibitive costs associated with generating large annotated datasets, the focus has shifted towards unsupervised domain adaptation, which seeks to adapt models trained on synthetic data (source domain) to perform effectively on real-world data (target domain).

Core Contributions

The authors introduce a novel framework combining a self-ensembling technique with GAN-based data augmentation to tackle the domain shift in semantic segmentation. The combination of these methodologies aims to align the source and target domain distributions more effectively compared to traditional approaches.

  1. GAN-based Data Augmentation: The paper proposes a data augmentation method using Generative Adversarial Networks (GANs) that generates augmented images with semantic consistency maintained through global and local structures. This method seeks to overcome the limitations of geometric transformations typically used in self-ensembling, which are unsuitable for reducing domain discrepancies in semantic segmentation.
  2. Self-Ensembling: This approach involves a teacher-student network paradigm. The teacher network serves as an ensemble of the student's weights and provides pseudo-labels for unlabeled target data, compelling the student to produce consistent predictions as the target data observes domain shift reductions.
  3. Integration within a Unified Framework: The paper builds a cohesive framework that integrates the proposed GAN-based augmentation with self-ensembling, showing improved performance for unsupervised domain adaptation in semantic segmentation tasks.

Experimental Results

In experiments with datasets like GTA5 and SYNTHIA as the source, and Cityscapes as the target, the proposed method achieved a significant improvement in mIoU scores compared to baseline models and other state-of-the-art approaches. Specifically, the paper reports mIoU improvements by 14.2% on the GTA5 to Cityscapes adaptation and 13.1% on the SYNTHIA to Cityscapes adaptation, thus validating the efficacy of the proposed framework.

Implications and Future Directions

The implications of this research are profound for autonomous systems and other domains where pixel-level annotation is costly or unavailable. The integration of GANs for data augmentation coupled with self-ensembling offers a robust route for models to generalize across varying domains without manual intervention.

In future work, the exploration of more refined GAN architectures and additional constraints might further enhance domain alignment. Moreover, adapting this approach to other vision tasks could demonstrate the versatility and efficacy of self-ensembling with GAN-based augmentation. Exploring unsupervised domain adaptation in three-dimensional semantic segmentation and multi-modal transfer learning also offers promising avenues for extending this research.

This contribution underscores the potential of utilizing generative models and ensemble learning to bridge the synthetic-real domain gap, paving the way for more practical and effective machine learning models in scenarios where data annotation remains a bottleneck.