Learning Texture Invariant Representation for Domain Adaptation of Semantic Segmentation (2003.00867v2)

Published 2 Mar 2020 in cs.CV

Abstract: Since annotating pixel-level labels for semantic segmentation is laborious, leveraging synthetic data is an attractive solution. However, due to the domain gap between synthetic domain and real domain, it is challenging for a model trained with synthetic data to generalize to real data. In this paper, considering the fundamental difference between the two domains as the texture, we propose a method to adapt to the texture of the target domain. First, we diversity the texture of synthetic images using a style transfer algorithm. The various textures of generated images prevent a segmentation model from overfitting to one specific (synthetic) texture. Then, we fine-tune the model with self-training to get direct supervision of the target texture. Our results achieve state-of-the-art performance and we analyze the properties of the model trained on the stylized dataset with extensive experiments.

Authors (2)

Myeongjin Kim (2 papers)
Hyeran Byun (23 papers)

Citations (265)

View on Semantic Scholar

Summary

Learning Texture Invariant Representation for Domain Adaptation of Semantic Segmentation

The paper "Learning Texture Invariant Representation for Domain Adaptation of Semantic Segmentation" addresses the challenge of bridging the domain gap between synthetic and real-world datasets for semantic segmentation tasks, leveraging the use of synthetic data as a cost-effective substitute for costly manual annotations. The primary focus is the recognition and transformation of texture variances between synthetic and real images, which hinder generalized performance when models trained on synthetic data are deployed in the real domain.

Contributions and Methodology

The central contribution of this work is a proposed framework that enhances domain adaptation in semantic segmentation by learning texture-invariant representations. The authors highlight the significance of overcoming the texture gap using a two-stage adaptation process that involves texture diversification and self-training.

Texture Diversification:
- The paper introduces a technique to diversify the texture of synthetic images via a style transfer algorithm. By altering the texture in multiple synthetic images, the aim is to prevent the segmentation models from overfitting to the homogeneous synthetic textures.
- The diverse texture generated serves as a regularization mechanism across the dataset, allowing the model to focus on invariant features that delineate content rather than texture.
Self-Training:
- After achieving texture invariance, the model undergoes a self-training phase where it fine-tunes itself using pseudo-labels extracted from the target domain’s real data.
- This iterative process enables the model to adapt to the finer nuances and textures specific to the target images through direct supervision.

Experimental Evaluation

The proposed method's efficacy is demonstrated on the GTA5 to Cityscapes benchmark, achieving state-of-the-art results. Notably, the approach outperforms previous CycleGAN-based methods both in texture handling and overall segmentation performance. The paper rigorously validates the robustness of their model through numerical evaluations on corrupted datasets, maintaining competitive performance even when confronted with varied noise, underscoring the learned robustness of texture invariance.

Implications and Future Directions

This research underlines the pivotal role of texture in semantic segmentation tasks and offers a scalable, computationally efficient method to tackle the domain shift problem through texture invariance. It sets the groundwork for further exploration into how shape-dependant representations could be exploited in conjunction with texture-invariant methods to extend this domain adaptation strategy to tasks with more stringent shape constraints.

Future research could benefit from exploring more complex generative models for texture variance, adaptive self-training mechanisms that dynamically adjust to pseudo-label uncertainties, and potential applications in other areas of computer vision where synthetic-real domain discrepancies persist.

The findings and methodologies presented in this paper pave a forward path in the domain adaptation paradigm, hinting at broader applications with practical implications across various AI subfields where synthetic data serves as the primary training source.

PDF Markdown