Image to Image Translation for Domain Adaptation (1712.00479v1)

Published 1 Dec 2017 in cs.CV

Abstract: We propose a general framework for unsupervised domain adaptation, which allows deep neural networks trained on a source domain to be tested on a different target domain without requiring any training annotations in the target domain. This is achieved by adding extra networks and losses that help regularize the features extracted by the backbone encoder network. To this end we propose the novel use of the recently proposed unpaired image-toimage translation framework to constrain the features extracted by the encoder network. Specifically, we require that the features extracted are able to reconstruct the images in both domains. In addition we require that the distribution of features extracted from images in the two domains are indistinguishable. Many recent works can be seen as specific cases of our general framework. We apply our method for domain adaptation between MNIST, USPS, and SVHN datasets, and Amazon, Webcam and DSLR Office datasets in classification tasks, and also between GTA5 and Cityscapes datasets for a segmentation task. We demonstrate state of the art performance on each of these datasets.

Citations (521)

View on Semantic Scholar

Summary

The paper introduces a novel framework that uses unpaired image-to-image translation for unsupervised domain adaptation.
It employs adversarial, identity, and cycle consistency losses to create a shared, domain-agnostic feature space.
The method achieves state-of-the-art results on digit classification, object recognition, and semantic segmentation tasks.

Image to Image Translation for Domain Adaptation

Overview

The paper "Image to Image Translation for Domain Adaptation" presents a framework to enhance unsupervised domain adaptation. The core idea is to leverage deep neural networks trained on a labeled source domain to perform effectively on an unlabeled target domain. This framework introduces a novel approach using unpaired image-to-image translation to regularize and constrain the features extracted by the encoder network. The proposed system amalgamates several existing domain adaptation techniques, using a combination of auxiliary networks and loss functions to ensure that the learned representations are domain-agnostic and discriminatively robust.

Methodology

The framework's central concept is to create a shared latent space, wherein features from the source and target domains are domain-agnostic. The approach consists of several key components:

Domain Agnostic Feature Extraction: Features from both domains should be indistinguishable. This is enforced through adversarial learning with a discriminator network attempting to classify which domain a feature belongs to.
Domain Specific Reconstruction: Features are required to be rich enough to allow reconstruction into source and target domains, thereby preserving core information while discarding domain-specific nuisances.
Cycle Consistency: Ensures that transformations maintain the semantic integrity of images across domains, regularizing potential mappings and preventing collapse into trivial solutions.

The method orchestrates these elements with a combination of losses:

Identity Loss for maintaining essential information from both domains.
Translation and Cycle-consistency Losses to enforce semantic-aware translation between domains.
Classification Loss to impose discriminability on features represented in the latent space.

Experiments and Results

The framework is evaluated on several datasets including MNIST, USPS, SVHN (digit classification), Office datasets (object recognition), and GTA5 to Cityscapes (semantic segmentation). The performance metrics reveal superior results compared to existing approaches:

Digit Classification: Achieved higher accuracy across different domain adaptation tasks, notably improving results on difficult tasks like SVHN → MNIST adaptation.
Office Datasets: Outperformed state-of-the-art methods on domain shifts between Amazon, DSLR, and Webcam image sets.
Driving Scenes Segmentation: Showed significant improvement in Mean Intersection over Union (mIoU) when adapting from synthetic GTA5 images to real-world Cityscapes images.

The framework effectively combines domain adaptation with image-to-image translation, leading to robust cross-domain performance improvements.

Implications and Future Directions

This research has significant implications for tasks where labeled target domain data is scarce or unavailable. It demonstrates that integrating domain adaptation with generative models can mitigate domain shift while preserving essential semantic content. The framework sets the stage for future exploration into more complex domain adaptation scenarios, including semi-supervised and unsupervised tasks, with possible extensions to video domain adaptation and more intricate network architectures. Future work may involve refining the translation capabilities and exploring novel discriminative mechanisms to further enhance domain robustness. The ongoing evolution of encoder-decoder architectures could also be instrumental in applying this adaptation methodology effectively across broader applications in AI and computer vision.

PDF Markdown