Deep Visual Domain Adaptation: A Survey (1802.03601v4)

Published 10 Feb 2018 in cs.CV

Abstract: Deep domain adaption has emerged as a new learning technique to address the lack of massive amounts of labeled data. Compared to conventional methods, which learn shared feature subspaces or reuse important source instances with shallow representations, deep domain adaption methods leverage deep networks to learn more transferable representations by embedding domain adaptation in the pipeline of deep learning. There have been comprehensive surveys for shallow domain adaption, but few timely reviews the emerging deep learning based methods. In this paper, we provide a comprehensive survey of deep domain adaptation methods for computer vision applications with four major contributions. First, we present a taxonomy of different deep domain adaption scenarios according to the properties of data that define how two domains are diverged. Second, we summarize deep domain adaption approaches into several categories based on training loss, and analyze and compare briefly the state-of-the-art methods under these categories. Third, we overview the computer vision applications that go beyond image classification, such as face recognition, semantic segmentation and object detection. Fourth, some potential deficiencies of current methods and several future directions are highlighted.

Citations (1,895)

View on Semantic Scholar

Summary

The paper presents a comprehensive taxonomy of deep visual domain adaptation scenarios based on data divergence properties.
The paper categorizes deep domain adaptation approaches into discrepancy-based, adversarial-based, and reconstruction-based methods to clarify their mechanisms.
The paper discusses multi-step adaptation strategies and their applications across image classification, face recognition, object detection, and more, highlighting future research directions.

Deep Visual Domain Adaptation: A Survey

Mei Wang and Weihong Deng's survey paper, "Deep Visual Domain Adaptation: A Survey," offers a comprehensive overview of methodologies in deep domain adaptation (DA) particularly tailored to computer vision applications. This exhaustive examination is timely, capturing the swift advancements and outlining the critical methodologies and challenges in leveraging deep learning for domain adaptation.

Core Contributions

The paper delineates four significant contributions:

Taxonomy of Deep Domain Adaptation Scenarios: The authors classify deep DA scenarios based on data properties that define domain divergence. This classification provides a structured lens through which to examine various DA methodologies.
Categorization of Deep DA Approaches: By focusing on training loss strategies, the survey categorizes deep DA methods into discrepancy-based, adversarial-based, and reconstruction-based approaches.
Analysis of Multi-step DA Methods: The authors paper multi-step adaptation mechanisms, categorizing them into hand-crafted, feature-based, and representation-based methodologies.
Survey of Application Domains: Beyond image classification, the survey extends its analysis to face recognition, object detection, semantic segmentation, and person re-identification, highlighting potential deficiencies and future research directions.

Approaches to Deep Domain Adaptation

One-step Domain Adaptation

One-step DA methods are utilized when the source and target domains have directly related distributions. This section is divided into three primary categories:

Discrepancy-Based Approaches: Techniques such as maximum mean discrepancy (MMD) and correlation alignment (CORAL) aim to align the feature distributions of source and target domains. Architectures like Deep Adaptation Network (DAN) and Joint Adaptation Network (JAN) exemplify these approaches by embedding MMD into convolutional neural networks, thus minimizing domain shift through multiple adaptation layers.
Adversarial-Based Approaches: These methods leverage adversarial training, with models like Domain-Adversarial Neural Network (DANN) and Adversarial Discriminative Domain Adaptation (ADDA) using domain confusion losses to ensure indistinguishable feature representations across domains. Generative models, such as GAN-based CoGAN and pixel-level adaptation, are used to produce synthetic data that align with target distributions.
Reconstruction-Based Approaches: These approaches use encoder-decoder architectures where network layers are optimized for reconstructing input data. Examples include Stacked Denoising Autoencoders (SDA) and Deep Reconstruction Classification Network (DRCN). These models leverage both domain-invariant shared representations and domain-specific reconstruction representations.

Multi-step Domain Adaptation

For tasks requiring intermediate steps between vastly different source and target domains, multi-step DA is employed.

Hand-Crafted Approaches: Intermediate domains are manually selected based on domain expertise. For example, using nighttime light intensity as an intermediate proxy for economic activity in remote sensing applications.
Instance-Based Approaches: Methods like Distant Domain Transfer Learning (DDTL) select certain parts of data from auxiliary datasets to form intermediate domains, progressively minimizing reconstruction errors.
Representation-Based Approaches: Progressive networks utilize lateral connections to features of previously learned networks, freezing these learned representations to avoid the catastrophic forgetting of previously learned knowledge.

Applications in Computer Vision

The applications of deep DA span several key computer vision tasks:

Image Classification: Deep DA techniques like JAN, DAN, and ADDA have shown substantial improvements in image classification tasks on datasets like Office-31, demonstrating the efficacy of reducing domain shifts.
Face Recognition: Techniques such as the SSPP-DAN generate synthetic face images with varying poses and use adversarial training to address variations caused by angles, ethnicity, and sensors.
Object Detection: Methods like Large-Scale Detection through Adaptation (LSDA) and Domain-Adaptive Faster R-CNN leverage weakly labeled data to adapt object detectors across domains.
Semantic Segmentation: FCN-based adaptations, such as those presented by Hoffman et al., utilize global and class-aware adversarial training to enhance semantic segmentation across different urban scenes.
Person Re-identification: Domain adaptation methods like SPGAN translate image styles to preserve self-similarity and domain dissimilarity, aiding in re-identifying persons across different camera feeds.
Image-to-Image Translation: Models like Cycle GAN and pix2pix achieve image-to-image translation by leveraging adversarial losses to align features across domains, demonstrating powerful applications in style transfer and synthetic data generation.

Implications and Future Directions

This survey highlights the practical implications of deep DA in enhancing the robustness of machine learning models across diverse domains. The discuss the limitations of current methodologies, particularly the challenge in heterogeneous DA, where feature spaces differ significantly. Future research should focus on developing more generalized approaches to handle diverse feature spaces and on expanding the application of deep DA beyond the current focus areas, potentially branching into new, uncharted domains.

Conclusion

Wang and Deng’s survey provides an invaluable resource for researchers in domain adaptation, offering a structured categorization and evaluation of contemporary deep DA methods. The survey not only underscores the advancements but also delineates the challenges and future research directions necessary for the maturation of deep domain adaptation techniques.

PDF Markdown