Unsupervised Domain Adaptation with Residual Transfer Networks (1602.04433v2)

Published 14 Feb 2016 in cs.LG

Abstract: The recent success of deep neural networks relies on massive amounts of labeled data. For a target task where labeled data is unavailable, domain adaptation can transfer a learner from a different source domain. In this paper, we propose a new approach to domain adaptation in deep networks that can jointly learn adaptive classifiers and transferable features from labeled data in the source domain and unlabeled data in the target domain. We relax a shared-classifier assumption made by previous methods and assume that the source classifier and target classifier differ by a residual function. We enable classifier adaptation by plugging several layers into deep network to explicitly learn the residual function with reference to the target classifier. We fuse features of multiple layers with tensor product and embed them into reproducing kernel Hilbert spaces to match distributions for feature adaptation. The adaptation can be achieved in most feed-forward models by extending them with new residual layers and loss functions, which can be trained efficiently via back-propagation. Empirical evidence shows that the new approach outperforms state of the art methods on standard domain adaptation benchmarks.

Authors (4)

Mingsheng Long (110 papers)
Han Zhu (50 papers)
Jianmin Wang (119 papers)
Michael I. Jordan (438 papers)

Citations (1,439)

View on Semantic Scholar

Summary

The paper introduces Residual Transfer Networks that bridge the gap between source and target classifiers through small residual functions.
It integrates a tensor-based MMD penalty to align multi-layer feature distributions, enhancing cross-domain transfer.
Empirical results on Office-31 and Office-Caltech datasets reveal that RTNs outperform methods like DAN and RevGrad in accuracy.

Unsupervised Domain Adaptation with Residual Transfer Networks

In the paper "Unsupervised Domain Adaptation with Residual Transfer Networks", Long et al. address the challenge of transferring learning models between different domains without labeled data in the target domain. The authors propose an innovative approach that integrates both feature and classifier adaptation within a unified deep learning framework.

Motivation and Background

Deep learning models have demonstrated impressive performance across various machine learning tasks. However, their reliance on massive amounts of labeled data for training remains a significant limitation. Domain adaptation offers a means to leverage labeled data from a different but related source domain to train models for a target domain where labeled data is scarce or unavailable. Traditional domain adaptation techniques typically focus on aligning feature representations between domains, often neglecting potential discrepancies in classifiers.

Recent studies have introduced deep networks capable of learning transferable features, thereby improving domain adaptation. Yet, these methods often assume that the classifier trained on the source domain can be directly applied to the target domain—a presumption that may not hold in practice. The authors propose Residual Transfer Networks (RTNs) to address this limitation by explicitly allowing the source and target classifiers to differ through a residual function.

Methodology

The authors introduce a novel deep neural network architecture that jointly learns adaptive classifiers and transferable features. The haLLMark of their approach lies in relaxing the shared-classifier assumption and introducing residual functions to capture the difference between source and target classifiers. The primary components of their methodology include:

Feature Adaptation: The approach employs Maximum Mean Discrepancy (MMD) to align feature distributions between the source and target domains. Unlike previous methods that use multiple MMD penalties for different layers, the authors propose a single tensor MMD penalty by fusing features across multiple layers.
Classifier Adaptation: The key innovation is the incorporation of residual layers to model the difference between source and target classifiers. The classifier mismatch is addressed by assuming the source and target classifiers differ by a small residual function, trained with reference to the target classifier. This is implemented through additional fully connected layers and an entropy minimization principle to ensure low-density separation between classes in the target domain.
Residual Learning Framework: Inspired by the success of deep residual networks, the architecture connects the source classifier to the target classifier by a residual block, thereby learning the small perturbation that bridges the two.

Results

The empirical validation of the proposed RTN model against state-of-the-art domain adaptation benchmarks shows significant performance improvements. The dataset includes the Office-31 and Office-Caltech collections, encompassing various transfer tasks. Notable results include:

RTN consistently outperformed existing methods like DAN and RevGrad across diverse transfer tasks, demonstrating superior accuracy particularly in challenging domain shifts.
The incorporation of classifier adaptation through residual learning provided measurable advantages over previous approaches which relied solely on feature adaptation.

Implications

The introduction of residual transfer networks has both practical and theoretical implications. Practically, the ability to learn adaptive classifiers in conjunction with transferable features enhances the robustness of domain adaptation techniques, making them more applicable to real-world scenarios where source and target domains may exhibit substantial discrepancies. Theoretically, the integration of residual functions within a deep learning framework offers a novel perspective on mitigating domain shift, paving the way for further research on combining various adaptation strategies to improve model generalization across domains.

Future Work

The paper opens several avenues for future research. Extending the framework to semi-supervised domain adaptation, where some labeled data is available in the target domain, could further exploit the potential of adaptive classifiers. Exploring more sophisticated methods for entropy minimization and integrating other forms of regularization might refine the adaptation process. Additionally, applying the RTN framework to other domains, such as language or cross-modal adaptation, could test its versatility and uncover new challenges and opportunities.

In conclusion, Long et al.'s work on Residual Transfer Networks represents a significant advancement in the field of domain adaptation, offering a comprehensive methodology that addresses both feature and classifier adaptation within a deep learning context. Their results showcase the efficacy of this approach, setting the stage for future innovations and applications in unsupervised domain adaptation.

PDF Markdown