Deep Transfer Learning with Joint Adaptation Networks

Published 21 May 2016 in cs.LG and stat.ML | (1605.06636v2)

Abstract: Deep networks have been successfully applied to learn transferable features for adapting models from a source domain to a different target domain. In this paper, we present joint adaptation networks (JAN), which learn a transfer network by aligning the joint distributions of multiple domain-specific layers across domains based on a joint maximum mean discrepancy (JMMD) criterion. Adversarial training strategy is adopted to maximize JMMD such that the distributions of the source and target domains are made more distinguishable. Learning can be performed by stochastic gradient descent with the gradients computed by back-propagation in linear-time. Experiments testify that our model yields state of the art results on standard datasets.

Abstract PDF Upgrade to Chat

Authors (4)

Citations (2,322)

View on Semantic Scholar

Summary

The paper introduces Joint Adaptation Networks that leverage a Joint Maximum Mean Discrepancy criterion to align joint distributions between source and target domains.
It employs an adversarial training strategy to optimize the JMMD measure, enhancing transferable representations in deep models like AlexNet and ResNet.
Experimental results demonstrate JAN outperforms methods like DAN and RevGrad, achieving up to 85.8% accuracy on datasets such as Office-31 and ImageCLEF-DA.

Deep Transfer Learning with Joint Adaptation Networks

The paper "Deep Transfer Learning with Joint Adaptation Networks" by Mingsheng Long, Han Zhu, Jianmin Wang, and Michael I. Jordan proposes an innovative approach to addressing the challenge of unsupervised domain adaptation by introducing Joint Adaptation Networks (JAN). The JAN methodology extends the capabilities of conventional deep learning frameworks to adapt more effectively to shifts in joint distributions between source and target domains.

Key Contributions

Joint Adaptation Networks (JAN): The paper introduces Joint Adaptation Networks, which align the joint distributions of network activations from multiple domain-specific layers across source and target domains. The alignment is based on a Joint Maximum Mean Discrepancy (JMMD) criterion.
Joint Maximum Mean Discrepancy (JMMD): A novel criterion named JMMD is derived to measure the discrepancy between joint distributions. This technique leverages Hilbert space embeddings of distributions to facilitate the alignment in a high-dimensional space, minimizing the domain discrepancy effectively. JMMD scales linearly with sample size, making it suitable for large datasets.
Adversarial Training Strategy: The JAN framework incorporates an adversarial training strategy to optimize the JMMD criterion. This allows for a richer function class, improving the distinguishability between source and target distributions, which enhances the model’s ability to learn transferable representations.

Experimental Evaluation

The empirical evaluation demonstrated the efficacy of JAN over a range of benchmark datasets such as Office-31 and ImageCLEF-DA, utilizing deep neural networks like AlexNet and ResNet. The experiments tested the models across several domain adaptation tasks, evaluating their performance in aligning source and target domains with varying degrees of similarity.

Numerical Results

Across multiple transfer tasks, JAN consistently outperformed competing methods such as Transfer Component Analysis (TCA), Geodesic Flow Kernel (GFK), Domain-Adversarial Neural Networks (RevGrad), and Deep Adaptation Networks (DAN). For instance, on the Office-31 dataset using ResNet, JAN-A achieved an average accuracy of 84.6%, surpassing the nearest competitor’s 82.2%. Furthermore, on the ImageCLEF-DA dataset, JAN reached 85.8% accuracy, demonstrating the method's robustness in diverse settings.

Theoretical and Practical Implications

Theoretically, the introduction of JMMD marks a significant advancement in the domain adaptation landscape by systematically addressing the complex interactions within joint distributions that conventional methods overlook. The innovation reflects a deeper understanding of the nature of domain shifts, rendering the model more adaptable and generalizable.

Practically, the robust performance of JAN across various benchmarks underscores its potential in real-world applications where labeled data is scarce or costly to obtain. The linear scalability of JMMD ensures feasibility in large-scale applications, making JAN highly practical for deployment in industrial and research environments.

Future Developments

Future developments could explore the integration of JAN frameworks with other deep learning architectures to further enhance transferability. Additionally, expanding JAN to semi-supervised and fully supervised scenarios could provide even more utility across different machine learning tasks. The exploration of more sophisticated adversarial strategies and kernel choices for JMMD could further refine the adaptation process.

In conclusion, the paper significantly advances the field of deep transfer learning by introducing Joint Adaptation Networks. It presents robust experimental validation and offers extensive insights into the potential applications and future trajectory of domain adaptation methodologies. The methodological ingenuity and empirical success of JAN make it a pivotal reference in the ongoing development of transfer learning technologies.

Markdown Report Issue