Joint Distribution Optimal Transportation for Domain Adaptation (1705.08848v2)

Published 24 May 2017 in stat.ML and cs.LG

Abstract: This paper deals with the unsupervised domain adaptation problem, where one wants to estimate a prediction function $f$ in a given target domain without any labeled sample by exploiting the knowledge available from a source domain where labels are known. Our work makes the following assumption: there exists a non-linear transformation between the joint feature/label space distributions of the two domain $\mathcal{P}_s$ and $\mathcal{P}_t$. We propose a solution of this problem with optimal transport, that allows to recover an estimated target $\mathcal{P}^{f_t=(X,f(X))$} by optimizing simultaneously the optimal coupling and $f$. We show that our method corresponds to the minimization of a bound on the target error, and provide an efficient algorithmic solution, for which convergence is proved. The versatility of our approach, both in terms of class of hypothesis or loss functions is demonstrated with real world classification and regression problems, for which we reach or surpass state-of-the-art results.

Citations (515)

View on Semantic Scholar

Summary

The paper introduces JDOT, a method that integrates optimal transport within a nonlinear transformation to align joint feature-label distributions for domain adaptation.
It proposes an efficient algorithm with proven convergence and demonstrates strong performance across classification and regression tasks.
The work introduces Probabilistic Transfer Lipschitzness, offering new theoretical insights and paving the way for future improvements in transfer learning.

Overview of "Joint Distribution Optimal Transportation for Domain Adaptation"

Introduction

The paper explores unsupervised domain adaptation (UDA), a process where a prediction function is estimated in a target domain without labeled samples, drawing on knowledge from a source domain with labels. The authors propose addressing this challenge by assuming a nonlinear transformation exists between the joint feature/label space distributions of the domains. This transformation can be estimated using optimal transport (OT), an approach grounded in minimizing the discrepancy between distributions.

Key Contributions

Joint Distribution Optimal Transport (JDOT): The method seeks an optimal coupling between source and target distributions by leveraging OT within a nonlinear transformation framework. JDOT simultaneously optimizes for this coupling and the prediction function. This approach is theoretically robust as it minimizes a bound on target error.
Algorithmic Framework: An efficient algorithm implementing JDOT is proposed, which demonstrates convergence. The approach’s versatility is showcased through successful applications in classification and regression tasks.

Fundamental Concepts

Domain Adaptation: This technique enables knowledge transfer by transforming feature spaces to mitigate distribution discrepancies between domains.
Optimal Transport: OT matches probability distributions by minimizing transportation cost, a concept leveraged to align both feature space and label distributions between domains.
Probabilistic Transfer Lipschitzness (PTL): A novel concept introduced to quantify the likelihood of consistent labeling over small distances across domain instances, a foundational assumption for JDOT.

Experimental Validation

Numerical experiments across visual and textual domains confirm JDOT reaches or exceeds state-of-the-art performance, demonstrating its efficacy in real-world classification and regression scenarios. The method is particularly noted for its adaptability to different problem types by integrating various loss functions and hypothesis classes.

Implications and Future Directions

Practical Implications: JDOT offers a robust domain adaptation technique applicable across diverse datasets and problem types, enhancing transfer learning efficacy by combining theoretical foundations with practical efficacy.
Theoretical Implications: The introduction of PTL broadens the understanding of Lipschitz properties in adaptation contexts, providing a basis for further exploration of minimal transport plans and adaptation effectiveness.
Future Work: Potential research areas include extending JDOT to semi-supervised contexts, improving computational efficiency through stochastic methods, and exploring the probabilistic foundations of PTL more deeply.

In summary, the paper presents a principled approach to domain adaptation using JDOT, contributing substantial advancements in both theoretical and practical dimensions of UDA. The effects of this work promise continued improvements in handling domain discrepancies across machine learning applications.

PDF Markdown