Asymmetric Tri-training for Unsupervised Domain Adaptation (1702.08400v3)

Published 27 Feb 2017 in cs.CV and cs.AI

Abstract: Deep-layered models trained on a large number of labeled samples boost the accuracy of many tasks. It is important to apply such models to different domains because collecting many labeled samples in various domains is expensive. In unsupervised domain adaptation, one needs to train a classifier that works well on a target domain when provided with labeled source samples and unlabeled target samples. Although many methods aim to match the distributions of source and target samples, simply matching the distribution cannot ensure accuracy on the target domain. To learn discriminative representations for the target domain, we assume that artificially labeling target samples can result in a good representation. Tri-training leverages three classifiers equally to give pseudo-labels to unlabeled samples, but the method does not assume labeling samples generated from a different domain.In this paper, we propose an asymmetric tri-training method for unsupervised domain adaptation, where we assign pseudo-labels to unlabeled samples and train neural networks as if they are true labels. In our work, we use three networks asymmetrically. By asymmetric, we mean that two networks are used to label unlabeled target samples and one network is trained by the samples to obtain target-discriminative representations. We evaluate our method on digit recognition and sentiment analysis datasets. Our proposed method achieves state-of-the-art performance on the benchmark digit recognition datasets of domain adaptation.

Authors (3)

Kuniaki Saito (31 papers)
Yoshitaka Ushiku (52 papers)
Tatsuya Harada (142 papers)

Citations (566)

View on Semantic Scholar

Summary

The paper introduces an asymmetric tri-training method where two networks generate pseudo-labels and a third network learns target-specific features.
The approach employs batch normalization and weight constraints to ensure robust feature learning and mitigate label noise.
Experimental results demonstrate state-of-the-art performance with over a 10% gain in MNIST to SVHN adaptation.

Asymmetric Tri-training for Unsupervised Domain Adaptation

The paper "Asymmetric Tri-training for Unsupervised Domain Adaptation" introduces an innovative method for enhancing unsupervised domain adaptation using deep neural networks. The core idea involves the asymmetric utilization of a tri-training mechanism to assign pseudo-labels to unlabeled target samples, which aids in learning discriminative features for the target domain.

Motivation and Approach

Deep neural networks, particularly CNNs, have shown remarkable performance when trained on large datasets with labeled samples. However, their ability to generalize to different domains remains limited unless appropriately adapted, as data distributions vary significantly across domains. Traditional approaches to unsupervised domain adaptation often focus on aligning feature distributions between source and target domains but encounter challenges when the aligned features lack discriminative power for the target domain.

To address these challenges, the authors propose an asymmetric tri-training method. This approach leverages three neural networks with distinct roles: two networks are designated to assign pseudo-labels to the unlabeled target samples, while a third network uses these pseudo-labels to learn target-discriminative representations. The asymmetry is a key departure from traditional tri-training, where networks typically share equal roles in labeling tasks.

Methodology

The proposed method involves the following key components:

Pseudo-label Assignment: Two of the networks generate pseudo-labels for target samples if they agree on the predictions with high confidence. This dual-network agreement mitigates the risk of noisy labels.
Training with Pseudo-labels: The third network, specialized for the target domain, utilizes these pseudo-labels for its training, while feedback from all networks informs a shared feature extractor network.
Batch Normalization and Weight Constraints: To further align the learning process with domain-specific characteristics, batch normalization is incorporated, along with weight constraints to ensure diverse input perspectives from each labeling network.

Experimental Evaluation

The method was evaluated on several datasets, including digit recognition datasets (MNIST, SVHN, and others) and sentiment analysis tasks on the Amazon Reviews dataset. The results demonstrated state-of-the-art performance, particularly highlighting significant gains in the MNIST to SVHN adaptation scenario, where the performance improved by over 10% compared to existing methods.

The authors also conducted thorough analyses, such as comparing the performance of each network within the tri-training framework and examining the impact of batch normalization and weight constraints. The experimental setup confirmed the model's ability to learn effective and discriminative domain-specific representations without requiring special implementations.

Theoretical Insights

The paper extends existing theoretical frameworks on domain adaptation by focusing on a balance between minimizing domain divergence and accounting for potential labeling errors in pseudo-labels. The inclusion of pseudo-labels in training is argued to approximate entropy regularization effects, promoting a low-density separation.

Implications and Future Directions

This paper contributes to both theoretical and practical advancements in unsupervised domain adaptation, showcasing how asymmetric structures can effectively exploit pseudo-labels. The methodology has potential implications for applications where obtaining labeled data is challenging but domain differentiation is crucial.

Future research may explore refined pseudo-labeling strategies or integrate adaptive threshold mechanisms to optimize label confidence decisions. Additionally, extending this framework to other modalities and more complex domains could further validate its versatility and robustness.

In summary, the paper offers a substantive addition to domain adaptation techniques, with a thoughtful integration of tri-training asymmetry to facilitate more robust, domain-invariant representations, and provides a foundation for future explorations into unsupervised learning methodologies.