Unsupervised Domain Adaptation via Structured Prediction Based Selective Pseudo-Labeling (1911.07982v1)

Published 18 Nov 2019 in cs.LG, cs.CV, cs.MM, and stat.ML

Abstract: Unsupervised domain adaptation aims to address the problem of classifying unlabeled samples from the target domain whilst labeled samples are only available from the source domain and the data distributions are different in these two domains. As a result, classifiers trained from labeled samples in the source domain suffer from significant performance drop when directly applied to the samples from the target domain. To address this issue, different approaches have been proposed to learn domain-invariant features or domain-specific classifiers. In either case, the lack of labeled samples in the target domain can be an issue which is usually overcome by pseudo-labeling. Inaccurate pseudo-labeling, however, could result in catastrophic error accumulation during learning. In this paper, we propose a novel selective pseudo-labeling strategy based on structured prediction. The idea of structured prediction is inspired by the fact that samples in the target domain are well clustered within the deep feature space so that unsupervised clustering analysis can be used to facilitate accurate pseudo-labeling. Experimental results on four datasets (i.e. Office-Caltech, Office31, ImageCLEF-DA and Office-Home) validate our approach outperforms contemporary state-of-the-art methods.

PDF Abstract

Unsupervised Domain Adaptation via Structured Prediction Based Selective Pseudo-Labeling

In the field of machine learning, one of the persistent challenges is to adapt models trained on one domain (source domain) to perform well on a different, yet related domain (target domain) without any labeled data from the target domain. This challenge is known as Unsupervised Domain Adaptation (UDA). The paper "Unsupervised Domain Adaptation via Structured Prediction Based Selective Pseudo-Labeling" by Qian Wang and Toby P. Breckon tackles this problem by proposing a novel method that leverages structured prediction and selective pseudo-labeling to enhance the accuracy of domain adaptation.

The core issue in UDA is the discrepancy in data distributions between the source and target domains, which can significantly degrade the performance of models when applied to target domain data. The proposed method addresses this by introducing a selective pseudo-labeling strategy that employs structured prediction to explore and capture the inherent structure within the target domain. This approach is predicated on the assumption that target domain samples are inherently clustered in the feature space, and unsupervised clustering can aid in precise pseudo-labeling.

The methodology involves learning a domain-invariant subspace through Supervised Locality Preserving Projection (SLPP), where both source and pseudo-labeled target samples are used. Selective pseudo-labeling is executed iteratively; each iteration enhances the quality of pseudo-labels by refining them through a process that utilizes clustering (via $K$ -means) and structured prediction matching between clusters and source domain classes. Notably, the algorithm includes a sample selection strategy that mitigates the risk of error accumulation by avoiding the inclusion of low-confidence pseudo-labels in subsequent training iterations.

The paper substantiates the efficacy of this approach with experimental results across four benchmark datasets: Office-Caltech, Office31, ImageCLEF-DA, and Office-Home. These datasets serve as standard benchmarks within the domain adaptation community, featuring domains with varied visual characteristics. The proposed method demonstrates superior performance over existing state-of-the-art UDA methods, including both traditional feature-based and deep learning-based models, achieving higher classification accuracy across all datasets.

The numerical results indicate that selective pseudo-labeling with structured prediction consistently leads to improved classification accuracy. For instance, on the Office-Caltech dataset, the method achieves an average accuracy of 93.0%, surpassing other leading approaches such as MEDA and CAPLS. Similarly, notable improvements are observed in the results on Office31 and ImageCLEF-DA datasets, reflecting the robustness of the approach even with deep models as comparators.

From a theoretical standpoint, this research contributes significantly by marrying the concept of structured prediction with pseudo-labeling in the context of UDA, offering a more refined approach for aligning conditional distributions between domains. Practically, it opens avenues for more effective model adaptation in real-world applications where labeled data in the target domain is scarce or unavailable.

Looking forward, the implications of this work could extend to more diverse domains and larger datasets, potentially involving more complex clustering algorithms and deeper integration with end-to-end domain adaptation frameworks. This could further broaden the applicability and impact of unsupervised domain adaptation techniques in various AI-driven sectors. Consequently, future research may explore the scalability of this approach in scenarios involving high-dimensional data or dynamic evolving domains.

Overall, "Unsupervised Domain Adaptation via Structured Prediction Based Selective Pseudo-Labeling" makes a substantial contribution to advancing domain adaptation methodologies by addressing key challenges of distribution shifts between domains and enhancing learning through intelligent pseudo-labeling.

PDF Markdown Bookmark Chat (Pro)

Authors (2)

Qian Wang (453 papers)
Toby P. Breckon (73 papers)

Citations (195)

View on Semantic Scholar

Unsupervised Domain Adaptation via Structured Prediction Based Selective Pseudo-Labeling (1911.07982v1)

Unsupervised Domain Adaptation via Structured Prediction Based Selective Pseudo-Labeling

Related Papers