Learning Domain-Invariant Subspace using Domain Features and Independence Maximization (1603.04535v2)

Published 15 Mar 2016 in cs.CV, cs.AI, and cs.LG

Abstract: Domain adaptation algorithms are useful when the distributions of the training and the test data are different. In this paper, we focus on the problem of instrumental variation and time-varying drift in the field of sensors and measurement, which can be viewed as discrete and continuous distributional change in the feature space. We propose maximum independence domain adaptation (MIDA) and semi-supervised MIDA (SMIDA) to address this problem. Domain features are first defined to describe the background information of a sample, such as the device label and acquisition time. Then, MIDA learns a subspace which has maximum independence with the domain features, so as to reduce the inter-domain discrepancy in distributions. A feature augmentation strategy is also designed to project samples according to their backgrounds so as to improve the adaptation. The proposed algorithms are flexible and fast. Their effectiveness is verified by experiments on synthetic datasets and four real-world ones on sensors, measurement, and computer vision. They can greatly enhance the practicability of sensor systems, as well as extend the application scope of existing domain adaptation algorithms by uniformly handling different kinds of distributional change.

Citations (164)

View on Semantic Scholar

Summary

Learning Domain-Invariant Subspace using Domain Features and Independence Maximization

Domain adaptation remains a crucial challenge in machine learning, especially when there is a discrepancy between the distributions of the training and test datasets. The paper at hand introduces novel techniques, Maximum Independence Domain Adaptation (MIDA) and its semi-supervised form, SMIDA, to address issues of instrumental variation and time-varying drift commonly encountered in sensor and measurement systems. These variations can pose significant obstacles to accurate predictions, as sensor responses may differ not only between different instruments but also over time.

MIDA and SMIDA aim to learn domain-invariant subspaces by using domain features and maximizing independence. Domain features, which encompass intrinsic sample background information such as device label and acquisition time, differentiate the approach from others like TCA and SSA. Utilizing these features, the algorithms apply the Hilbert-Schmidt Independence Criterion (HSIC) to ensure learned features remain independent of domain-specific biases.

The paper demonstrates the efficacy of MIDA and SMIDA through their application in several domains, including synthetic datasets, gas sensor array drift, breath analysis, and spectroscopy datasets. On synthetic datasets, MIDA surpassed TCA by better aligning domains across multiple dimensions while maintaining classification accuracy. In real-world applications, MIDA and SMIDA significantly improved drift correction results in sensor datasets, outperforming established methods such as GFK and SSA. Notably, continuous SMIDA surpassed even cases where target domain-specific models were trained, highlighting its robustness and adaptability.

Furthermore, SMIDA exhibits advantageous flexibility across multiple domains and problem sets (classification and regression). In addition to handling multiple and continuous distribution changes, SMIDA allows for integration of label information to further enrich the learned subspace, proving beneficial in semi-supervised scenarios. This versatility is integral given the variety of practical applications where labeling is sparse or uncertain.

Despite their potent utility in sensor systems, these algorithms hold potential for broader applications in other domains of AI, particularly where domain discrepancies significantly impact performance. Innovations like feature augmentation and domain feature construction offer pathways for addressing complex adaptation scenarios across various fields, notably computer vision as illustrated in the Office+Caltech domain adaptation experiment.

Future research directions may consider extending domain feature definitions to encompass broader contextual information enhancing adaptation capabilities further. Evaluating the algorithms against more diverse datasets could also provide deeper insights into their adaptability and suggest meaningful improvements. Overall, MIDA and SMIDA present a promising stride forward in domain adaptation, facilitating greater applicability and practicality for real-world machine learning challenges.