Domain Adaptation Optimized for Robustness in Mixture Populations (2407.20073v2)
Abstract: While domain adaptation methods address data shifts, most assume target populations align with at least one source population, neglecting mixtures that combine sources influenced by factors like demographics. Additional challenges in electronic health record (EHR)-based studies include unobserved outcomes and the need to explain population mixtures using broader clinical characteristics than those in standard risk models. To address these challenges under shifts in both covariate distributions and outcome models, we propose a novel framework: Domain Adaptation Optimized for Robustness in Mixture populations (DORM). Leveraging partially labeled source data, DORM constructs an initial target outcome model under a joint source-mixture assumption. To enhance generalizability to future target populations that may deviate from the joint source-mixture approximation, DORM incorporates a group adversarial learning step to derive a final estimate, optimizing its worst-case performance within a convex uncertainty set built around the initial target model. In addition, this robust domain adaptation procedure is assisted by high-dimensional surrogates that enhance transferability in EHR studies. When a small set of gold-standard or noisy labels is available from the target population, a tuning strategy is implemented to refine the uncertainty set, mitigating conservativeness and further improving performance for the specific target population. Statistical convergence and predictive accuracy of our method are quantified through asymptotic studies. Simulation and real-world studies demonstrate the out-performance of our method over existing approaches.