Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
158 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Multi-source domain adaptation for regression (2312.05460v1)

Published 9 Dec 2023 in stat.ML and cs.LG

Abstract: Multi-source domain adaptation (DA) aims at leveraging information from more than one source domain to make predictions in a target domain, where different domains may have different data distributions. Most existing methods for multi-source DA focus on classification problems while there is only limited investigation in the regression settings. In this paper, we fill in this gap through a two-step procedure. First, we extend a flexible single-source DA algorithm for classification through outcome-coarsening to enable its application to regression problems. We then augment our single-source DA algorithm for regression with ensemble learning to achieve multi-source DA. We consider three learning paradigms in the ensemble algorithm, which combines linearly the target-adapted learners trained with each source domain: (i) a multi-source stacking algorithm to obtain the ensemble weights; (ii) a similarity-based weighting where the weights reflect the quality of DA of each target-adapted learner; and (iii) a combination of the stacking and similarity weights. We illustrate the performance of our algorithms with simulations and a data application where the goal is to predict High-density lipoprotein (HDL) cholesterol levels using gut microbiome. We observe a consistent improvement in prediction performance of our multi-source DA algorithm over the routinely used methods in all these scenarios.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (38)
  1. The current and future use of ridge regression for prediction in quantitative genetics. BioMed research international, 2015, 2015.
  2. Gene expression omnibus: Ncbi gene expression and hybridization array data repository. Nucleic acids research, 30(1):207–210, 2002.
  3. A brief review of domain adaptation. Advances in Data Science and Information Engineering: Proceedings from ICDATA 2020 and IKE 2020, pages 877–894, 2021.
  4. The gut microbiome contributes to a substantial proportion of the variation in blood lipids. Circulation research, 117(9):817–824, 2015.
  5. Domain-adversarial training of neural networks. The journal of machine learning research, 17(1):2096–2030, 2016.
  6. A unified view of label shift estimation. Advances in Neural Information Processing Systems, 33:3290–3300, 2020.
  7. Domain adaptation with conditional transferable components. In International conference on machine learning, pages 2839–2848. PMLR, 2016.
  8. Covariate shift by kernel mean matching. Dataset shift in machine learning, 3(4):5, 2009.
  9. Merging versus ensembling in multi-study prediction: Theoretical insight from random effects. arXiv preprint arXiv:1905.07382, 2019.
  10. Multi-source domain adaptation with mixture of experts. arXiv preprint arXiv:1809.02256, 2018.
  11. A predictive index for health status using species-level gut microbiome profiling. Nature communications, 11(1):1–16, 2020.
  12. Frank E Harrell et al. Regression modeling strategies: with applications to linear models, logistic regression, and survival analysis, volume 608. Springer, 2001.
  13. The elements of statistical learning: data mining, inference, and prediction, volume 2. Springer, 2009.
  14. Integrated multi-omics of the human gut microbiome in a case study of familial type 1 diabetes. Nature microbiology, 2(1):1–13, 2016.
  15. Gut metagenome in european women with normal, impaired and diabetic glucose control. Nature, 498(7452):99–103, 2013.
  16. Cholesterol metabolism by uncultured human gut bacteria influences host cholesterol level. Cell host & microbe, 28(2):245–257, 2020.
  17. The intestinal microbiota regulates host cholesterol homeostasis. BMC biology, 17(1):1–18, 2019.
  18. Detecting and correcting for label shift with black box predictors. In International conference on machine learning, pages 3122–3130. PMLR, 2018.
  19. Hierarchical resampling for bagging in multistudy prediction with applications to human neurochemical sensing. The annals of applied statistics, 16(4):2145–2165, 2022.
  20. Genotype score in addition to common risk factors for prediction of type 2 diabetes. New England Journal of Medicine, 359(21):2208–2219, 2008.
  21. The alzheimer’s disease neuroimaging initiative. Neuroimaging Clinics, 15(4):869–877, 2005.
  22. Arrayexpress update—an archive of microarray and high-throughput sequencing-based functional genomics experiments. Nucleic acids research, 39(suppl_1):D1002–D1004, 2010.
  23. Accessible, curated metagenomic data through experimenthub. Nature methods, 14(11):1023–1024, 2017.
  24. Test set bias affects reproducibility of gene signatures. Bioinformatics, 31(14):2318–2323, 2015.
  25. Training replicable predictors in multiple studies. Proceedings of the National Academy of Sciences, 115(11):2578–2583, 2018.
  26. Moment matching for multi-source domain adaptation. In Proceedings of the IEEE/CVF international conference on computer vision, pages 1406–1415, 2019.
  27. A metagenome-wide association study of gut microbiota in type 2 diabetes. Nature, 490(7418):55–60, 2012.
  28. Tree-weighting for multi-study ensemble learners. In Pacific Symposium on Biocomputing 2020, pages 451–462. World Scientific, 2019.
  29. Large-scale meta-analysis of cancer microarray data identifies common transcriptional profiles of neoplastic transformation and progression. Proceedings of the National Academy of Sciences, 101(25):9309–9314, 2004.
  30. Ensemble learning: A survey. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, 8(4):e1249, 2018.
  31. Aggregating from multiple target-shifted sources. In International Conference on Machine Learning, pages 9638–9648. PMLR, 2021.
  32. Pitfalls in the use of dna microarray data for diagnostic and prognostic classification. Journal of the National Cancer Institute, 95(1):14–18, 2003.
  33. Assessment of variation in microbial community amplicon sequencing by the microbiome quality control (mbqc) project consortium. Nature biotechnology, 35(11):1077–1086, 2017.
  34. Domain adaptation with conditional distribution matching and generalized label shift. Advances in Neural Information Processing Systems, 33:19276–19289, 2020.
  35. Domain aggregation networks for multi-source domain adaptation. In International conference on machine learning, pages 10214–10224. PMLR, 2020.
  36. Domain adaptation under target and conditional shift. In International conference on machine learning, pages 819–827. PMLR, 2013.
  37. The impact of different sources of heterogeneity on loss of accuracy from genomic prediction models. Biostatistics, 21(2):253–268, 2020.
  38. Multi-source domain adaptation in the deep learning era: A systematic survey. arXiv preprint arXiv:2002.12169, 2020.
Citations (1)

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com