Domain-invariant Clinical Representation Learning by Bridging Data Distribution Shift across EMR Datasets (2310.07799v3)
Abstract: Emerging diseases present challenges in symptom recognition and timely clinical intervention due to limited available information. An effective prognostic model could assist physicians in making accurate diagnoses and designing personalized treatment plans to prevent adverse outcomes. However, in the early stages of disease emergence, several factors hamper model development: limited data collection, insufficient clinical experience, and privacy and ethical concerns restrict data availability and complicate accurate label assignment. Furthermore, Electronic Medical Record (EMR) data from different diseases or sources often exhibit significant cross-dataset feature misalignment, severely impacting the effectiveness of deep learning models. We present a domain-invariant representation learning method that constructs a transition model between source and target datasets. By constraining the distribution shift of features generated across different domains, we capture domain-invariant features specifically relevant to downstream tasks, developing a unified domain-invariant encoder that achieves better feature representation across various task domains. Experimental results across multiple target tasks demonstrate that our proposed model surpasses competing baseline methods and achieves faster training convergence, particularly when working with limited data. Extensive experiments validate our method's effectiveness in providing more accurate predictions for emerging pandemics and other diseases. Code is publicly available at https://github.com/wang1yuhang/domain_invariant_network.
- Prediction rule for scrub typhus meningoencephalitis in children: emerging disease in north india. Journal of child neurology, 35(12):820–827, 2020.
- Patient subtyping via time-aware lstm networks. In Proceedings of the 23rd ACM SIGKDD international conference on knowledge discovery and data mining, pages 65–74, 2017.
- T. Chai and R. R. Draxler. Root mean square error (rmse) or mean absolute error (mae)? – arguments against avoiding rmse in the literature. Geoscientific Model Development, 7(3):1247–1250, 2014.
- Gate-variants of gated recurrent unit (gru) neural networks. In 2017 IEEE 60th international midwest symposium on circuits and systems (MWSCAS), pages 1597–1600. IEEE, 2017.
- Tom Fawcett. An introduction to roc analysis. Pattern recognition letters, 27(8):861–874, 2006.
- Completing missing prevalence rates for multiple chronic diseases by jointly leveraging both intra- and inter-disease population health data correlations. In Proceedings of the Web Conference 2021, WWW ’21, page 183–193, New York, NY, USA, 2021. Association for Computing Machinery.
- Unsupervised domain adaptation by backpropagation. In International conference on machine learning, pages 1180–1189. PMLR, 2015.
- Camp: Co-attention memory networks for diagnosis prediction in healthcare. In 2019 IEEE International Conference on Data Mining (ICDM), pages 1036–1041, 2019.
- Dr. agent: Clinical predictive model via mimicked second opinions. Journal of the American Medical Informatics Association, 27(7):1084–1091, 2020.
- Stagenet: Stage-aware neural networks for health risk prediction. In Proceedings of The Web Conference 2020, pages 530–540, 2020.
- A comprehensive benchmark for covid-19 predictive modeling using electronic health records in intensive care, 2023.
- Multitask learning and benchmarking with clinical time series data. Scientific data, 6(1):1–18, 2019.
- HMÂ Hospitales. Covid data save lives. https://www.hmhospitales.com/prensa/notas-de-prensa/comunicado-covid-data-save-lives, 2020. Accessed: 2023-09-18.
- Clinical features of patients infected with 2019 novel coronavirus in wuhan, china. The lancet, 395(10223):497–506, 2020.
- Neutrophil-to-lymphocyte ratio predicts critical illness patients with 2019 coronavirus disease in the early stage. Journal of translational medicine, 18(1):1–12, 2020.
- Improving electrocardiogram-based detection of rare genetic heart disease using transfer learning: An application to phospholamban p. arg14del mutation carriers. Computers in Biology and Medicine, 131:104262, 2021.
- Health-atm: A deep architecture for multifaceted patient health record representation and risk prediction. In SDM, 2018.
- Adacare: Explainable clinical health status representation learning via scale-adaptive feature extraction and recalibration. Proceedings of the AAAI Conference on Artificial Intelligence, 34(01):825–832, Apr. 2020.
- Concare: Personalized clinical feature embedding via capturing the healthcare context. Proceedings of the AAAI Conference on Artificial Intelligence, 34(01):833–840, Apr. 2020.
- Distilling knowledge from publicly available online emr data to emerging epidemic for prognosis. In Proceedings of the Web Conference 2021, pages 3558–3568, 2021.
- Mortality prediction with adaptive feature importance recalibration for peritoneal dialysis patients. Patterns, 4(12), 2023.
- Timenet: Pre-trained deep recurrent neural network for time series classification. arXiv preprint arXiv:1706.08838, 2017.
- Meinard Müller. Dynamic time warping. Information retrieval for music and motion, pages 69–84, 2007.
- Source-free domain adaptation with temporal imputation for time series data. In 29th SIGKDD Conference on Knowledge Discovery and Data Mining - Research Track, 2023.
- Early prediction of sepsis from clinical data: the physionet/computing in cardiology challenge 2019. In 2019 Computing in Cardiology (CinC), pages Page–1. IEEE, 2019.
- Healing sample selection bias by source classifier selection. In 2011 IEEE 11th International Conference on Data Mining, pages 577–586. IEEE, 2011.
- Attention is all you need. Advances in neural information processing systems, 30, 2017.
- L Wang. C-reactive protein levels in the early stage of covid-19. Medecine et maladies infectieuses, 50(4):332–334, 2020.
- Predicting progression to septic shock in the emergency department using an externally generalizable machine-learning algorithm. Annals of emergency medicine, 77(4):395–406, 2021.
- Locally informed simulation to predict hospital capacity needs during the covid-19 pandemic. Annals of internal medicine, 173(1):21–28, 2020.
- Multi-source deep domain adaptation with weak supervision for time-series sensor data. In Proceedings of the 26th ACM SIGKDD international conference on knowledge discovery & data mining, pages 1768–1778, 2020.
- An interpretable mortality prediction model for covid-19 patients. Nature machine intelligence, 2(5):283–288, 2020.
- M3care: Learning with missing modalities in multimodal healthcare data. In Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, KDD ’22, page 2418–2428, New York, NY, USA, 2022. Association for Computing Machinery.
- M3fair: Mitigating bias in healthcare data through multi-level and multi-sensitive-attribute reweighting method. arXiv preprint arXiv:2306.04118, 2023.
- Pyehr: A predictive modeling toolkit for electronic health records. https://github.com/yhzhu99/pyehr, 2023.
- Leveraging prototype patient representations with feature-missing-aware calibration to mitigate ehr data sparsity, 2023.
- An improved index for diagnosis and mortality prediction in malignancy-associated hemophagocytic lymphohistiocytosis. Blood, The Journal of the American Society of Hematology, 139(7):1098–1110, 2022.
Sponsor
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.