MOTOR: A Time-To-Event Foundation Model For Structured Medical Records (2301.03150v4)
Abstract: We present a self-supervised, time-to-event (TTE) foundation model called MOTOR (Many Outcome Time Oriented Representations) which is pretrained on timestamped sequences of events in electronic health records (EHR) and health insurance claims. TTE models are used for estimating the probability distribution of the time until a specific event occurs, which is an important task in medical settings. TTE models provide many advantages over classification using fixed time horizons, including naturally handling censored observations, but are challenging to train with limited labeled data. MOTOR addresses this challenge by pretraining on up to 55M patient records (9B clinical events). We evaluate MOTOR's transfer learning performance on 19 tasks, across 3 patient databases (a private EHR system, MIMIC-IV, and Merative claims data). Task-specific models adapted from MOTOR improve time-dependent C statistics by 4.6% over state-of-the-art, improve label efficiency by up to 95% ,and are more robust to temporal distributional shifts. We further evaluate cross-site portability by adapting our MOTOR foundation model for six prediction tasks on the MIMIC-IV dataset, where it outperforms all baselines. MOTOR is the first foundation model for medical TTE predictions and we release a 143M parameter pretrained model for research use at [redacted URL].
- Ahmed M Alaa and Mihaela van der Schaar. Deep multi-task gaussian processes for survival analysis with competing risks. In Proceedings of the 31st International Conference on Neural Information Processing Systems, pp. 2326–2334, 2017.
- Improving palliative care with deep learning. BMC medical informatics and decision making, 18(4):55–64, 2018.
- Countdown regression: Sharp and calibrated survival predictions. In Uncertainty in Artificial Intelligence, pp. 145–155. PMLR, 2020.
- Robust and efficient medical imaging with self-supervision. CoRR, abs/2205.09723, 2022. doi: 10.48550/arXiv.2205.09723. URL https://doi.org/10.48550/arXiv.2205.09723.
- On the opportunities and risks of foundation models. arXiv preprint arXiv:2108.07258, 2021.
- Disparate censorship & undertesting: A source of label bias in clinical machine learning, 2022. URL https://arxiv.org/abs/2208.01127.
- Recurrent neural networks for multivariate time series with missing values, 2016. URL https://arxiv.org/abs/1606.01865.
- Retain: An interpretable predictive model for healthcare using reverse time attention mechanism, 2016. URL https://arxiv.org/abs/1608.05745.
- David R Cox. Regression models and life-tables. Journal of the Royal Statistical Society: Series B (Methodological), 34(2):187–202, 1972.
- Evaluation of the performance of survival analysis models: Discrimination and calibration measures. In Advances in Survival Analysis, volume 23 of Handbook of Statistics, pp. 1–25. Elsevier, 2003. doi: https://doi.org/10.1016/S0169-7161(03)23001-7. URL https://www.sciencedirect.com/science/article/pii/S0169716103230017.
- Semi-supervised sequence learning. In C. Cortes, N. Lawrence, D. Lee, M. Sugiyama, and R. Garnett (eds.), Advances in Neural Information Processing Systems, volume 28. Curran Associates, Inc., 2015. URL https://proceedings.neurips.cc/paper/2015/file/7137debd45ae4d0ab9aa953017286b20-Paper.pdf.
- Gaussian processes for survival analysis. Advances in Neural Information Processing Systems, 29, 2016.
- Piecewise exponential artificial neural networks (peann) for modeling hazard function with right censored data. In Computation Intelligence Methods for Bioinformatics and Biostatistics, pp. 125–136, 07 2014. ISBN 978-3-319-09041-2. doi: 10.1007/978-3-319-09042-9_9.
- Michael Friedman. Piecewise exponential models for survival data with covariates. The Annals of Statistics, 10(1):101–113, 1982.
- Y. Gao and Y. Cui. Clinical time-to-event prediction enhanced by incorporating compatible related outcomes. PLOS Digit Health, 1(5), 2022.
- Physiobank, physiotoolkit, and physionet: components of a new research resource for complex physiologic signals. circulation, 101(23):e215–e220, 2000.
- Assessment and comparison of prognostic classification schemes for survival data. Stat Med, 18(17-18):2529–2545, 1999.
- On calibration of modern neural networks. CoRR, abs/1706.04599, 2017. URL http://arxiv.org/abs/1706.04599.
- Evaluating the yield of medical tests. Jama, 247(18):2543–2546, 1982.
- Pitfalls of the concordance index for survival outcomes. Statistics in Medicine, n/a(n/a), 2023. doi: https://doi.org/10.1002/sim.9717. URL https://onlinelibrary.wiley.com/doi/abs/10.1002/sim.9717.
- Survival model predictive accuracy and roc curves. Biometrics, 61(1):92–105, 2005.
- Machine Learning in Detection of Undiagnosed Celiac Disease. Clin Gastroenterol Hepatol, 16(8):1354–1355, Aug 2018.
- Chexpert: A large chest radiograph dataset with uncertainty labels and expert comparison. CoRR, abs/1901.07031, 2019. URL http://arxiv.org/abs/1901.07031.
- Random survival forests. Ann Appl Stat, pp. 2(3):841–860, 2008.
- Neural survival recommender. In Proceedings of the Tenth ACM International Conference on Web Search and Data Mining, WSDM ’17, pp. 515–524, New York, NY, USA, 2017. Association for Computing Machinery. ISBN 9781450346757. doi: 10.1145/3018661.3018719. URL https://doi.org/10.1145/3018661.3018719.
- MIMIC-IV, a freely accessible electronic health record dataset. Scientific Data, 10(1), January 2023. doi: 10.1038/s41597-022-01899-x. URL https://doi.org/10.1038/s41597-022-01899-x.
- No fair lunch: A causal perspective on dataset bias in machine learning for medical imaging, 2023.
- DeepSurv: Personalized treatment recommender system using a cox proportional hazards deep neural network. BMC Medical Research Methodology, 18(1), feb 2018. doi: 10.1186/s12874-018-0482-1. URL https://doi.org/10.1186%2Fs12874-018-0482-1.
- Improved survival analysis by learning shared genomic information from pan-cancer data. Bioinformatics, 36(Suppl_1):i389–i398, Jul 2020.
- Recent advances in the use of machine learning and artificial intelligence to improve diagnosis, predict flares, and enrich clinical trials in lupus. Curr Opin Rheumatol, 34(6):374–381, Nov 2022.
- Prediction from randomly right censored data. Journal of Multivariate Analysis, 80(1):73–100, 2002.
- Bootstrapping and permuting paired t-test type statistics. Statistics and Computing, 24(3):283–296, January 2013. doi: 10.1007/s11222-012-9370-4. URL https://doi.org/10.1007/s11222-012-9370-4.
- Semi-structured deep piecewise exponential models. CoRR, abs/2011.05824, 2020. URL https://arxiv.org/abs/2011.05824.
- H. Kvamme and O. Borgan. Continuous and discrete-time survival prediction with neural networks. Lifetime Data Anal, 27(4):710–736, Oct 2021.
- Estimating the crude probability of death due to cancer and other causes using relative survival models. Statistics in medicine, 29(7-8):885–895, 2010.
- Dynamic-deephit: A deep learning approach for dynamic survival analysis with competing risks based on longitudinal data. IEEE Transactions on Biomedical Engineering, 67(1):122–133, 2020. doi: 10.1109/TBME.2019.2909027.
- Prediction Model for Pancreatic Cancer-A Population-Based Study from NHIRD. Cancers (Basel), 14(4), Feb 2022.
- A multi-task learning formulation for survival analysis. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’16, pp. 1715–1724, New York, NY, USA, 2016. Association for Computing Machinery. ISBN 9781450342322. doi: 10.1145/2939672.2939857. URL https://doi.org/10.1145/2939672.2939857.
- Behrt: Transformer for electronic health records, 2019. URL https://arxiv.org/abs/1907.09538.
- Holistic evaluation of language models. CoRR, abs/2211.09110, 2022. doi: 10.48550/arXiv.2211.09110. URL https://doi.org/10.48550/arXiv.2211.09110.
- Bryan Lim and Mihaela van der Schaar. Disease-atlas: Navigating disease trajectories using deep learning. In Finale Doshi-Velez, Jim Fackler, Ken Jung, David Kale, Rajesh Ranganath, Byron Wallace, and Jenna Wiens (eds.), Proceedings of the 3rd Machine Learning for Healthcare Conference, volume 85 of Proceedings of Machine Learning Research, pp. 137–160. PMLR, 17–18 Aug 2018. URL https://proceedings.mlr.press/v85/lim18a.html.
- Time-to-event predictive modeling for chronic conditions using electronic health records. IEEE Intelligent Systems, 29(3):14–20, 2014.
- Egil Martinsson. Wtte-rnn: Weibull time to event recurrent neural network. PhD thesis, Chalmers University of Technology & University of Gothenburg, 2016.
- Reproducibility in machine learning for health research: Still a ways to go. Science Translational Medicine, 13(586):eabb1655, 2021.
- Merative. Merative marketscan research databases for life sciences researchers, 2018. https://www.merative.com/content/dam/merative/documents/brief/marketscan-research-databases-for-life-sciences-researchers.pdf.
- MIMIC-OMOP. Mimic-omop. https://github.com/MIT-LCP/mimic-omop, 2023.
- Deep patient: An unsupervised representation to predict the future of patients from the electronic health records. Sci Rep, 6:26094, May 2016.
- Deep survival machines: Fully parametric survival regression and representation learning for censored data with competing risks. IEEE Journal of Biomedical and Health Informatics, January 2021.
- OHDSI. The Book of OHDSI: Observational Health Data Sciences and Informatics. OHDSI, 2019. ISBN 9781088855195. URL https://books.google.com/books?id=JxpnzQEACAAJ.
- Validation of a common data model for active safety surveillance research. Journal of the American Medical Informatics Association, 19(1):54–60, 2012.
- Cehr-bert: Incorporating temporal information from structured ehr data to improve prediction tasks. In Subhrajit Roy, Stephen Pfohl, Emma Rocheteau, Girmaw Abebe Tadesse, Luis Oala, Fabian Falck, Yuyin Zhou, Liyue Shen, Ghada Zamzmi, Purity Mugambi, Ayah Zirikly, Matthew B. A. McDermott, and Emily Alsentzer (eds.), Proceedings of Machine Learning for Health, volume 158 of Proceedings of Machine Learning Research, pp. 239–260. PMLR, 04 Dec 2021. URL https://proceedings.mlr.press/v158/pang21a.html.
- Churn prediction in mobile social games: Towards a complete assessment using survival ensembles. In 2016 IEEE International Conference on Data Science and Advanced Analytics (DSAA). IEEE, oct 2016. doi: 10.1109/dsaa.2016.84. URL https://doi.org/10.1109%2Fdsaa.2016.84.
- Deep contextualized word representations. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers), pp. 2227–2237, New Orleans, Louisiana, June 2018. Association for Computational Linguistics. doi: 10.18653/v1/N18-1202. URL https://aclanthology.org/N18-1202.
- Sebastian Pölsterl. scikit-survival: A library for time-to-event analysis built on top of scikit-learn. Journal of Machine Learning Research, 21(212):1–6, 2020. URL http://jmlr.org/papers/v21/20-729.html.
- Exploring the limits of transfer learning with a unified text-to-text transformer. J. Mach. Learn. Res., 21(1), jun 2022. ISSN 1532-4435.
- Covrnn—a recurrent neural network model for predicting outcomes of covid-19 patients: model development and validation using ehr data. medRxiv, 2021a. doi: 10.1101/2021.09.27.21264121. URL https://www.medrxiv.org/content/early/2021/09/29/2021.09.27.21264121.
- Med-BERT: Pretrained contextualized embeddings on large-scale structured electronic health records for disease prediction. npj Digital Medicine, 4(1), May 2021b. doi: 10.1038/s41746-021-00455-y. URL https://doi.org/10.1038/s41746-021-00455-y.
- Application of machine learning in predicting non-alcoholic fatty liver disease using anthropometric and body composition indices. Sci Rep, 13(1):4942, Mar 2023.
- Deep recurrent survival analysis, 2018. URL https://arxiv.org/abs/1809.02403.
- Efficient content-based sparse attention with routing transformers. Transactions of the Association for Computational Linguistics, 9:53–68, 2021. doi: 10.1162/tacl_a_00353. URL https://aclanthology.org/2021.tacl-1.4.
- DeepHIT: A deep learning framework for prediction of hERG-induced cardiotoxicity. Bioinformatics, 36(10):3049–3055, May 2020.
- Michael Schemper. Cox analysis of survival data with non-proportional hazard functions. Journal of the Royal Statistical Society. Series D (The Statistician), 41(4):455–465, 1992. ISSN 00390526, 14679884. URL http://www.jstor.org/stable/2349009.
- Chapter 5 standardized vocabularies, Jan 2021. URL https://ohdsi.github.io/TheBookOfOhdsi/StandardizedVocabularies.html.
- Claude Elwood Shannon. A mathematical theory of communication. The Bell System Technical Journal, 27:379–423, 1948. URL http://plan9.bell-labs.com/cm/ms/what/shannonday/shannon1948.pdf.
- Irene Solaiman. The gradient of generative ai release: Methods and considerations. arXiv preprint arXiv:2302.04844, 2023.
- Language models are an effective representation learning technique for electronic health record data. Journal of Biomedical Informatics, 113:103637, 2021.
- Roformer: Enhanced transformer with rotary position embedding, 2021. URL https://arxiv.org/abs/2104.09864.
- K TSIMA. The reproducibility issues that haunt health-care ai. Nature, 613, 2023.
- Mark J Van der Laan and James M Robins. Unified methods for censored longitudinal data and causality, volume 5. Springer, 2003.
- Survtrace: Transformers for survival analysis with competing events. CoRR, abs/2110.00855, 2021. URL https://arxiv.org/abs/2110.00855.
- Time-to-event modeling for hospital length of stay prediction for COVID-19 patients. Mach Learn Appl, 9:100365, Sep 2022.
- Combining parametric, semi-parametric, and non-parametric survival models with stacked survival models. Biostatistics, 16(3):537–549, 02 2015. ISSN 1465-4644. doi: 10.1093/biostatistics/kxv001. URL https://doi.org/10.1093/biostatistics/kxv001.
- Atherosclerotic cardiovascular disease risk assessment: An American Society for Preventive Cardiology clinical practice statement. Am J Prev Cardiol, 10:100335, Jun 2022.
- The shaky foundations of clinical foundation models: A survey of large language models and foundation models for emrs. arXiv preprint arXiv:2303.12961, 2023.
- Predicting winning price in real time bidding with censored data. In Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’15, pp. 1305–1314, New York, NY, USA, 2015. Association for Computing Machinery. ISBN 9781450336642. doi: 10.1145/2783258.2783276. URL https://doi.org/10.1145/2783258.2783276.
- Extracting and integrating data from entire electronic health records for detecting colorectal cancer cases. AMIA Annu Symp Proc, 2011:1564–1572, 2011.
- Pretrained transformer framework on pediatric claims data for population specific tasks. Scientific Reports, 12(1):3651, Mar 2022. ISSN 2045-2322. doi: 10.1038/s41598-022-07545-1. URL https://doi.org/10.1038/s41598-022-07545-1.
- Ethan Steinberg (16 papers)
- Jason Fries (9 papers)
- Yizhe Xu (11 papers)
- Nigam Shah (16 papers)