Data augmentation method for modeling health records with applications to clopidogrel treatment failure detection (2402.18046v1)
Abstract: We present a novel data augmentation method to address the challenge of data scarcity in modeling longitudinal patterns in Electronic Health Records (EHR) of patients using NLP algorithms. The proposed method generates augmented data by rearranging the orders of medical records within a visit where the order of elements are not obvious, if any. Applying the proposed method to the clopidogrel treatment failure detection task enabled up to 5.3% absolute improvement in terms of ROC-AUC (from 0.908 without augmentation to 0.961 with augmentation) when it was used during the pre-training procedure. It was also shown that the augmentation helped to improve performance during fine-tuning procedures, especially when the amount of labeled training data is limited.
- M. Cattaneo. Resistance to antiplatelet drugs: molecular mechanisms and laboratory detection. Journal of Thrombosis and Haemostasis, 5(s1):230–237, 2007. https://doi.org/10.1111/j.1538-7836.2007.02498.x. URL https://onlinelibrary.wiley.com/doi/abs/10.1111/j.1538-7836.2007.02498.x.
- Retain: An interpretable predictive model for healthcare using reverse time attention mechanism, 2017.
- BERT: pre-training of deep bidirectional transformers for language understanding. arXiv:1810.04805 [cs], May 2019. URL http://arxiv.org/abs/1810.04805.
- Consensus list of signals to detect potential adverse drug reactions in nursing homes. Journal of the American Geriatrics Society, 56(5):808–815, 2008. ISSN 1532-5415 0002-8614. 10.1111/j.1532-5415.2008.01665.x.
- Under-reporting of adverse drug reactions. Drug Safety, 29(5):385–396, 2006. ISSN 1179-1942. 10.2165/00002018-200629050-00003. URL https://doi.org/10.2165/00002018-200629050-00003.
- Behrt: Transformer for electronic health records, 2019.
- U. A. Meyer. Pharmacogenetics and adverse drug reactions. Lancet (London, England), 356(9242):1667–1671, 2000. ISSN 0140-6736. 10.1016/S0140-6736(00)03167-6.
- Cehr-bert: Incorporating temporal information from structured ehr data to improve prediction tasks, 2021.
- Peter Pitts. Twenty-first century global ADR management: A need for clarification, redesign, and coordinated action. Therapeutic Innovation and Regulatory Science, 57, 08 2022. 10.1007/s43441-022-00443-8.
- Med-BERT: pretrained contextualized embeddings on large-scale structured electronic health records for disease prediction. npj Digital Medicine, 4(1):86, December 2021. ISSN 2398-6352. 10.1038/s41746-021-00455-y. URL http://www.nature.com/articles/s41746-021-00455-y.
- Improving neural machine translation models with monolingual data. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 86–96, Berlin, Germany, August 2016. Association for Computational Linguistics. 10.18653/v1/P16-1009. URL https://aclanthology.org/P16-1009.
- Pre-training of Graph Augmented Transformers for Medication Recommendation. arXiv:1906.00346 [cs], November 2019. URL http://arxiv.org/abs/1906.00346.
- Cathie Sudlow et al. Uk biobank: An open access resource for identifying the causes of a wide range of complex diseases of middle and old age. PLOS Medicine, 12(3):1–10, 03 2015. 10.1371/journal.pmed.1001779. URL https://doi.org/10.1371/journal.pmed.1001779.
- The Shaky Foundations of Clinical Foundation Models:.
- Character-level convolutional networks for text classification. In C. Cortes, N. Lawrence, D. Lee, M. Sugiyama, and R. Garnett, editors, Advances in Neural Information Processing Systems, volume 28. Curran Associates, Inc., 2015. URL https://proceedings.neurips.cc/paper_files/paper/2015/file/250cf8b51c773f3f8dc8b4be867a9a02-Paper.pdf.