Leveraging Pre-trained and Transformer-derived Embeddings from EHRs to Characterize Heterogeneity Across Alzheimer's Disease and Related Dementias (2404.00464v1)
Abstract: Alzheimer's disease is a progressive, debilitating neurodegenerative disease that affects 50 million people globally. Despite this substantial health burden, available treatments for the disease are limited and its fundamental causes remain poorly understood. Previous work has suggested the existence of clinically-meaningful sub-types, which it is suggested may correspond to distinct etiologies, disease courses, and ultimately appropriate treatments. Here, we use unsupervised learning techniques on electronic health records (EHRs) from a cohort of memory disorder patients to characterise heterogeneity in this disease population. Pre-trained embeddings for medical codes as well as transformer-derived Clinical BERT embeddings of free text are used to encode patient EHRs. We identify the existence of sub-populations on the basis of comorbidities and shared textual features, and discuss their clinical significance.
- Irma Mebane-Sims. 2009 alzheimer’s disease facts and figures. Alzheimer’s & Dementia: The Journal of the Alzheimer’s Association, 2009.
- The amyloid hypothesis of alzheimer’s disease: progress and problems on the road to therapeutics. science, 297(5580):353–356, 2002.
- Molecular subtyping of alzheimer’s disease using rna sequencing data reveals novel mechanisms and targets. Science Advances, 7(2):eabb5398, 2021.
- Uncovering the heterogeneity and temporal complexity of neurodegenerative diseases with subtype and stage inference. Nature communications, 9(1):1–16, 2018.
- Four distinct trajectories of tau deposition identified in alzheimer’s disease. Nature Medicine, 27(5):871–881, 2021.
- Neuropsychological syndromes associated with alzheimer’s/vascular dementia: a latent class analysis. Journal of Alzheimer’s disease, 42(3):999–1014, 2014.
- The identification of cognitive subtypes in alzheimer’s disease dementia using latent class analysis. Journal of Neurology, Neurosurgery & Psychiatry, 87(3):235–243, 2016.
- Cognitive subtypes of probable alzheimer’s disease robustly identified in four cohorts. Alzheimer’s & Dementia, 13(11):1226–1236, 2017.
- Comorbidity clusters in autism spectrum disorders: an electronic health record time-series analysis. Pediatrics, 133(1):e54–e63, 2014.
- A multidimensional precision medicine approach identifies an autism subtype characterized by dyslipidemia. Nature Medicine, 26(9):1375–1379, 2020.
- Data-driven subtyping of parkinson’s disease using longitudinal clinical records: a cohort study. Scientific reports, 9(1):1–12, 2019.
- Data-driven discovery of probable alzheimer’s disease and related dementia subphenotypes using electronic health records. Learning Health Systems, 4(4):e10246, 2020.
- Using unsupervised learning to identify clinical subtypes of alzheimer’s disease in electronic health records. Studies in health technology and informatics, 270:499–503, 2020.
- Deep representation learning of electronic health records to unlock patient stratification at scale. NPJ digital medicine, 3(1):1–11, 2020.
- Temporal subtyping of alzheimer’s disease using medical conditions preceding alzheimer’s disease onset in electronic health records. arXiv preprint arXiv:2202.10991, 2022.
- Deep phenotyping of alzheimer’s disease leveraging electronic medical records identifies sex-specific clinical associations. Nature communications, 13(1):1–15, 2022.
- Attention is all you need. In Advances in neural information processing systems, pages 5998–6008, 2017.
- Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805, 2018.
- Biobert: a pre-trained biomedical language representation model for biomedical text mining. Bioinformatics, 36(4):1234–1240, 2020.
- Publicly available clinical bert embeddings. arXiv preprint arXiv:1904.03323, 2019.
- Mimic-iii, a freely accessible critical care database. Scientific data, 3(1):1–9, 2016.
- Learning low-dimensional representations of medical concepts. AMIA Summits on Translational Science Proceedings, 2016:41, 2016.
- Claude Elwood Shannon. A mathematical theory of communication. The Bell system technical journal, 27(3):379–423, 1948.
- Two measures of sample entropy. Statistics & Probability Letters, 20(3):225–234, 1994.
- SciPy 1.0: Fundamental Algorithms for Scientific Computing in Python. Nature Methods, 17:261–272, 2020.
- Scikit-learn: Machine learning in Python. Journal of Machine Learning Research, 12:2825–2830, 2011.
- Umap: Uniform manifold approximation and projection for dimension reduction. arXiv preprint arXiv:1802.03426, 2018.
- Diabetes mellitus and risk of alzheimer disease and decline in cognitive function. Archives of neurology, 61(5):661–666, 2004.
- Diabetes is associated with increased rate of cognitive decline in questionably demented elderly. Dementia and geriatric cognitive disorders, 29(1):68–74, 2010.
- How does diabetes accelerate alzheimer disease pathology? Nature Reviews Neurology, 6(10):551–559, 2010.
- Association between persistent pain and memory decline and dementia in a longitudinal cohort of elders. JAMA internal medicine, 177(8):1146–1153, 2017.
- The link between chronic pain and alzheimer’s disease. Journal of neuroinflammation, 16(1):1–11, 2019.
- Apolipoprotein e, cholesterol metabolism, diabetes, and the convergence of risk factors for alzheimer’s disease and cardiovascular disease. Molecular psychiatry, 11(8):721–736, 2006.
- Vascular contributions to cognitive impairment and dementia including alzheimer’s disease. Alzheimer’s & Dementia, 11(6):710–717, 2015.
- The relationship between alzheimer’s disease and skin diseases: A review. Clinical, cosmetic and investigational dermatology, 14:1551, 2021.
- Overexpression of amyloid precursor protein promotes the onset of seborrhoeic keratosis and is related to skin ageing. Acta Dermato-Venereologica, 98(6), 2018.
- Lack of shunt response in suspected idiopathic normal pressure hydrocephalus with alzheimer disease pathology. Annals of neurology, 68(4):535–540, 2010.
- Frequency of alzheimer’s disease pathology at autopsy in patients with clinical normal pressure hydrocephalus. Alzheimer’s & Dementia, 7(5):509–513, 2011.
- Glymphatic system impairment in alzheimer’s disease and idiopathic normal pressure hydrocephalus. Trends in molecular medicine, 26(3):285–295, 2020.
- Variationally regularized graph-based representation learning for electronic health records. In Proceedings of the Conference on Health, Inference, and Learning, pages 1–13, 2021.
- Learning the graphical structure of electronic health records with graph convolutional transformer. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 34, pages 606–613, 2020.
- Matthew West (27 papers)
- Colin Magdamo (7 papers)
- Lily Cheng (1 paper)
- Yingnan He (3 papers)
- Sudeshna Das (17 papers)