BEHRT: Transformer for Electronic Health Records (1907.09538v1)

Published 22 Jul 2019 in cs.LG and stat.ML

Abstract: Today, despite decades of developments in medicine and the growing interest in precision healthcare, vast majority of diagnoses happen once patients begin to show noticeable signs of illness. Early indication and detection of diseases, however, can provide patients and carers with the chance of early intervention, better disease management, and efficient allocation of healthcare resources. The latest developments in machine learning (more specifically, deep learning) provides a great opportunity to address this unmet need. In this study, we introduce BEHRT: A deep neural sequence transduction model for EHR (electronic health records), capable of multitask prediction and disease trajectory mapping. When trained and evaluated on the data from nearly 1.6 million individuals, BEHRT shows a striking absolute improvement of 8.0-10.8%, in terms of Average Precision Score, compared to the existing state-of-the-art deep EHR models (in terms of average precision, when predicting for the onset of 301 conditions). In addition to its superior prediction power, BEHRT provides a personalised view of disease trajectories through its attention mechanism; its flexible architecture enables it to incorporate multiple heterogeneous concepts (e.g., diagnosis, medication, measurements, and more) to improve the accuracy of its predictions; and its (pre-)training results in disease and patient representations that can help us get a step closer to interpretable predictions.

View on arXiv

Authors (8)

Yikuan Li (23 papers)
Shishir Rao (10 papers)
Jose Roberto Ayala Solares (3 papers)
Abdelaali Hassaine (9 papers)
Dexter Canoy (9 papers)
Yajie Zhu (5 papers)
Kazem Rahimi (11 papers)
Gholamreza Salimi-Khorshidi (12 papers)

Citations (401)

View on Semantic Scholar

Summary

BEHRT: Transformer for Electronic Health Records

The paper presents BEHRT, a deep learning model that utilizes the Transformer architecture specifically for Electronic Health Records (EHRs) to facilitate early disease detection and enhance precision healthcare. BEHRT is introduced as a significant leap forward in EHR analysis, capable of multitask prediction and providing a personalized view of disease trajectories.

Summary of BEHRT's Contributions

BEHRT builds on the success of the Transformer-based architectures, particularly BERT, in natural language processing, and adapts these architectures to the nuances and complexities inherent in EHR data. It addresses several challenges in EHR modeling, such as non-linear interactions, long-term dependencies among events, and the representation of heterogeneous concepts.

Key contributions of BEHRT include:

Model Architecture: BEHRT employs a feedforward architecture that abstracts away the exploding and vanishing gradient problems common in recurrent neural networks (RNNs) and enables efficient parallel training of EHR sequences.
Embedding Layer: Incorporates four key embeddings—disease, age, segment, and position—to provide a comprehensive representation of events in the patient's medical history. This enables the model to capture temporal relationships, patient demographics, and care delivery patterns.
Multi-Headed Self-Attention Mechanism: This feature allows the model to capture complex interactions across different points in a patient's medical history, facilitating the discovery of significant patterns that might affect disease progression.

Results and Implications

The paper demonstrates BEHRT's superior performance compared to existing state-of-the-art models like RETAIN and DeepR in multi-label predictions for the onset of numerous health conditions. BEHRT is shown to improve Average Precision Score (APS) by 8.0-10.8% for the early prediction of 301 conditions. This level of predictive accuracy suggests that BEHRT could play a crucial role in the movement towards precision healthcare, supporting early intervention and efficient resource allocation.

The paper also highlights BEHRT's potential to generate interpretative insights through its disease embeddings and attention visualizations. By unearthing latent patterns in EHR data through visual clustering and self-attention analysis, BEHRT provides a novel method for understanding disease trajectories and interactions.

Future Directions

BEHRT's architecture represents a flexible platform that could be expanded with additional modalities of EHR data, such as medications and laboratory results, without major architectural changes. The paper also suggests the potential for ensemble learning with variations of BEHRT to further improve predictive power.

The possibility of using BEHRT in population-level studies to understand multimorbidity patterns and in individual-level applications for personalized prediction represents a significant advancement in healthcare AI. Future work may also involve fine-grained disease analysis and incorporation of demographic features for enhanced model performance. Moreover, there is an intriguing prospect of deploying BEHRT as a clinical tool to assist healthcare professionals in diagnosis and treatment planning.

In conclusion, while BEHRT is not lauded with superlatives, its methodological rigor and potential applications mark an important step towards harnessing AI for improved patient outcomes in healthcare. This paper sets a foundation for subsequent research to optimize and integrate deep learning architectures in the healthcare domain.

PDF Markdown

Related Papers

Find Related Papers