Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
139 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
46 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Pre-training of Graph Augmented Transformers for Medication Recommendation (1906.00346v2)

Published 2 Jun 2019 in cs.AI, cs.CL, and cs.LG

Abstract: Medication recommendation is an important healthcare application. It is commonly formulated as a temporal prediction task. Hence, most existing works only utilize longitudinal electronic health records (EHRs) from a small number of patients with multiple visits ignoring a large number of patients with a single visit (selection bias). Moreover, important hierarchical knowledge such as diagnosis hierarchy is not leveraged in the representation learning process. To address these challenges, we propose G-BERT, a new model to combine the power of Graph Neural Networks (GNNs) and BERT (Bidirectional Encoder Representations from Transformers) for medical code representation and medication recommendation. We use GNNs to represent the internal hierarchical structures of medical codes. Then we integrate the GNN representation into a transformer-based visit encoder and pre-train it on EHR data from patients only with a single visit. The pre-trained visit encoder and representation are then fine-tuned for downstream predictive tasks on longitudinal EHRs from patients with multiple visits. G-BERT is the first to bring the LLM pre-training schema into the healthcare domain and it achieved state-of-the-art performance on the medication recommendation task.

Citations (264)

Summary

  • The paper presents G-BERT, a novel model that combines GNNs and Transformers to incorporate hierarchical medical code structures.
  • It introduces a pre-training strategy on extensive EHR data that mitigates selection bias and enriches code representations.
  • Experimental results show that G-BERT outperforms baselines like RETAIN and GAMENet on key metrics, demonstrating its clinical potential.

An Overview of "Pre-training of Graph Augmented Transformers for Medication Recommendation"

The paper "Pre-training of Graph Augmented Transformers for Medication Recommendation" introduces a novel approach to medication recommendation tasks in healthcare by developing a model named G-BERT. This model synergistically combines Graph Neural Networks (GNNs) and Bidirectional Encoder Representations from Transformers (BERT) to address limitations in existing approaches, particularly related to selection bias and the underutilization of hierarchical medical knowledge.

Key Concepts and Methodology

The core innovation of this work lies in integrating GNNs with Transformers to capture the hierarchical structure of medical codes embedded in electronic health records (EHRs). Here are the main components of the proposed solution:

  1. Graph Representation of Medical Codes:
    • The model employs GNNs to encapsulate the hierarchical structures inherent in medical code ontologies (e.g., ICD-9 for diagnoses). These hierarchies are crucial as they inform the relationships between different medical codes, which are often structured as tree-like ontologies.
    • The GNNs allow the model to generate embeddings for these codes by learning from their ancestors, significantly improving the representation richness beyond flat embeddings.
  2. Transformer-Based Visit Encoder:
    • A Transformer-based architecture, inspired by BERT, is used to encode patient visits. However, unlike standard BERT models, which rely on sequential word order, the G-BERT visit encoder adapts to unordered sets of medical codes encountered within a single patient visit.
    • This model uses a multi-layer transformer without position embeddings, reflecting the non-sequential nature of medical data.
  3. Pre-training Strategies:
    • The pre-training incorporates vast EHR data, including records from patients with a single hospital visit, which are often neglected. This stage utilizes techniques such as masked modeling akin to BERT and introduces a self-prediction and dual-prediction mechanism to recover original codes and predict related sets of diagnoses or medications.
    • Through this pre-training, G-BERT leverages unlabeled data more effectively, preparing the model for subsequent tasks with limited labeled data.
  4. Fine-tuning for Medication Recommendation:
    • Fine-tuning involves optimizing the model using labeled multi-visit patient records to predict appropriate medication lists based on the observed diagnoses.

Experimental Evaluation and Results

G-BERT was subjected to rigorous evaluation using the MIMIC-III dataset, where it was shown to outperform several state-of-the-art baseline models including RETAIN and GAMENet. This superiority is evidenced by metrics such as Jaccard Similarity Score and Precision-Recall AUC.

The experiments also validate the importance of each component — particularly the integration of hierarchical ontology information and the pre-training phase, which collectively elevate prediction capabilities compared to simpler Transformer applications without these enhancements.

Implications and Future Directions

The proposal of G-BERT opens significant avenues for the application of pre-trained model architectures in healthcare, extending the utility beyond conventional NLP tasks to domains that are structured and domain-informed like EHR data. The approach demonstrates how hybrid models can capture complex hierarchical and temporal dependencies present in medical data.

Future research could expand on this work by exploring additional structural tasks that might enhance code representation further, incorporating dynamic patient data streams, or scaling the approach to more diverse healthcare datasets with heterogeneous modalities. Furthermore, extending the model's modularity to incorporate external knowledge bases like drug-drug interactions more seamlessly could also potentiate expanded applications in clinical decision support systems.

In conclusion, G-BERT serves as a pioneering step towards more contextual and robust medication recommendations, demonstrating how cross-pollination between advanced NLP architectures and domain-specific knowledge graphs can address critical challenges in healthcare informatics.