Matching Patients to Clinical Trials with Large Language Models (2307.15051v5)

Published 27 Jul 2023 in cs.CL and cs.AI

Abstract: Patient recruitment is challenging for clinical trials. We introduce TrialGPT, an end-to-end framework for zero-shot patient-to-trial matching with LLMs. TrialGPT comprises three modules: it first performs large-scale filtering to retrieve candidate trials (TrialGPT-Retrieval); then predicts criterion-level patient eligibility (TrialGPT-Matching); and finally generates trial-level scores (TrialGPT-Ranking). We evaluate TrialGPT on three cohorts of 183 synthetic patients with over 75,000 trial annotations. TrialGPT-Retrieval can recall over 90% of relevant trials using less than 6% of the initial collection. Manual evaluations on 1,015 patient-criterion pairs show that TrialGPT-Matching achieves an accuracy of 87.3% with faithful explanations, close to the expert performance. The TrialGPT-Ranking scores are highly correlated with human judgments and outperform the best-competing models by 43.8% in ranking and excluding trials. Furthermore, our user study reveals that TrialGPT can reduce the screening time by 42.6% in patient recruitment. Overall, these results have demonstrated promising opportunities for patient-to-trial matching with TrialGPT.

PDF Abstract

Patient-to-Trial Matching Utilizing LLMs: An Analysis of TrialGPT

The paper "Matching Patients to Clinical Trials with LLMs" by Jin et al. introduces an innovative application of LLMs to address the challenge of matching patients to clinical trials, a process historically fraught with inefficiencies. The authors present TrialGPT, a novel framework leveraging LLMs to enhance the patient-to-trial matching procedure, particularly focusing on creating a patient-centric model.

Overview of TrialGPT Methodology

TrialGPT's methodology is anchored in its ability to assess clinical trial eligibility criteria on a per-patient basis using a LLM, specifically GPT-4. The framework evaluates patient eligibility on a granular, criterion-by-criterion basis and aggregates these predictions to form a comprehensive trial-level eligibility assessment. This involves two key tasks: the prediction of eligibility for each trial criterion and the aggregation of these predictions into an overall trial eligibility score.

Evaluation and Results

The authors evaluate TrialGPT's performance using three publicly available cohorts of patient-trial data, encompassing 184 patients and over 18,000 trial annotations. The model demonstrated a criterion-level prediction accuracy of 87.3%, closely aligning with expert performance, which ranges from 88.7% to 90.0%. Notably, TrialGPT's aggregated scores for trial matching showed superior performance, outperforming the best-comparing models by 32.6% to 57.2%. Furthermore, a user paper indicated a substantial reduction of 42.6% in screening time when TrialGPT was employed, underlining its practical utility in enhancing clinical trial recruitment efficacy.

Technical Contributions

TrialGPT's technical contribution primarily lies in its effective use of LLMs to generate criterion-level predictions with explanations, addressing the bottleneck seen in previous models that lacked capabilities for annotated instance-based evaluations and explainability. Additionally, TrialGPT's approach in aggregating these predictions into robust trial-level scores surpasses traditional linear aggregation methodologies, highlighting the advanced capabilities of transformer-based architectures, such as GPT-4, in biomedical applications.

Implications and Future Directions

The implications of this paper are multifaceted, promising advancements in the efficiency of clinical trial matching and potentially broadening patient access to experimental treatments. TrialGPT embodies the potential for LLMs to transform biomedical workflows by integrating AI into clinical environments, providing significant time savings and increasing the precision of trial matching processes.

Future research could aim to expand the scope by evaluating the incorporation of multi-modal data, such as lab results or imaging data, and testing the model on wider datasets that consider factors like geographic location and trial recruitment status. Moreover, exploring open-source LLM alternatives could alleviate dependency on commercial models like GPT-4.

Conclusion

TrialGPT's deployment underscores the continued integration of AI into healthcare, specifically in clinical trial settings where patient-to-trial matching can benefit substantially from language processing advancements. This research delineates a promising trajectory for future efforts to refine and expand the capabilities of LLMs in clinical applications, ultimately striving for both improved healthcare delivery and the empowerment of clinical teams through enhanced AI tools.