Lightweight Retrieval-Augmented Generation and Large Language Model-Based Modeling for Scalable Patient-Trial Matching

Published 23 Apr 2026 in cs.CL, cs.AI, and cs.LG | (2604.22061v1)

Abstract: Patient-trial matching requires reasoning over long, heterogeneous electronic health records (EHRs) and complex eligibility criteria, posing significant challenges for scalability, generalization, and computational efficiency. Existing approaches either rely on full-document processing with LLMs, which is computationally expensive, or use traditional machine learning methods that struggle to capture unstructured clinical narratives. In this work, we propose a lightweight framework that combines retrieval-augmented generation and LLM-based modeling for scalable patient-trial matching. The framework explicitly separates two key components: retrieval-augmented generation is used to identify clinically relevant segments from long EHRs, reducing input complexity, while LLMs are used to encode these selected segments into informative representations. These representations are further refined through dimensionality reduction and modeled using lightweight predictors, enabling efficient and scalable downstream classification. We evaluate the proposed approach on multiple public benchmarks (n2c2, SIGIR, TREC 2021/2022) and a real-world multimodal dataset from Mayo Clinic (MCPMD). Results show that retrieval-based information selection significantly reduces computational burden while preserving clinically meaningful signals. We further demonstrate that frozen LLMs provide strong representations for structured clinical data, whereas fine-tuning is essential for modeling unstructured clinical narratives. Importantly, the proposed lightweight pipeline achieves performance comparable to end-to-end LLM approaches with substantially lower computational cost.

Abstract PDF Upgrade to Chat

Authors (10)

Summary

The paper proposes a modular pipeline that integrates retrieval-augmented generation with lightweight LLM-based modeling to efficiently match patients with clinical trials.
It employs selective EHR segment retrieval, dimensionality reduction, and fine-tuning to achieve competitive Macro-F1 (0.70–0.76) and AUROC (0.75–0.80) scores.
The approach offers scalable, privacy-preserving performance that outperforms traditional ML and commercial LLM methods while significantly reducing computational costs.

Lightweight Retrieval-Augmented Generation and LLM-Based Frameworks for Scalable Patient-Trial Matching

Introduction

The deployment of patient-trial matching systems in clinical research is hindered by the complexity and heterogeneity of real-world EHRs and trial eligibility criteria. Existing methods based on end-to-end LLM processing are computationally intensive, while traditional ML techniques often lack the ability to extract or generalize from unstructured clinical narratives. This paper introduces a modular, resource-efficient pipeline integrating retrieval-augmented generation and locally deployed LLM-based modeling, explicitly designed for scalable patient-trial matching under real-world constraints (2604.22061).

Framework Architecture and Methods

The proposed framework adopts a two-stage paradigm: selective retrieval of clinically relevant EHR segments followed by representation learning and classification via lightweight predictors. The retrieval module leverages embedding-based similarity scoring (using BioBERT) to rank EHR chunks against trial eligibility criteria, thereby reducing input complexity and minimizing the need for full-document LLM processing. Retrieved segments are encoded using open-source LLMs (Mistral-7B, Llama3-8B, Falcon-7B), yielding dense representations amenable to further compression via dimensionality reduction (both sequence-level and feature-level strategies investigated).

The downstream classification employs low-parameter heads—MLP, SVM, DT, RF—facilitating efficient deployment and generalization. Fine-tuning is explicitly evaluated in mixed-modality (structured plus free-text) settings, demonstrating necessity for domain adaptation in unstructured data contexts. The architecture promotes privacy-preserving, scalable modeling suitable for clinical environments where proprietary commercial LLMs are not viable.

Experimental Results

Performance Across Modalities

The paper presents rigorous evaluation on public benchmarks (n2c2, SIGIR 2016, TREC 2021/2022) and a real-world multimodal dataset (MCPMD). The RAG-based selection strategy substantially reduces computational burden with minimal loss of clinically meaningful information. Dimensionality reduction and sequence-level aggregation outperform last-token approaches in representation quality, with optimal compression axes and degrees empirically determined. Frozen LLMs deliver robust results for structured data, but performance on unstructured text is suboptimal without fine-tuning.

Fine-Tuning and Generalization

Fine-tuning consistently enhances performance in Macro-F1 and AUPRC, especially under class imbalance and in mixed-modality settings. Lightweight, locally controlled pipelines match or exceed the performance of commercial end-to-end LLM approaches (e.g., TrialGPT) at dramatically lower resource cost. Cross-dataset and cross-trial results reveal pronounced performance degradation when target trial data are excluded from training, emphasizing the need for comprehensive exposure and domain adaptation to diverse trial protocols.

Comparative Evaluation

The framework surpasses classical ML approaches (RF, DT, SVM), early NLP systems (Criteria2Query, EliIE), and commercial black-box LLM APIs (TrialGPT) on clinical trial matching performance, robustness, and privacy. Notably, the lightweight pipeline achieves competitive or superior classification metrics vis-à-vis SOTA LLM-based methods, with fine-tuned models consistently outperforming zero-shot and frozen settings in both structured and free-text domains.

Numerical Results and Key Findings

Macro-F1 scores reach 0.70–0.76 for mixed EHR modalities with fine-tuning, outperforming several baselines.
AUROC values are sustained at 0.75–0.80 in real-world datasets, reflecting robust discrimination.
The computational cost is reduced by selective RAG chunking and DimRed, permitting high-throughput deployment without full-context LLM inference.
Cross-trial experiments demonstrate that lack of exposure to trial-specific data yields substantial metric losses (e.g., Macro-F1 reduction >0.25), indicating severe generalization challenges in heterogeneous clinical environments.

Implications and Limitations

The findings underscore the necessity for retrieval-based information selection, dimensionality reduction, and task-specific fine-tuning in scalable patient-trial matching pipelines. The architecture advances privacy-preserving deployment, avoids dependency on commercial APIs, and addresses computational bottlenecks in longitudinal EHR processing. Nevertheless, the study is limited by reliance on trial-level annotations, sensitivity to class imbalance and heterogeneous label schemas, and trial-specific distribution shifts. Prospective clinical validation is required to confirm operational utility.

Future Directions

Enhancements in cross-trial adaptation, calibration of DimRed and classifier hyperparameters, and incorporation of additional modalities are future priorities. Addressing robustness under dataset shift and operational deployment is critical, as is the exploration of broader generalizability to diverse patient cohorts and clinical environments.

Conclusion

This paper establishes a principled, modular pipeline for scalable patient-trial matching, combining retrieval-augmented generation, dimensionality compression, and lightweight classification with locally deployed LLMs. The approach demonstrates performance comparable to end-to-end LLMs at a fraction of the computational cost, with enhanced privacy, adaptability, and robustness to real-world EHR complexities. Fine-tuning emerges as necessary for optimal performance on unstructured clinical narratives. The results inform the design and deployment of efficient, privacy-preserving AI systems for clinical trial recruitment and broader healthcare applications.