MALADE: Orchestration of LLM-powered Agents with Retrieval Augmented Generation for Pharmacovigilance (2408.01869v1)

Published 3 Aug 2024 in cs.CL, cs.AI, cs.IR, cs.LG, cs.MA, and q-bio.QM

Abstract: In the era of LLMs, given their remarkable text understanding and generation abilities, there is an unprecedented opportunity to develop new, LLM-based methods for trustworthy medical knowledge synthesis, extraction and summarization. This paper focuses on the problem of Pharmacovigilance (PhV), where the significance and challenges lie in identifying Adverse Drug Events (ADEs) from diverse text sources, such as medical literature, clinical notes, and drug labels. Unfortunately, this task is hindered by factors including variations in the terminologies of drugs and outcomes, and ADE descriptions often being buried in large amounts of narrative text. We present MALADE, the first effective collaborative multi-agent system powered by LLM with Retrieval Augmented Generation for ADE extraction from drug label data. This technique involves augmenting a query to an LLM with relevant information extracted from text resources, and instructing the LLM to compose a response consistent with the augmented data. MALADE is a general LLM-agnostic architecture, and its unique capabilities are: (1) leveraging a variety of external sources, such as medical literature, drug labels, and FDA tools (e.g., OpenFDA drug information API), (2) extracting drug-outcome association in a structured format along with the strength of the association, and (3) providing explanations for established associations. Instantiated with GPT-4 Turbo or GPT-4o, and FDA drug label data, MALADE demonstrates its efficacy with an Area Under ROC Curve of 0.90 against the OMOP Ground Truth table of ADEs. Our implementation leverages the Langroid multi-agent LLM framework and can be found at https://github.com/jihyechoi77/malade.

Citations (1)

View on Semantic Scholar

Collections

Sign up for free to add this paper to one or more collections.

Sign Up

Summary

The paper introduces MALADE, a novel multi-agent LLM system with retrieval augmented generation that achieves an AUC of 0.90 in ADE detection.
It employs specialized agents and an Agent-Critic framework to systematically identify drug-outcome associations from FDA drug labels.
The system offers practical improvements in pharmacovigilance and sets a foundation for future integration with EHR data and local LLM deployment.

An Expert Overview of MALADE: Multi-Agent LLM System for Pharmacovigilance

The paper "MALADE: Orchestration of LLM-powered Agents with Retrieval Augmented Generation for Pharmacovigilance" introduces a novel system for Adverse Drug Event (ADE) extraction, utilizing a LLM-agnostic architecture with a multi-agent framework. The system, named MALADE, demonstrates how orchestrated LLM agents, fortified with Retrieval Augmented Generation (RAG), can identify Drug-Outcome Associations (DOAs) from FDA drug labels.

Objectives and Motivation

Pharmacovigilance (PhV) is pivotal in ensuring patient safety by detecting harmful drug reactions post-market. The significance of Prompt and accurate ADE detection is heightened by challenges such as varied drug terminologies and voluminous narrative clinical text. MALADE addresses these issues by leveraging the advanced capabilities of LLMs for understanding and generating text, aiming to improve PhV practices.

System Architecture

The architecture of MALADE is divided into three main tasks handled by specialized agents:

DrugFinder: Identifies representative drugs within a given category.
DrugAgent: Assesses individual drugs for potential ADEs using FDA drug label data.
CategoryAgent: Synthesizes data from DrugAgents to determine the overall risk profile of a drug category concerning a specific ADE.

These agents interact using a refined Agent-Critic pattern, where Critic agents provide iterative feedback to their corresponding primary agents, ensuring the accuracy and reliability of the generated responses.

Retrieval Augmented Generation (RAG) enhances the LLM agents by enabling them to access up-to-date external knowledge through document retrieval mechanisms, significantly improving the accuracy and specificity of the data used in ADE detection.

Experimental Evaluation

MALADE’s performance was evaluated using the well-established OMOP Ground Truth table of ADEs. Several metrics were used to quantify MALADE's efficacy, including:

Area Under Curve (AUC) for both effect-based and ADE-specific classifications.
F1 scores for effect-based and ADE-specific classifications.

The results indicated that MALADE achieved high accuracy with an AUC of 0.90 against the OMOP reference, an improvement over existing systems that rely solely on off-the-shelf models like ChatGPT.

Implications and Future Directions

The introduction of MALADE presents several practical and theoretical implications:

Practical: MALADE’s architecture, particularly its reliance on RAG and the Agent-Critic interaction pattern, offers a scalable and adaptable framework for ADE detection. This system can assist healthcare providers and policymakers in making informed decisions based on the reliable synthesis of medical knowledge.
Theoretical: The use of orchestrated multi-agent systems presents a promising direction for future developments in AI, particularly in high-stakes domains like healthcare. The system demonstrates a tangible push towards enhancing the interpretability and trustworthiness of LLMs by ensuring they operate within well-defined constraints and are subject to continuous validation through Critic agents.

Future Research could focus on:

EHR Data Integration: Extending MALADE to incorporate Electronic Health Records (EHRs) for more granular and real-time ADE detection.
Open-source LLMs: Exploring the deployment of local LLMs to address privacy and cost considerations. Challenges such as instruction adherence and reliable tool use must be tackled for these models to be as effective as proprietary options.
Complex Task Decomposition: Further refining the decomposition of complex queries into manageable sub-tasks to enhance the system’s robustness and extend its applicability to broader medical inquiries.

Conclusion

MALADE advances the field of pharmacovigilance by showcasing how collaborative LLM agents can be orchestrated to tackle the intricate task of ADE extraction. The principles underpinning its design—Agent-Critic interactions, task decomposition, and selective use of LLMs—offer a strong foundation for developing reliable, evidence-based medical AI applications. As AI continues to evolve, systems like MALADE can set the standard for integrating sophisticated models into practical, high-impact tasks in healthcare and beyond.