Papers
Topics
Authors
Recent
2000 character limit reached

GraphRAG for Drug Side Effects

Updated 7 January 2026
  • The paper presents GraphRAG, a modular framework combining entity recognition, dynamic Cypher queries, and LLM-based binary classification to achieve near-perfect side-effect retrieval accuracy.
  • Methodologically, the system leverages a Neo4j-powered knowledge graph built from SIDER data and incorporates precise retrieval techniques to address LLM hallucinations.
  • Key experimental results show GraphRAG outperforming conventional text-based RAG methods, attaining accuracy, precision, and recall levels close to 1 with minimal false positives and negatives.

GraphRAG (Graph-based Retrieval-Augmented Generation) represents an advance in pharmacovigilance, enabling highly precise retrieval of drug–side-effect associations by integrating a Neo4j-powered knowledge graph with a state-of-the-art LLM. The framework is optimized to address LLM shortcomings such as hallucinations and lack of specialized biomedical context, supporting robust, explainable side-effect retrieval at scale (Nygren et al., 18 Jul 2025).

1. System Architecture and Workflow

GraphRAG comprises a modular pipeline combining entity recognition, graph traversal, prompt engineering, and LLM-based binary classification.

  • Entity Extraction and Query Construction: The system first extracts drug and side-effect entities from free-text user queries, e.g., “Is headache an adverse effect of metformin?” using an entity recognition module. A Cypher query is then dynamically generated to interrogate the knowledge graph:
    1
    2
    
    MATCH (d:Drug {name: "<drug>"})-[:may_cause_side_effect]->(s:SideEffect {name: "<side_effect>"})
    RETURN d, s
  • Graph Retrieval: Neo4j executes an exact edge-existence check, efficiently determining if the specific drug–side-effect association is documented.
  • Contextualization and Prompting: Depending on the retrieval result, a canonical context sentence is formatted—affirmative if the edge exists, negative otherwise. This context, together with the original question, is provided to a Llama 3 8B model via a tightly constrained prompt:
    1
    2
    3
    4
    5
    
    You are asked to answer the following question with a single word: YES or NO.
    Base your answer strictly on the GraphRAG result provided below.
    Do not speculate beyond the information given.
    Result: <context sentence>
    Question: <original query>
  • LLM Outputs and Orchestration: The model returns YES/NO plus succinct justification. System orchestration is managed by AWS Lambda and API Gateway hosted on Amazon Bedrock, supporting low-latency and scalability for deployment (Nygren et al., 18 Jul 2025).

2. Knowledge Graph Construction

The knowledge graph is instantiated from SIDER 4.1, with:

  • Node Types:
    • Drug nodes, each representing a generic drug with standardized nomenclature.
    • SideEffect nodes, using MedDRA Preferred Terms for side-effect granularity.
  • Edge Semantics:
    • Directed “may_cause_side_effect” edges from Drug to SideEffect indicate observed associations in the curated SIDER dataset.
  • Scope and Subsetting:
    • The full graph includes 1,106 drugs, 4,073 side effects, and 141,209 documented associations.
    • Evaluation utilizes a balanced subset (976 drugs, 3,851 side effects, 19,520 edges), ensuring coverage while managing computational demands (Nygren et al., 18 Jul 2025).

3. Mathematical Formulation and Comparison with Other RAG Methods

  • Retrieval Objective:
    • Let A{0,1}D×SA \in \{0,1\}^{|D| \times |S|} denote the adjacency matrix of the knowledge graph.
    • For query drug dd^* and side effect ss^*, retrieval is r(d,s)=Ad,sr(d^*,s^*) = A_{d^*,s^*}; this binary result is directly surfaced to the LLM.
  • Evaluation Metrics:
    • Standard classification metrics are used: accuracy, precision, recall (sensitivity), specificity, and F1 score. Definitions follow the conventions:
    • Accuracy=TP+TNTP+TN+FP+FN\mathrm{Accuracy} = \frac{TP+TN}{TP+TN+FP+FN}
    • Precision=TPTP+FP\mathrm{Precision} = \frac{TP}{TP+FP}
    • Recall=TPTP+FN\mathrm{Recall} = \frac{TP}{TP+FN}
    • Specificity=TNTN+FP\mathrm{Specificity} = \frac{TN}{TN+FP}
    • F1=2PrecisionRecallPrecision+Recall\mathrm{F1} = 2 \frac{\mathrm{Precision}\cdot \mathrm{Recall}}{\mathrm{Precision}+\mathrm{Recall}}
  • Contrast with Text-RAG:
    • Conventional text-based RAG systems retrieve top-kk textual chunks based on vector-similarity scoring: scorei=cosine(emb(q),emb(chunki))score_i = \mathrm{cosine}(\mathrm{emb}(q), \mathrm{emb}(chunk_i)), which may induce subtle loss of precision or increased hallucination risk.
    • GraphRAG, in contrast, guarantees exact match for supported queries, yielding minimal ambiguity (Nygren et al., 18 Jul 2025).

4. Experimental Results and Baseline Comparisons

A comprehensive evaluation on a 19,520-pair benchmark establishes the empirical advantage of GraphRAG:

Method Accuracy F1 Precision Recall Specificity
Llama 3 8B 0.529 0.164 N/A 0.092 N/A
RAG (Format A) 0.886 0.729 0.872 0.775 N/A
RAG (Format B) 0.998 0.997 0.998 0.999 0.997
GraphRAG 0.9999 0.9999 0.9998 0.9999 0.9998

GraphRAG achieves near-perfect performance, virtually eliminating both false positives and false negatives. Results hold across Anatomical Therapeutic Chemical (ATC) classes and MedDRA System Organ Classes. ChatGPT 3.5/4 baselines averaged 0.55–0.63 accuracy on a subset, verifying GraphRAG’s domain-specific advantage (Nygren et al., 18 Jul 2025).

5. Methodological Extensions: Graph Co-Attention and Generalization

While GraphRAG as instantiated utilizes direct edge-existence queries for maximal precision, more expressive retrieval/generation is possible by incorporating molecular and relational structure via graph co-attentive encoding.

The Multi-Head Co-Attentive Drug-Drug Interaction encoder (MHCADDI) integrates atomic features and chemical bonds into learned graph embeddings for each drug. Through T-layer message passing augmented with cross-graph co-attention (K=8 heads per layer), it forms a 32-D joint representation for each drug pair. This approach is demonstrated to outperform traditional graph or feature-based models in predicting adverse drug–drug interactions (AUROC 0.882) (Deac et al., 2019). GraphRAG can substitute its direct look-up module with a MHCADDI encoder to jointly embed drug pairs and side effects, generalizing beyond explicit knowledge-graph associations to cases where only molecular structure or similarity evidence is available.

A plausible implication is that future GraphRAG instantiations could use multi-hop graph traversals, learned embeddings, or joint representations to support reasoning over indirect associations or emergent pharmacovigilance phenomena (Deac et al., 2019).

6. Limitations, Scope, and Future Directions

GraphRAG’s reliance on reported, static associations yields high precision but omits under-reported or emerging side effects. Only one-to-one (drug, side effect) queries are directly supported; class-level or reverse queries require Cypher pattern extensions. The framework currently provides only binary outputs, lacking nuanced indications of severity or frequency, and is sensitive to inconsistencies in entity normalization (e.g., synonyms and misspellings).

Proposed directions include:

  • Integration of additional data sources (e.g., FAERS, patient-generated data) to detect novel associations.
  • Expansion of Cypher query patterns for indirect association and “class queries.”
  • Incorporation of synonym mapping and MedDRA term expansion to improve recall.
  • Adoption of richer node and edge features, and Graph Neural Networks for multi-hop or embedding-based reasoning.
  • Enabling LLM responses with richer content (e.g., frequencies, confidence intervals, and evidence citations) (Nygren et al., 18 Jul 2025).

7. Broader Implications and Domain Generality

The core paradigm—modeling domain entities and relations as a graph, retrieving precise or subgraph relational evidence, and providing concise factual contexts to LLMs—exhibits wide generality. It is applicable to diverse domains requiring high interpretability and precision, from other biomedical association retrievals (gene–phenotype, drug–drug interactions, trial outcomes) to structured legal or technical document QA (Nygren et al., 18 Jul 2025). Continued integration of joint molecular–textual–graph context embeddings, as in MHCADDI, suggests a path toward retrieval-augmented LLMs capable of compositional, real-world pharmacovigilance and beyond.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (2)

Whiteboard

Topic to Video (Beta)

Follow Topic

Get notified by email when new papers are published related to GraphRAG for Drug Side Effects.