Personalized RAG++ for Clinical Decision Support

Updated 12 November 2025

The paper demonstrates that Personalized RAG++ integrates multi-phase LLM orchestration and structured graph-based retrieval to achieve 100% adherence in NCCN breast cancer recommendations.
Personalized RAG++ is a dual-pipeline system that marries agentic LLM-driven sufficiency loops with graph-based clinical knowledge extraction for tailored decision support.
The system minimizes decision errors by employing iterative verification loops and explicit NCCN citations to maintain transparency and protocol accuracy.

Personalized RAG++ refers to a spectrum of advanced Retrieval-Augmented Generation (RAG) systems in which every module—in particular, pre-retrieval, retrieval, and generation—incorporates user- or case-specific information, often orchestrated with explicit iterative decision logic, structured data representations, and agentic control flows. The defining goal is to maximize the precision, transparency, and clinical or task adherence of generated responses, while providing robust guardrails against omission, hallucination, or protocol misalignment. In applied biomedical contexts, such as NCCN-guided cancer care, Personalized RAG++ instantiates as a hybrid pipeline: leveraging both agentic LLM-driven sufficiency loops and graph-based clinical knowledge extraction to bespoke patient attributes and evolving knowledge standards.

1. Formal Definition and Objectives

Personalized RAG++ in the context of NCCN breast cancer treatment recommendation is a dual-pipeline system that extends baseline RAG by:

Integrating multi-phase, LLM-driven, agentic orchestration for complex decision-making—including clinical title selection, deep retrieval, iterative plan sufficiency checks, and prompt-based output refinement.
Employing structured graph representations of the guideline corpus to explicitly encode inter-entity treatment relationships, protocol flow, and context-rich community clusters.
Iteratively refining recommendations through domain-aligned checklist prompts and automated error detection, yielding both high adherence to clinical standards and transparent, reference-anchored outputs (with explicit NCCN document citations).

High-level system goals are to:

Maximize adherence to latest NCCN guidelines for personalized treatment planning.
Provide transparent recommendations with granular source references.
Minimize both omission (missing protocols) and commission (hallucinatory or incorrect therapies).

2. Data Representation: Guideline Ingestion and Preprocessing

Personalized RAG++ commences with conversion of complex medical guidelines into a machine-actionable corpus, following these computational strategies:

Corpus Creation: The NCCN Breast Cancer v3.2024 PDFs (including flowcharts, tables, and free text) are parsed page-wise, converting visual and textual content into structured JSON records.
Document schema:
- page_id: integer
- visual_type: categorical ({flowchart, table, text})
- entities: array (e.g., “Stage IIA”, “Endocrine therapy”)
- relationships: array (edges/arcs from diagrams, inferred protocol transitions)
- raw_text: string (OCR or extracted text)
Preprocessing:
- OCR for text on images; bounding-box clustering for flowchart nodes.
- Entity normalization using SNOMED CT/NCCN-specific ontologies.
- All JSON objects indexed in a vector database (e.g., FAISS), with per-page text/entity embedding for similarity search.

This facilitates both high-precision content retrieval and robust semantic search over the hierarchical structure of medical protocols.

3. Agentic-RAG: LLM-Orchestrated Multi-Phase Recommendation

The Agentic-RAG pipeline is characterized by staged reasoning, implemented as follows:

Stage 1: Clinical Title Selection
- LLM prompt filters and ranks up to 5 relevant guideline titles based on patient description and question variant.
Stage 2: JSON Retrieval
- Top titles are mapped to JSON corpus indices, retrieving corresponding guideline content.
Stage 3: Treatment Recommendation Generation
- LLM synthesizes a structured treatment plan (regimen, dosage/schedule, sequence/duration, explicit NCCN page references).
Stage 4: Insufficiency Loop
- LLM evaluates recommendation against checklist:
- 1. All pertinent treatments captured?
- 2. Correct sequencing?
- 3. Comorbidity appropriateness?
- 4. References present?
- If not “Sufficient,” recommendation is iteratively refined (loop limited to one revision in practical deployment).

Mathematically, retrieval uses cosine similarity with an embedding model: $\mathrm{retrieval\_score}(q, d) = \cos(\mathrm{Embed}(q), \mathrm{Embed}(d))$ with top- $k$ JSONs selected where $\mathrm{retrieval\_score} \geq \tau$ (threshold tuned).

This agentic loop enforces adherence and completeness, effectively minimizing hallucinations or protocol deviations.

4. Graph-RAG: Structured Graph Construction and Graph-Based Querying

Graph-RAG maps the guideline corpus into a medical knowledge graph to enable community-based summarization and retrieval:

Graph Construction:
- Chunks of size ≈512 tokens (from the JSON corpus) are parsed with prompts to extract (head, relation, tail) medical triples.
- Entities and relations are added as graph nodes and labeled edges; Louvain community detection extracts clinical subgraphs (e.g., grouped by therapy subtype or disease stage).
- Each community is summarized by an LLM, forming meta-nodes capturing clinical themes (e.g., “Hormone-receptor positive adjuvant chemo”).
Query Pipeline:
- Patient queries are embedded and used to retrieve the most relevant graph communities (meta-nodes), whose summaries are concatenated for prompt input.
- LLM is then prompted with these community summaries and patient context to synthesize a treatment recommendation, again with references.

Pseudocode for graph-based querying:

Input: patient_query Q
embedding_Q ← Embed(Q)
ranked_communities ← top_k_communities(embedding_Q, meta-graph)
prompt ← "Based on these community summaries: {ranked_communities}, generate a treatment plan for patient: {Q} with references."
P ← o1-preview(prompt)
return P

This approach enables explicit modeling of protocol dependencies and variant paths, supporting complex, relational reasoning beyond flat retrieval.

5. Comparative Experimental Evaluation

Personalized RAG++ was evaluated with 64 queries derived from 16 breast cancer patient vignettes (each with 4 clinical questions):

System	Adherence Rate	Missing	Unnecessary
ChatGPT-4	94% (22/24)	2	1
Graph-RAG	92% (23/25*)	4	0
Agentic-RAG	100% (24/24)	0	0

No hallucinations occurred in any system.
Agentic-RAG achieved 100% adherence (24/24), with zero missing or incorrect treatments.
Graph-RAG and baseline ChatGPT-4 showed some missing recommendations, proving the value of sufficiency loops and graph-contextualization.
Both pipelines provided detailed, reference-anchored outputs, increasing clinical interpretability.

*Slight count difference in Graph-RAG due to threshold-based filtering of low-relevance communities.

6. Limitations, Failure Modes, and Future Directions

Observed limitations:

Graph-RAG may miss nuanced or overlapping branches in protocols when communities are too coarse.
Agentic-RAG incurs computational overhead (multiple LLM calls per query) and is sensitive to prompt engineering.
Evaluation limited to breast cancer v3.2024; untested on other tumor types or guideline domains.

Future research directions:

Expansion to additional tumor domains (e.g., NCCN guidelines for lung, colorectal, prostate cancer).
Integration with real-time EHR for dynamic, profile-adaptive titration of therapy plans.
Active learning to refine retrieval thresholds and community detection parameters.
Rigorous user studies with practicing oncologists to quantify impact on clinical decision latency and recommendation accuracy.
Deeper integration of neuro-symbolic retrieval policies to further reduce omission rates.

7. Clinical and Technical Significance

Personalized RAG++ as instantiated in these pipelines demonstrates that:

Iterative, agentic LLM orchestration (Agentic-RAG) can match or exceed gold-standard protocol adherence, eliminating both hallucination and omission errors.
Structured graph representations (Graph-RAG) enable granular relational reasoning over complex, visual medical guidelines.
Both approaches succeed in providing highly transparent, reference-rich, and context-sensitive recommendations, a prerequisite for clinical decision support tools with real-world adoption potential.

The complementarity of multi-stage LLM logic and graph-based context representation in Personalized RAG++ represents a substantive advancement for trustworthy, interpretable, and protocol-compliant AI in health care (Mohammed et al., 6 Jan 2025).

PDF Markdown Chat (Pro)

References (1)

Developing an Artificial Intelligence Tool for Personalized Breast Cancer Treatment Plans based on the NCCN Guidelines (2025)

Follow Topic

Get notified by email when new papers are published related to Personalized RAG++.