Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
41 tokens/sec
GPT-4o
60 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
8 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

A Smart Multimodal Healthcare Copilot with Powerful LLM Reasoning (2506.02470v1)

Published 3 Jun 2025 in cs.AI

Abstract: Misdiagnosis causes significant harm to healthcare systems worldwide, leading to increased costs and patient risks. MedRAG is a smart multimodal healthcare copilot equipped with powerful LLM reasoning, designed to enhance medical decision-making. It supports multiple input modalities, including non-intrusive voice monitoring, general medical queries, and electronic health records. MedRAG provides recommendations on diagnosis, treatment, medication, and follow-up questioning. Leveraging retrieval-augmented generation enhanced by knowledge graph-elicited reasoning, MedRAG retrieves and integrates critical diagnostic insights, reducing the risk of misdiagnosis. It has been evaluated on both public and private datasets, outperforming existing models and offering more specific and accurate healthcare assistance. A demonstration video of MedRAG is available at: https://www.youtube.com/watch?v=PNIBDMYRfDM. The source code is available at: https://github.com/SNOWTEAM2023/MedRAG.

Misdiagnosis is a significant issue in healthcare, leading to increased costs and risks for patients. Traditional AI-assisted diagnostic systems often struggle to integrate diverse information sources and provide robust reasoning, particularly when distinguishing between similar conditions.

To address these limitations, the paper introduces MedRAG (Zhao et al., 3 Jun 2025 ), a smart multimodal healthcare copilot. MedRAG aims to enhance medical decision-making by combining powerful LLM reasoning with a novel Knowledge Graph (KG)-elicited reasoning approach. It supports multiple input modalities to capture information comprehensively from clinical workflows.

The key features and components of MedRAG include:

  1. Multimodal Input: MedRAG is designed to accept input from three primary sources to mirror real-world clinical interactions:
    • Non-intrusive voice monitoring: Captures doctor-patient conversations in real-time during consultations, enabling context-aware recommendations without disrupting interaction. This is implemented using Google's Speech-to-Text API.
    • General medical queries: Allows doctors to type questions, refine diagnoses, or seek personalized treatment suggestions interactively.
    • Electronic Health Records (EHRs): Supports uploading or processing existing EHRs to analyze past cases and provide data-driven support for complex decisions.
  2. Knowledge Graph-elicited Reasoning RAG: This is the core analytical engine. It enhances the standard Retrieval-Augmented Generation (RAG) approach by integrating structured medical knowledge from a KG.
    • Diagnostic Knowledge Graph: A hierarchical KG is constructed from an EHR database. Diseases with similar manifestations are clustered, and manifestations are decomposed into unique features. GPT-4o is used to augment this KG by expanding unique features to help differentiate similar diseases. Given patient manifestations, MedRAG identifies the most relevant subcategory and extracts associated disease, relation, feature\langle \text{disease, relation, feature} \rangle triplets as contextual information for the LLM.
    • Retrieval-Augmented Generation: The system retrieves relevant EHRs from a database based on semantic similarity with the patient's input data. OpenAI's text-embedding-3-large API is used for generating embeddings, and cosine similarity measures relevance. The top 3 relevant EHRs are typically selected as additional context for the LLM, mitigating hallucination and grounding responses in case-specific information.
  3. Proactive Question Generation: During voice monitoring or after processing initial input, MedRAG assesses if the available information is sufficient for a confident diagnosis. If not, it consults the diagnostic KG to identify critical unmentioned features that could help differentiate between similar candidate diseases and formulates follow-up questions for the doctor.

MedRAG provides output in the form of diagnostic, treatment, medication, and follow-up questioning recommendations. The system is designed to be adaptable, supporting various open-source and closed-source LLMs as its backbone.

The paper evaluates MedRAG on public (DDXPlus) and private (CPDD) datasets. A case paper demonstrates MedRAG's ability to provide a more accurate and specific diagnosis (Lumbar canal stenosis) compared to general LLMs (Llama3.1, Mixtral, Qwen2.5), which gave broader or less precise suggestions (Sciatica, Radiculopathy). MedRAG also proactively suggested a relevant follow-up question in this case. Quantitative evaluation on the CPDD dataset shows that MedRAG using GPT-4o as the backbone LLM outperforms GPT-3.5-turbo and maintains performance across text and voice modalities, although with a slight drop for voice, likely due to speech-to-text errors.

A human evaluation involving four experienced doctors compared MedRAG's performance against GPT-4o on representative cases. MedRAG was assessed using five Human Factors criteria (Clinical Relevance, Trust, etc.) and scored higher than GPT-4o across all criteria, particularly in Adoption Intention, highlighting its evidence-based approach as a key differentiator favored by clinicians.

The system features a user-friendly interface built with Streamlit and CSS, allowing users to interact via speaking, uploading files, or typing. The chat history is maintained for easy access to past consultations.

In conclusion, MedRAG is presented as a promising multimodal healthcare copilot that leverages KG-elicited reasoning and retrieval-augmented generation to improve diagnostic accuracy and decision support. Its multimodal input capabilities and proactive question generation are designed to integrate seamlessly into clinical workflows, validated by both quantitative metrics and positive feedback from doctor evaluations. The code and a demonstration video are publicly available.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (4)
  1. Xuejiao Zhao (10 papers)
  2. Siyan Liu (13 papers)
  3. Su-Yin Yang (2 papers)
  4. Chunyan Miao (145 papers)
Github Logo Streamline Icon: https://streamlinehq.com

GitHub