Papers
Topics
Authors
Recent
2000 character limit reached

MedDM: LLM-Executable Clinical Guidance Dataset

Updated 27 December 2025
  • MedDM is a large-scale, machine-actionable dataset of 1,202 Clinical Guidance Trees curated from 5,000 clinical documents covering over 500 diseases.
  • It employs a multi-stage pipeline with automated figure extraction, ML-driven flowchart detection, and rigorous expert curation to ensure acyclic, high-quality decision graphs.
  • The dataset supports LLM-driven, multi-turn clinical decision-making by providing structured workflows for differential diagnosis and treatment recommendations.

MedDM is a large-scale dataset of 1,202 LLM-executable Clinical Guidance Trees (CGTs), systematically extracted and curated from 5,000 published clinical practice guidelines, textbooks, and expert consensus documents across 12 hospital departments and encompassing over 500 distinct diseases. MedDM is designed to address the lack of specialized, machine-actionable clinical guidance data directly compatible with LLM frameworks, providing structured, workflow-driven resources for evidence-based clinical decision-making and dialogue systems (Li et al., 2023).

1. Dataset Construction Pipeline

MedDM was constructed via a multi-stage pipeline combining automated figure extraction, machine learning–driven flowchart detection, expert manual curation, and graph-structuring transformations. The process encompassed the following steps:

  1. Literature Collection: 5,000 medical sources spanning internal medicine, surgery, psychiatry, pediatrics, obstetrics & gynecology, emergency, anesthesiology, dermatology, oncology, infectious diseases, preventive care, and otolaryngology were systematically gathered. The target was flowcharts rooted at a symptom or disease, terminating in diagnostic or therapeutic endpoints.
  2. Automatic Figure Extraction: Each PDF page was rendered as an image using PyMuPDF. PaddleOCR’s layout analysis segmented all non-body-text regions, yielding approximately 100,000 figure candidates.
  3. Pre-screening for Flowcharts: A Faster R-CNN model, trained on a synthetic Flowchart BS dataset and real flowchart data (e.g., FR-DETR), was applied to identify images containing 8 or more basic flowchart shapes (Process, Decision, Start/End, Scan, Arrow), reducing the candidate set to roughly 2,300 flowcharts.
  4. Manual Selection: Five medically trained graduates applied strict inclusion/exclusion criteria: starting at a symptom/disease, ending at a diagnosis/treatment, exclusively using five basic shape types, and exhibiting detailed, unambiguous decision logic. This resulted in 1,202 high-quality flowchart images.
  5. Flowchart Recognition Pipeline:
    • Shape Detection: Faster R-CNN (mAP ≈ 97%) identified all relevant shapes.
    • Connector Recognition: After shape masking, OpenCV contour detection and convex hull computation extracted lines; DBSCAN clustering grouped line-endpoints as directed edges.
    • Text Recognition: CnOCR processed Chinese and English text within shapes.
    • Node Integration: Heuristic depth-first search merged bounding boxes with detected text and established parent-child references for the graph.
  6. Cross-Validation ("Calibration"): Eight verifiers conducted two rounds of node text, edge, and split validation, correcting errors and ambiguities.
  7. Flowchart-to-CGT Transformation:

    • Label Reconstruction: Edge "Yes/No" labels were promoted into child-node content.
    • Cycle Elimination: Any detected loop Nk+1P={N1,,Nk}N_{k+1} \in P = \{N_1,\dots,N_k\} resulted in removing the edge (Nk,Nk+1)(N_k, N_{k+1}) and introducing a new leaf node Nn+1N_{n+1}:

    P={N1,N2,,Nk},if Nk+1P then remove (Nk,Nk+1) and add Nn+1.P = \{N_1,N_2,\dots,N_k\},\quad\text{if }N_{k+1}\in P\text{ then remove }(N_k,N_{k+1})\text{ and add }N_{n+1}.

  • Shared-Child Replication: Nodes sharing children were duplicated to guarantee tree acyclicity.
  1. Final Validation: Seven additional student annotators employed a web interface to correct mislabeling and ensure clinical coherence.

2. CGT Representation and Data Schema

Each Clinical Guidance Tree is formulated as a directed acyclic graph G=(V,E)G = (V, E) with the following node partitions:

  • VrootV_{\text{root}}: Single root node (symptom/disease)
  • VcondV_{\text{cond}}: Condition nodes representing branching clinical questions
  • VactV_{\text{act}}: Leaf/action nodes denoting diagnoses or treatments

Edges EV×VE \subset V \times V express decision flow. Each condition node vVcondv \in V_{\text{cond}} is equipped with a decision function:

$d_v\colon \mathcal H \to \{\textsc{Yes},\textsc{No},\textsc{Undet}\}$

with H\mathcal H representing the patient–LLM dialogue history.

Node JSON Schema

1
2
3
4
5
6
7
8
9
{
  "id": "string",
  "type": "root" | "condition" | "action",
  "content": "natural-language question or action text",
  "parent_id": "string|null",
  "children": [
    { "label": "Yes" | "No", "id": "child_node_id" }
  ]
}

LLM-Interactive IEET

CGTs are linearized into an If-Elif-Else Tree (IEET) structure for LLM consumption:

1
2
3
4
5
6
If <condition₁>:
    <subtree>
Elif <condition₂>:
    ...
Else:
    <action>
This representation enables direct, multi-turn interaction consistent with clinical reasoning workflows.

3. Dataset Statistics and Coverage

MedDM encompasses 1,202 CGTs, comprised of 443 Differential-Diagnosis (DD) and 759 Treatment-Recommendation (TR) trees.

Departmental and Tree Breakdown

Department DD Trees (n, %) TR Trees (n, %)
Internal Medicine 167 (37.6%) 36 (4.7%)
Surgery 59 (13.3%) 6 (0.8%)
Pediatrics 5 (1.1%) 52 (6.8%)
OB/GYN 7 (1.5%) 131 (17.2%)
Emergency 72 (16.3%) 12 (1.6%)
Psychiatry 2 (0.5%) 18 (2.4%)
Anesthesiology 28 (6.3%) 221 (29.1%)
Dermatology 2 (0.5%) 1 (0.1%)
ENT ("Five Senses") 79 (17.8%) 119 (15.7%)
Oncology 10 (2.3%) 110 (14.5%)
Infectious Diseases 7 (1.6%) 30 (3.9%)
Preventive Care 5 (1.1%) 23 (3.0%)
  • Minimum nodes per tree: 8
  • Maximum observed: ≈ 24
  • Mean nodes/tree: ≈ 12.3
  • Disease/syndrome coverage: >500 distict entities

This distribution demonstrates broad representation across standard clinical subspecialties, with focused depth in anesthesiology, oncology, ENT, OB/GYN, and internal medicine.

4. Representative Content Structure and Examples

Each CGT encodes a rigorously validated clinical reasoning workflow. For example, the dyspnea differential-diagnosis tree employs a multi-level condition-action structure.

Dyspnea CGT (JSON excerpt):

1
2
3
4
5
6
7
8
9
[
  {"id": "N1", "type": "root", "content": "Dyspnea", ...},
  {"id": "N2", "type": "condition", "content": "Have you had any fever symptoms?", ...},
  {"id": "N3", "type": "condition", "content": "Do you feel any chest pain?", ...},
  {"id": "N4", "type": "action", "content": "Diagnosis: pneumonia. Recommend chest CT and start antibiotics.", ...},
  {"id": "N5", "type": "action", "content": "Consider pulmonary embolism. Order D-dimer, CTPA.", ...},
  {"id": "N6", "type": "action", "content": "Likely chronic heart failure. Obtain echocardiogram.", ...},
  {"id": "N7", "type": "action", "content": "Unable to determine chest pain—ask additional symptoms.", ...}
]
Linearized IEET Structure:

1
2
3
4
5
6
7
8
9
If you have had any fever symptoms?
  If you feel any chest pain?
    - Yes: Diagnosis: pneumonia; recommend chest CT + antibiotics
    - No: Diagnosis: pulmonary embolism; order D-dimer, CTPA
    - Else: “Please tell me whether you have cough with sputum.”
Elif you have not had fever
  “Do you experience leg swelling?”
Else
  “I’m not sure—please describe any other symptoms.”
Further trees address acute ischemic stroke treatment and additional differential/treatment logic following similar conventions.

5. Reasoning Methods and Patient–LLM Dialogue Framework

MedDM operationalizes a Clinical Decision Making (CDM) engine, executing the CGT as a multi-turn LLM–patient system:

  • At each node vv, a prompt is constructed from (a) the patient’s history HH and (b) vv’s clinical condition text cvc_v.
  • The LLM infers a verdict from $\{\textsc{Yes},\textsc{No},\textsc{Undetermined}\}$, directing traversal to the corresponding child node or, if undetermined, triggering a follow-up question for clarification and updating HH.
  • At an action node, the system outputs the final diagnostic or therapeutic decision.

Pseudocode:

1
2
3
4
5
6
7
8
9
10
11
12
13
function Diagnose(G, History):
  node  G.root
  while node.type != "action":
    verdict  LLM_infer(prompt=History + node.content)
    if verdict == "Yes":
      node  node.child["Yes"]
    elif verdict == "No":
      node  node.child["No"]
    else:
      q  LLM_generate_question(node.content)
      History += Patient_answer(q)
      continue
  return node.content
For CGT retrieval, patient histories are converted to professional medical narratives (using, e.g., ChatGPT), vectorized, and matched against the CGT database via cosine similarity:

t=argmaxiqtiqtit' = \arg\max_{i} \frac{q \cdot t_i}{\|q\| \|t_i\|}

This retrieval step ensures that interaction adheres precisely to evidence-based workflow logic, eliminating hallucination and guaranteeing traceability to accepted medical guidelines.

6. Dataset Accessibility and Integration

MedDM is available for research and educational purposes under the CC-BY-NC 4.0 license. Upon public release (GitHub URL forthcoming), the repository will include:

  • All 1,202 CGTs in JSON
  • IEET-formatted text files
  • Extraction and transformation code

Typical Download and Setup:

1
2
3
git clone https://github.com/YourOrg/MedDM.git
cd MedDM
pip install -r requirements.txt
Integration Workflow:

  • Indexing: Vectorize each CGT file using embedding models (e.g., OpenAI, Sentence-BERT) and load vectors into a retrieval DB (e.g., FAISS, Weaviate).
  • Inference Loop: Collect patient prompt, rewrite to clinical form with LLM, embed, retrieve top CGT match. Load CGT, convert to IEET as needed, and invoke the diagnostic function.
  • Prompt Design: Instructions such as “You are a clinical decision-support assistant. Follow the tree logic exactly.” are prepended to the IEET logic.

By adhering to this structure, practitioners can enable LLM frameworks to conduct multi-turn, guideline-constrained diagnostic and treatment reasoning in alignment with medically validated standards (Li et al., 2023).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Whiteboard

Topic to Video (Beta)

Follow Topic

Get notified by email when new papers are published related to MedDM Dataset.