RadGraph Annotations: Structural Clinical Insights
- RadGraph Annotations are a formal schema that extracts structured clinical information from radiology reports by identifying key entities and their relationships.
- They distinguish between anatomical and observational entities with assigned polarity, linking them through specific relations like modify and located_at.
- RadGraph2 enhances the original schema with hierarchical change annotations, improving clinical change tracking and enabling advanced image-to-report generation.
RadGraph Annotations provide a formal schema for extracting structured clinical content from radiology reports, focusing on entities and their relations to facilitate information extraction, downstream natural language processing, and multi-modal learning involving imaging data. The RadGraph family of schemas—originating with RadGraph, and later extended in RadGraph2—offer entity/relation taxonomies, annotation guidelines, and models designed for high-fidelity capture of findings, anatomy, diagnostic inferences, and disease/device progression. These resources underpin both standalone NLP tasks and content backbones in radiology report generation frameworks (Jain et al., 2021, Khanna et al., 2023, Yan et al., 2023).
1. Schema and Entity Taxonomy
RadGraph structures radiology report text by annotating two major classes of entities:
- Anatomical entities (ANAT): Text spans denoting anatomic structures (e.g., “lungs,” “pleural space,” “carina”). These are always considered present and are not assigned a polarity.
- Observational entities (OBS): Spans encoding observable findings or diagnoses, each with a polarity attribute:
- OBS-DP (“descriptor present”/“definitely present”): Finding confirmed (e.g., “opacity”).
- OBS-DA (“descriptor absent”/“definitely absent”): Finding explicitly negated (e.g., “no effusion”).
- OBS-DU (“descriptor uncertain”/“uncertain”): Finding hedged or marked as possible (e.g., “possible collapse”).
RadGraph2 expands this taxonomy via a hierarchical schema:
- Introduces a new top-level class, CHAN (Change), to capture explicit change statements and device progression, subdivided into:
- CHAN-NC: No change
- CHAN-CON: Condition change (with subtypes: appearance, worsening, improvement, resolution)
- CHAN-DEV: Device change (appearance, placement, disappearance)
Every annotated entity is linked to its minimal span in the report and labeled with origin (Findings or Impression section) (Jain et al., 2021, Yan et al., 2023, Khanna et al., 2023).
2. Relation Types and Graph Construction
RadGraph encodes three semantic relations as directed, labeled edges:
- modify: Connects a modifier (e.g., adjective, qualifier) to the entity it qualifies (e.g., “patchy” → “opacity”).
- located_at: Links an observational entity to the anatomical region it affects (e.g., “effusion” → “left lower lobe”).
- suggestive_of: Captures inferential or diagnostic links, representing when one entity suggests or implies another (e.g., “consolidation” → “pneumonia”) (Jain et al., 2021, Yan et al., 2023).
Formally, the schema is instantiated as a directed graph , where each is assigned and, if observational, a polarity . Each edge has (Yan et al., 2023).
RadGraph2 maintains these relations, extending “modify” to connect change entities (CHAN) to findings/devices and enforcing a hierarchical parent–child taxonomy (Khanna et al., 2023).
3. Annotation Guidelines and Quality Standards
Annotation is performed by board-certified radiologists using platforms such as Datasaur.ai, under detailed protocol:
- Span selection: Minimal spans, covering every mention of anatomy or observation, with explicit polarity assignment.
- Relation assignment: Modifiers are always labeled as entities; their links to head entities are designated via “modify.” “Located_at” attaches observations to anatomical regions; “suggestive_of” is used for plain or hedged diagnostic inferences.
- Section focus: Annotation is restricted to Findings and Impression report sections to maximize clinical content and minimize noise (Yan et al., 2023).
RadGraph2’s protocol adds:
- Explicit annotation of change statements (CHAN), with re-labeling where needed.
- Marking all mentions of change, even across repeated sentences.
- Hierarchical span selection for modifiers and device changes.
Inter-annotator agreement, measured by Cohen’s , is high: e.g., NER of 0.974 (MIMIC-CXR test) and 0.829 (CheXpert test); for RadGraph2, median pairwise = 0.9963 (Jain et al., 2021, Khanna et al., 2023).
4. Dataset Composition and Statistics
The original RadGraph dataset comprises:
- Dev set: 500 reports, 12,388 entities and 9,251 relations.
- Test set: 100 reports (each for MIMIC-CXR and CheXpert), ~1,300–1,470 entities and ~900–1,100 relations per split.
RadGraph2 expands this:
- 800 expert-annotated reports (23,457 entity spans, 17,373 relations).
- Entity type expansion: From 4 leaves in RadGraph ({ANAT-DP, OBS-DP, OBS-U, OBS-DA}) to 12 (adding 8 CHAN subtypes).
- Relation count increases by ~20% due to the inclusion of change-entity arcs.
A massive inference set (220,763 MIMIC-CXR reports, ~6M entities; 500 CheXpert reports) was automatically annotated using the RadGraph Benchmark model, enabling multi-modal and large-scale learning (Jain et al., 2021, Khanna et al., 2023).
5. Model Architectures and Information Extraction
RadGraph and RadGraph2 annotations are produced and consumed by neural models:
- DyGIE++ forms the basis, with BERT-based span encoders, entity/relation modules, and multi-label classifiers.
- RadGraph Benchmark achieved a micro F1 of 0.82 (MIMIC-CXR) and 0.73 (CheXpert) on relation extraction (Jain et al., 2021).
- HGIE (Hierarchical Graph IE, RadGraph2): Introduces a hierarchical recognition (HR) head, with conditional fine-to-coarse training and depth-dependent loss:
for gold ancestor path of each leaf . The total objective is:
- Performance: HGIE surpasses baseline models on RadGraph2 (micro-F1 for MIMIC-CXR: 0.879; CheXpert: 0.739) (Khanna et al., 2023).
RadGraph annotations have been integrated as the intermediate “content” layer in two-stage report generation: extraction of RadGraph from X-ray images precedes style-conditioned report generation via LLMs (e.g., GPT-3.5-Turbo) (Yan et al., 2023).
6. Applications and Illustrative Examples
RadGraph is used for:
- Radiology report mining: Extraction of findings, spatial reasoning, and inferences from unstructured text for research and clinical applications.
- Image-to-report generation: Employed as a structured “backbone” for generation systems, decoupling clinical content extraction from free-text “style” generation; this separation improves both clinical accuracy and stylistic adaptation (Yan et al., 2023).
- Disease and device tracking: RadGraph2 enables explicit modeling of disease progression, stability, and device changes, supporting longitudinal studies.
Example (RadGraph2):
- Sentence: “Compared to prior, the small left pleural effusion has increased slightly and the nasogastric tube is unchanged.”
- Annotations:
- Entities: “small” (OBS-DP), “left” (ANAT-DP), “pleural” (ANAT-DP), “effusion” (OBS-DP), “increased” (CHAN-CON-WOR), “slightly” (OBS-DP), “unchanged” (CHAN-NC), “nasogastric” (ANAT-DP), “tube” (ANAT-DP)
- Relations: modify(“small”, “effusion”), modify(“increased”, “effusion”), etc. (Khanna et al., 2023)
In image-to-text pipelines, RadGraph is serialized into a compact span-labeled string, converted from weakly connected components and preserving entity order; during generation, polarity (absent/uncertain) is prepended as “no” or “maybe” (Yan et al., 2023).
7. Impact and Extensions
RadGraph provides a scalable, high-agreement, radiologist-verified foundation for radiology report structuring. The dataset’s linkage to imaging via DICOM StudyInstanceUID allows direct application in vision-language pretraining and cross-modal learning. RadGraph2 demonstrates that imposing a hierarchical taxonomy yields consistent gains in entity and relation extraction and extensible representations for clinical change analysis, while maintaining compatibility and performance with earlier schemas. The annotation guidelines, taxonomy, and neural architectures are generalizable to domains requiring fine-grained, hierarchical information extraction from clinical or scientific text (Khanna et al., 2023, Jain et al., 2021).