RadGenome-Chest CT: Region-Guided Dataset

Updated 26 March 2026

RadGenome-Chest CT is a comprehensive chest CT dataset featuring region-guided anatomical segmentation of 197 structures paired with grounded radiology report sentences.
It comprises over 25,000 3D CT volumes, 665,000 grounded report sentences, and 1.3 million VQA pairs, with segmentation and report links validated by radiologists.
The dataset supports multimodal tasks like segmentation, visual question answering, and radiogenomics, providing a robust benchmark for advanced medical imaging research.

RadGenome-Chest CT is a large-scale, region-guided chest computed tomography (CT) dataset paired with fine-grained anatomical segmentation masks and language groundings for multimodal machine learning and computer vision research in thoracic imaging, particularly for the development, evaluation, and benchmarking of explainable foundation models for medical vision-language tasks. Built upon the public CT-RATE collection, it uniquely links volumetric organ-level masks with grounded report sentences and region-specific visual question answering (VQA) pairs, enabling anatomically explicit interpretations and facilitating research in segmentation, diagnosis, radiogenomics, question answering, and report generation in chest CT imaging (Zhang et al., 2024).

1. Dataset Structure, Acquisition, and Annotation

RadGenome-Chest CT comprises 25,692 non-contrast 3D chest CT volumes from 21,304 unique patients, with all images paired to corresponding radiology reports. Inherited from CT-RATE, the train/validation split encompasses 24,128/1,564 volumes (20,000/1,304 patients, respectively). All volumes are resampled to isotropic (1×1×3 mm) voxel spacing and stored in either NIfTI or original DICOM format. Associated metadata—including windowing, anonymized demographics, and orientation—are provided per volume (the CT-Agent study reports only basic preprocessing and does not provide further demographic or scanner information (Mao et al., 22 May 2025)).

Annotation is multi-tiered:

Organ-level segmentation: Each scan is labeled for 197 anatomically hierarchized structures (e.g., lobes, vertebrae, vessels, heart chambers, mediastinum, pleura, bones, breasts, abdominal organs), with SAT (“Segment Anything for medical Tasks”) generating 3D instance masks via promptable segmentation. All 197 classes receive separate .nii.gz masks per volume (Zhang et al., 2024).
Grounded reports: Every sentence of each FINDINGS section is mapped to one or more anatomical regions through sentence–region associations, verified on validation by radiologists. For ten designated major regions in the CT-Agent pipeline (Trachea & Bronchi, Thyroid, Lung, Heart, Mediastinum, Pleura, Esophagus, Abdomen, Bone, Breast), region-level grounded reports are generated using automated GPT protocols for both training and test splits (Mao et al., 22 May 2025).
VQA pairs: Over 1.3 million region-level VQA pairs are constructed by filling four primary question templates (Abnormality, Presence, Location, Size), each referencing a specific region’s mask. The validation set is fully audited for annotation fidelity (Zhang et al., 2024).

Manual audit in RadGenome-Chest CT ensures >94.6% accuracy on sentence–region links and >93% answer accuracy for validation set VQA pairs.

Data Element	Total	Training Set	Validation Set
CT Volumes	25,692	24,128	1,564
Unique Patients	21,304	20,000	1,304
FINDINGS/IMPRESSION Reports	25,692	24,128	1,564
Segmentation Categories	197	–	–
Grounded Report Sentences	665,000	651,292	13,708
Region-level VQA Pairs	1,338,000	1,338,000	85,000
Case-level VQA Pairs	25,692	24,128	1,564

2. Construction and Grounding Methodology

Segmentation masks covering 197 organ-level categories are automatically generated using SAT, pre-trained across 72 datasets (498 classes), and guided by text prompts specific to RadGenome’s anatomical taxonomy. Region–report grounding proceeds in two steps: (1) GPT-4 annotates validation sentences, yielding >95% accuracy; (2) a fine-tuned GPT-2 model assigns sentences to regions at 94.6% accuracy based on this reference annotation (Zhang et al., 2024).

Named-entity recognition systems further extract “anatomy,” “abnormality,” and “non-abnormality” tokens from sentences, with abnormality labels for regions being filtered and standardized via GPT-4 prompts. All grounded outputs in the validation set undergo 100% manual review by radiologists for quality assurance.

The CT-Agent study employs a streamlined variant, annotating only ten major thoracic regions and producing GPT-generated region-level “grounded reports” and binary labels for presence/abnormality detection, relying entirely on automated protocols without reported inter-rater statistics (Mao et al., 22 May 2025).

No further patient demographics, scanner manufacturers/models, or explicit inclusion criteria are provided in these sources.

3. Data Organization, Access, and Preprocessing

Each RadGenome-Chest CT volume is delivered in a folder containing:

The CT image in NIfTI (or DICOM) format, with separate JSON metadata.
Segmentation masks for all 197 structures, each as a .nii.gz file named by structure.
Grounded report JSONL mapping sentences to regions and specifying anatomy/abnormality entities.
VQA JSONL, each entry specifying a question, corresponding region mask, and text answer (Zhang et al., 2024).

Recommended preprocessing includes Hounsfield Unit windowing to [–1,000, +400], z-score normalization, resampling to the provided 1×1×3 mm grid, and cropping to the nonzero bounding box of segmentation masks. The CT-Agent variant resamples all volumes to (1.5 mm in z, 1.0 mm × 1.0 mm in-plane) and crops each scan to 240 slices of 512 × 512, aligning with the CLIP ViT-B/16 input requirement (256 patches per slice); DICOM metadata is used to ensure precise HU conversion (Mao et al., 22 May 2025).

4. Example Use Cases and Workflows

RadGenome-Chest CT enables regionally grounded model development for both conventional and foundation model paradigms:

Grounded report generation: Models intake a CT volume and a specific organ/lobe mask to synthesize an anatomically linked summary sentence, directly supervised by paired mask–sentence groundings (Zhang et al., 2024).
Region-guided VQA: Models process the CT image, a region mask, and a specific question template (e.g., “What abnormality is present in the right lung upper lobe?”) to generate short, evidence-grounded answers, supporting both textual and segmentation outputs.
Case-level summarization: The IMPRESSION section text and case-level QA enable global diagnoses; this global output may be fused with regionally grounded generators for compound interpretability pipelines.
Radiogenomics/recurrence pipeline (specialized): Prior work using radiogenomics datasets analogous to RadGenome-Chest CT combines segmentation, deep feature extraction, and genomic label prediction (e.g., EGFR mutation status, or recurrence risk in NSCLC) (Navarrete et al., 2022, Aonpong et al., 2021).

5. Integration with Multimodal Agents: The Case of CT-Agent

CT-Agent is a multimodal, LLM-augmented agent optimized for regionally grounded 3D CT interpretation using RadGenome-Chest CT as its principal supervision source (Mao et al., 22 May 2025). The framework’s main design features include:

Anatomy-aware action space: A dedicated LoRA adapter for each of the ten anatomical regions. During training, cropped region subvolumes (via segmentation masks) are fed through a frozen CLIP ViT-B/16 backbone.
Token compression: Two-path strategy—global aggregation (mean-pooled, MoE-refined tokens across slices) and local selection (top-K via CLS-attention, remaining pooled by key similarity)—produces a condensed token matrix for efficiency and spatial sensitivity.
Template-driven QA/reporting: Region-specific QA is constructed by tokenizing both the preprocessed image region and standard question template (e.g., “Is there any abnormality in the Lung?”), facilitating direct LLM conditioning.
Parallel tool invocation: For report generation, region-specific tools synthesize short diagnostic statements in parallel. A sentence-embedding index (built on historical RadGenome reports) aids retrieval of exemplar findings, which the LLM then aggregates to form a complete, global report.
Supervision: Cropping and target text derive from RadGenome’s segmentation masks and grounded sentences, respectively.

CT-Agent has demonstrated particularly strong region-by-region performance gains on smaller, more heterogeneous structures; e.g., abnormality F1 for Trachea & Bronchi increases from 0.018 (baseline) to 0.790 (CT-Agent) (Mao et al., 22 May 2025).

6. Quantitative Benchmarks and Statistical Properties

Segmentation mask quality was validated by manual review of 200 slices (across 20 volumes): mean Dice coefficient >0.88, mean IoU >0.80 for the 20 most frequent organs. Sentence–region linkage reaches 94.6% accuracy in validation; VQA answer accuracy is >93% on audited validation data. Within the training set, 58% of scans have ≥1 abnormality; abnormality prevalence per structure varies (e.g., right lung upper lobe abnormal in 12%, pleura 8%, spinal canal 3%) (Zhang et al., 2024).

In multimodal QA, CT-Agent results (on a combined CT-RATE/RadGenome test set of 119,200 QA pairs) demonstrate the following average per-region performance (Mao et al., 22 May 2025):

Task	Baseline Precision/Recall/F1	CT-Agent Precision/Recall/F1
Presence Detection	0.728 / 0.526 / 0.589	0.734 / 0.615 / 0.646
Abnormality Identification	0.665 / 0.297 / 0.393	0.722 / 0.465 / 0.532

The largest F1 improvements are observed for small/highly variant structures.

No additional metrics (e.g., accuracy, ROC-AUC) are reported for CT-Agent on RadGenome-Chest CT, and report-generation metrics are available only for CT-RATE.

7. Comparative Analysis and Future Directions

RadGenome-Chest CT surpasses all prior public chest CT datasets in both anatomical granularity and vision-language task diversity. CT-RATE includes only 18 global abnormality labels for 20,000 patients without segmentation or sentence-level grounding. RadGenome-Chest CT provides 197 organ-level masks, 665,000 region–sentence associations, and over 1.3 million regionally grounded VQA pairs, with robust validation and open-source release scheduled (Zhang et al., 2024).

Radiogenomics studies have used related protocols to predict molecular signatures and recurrence (e.g., EGFR status, microarray-predicted recurrence) from CT images via hybrid pipelines blending classical radiomics, deep features, and neural surrogates for gene expression (Navarrete et al., 2022, Aonpong et al., 2021). These pipelines achieved segmentation Dice up to 75.26 (RA-Seg) and classification F1-macro to 0.95 (random forest/LDA for EGFR), or recurrence accuracy up to 83.28% by integrating “hallucinated” genotype vectors from image data.

A plausible implication is that RadGenome-Chest CT, by combining anatomical, linguistic, and—via protocol extension—molecular-level annotation, is positioned to serve as the reference standard for 3D chest CT multimodal learning, both for anatomically explicit reporting and for noninvasive radiogenomic risk stratification.

Future directions suggested include scaling models via multi-task and attention-enhanced surrogates, targeted expansion into specific molecular pathology use-cases, and more comprehensive external validation in prospective trials (Aonpong et al., 2021). The dataset’s open-source release is intended to facilitate these directions and broader adoption within the multimodal medical AI community.