Papers
Topics
Authors
Recent
Search
2000 character limit reached

RSNA 2023 Abdominal Trauma Dataset

Updated 28 January 2026
  • RSNA 2023 Abdominal Trauma Dataset is the largest annotated collection of adult abdominal CT scans for traumatic injuries, curated from 23 institutions worldwide.
  • It provides comprehensive labels at study, image, and voxel levels, enabling automated detection, classification, and segmentation for machine learning applications.
  • The dataset underpins benchmarking of advanced methods like nnU-Net and 2D-VoCo, addressing challenges such as class imbalance and inter-rater variability.

The RSNA 2023 Abdominal Trauma Dataset, formally the RSNA Abdominal Traumatic Injury CT (RATIC) dataset, constitutes the largest publicly available collection of adult abdominal CT studies annotated for traumatic injuries. Curated across 23 institutions spanning 14 countries and 6 continents, RATIC provides a multi-level labeled resource for automated detection, classification, and segmentation of abdominal trauma. The dataset, released for non-commercial use via Kaggle, underpins both the RSNA 2023 Abdominal Trauma Detection competition and a series of subsequent analyses aimed at advancing machine learning for traumatic injury assessment (Rudie et al., 2024).

1. Composition, Provenance, and Acquisition Protocols

RATIC consists of 4,274 adult patient studies, each comprising a de-identified abdominal CT scan. These originate from 23 institutions, including 11 in the United States, four in Canada, three in Australia, two in Germany, and single sites in Spain, Chile, Thailand, Taiwan, Malta, Ireland, Morocco, Bosnia and Herzegovina, Turkey, and Brazil. Each institution contributed studies acquired under diverse protocols to foster robustness and generalization for downstream development.

Inclusion criteria mandated adult patients (age ≥ 18), allowed only one CT per patient, and required contrast-enhanced series (portal venous, late arterial, split-bolus, multiphasic), while excluding non-contrast and delayed phase acquisitions. All scans were cropped to the abdominopelvic region using an automated anatomical pipeline and exhibited a maximum slice thickness of 5 mm, with thinner reconstructions preferred. Acquisition heterogeneity—across platforms, manufacturers, and imaging settings—remains intentionally preserved, with DICOM metadata (slice thickness, kVp, pixel spacing, etc.) maintained to maximize real-world applicability. HU-based phase determination was performed via measurement of aortic attenuation for each series (Rudie et al., 2024).

2. Annotation Schema, Labeling Protocols, and Injury Grading

Annotations were performed by volunteer, board-certified radiologists affiliated with the American Society of Emergency Radiology (ASER) and the Society of Abdominal Radiology (SAR), employing a consensus workflow to mitigate inter-rater variability.

Labeling granularity comprises:

  • Study-level injury grades: Assigned to the liver, spleen, left kidney, and right kidney, based on the AAST Organ Injury Scale (OIS, Grades I–V), subsequently grouped into “low grade” (I–III) and “high grade” (IV–V) categories.
  • Image-level binary annotations: Presence/absence of bowel or mesenteric injury and vascular injury (active extravasation).
  • Voxel-level segmentations: Manual annotations of five abdominal organs (liver, spleen, right kidney, left kidney, and bowel) provided as NIfTI masks for a subset of 206 series.

The annotation process consisted of independent triple grading of solid organ injuries (majority vote for consensus, committee adjudication for maximal disagreement), dual-grading for bowel/mesenteric injury (positivity upon either annotation), and triple-grading for active extravasation (positive if at least two agreed). Segmentations were first generated via nnU-Net models trained on TotalSegmentator and then curated by board-certified radiologists. Such practices address the high acknowledged inter-rater variability inherent to AAST grading and favor consensus labels over single-expert assessments (Rudie et al., 2024).

3. Data Structure, Organization, and Access

The dataset is structured in a hierarchy of DICOM series, NIfTI segmentations, and multiple tabular metadata files:

  • DICOM files: Organized as [patient_id]/[series_id]/[instance_number].dcm.
  • Segmentations: Labeled NIfTI files indexed by Series Instance UID; anatomical mask values are numerically encoded (1: liver, 2: spleen, 3: left kidney, 4: right kidney, 5: bowel).
  • Tabular metadata:
    • train_demographics_2024.csv: age, sex, patient_id
    • train_series_meta_2024.csv: series_id, aortic HU, incomplete organ flag
    • train_2024.csv: summary injury labels per organ, bowel/extravasation binary labels, any injury
    • image_level_labels_2024.csv: image-level annotations per slice
    • train_dicom_tags_2024.parquet: extracted DICOM header metadata

Metadata encompasses patient demographics (age, sex), imaging protocol details (phases, slice thickness, pixel spacing, kVp), and clinical injury annotations.

Licensing stipulates non-commercial research usage, with access contingencies including a Kaggle account and acceptance of dataset terms. The dataset is distributed in train, public test (404 cases), and private test (723 cases) partitions (Rudie et al., 2024).

4. Applications and Baseline Methodologies

RATIC is explicitly designed to support research in:

  • Automated detection, localization, and grading of traumatic abdominal injuries
  • Multi-organ segmentation and injury mapping
  • Model robustness benchmarking across heterogeneous acquisition parameters
  • Assessment of class imbalance mitigation and comparative weak/strong supervision strategies

The primary segmentation baseline is nnU-Net trained on TotalSegmentator, with further manual refinement for ground truth mask curation. Detection and grading baselines remain unreported, with the expectation that subsequent work will benchmark on this dataset (Rudie et al., 2024).

Preprocessing recommendations include DICOM key tag harmonization, FOV cropping via anatomical atlas mapping, intensity normalization (windowing to soft-tissue), and strategies for class imbalance such as over/under-sampling and synthetic augmentation. Standard 3D augmentations—random rotations, scaling, elastic deformation, intensity jitter—are also encouraged (Rudie et al., 2024).

5. Benchmarking and Downstream Evaluation: 2D-VoCo Example

The dataset has been leveraged in the development of computationally efficient self-supervised learning, notably in the 2D-VoCo framework for multi-organ injury classification (Chiu et al., 21 Jan 2026). 2D-VoCo adapts volume contrastive learning to operate at the slice (2D) level, avoiding prohibitive memory costs of 3D-volume approaches.

  • Pre-training: EfficientNetV2 backbone is trained using a momentum-based student-teacher scheme. Each CT volume yields 32 sampled slices, each partitioned into “base” crops and a random reference crop. The pre-training objective comprises intra-slice loss (Lintra\mathcal{L}_{\mathrm{intra}}), inter-patient alignment (Linter\mathcal{L}_{\mathrm{inter}}), and a regularization term to prevent representational collapse, explicitly formalized as:

Lintra(fc,fbi)=log ⁣(1ricos(fc,fbi)) Linter=log ⁣(1cos(fc,fbi)cos(fc,fbi)) Lreg=i<jcos(fbi,fbj) L2D-VoCo=iLintra+iLinter+i<jLreg\mathcal{L}_{\mathrm{intra}}(f_c, f_{b_i}) = -\log\!\bigl(1 - \bigl|\,r_i - \cos(f_c,f_{b_i})\bigr|\bigr) \ \mathcal{L}_{\mathrm{inter}} = -\log\!\bigl(1 - \bigl|\cos(f_c,f_{b_i}) - \cos(f'_c,f'_{b_i})\bigr|\bigr) \ \mathcal{L}_{\mathrm{reg}} = \sum_{i<j} \bigl|\cos(f_{b_i},f_{b_j})\bigr| \ \mathcal{L}_{\mathrm{2D\text{-}VoCo}} = \sum_i \mathcal{L}_{\mathrm{intra}} + \sum_i \mathcal{L}_{\mathrm{inter}} + \sum_{i<j}\mathcal{L}_{\mathrm{reg}}

  • Fine-tuning: The pre-trained backbone’s slice-wise features are integrated into a bidirectional LSTM for multi-organ classification (kidney, liver, spleen; each with 3 classes: healthy, low, high).

Comparison to ImageNet-only and single-organ models demonstrates that 2D-VoCo consistently improves RSNA score, mAP, precision, and recall, with further generalization gains achieved by incorporating extra unlabeled data (FLARE23). For example, using EfficientNetV2-T, RSNA score improved from 0.4133 (ImageNet) to 0.3818 (2D-VoCo RATIC) and further to 0.3777 with additional FLARE23 data; mAP, precision, and recall metrics followed the same trend. The multi-organ model also outperformed single-organ models, confirming the benefit of contextual learning (Chiu et al., 21 Jan 2026).

6. Limitations and Future Development

Identified limitations of RATIC include:

  • Persistent class imbalance, with lower prevalence of certain injury types than anticipated despite targeted enrichment.
  • Lack of delayed phase imaging, leading to under-representation of collecting system injuries.
  • High inter-rater variability in AAST injury grading, partially mitigated by consensus labeling but not eliminated.
  • Segmentations are limited to five organ classes; other injury types (hematomas, fractures, thoracic injuries) are present in images but not annotated.
  • 141 studies exhibit uncommon DICOM PixelData encoding; technical workarounds are documented on Kaggle.

Planned or anticipated future work (not explicitly detailed in the primary dataset publication) includes the potential addition of delayed phase scans, broader annotation of injuries (hematomas, fractures, thoracic trauma), and prospective curation of clinical datasets with state-of-the-art thin-slice multi-planar reformats and associated clinical priors (Rudie et al., 2024).

7. Significance and Impact

RATIC establishes a foundational resource for developing and benchmarking machine learning models in abdominal trauma diagnosis, offering unprecedented scale and annotation depth. Its diverse, multi-institutional origin and detailed labeling at voxel, slice, and study levels directly address major limitations in previous datasets, catalyzing methodological advances in supervised, weakly-supervised, and self-supervised approaches to clinical CT interpretation. Applications span automated triage, detection, grading of traumatic injuries, and investigation of domain generalization and robustness. Initial benchmarking with 2D-VoCo demonstrates substantial gains in precision, recall, and calibration for multi-organ injury detection, underscoring its relevance for both medical imaging and clinical machine learning research (Rudie et al., 2024, Chiu et al., 21 Jan 2026).

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to RSNA 2023 Abdominal Trauma Dataset.