ImageTBAD Dataset for 3D CT Segmentation
- ImageTBAD is a benchmark dataset of 100 pre-operative 3D CTA studies of Stanford Type-B aortic dissection, offering detailed voxel-level annotations for TL, FL, and FLT.
- The dataset supports advanced segmentation research with a two-stage 3D U-Net pipeline and robust preprocessing to achieve high accuracy for TL and FL delineation.
- It features rigorous annotation protocols and quality control processes, addressing the challenge of FLT segmentation critical for clinical decision-making in endovascular therapy.
ImageTBAD is a curated collection of three-dimensional computed tomography angiography (CTA) studies of Stanford Type-B aortic dissection (TBAD), annotated at the voxel level for true lumen (TL), false lumen (FL), and false lumen thrombus (FLT). It is the first publicly available dataset to provide complete manual segmentations for all three key substructures in TBAD, specifically enabling algorithmic research into the challenging problem of FLT segmentation, which is critical for accurate morphological assessment and clinical decision-making in endovascular therapy and prognosis (Yao et al., 2021, Mikhailapov et al., 19 Dec 2025).
1. Dataset Structure and Composition
- Cohort and Imaging Acquisition: The ImageTBAD dataset comprises 100 pre-operative CTA volumes from patients with Stanford Type-B aortic dissection, collected between January 2013 and April 2015 at Guangdong Provincial People’s Hospital. Each volume represents a single patient study.
- Scanning Parameters: Scans were acquired on Siemens SOMATOM Force and Philips 256-slice Brilliance iCT systems. Volumes have a 512 × 512 in-plane matrix, 135–416 slices per study, and typical reconstructed voxel size of approximately 0.25 × 0.25 × 0.25 mm³.
- Class Prevalence: All volumes contain TL and FL; 68 out of 100 exhibit at least one FLT region, while 32 are FLT-negative.
- Data Formats: The CTAs and their corresponding segmentation masks are distributed in the NIfTI-1 (.nii/.nii.gz) format. Preprocessing for neural network input includes intensity normalization, isotropic resampling, zero-padding to standardized volume cubes, and region-of-interest cropping (Yao et al., 2021, Mikhailapov et al., 19 Dec 2025).
2. Annotation Protocol and Quality Control
- Segmentation Classes:
- True lumen (TL): The aorta’s original flow channel connected to the left ventricle.
- False lumen (FL): The secondary flow channel created by the dissection, excluding thrombus.
- False lumen thrombus (FLT): Coagulated blood regions (thrombus) within the false lumen, possibly partial or confluent.
- Procedure: Annotation was performed in a two-pass scheme by cardiovascular radiologists—one provided manual delineations on each slice; a second reviewed and corrected contours. Average annotation time was 1–1.5 hours per case.
- Annotation Characteristics: FLT is highly variable in spatial extent, shape, and location. It may present as small, discontiguous islands or as large, continuous masses spanning the FL. Approximately 1–5% of total aortic voxels in FLT-positive cases are labeled as FLT (Yao et al., 2021).
- Inter-annotator Agreement: Not explicitly reported in the literature; subsequent studies refer to the original methodology for annotation details (Mikhailapov et al., 19 Dec 2025).
3. Preprocessing and Data Handling
- Preprocessing Steps: All volumes are intensity-windowed/leveled, isotropically resampled, and spatially normalized via cropping or padding. Region-of-interest extraction is achieved through a coarse segmentation pass, after which volumes are resampled to standardized dimensions (e.g., 64³ or 96³ for baseline segmentation; 128³ in semi-supervised setups).
- Data Augmentation: Random rotations, scaling, elastic deformations, and intensity jittering are consistently applied, reflecting best practices for 3D medical image model training (e.g., following Payer et al. parameters).
- Intensity Processing: Secondary pipelines may include Hounsfield value clipping, morphological erosion, mean–variance standardization, exponential contrast transformations, and min-max normalization (Mikhailapov et al., 19 Dec 2025).
4. Baseline Segmentation Architectures and Quantitative Results
- Two-Stage 3D U-Net Pipeline (Reference implementation):
- Multi-task prediction: Simultaneous aorta, TL, and FL estimation, with FLT derived via set subtraction; or
- Direct 3-class segmentation: Explicit prediction of TL, FL, and FLT.
- Architecture: Four resolution stages, each with 3 × 3 × 3 convolutions, batch-norm, ReLU, and 2 × 2 × 2 down/upsampling. Channel progression per stage is N, 2N, 4N, 8N (N = 32 or 64).
- Training Regimen: Optimization uses a composite Dice and cross-entropy loss, Adam optimizer (β₁ = 0.9, β₂ = 0.999, lr = 1e−3), early stopping, and 3-fold cross-validation stratified to balance FLT presence (Yao et al., 2021).
- Performance Metrics (best Approach B, input S = 96):
- Aorta Dice: 0.91 ± 0.04
- TL Dice: 0.85 ± 0.07
- FL Dice: 0.78 ± 0.21
- FLT Dice: 0.52 ± 0.40
- Hausdorff distances (not specified) are also reported for boundary error characterization.
- Evaluation Notation:
- Dice:
- Jaccard:
- Hausdorff:
FLT segmentation remains a major limitation (mean Dice ≈ 0.52), in contrast to near–state-of-the-art TL and FL segmentation (Yao et al., 2021).
5. Usage in Subsequent Research and Expanded Experiments
- Semi-Supervised Learning Extensions: Recent work evaluates semi-supervised learning with CNNs featuring multiple output branches, which segment TL, FL, and FLT simultaneously. These approaches leverage both labeled (manually annotated) and unlabeled ImageTBAD volumes using rotation- and flip-consistent consistency regularization; they are architecture-agnostic and do not rely on probabilistic soft assignments.
- Relevant Losses: Advanced training uses a composite of Generalized Dice Loss (GDL) and Focal Loss, parameterized for class imbalance and implemented as:
with GDL and Focal defined in (Mikhailapov et al., 19 Dec 2025). Exponential moving average (EMA) updates provide pseudo-label targets.
- Experimental Configuration: In these studies, volumes are resampled to 128 × 128 × 128 grids, and data splits typically follow an 80/20 train-validation strategy, with repeated random partitions to assess robustness (Mikhailapov et al., 19 Dec 2025).
6. Distribution, Licensing, and Access
- Data Release: The ImageTBAD dataset, including CTAs and NIfTI-1 segmentations, is available via Zenodo and GitHub (project repository, citation to (Yao et al., 2021)).
- Licensing: Dataset is released under CC BY-NC 4.0, permitting academic, non-commercial usage with required attribution. No additional restrictions apply beyond citation.
- Code Availability: Baseline code for the reference 3D U-Net segmentation pipeline accompanies the dataset. Additional preprocessing and experiment code is found in later publications (Yao et al., 2021, Mikhailapov et al., 19 Dec 2025).
7. Research Applications and Challenges
- Clinical Applications: Enables quantification of TL, FL, and especially FLT—critical for stratification of rupture risk, surgical planning (e.g., stent graft sizing and placement), and longitudinal assessment of post-TEVAR (thoracic endovascular aortic repair) remodeling.
- Methodological Research: The pronounced difficulty of FLT segmentation—due to low tissue contrast, irregular geometry, and extreme class imbalance—makes ImageTBAD an essential testbed for developing advanced segmentation models. Potential approaches include:
- Multi-scale attention and boundary-aware networks for small, irregular targets
- Anatomical priors or graph-based regularization for spatial consistency
- Adversarial or contrastive loss augmentation targeting the FLT-vs-FL boundary
- Extension Opportunities: Morphological quantification (volume, surface area, length) supports novel quantitative endpoints for outcome prediction and patient stratification (Yao et al., 2021).
ImageTBAD is a benchmark dataset specifically designed to catalyze research on automated, fine-grained quantification of Type-B aortic dissection anatomy with reliable, public segmentation ground truth for TL, FL, and FLT. Its open availability and rigorous annotation protocol address a longstanding bottleneck in cardiovascular image analysis (Yao et al., 2021, Mikhailapov et al., 19 Dec 2025).