Longitudinal CT Dataset Insights
- Longitudinal-CT datasets are collections of CT scans acquired at multiple time-points, enabling quantitative assessment of disease progression and treatment response.
- They incorporate advanced annotation, acquisition, and preprocessing methods to support robust deep learning, radiomics, and clinical outcome evaluations.
- Key challenges include spatial misregistration, annotation burden, and data heterogeneity, driving the need for innovative harmonization techniques.
Longitudinal-CT (Computed Tomography) datasets, defined as collections of CT imaging acquired on the same subject at two or more distinct time-points, enable quantitative assessment of disease progression, therapy response, body composition change, and prognostic biomarker dynamics. These datasets are foundational for deep learning, radiomics, and clinical studies requiring spatio-temporal imaging data. Common use cases include oncology lesion tracking, COVID-19 infection evolution, stroke prediction, body composition assessment, and clinical trial outcome evaluation. This article surveys the principal design, annotation, and analytic properties of state-of-the-art longitudinal-CT datasets as documented in recent literature.
1. Cohort Design and Acquisition Protocols
Longitudinal-CT datasets are characterized by repeated imaging at defined or variable intervals, typically in disease-monitoring or population studies. The cohort structure, scan intervals, and acquisition protocols are driven by clinical context:
- Oncology trials (e.g., MONALEESA): N = 203 metastatic breast cancer (mBC) subjects, each with screening and three follow-up chest CTs at weeks 8, 16, and 24 (total ≈812 scans); RECIST-compliant annotation for target lesions (Mukherjee, 12 Jan 2025).
- COVID-19 progression: 38 RT-PCR–confirmed adults (mean age 64±18 years), 2–3 low-dose non-contrast chest CTs each, with mean interval 17±10 days; exclusion for motion artifact or incomplete coverage (Kim et al., 2021).
- Stroke imaging: ISLES 2024, training set n=150, test set n=100; each case with acute CT trilogy (non-contrast, angiography, perfusion) on admission and follow-up MRI at 2–9 days; centers used multiple CT vendor platforms with wide-ranging parameters (Riedel et al., 2024).
- Abdominal CT in aging: BLSA, up to 12 scans over 15 years per subject, predominantly single 2D abdominal slices (n=1,033 subjects, n=4,223 images) at irregular intervals; typical protocol yields median 3 visits over two decades (Yu et al., 2023).
- Synthetic and challenge datasets: Synthetic paired CTs (n=2,625 simulated patients, 2 time-points each) emulate anatomical drift and lesion evolution for pretraining; challenge benchmarks (e.g., autoPET/CT IV: n=300 patients, 2 whole-body CTs each) provide real and synthetic pairs, standardized for multi-lesion promptable segmentation (Kirchhoff et al., 30 Aug 2025).
Scan parameters vary by setting but generally include axial acquisition, 120 kVp tube voltage, slice thickness 0.9–5 mm (thinner in chest, thicker in abdomen), and institutional or study-specific protocols for reconstruction and positioning.
2. Annotation and Ground Truth Curation
Annotation protocols are tailored to the intended analytic endpoint:
- Voxel-level segmentation: Manual annotation by expert thoracic radiologists (COVID-19, chest) in ImFusion Labels, with explicit labeling for healthy parenchyma, ground-glass opacity, consolidation, and pleural effusion. The process involves slice-by-slice contouring, cross-validated in three orthogonal planes, with manual post-hoc consistency checks. No semi-automated propagation was employed in these studies (Kim et al., 2021).
- Lesion tracking: For mBC trials, up to 3 RECIST target lesions per patient are manually segmented at each time-point. The RAMAC algorithm provides correspondence by rigid registration plus the Hungarian algorithm for 3D centroid matching (Mukherjee, 12 Jan 2025).
- Atlas-based propagation: Non-baseline lesion masks in spatio-temporal lesion segmentation datasets are propagated via rigid and non-rigid registration from baseline delineations and expert-reviewed (Kim et al., 13 Apr 2025).
- Single-slice and body composition: In abdominal datasets, 13 tissue classes (organs, muscle, fat depots, body masks) are either manually delineated (multi-organ, muscle, wall), semi-automatically expanded (abdominal wall), or clustered with unsupervised fuzzy c-means (fat, body mask). Supervised models (DeepLab-v3, U-Net) are deployed on large-scale annotated training subsets for pixelwise segmentation (Yu et al., 2022).
- Weakly supervised labels: Disease progression is sometimes labeled via radiology report extraction of summary measurements (e.g., SUV_max in PET/CT). Categories (progression, stable, resolution) are defined by change thresholds (e.g., ±25%) applied to these extracted values (Joshi et al., 2021).
Inter-rater variability is reported in some datasets, particularly ISLES 2024 (Dice = 0.86–0.90 for stroke DWI segmentation with two neuroradiologists), but not universally measured.
3. Data Formatting, Structure, and Preprocessing
Longitudinal-CT datasets employ standardized and study-specific data organization schemas:
- File format: Raw images are commonly stored in DICOM, with derived analytic datasets in NIfTI (for 3D/4D volumes) or PNG/TIFF (for 2D slices). Sidecar JSON files encapsulate metadata (e.g., acquisition date, days since symptom onset, scanner model, reconstruction parameters) (Kim et al., 2021, Kirchhoff et al., 30 Aug 2025).
- Directory layout: Hierarchical by patient ID and time-point, e.g.,
1 2 3 4 5
dataset_root/ ├── patient_001/ │ ├── T0.nii.gz │ ├── T1.nii.gz │ └── T2.nii.gz
- Preprocessing: Includes resampling to isotropic voxel size (e.g., 64×64×64), intensity normalization (min-max to [0,1] within HU range, e.g., –1024 to 2000), random cropping and augmentation (scale, rotation, flipping, HU shift), and explicit registration to harmonize field-of-view and lesion orientation (Kim et al., 13 Apr 2025, Mukherjee, 12 Jan 2025).
- Spatial harmonization: For axial slices with varied acquisition levels, algorithms (e.g., Body-Part Regression, C-SliceGen) assign a continuous "abdominal level" score or synthesize standard-level slices to reduce artifactual variance in body composition metrics (Yu et al., 2023).
- Synthetic pipeline: Anatomy-informed augmentations generate paired images with simulated anatomic drift and lesion morphodynamics, controlled by Gaussian deformation fields and contrast/attenuation shifts (Kirchhoff et al., 30 Aug 2025).
4. Quantitative Progression and Variability Metrics
Longitudinal-CT datasets are used to compute volumetric, morphological, radiomic, and clinical metrics:
- Voxelwise temporal change: Volumetric class quantification (e.g., for COVID-19: for tissue class ; infected fraction and change ), per-voxel progression maps () to assess disease dissemination or resolution (Kim et al., 2021).
- Radiomics and delta-radiomics: Extraction of 98 features per lesion and time-point (shape, first-order, GLCM, GLRLM, GLSZM, NGTDM, GLDM); delta radiomics () captures temporal dynamics (Mukherjee, 12 Jan 2025).
- Body composition: In abdominal work, visceral fat area (VFA) is computed as within HU-defined adiposity mask; normalized mutual information (NMI) and coefficient of variation (CV) quantify longitudinal stability (Yu et al., 2023).
- Longitudinal reproducibility: For BLSA abdominal CTs, intraclass correlation coefficient (ICC) and CV were computed for tissue area and mean intensity over repeated measures (, 2 years). Muscle and fat measures exhibit high ICC (), while small organs or those sensitive to slice position display low ICC () (Yu et al., 2022).
- Outcome modeling: For survival analysis in cancer, Cox proportional hazards models (additive, L1-penalized) and joint longitudinal-survival models are trained with patient-level features aggregating all tracked lesions and time-points, with model performance assessed by concordance index (C-index) and information criteria (DIC, WAIC, LPML) (Mukherjee, 12 Jan 2025).
5. Data Access, Availability, and Ethical Frameworks
Provenance, licensing, and sharing constraints for longitudinal-CT datasets are heterogeneous:
- Public datasets: ISLES 2024 (stroke, multi-modal CT + DWI) and autoPET/CT IV (oncology, lesion tracking) provide openly accessible annotated and challenge-structured training data, with withheld test sets for benchmarking (Riedel et al., 2024, Kirchhoff et al., 30 Aug 2025).
- Synthetic generation: The LesionLocator synthetic dataset and its codebase are licensed under CC BY-NC, supporting reproducible methods development for temporal segmentation/tracking (Kirchhoff et al., 30 Aug 2025).
- Requestable datasets: COVID-19 chest CT, OncoNet PET/CT, MONALEESA mBC studies require investigator application, IRB approval, and/or data-use agreements for access (Kim et al., 2021, Joshi et al., 2021, Mukherjee, 12 Jan 2025).
- Internal-only: Proprietary datasets remain internal to clinical institutions or industry sponsors, as in the case of the OmniMamba4D lesion cohort (Kim et al., 13 Apr 2025).
- Anonymization and compliance: All released datasets are de-identified. Additional anonymization steps include DICOM metadata stripping, facial defacing (e.g., TotalSegmentator, HD-BET), and noise injection into sensitive variables. Usage is strictly for academic/non-commercial research, with terms restricting re-identification or redistribution.
6. Technical and Practical Challenges
Several critical design and analytic challenges are evident across longitudinal-CT dataset studies:
- Longitudinal alignment: Slice positioning and spatial misregistration between time-points are persistent sources of measurement variability, particularly in single-slice or small-organ analyses (Yu et al., 2022, Yu et al., 2023). Approaches include registration-based harmonization, atlas propagation, and conditional generative harmonization (C-SliceGen).
- Annotation burden: Manual segmentation of high-resolution 3D or 4D volumes at multiple time-points is labor-intensive. Semi-automatic methods (atlas-based propagation, deep ensemble pre-labeling) are increasingly deployed, but consistently high inter-rater agreement is challenging to establish across large cohorts (Kim et al., 13 Apr 2025, Riedel et al., 2024).
- Weak supervision and sparse labels: When expert-annotated masks are unavailable, response labels derived from reports (e.g., ΔSUV_max) or from partial masks (e.g., only baseline provided for autoPET/CT IV) can support model development but may limit spatial precision (Joshi et al., 2021, Kirchhoff et al., 30 Aug 2025).
- Slice position confounds: In single-slice BLSA abdominal sets, inter-scan variability is principally driven by vertebral-level inconsistency rather than anatomic change, necessitating harmonization or robust QC metrics (e.g., monitoring ICC) (Yu et al., 2023, Yu et al., 2022).
- Data heterogeneity: Multi-center datasets (e.g., ISLES 2024) aggregate images from varied scanners, protocols, and acquisition windows, requiring sophisticated normalization and batch-effect correction procedures (Riedel et al., 2024).
In summary, longitudinal-CT datasets underpin modern research in temporal image analysis across many fields, offering rigorously annotated, often large-scale, time-resolved data for benchmarking spatio-temporal algorithms, quantifying progression, and linking imaging changes to clinical outcomes. Their construction and use demand specialized preprocessing, harmonization, and annotation strategies tailored to disease, organ system, and study objective. The landscape is evolving toward open, challenge-driven distributions, synthetic/real hybrid datasets, and increased methodological focus on mitigating spatial and temporal heterogeneity.