Valencia Oncology Institute Foundation (IVO)
- The IVO is a clinical and research institution offering high-quality, annotated mpMRI datasets for validated computer-aided diagnosis in prostate cancer.
- Its dataset comprises 221 pre-biopsy mpMRI studies with manual lesion segmentations, PI-RADS scores, and biopsy-proven Gleason labels for robust clinical assessments.
- IVO’s workflow integrates multimodal data harmonization with an advanced 3D Retina U-Net framework to set new standards in automated lesion detection, segmentation, and Gleason grading.
The Valencia Oncology Institute Foundation (IVO) is a clinical and research institution that has become a structured source of prostate mpMRI datasets and expert-annotated prostate cancer ground truth data for computer-aided diagnosis (CAD) research. Its data were central to the paper "Deep Learning for fully automatic detection, segmentation, and Gleason Grade estimation of prostate cancer in multiparametric Magnetic Resonance Images" by Pellicer-Valero et al., which established new technical standards for automated analysis of clinically significant prostate cancer using deep learning on multiparametric MRI (Pellicer-Valero et al., 2021).
1. IVO mpMRI Dataset Characteristics
The IVO dataset consists of 221 pre-biopsy multiparametric MRI studies, one per patient. Lesion annotation and signal quality assurance were performed by radiologists with 2–7 years' expertise in mpMRI interpretation. Each case includes manual lesion segmentations and PI-RADS (Prostate Imaging Reporting and Data System) scores. Most studies contain 1–2 annotated lesions per patient (mean 1.04). Gleason scores, used as ground truth labels for clinical significance (Gleason Grade Group, GGG), were established by transperineal fusion-guided plus systematic template biopsy (20–30 cores per patient, 2–3 cores per ROI).
MRI was performed using a General Electric 1.5 Tesla platform with the following sequences:
- T2-weighted (available in all cases)
- Diffusion-weighted imaging (DWI) with b=100, b=500, b=1000 s/mm² (in 1.36% of cases b=1400 instead of b=1000)
- Apparent diffusion coefficient (ADC) maps (95.48% completeness)
- Dynamic contrast-enhanced (DCE) T1-weighted images (57.47% completeness; temporally sampled at 30 timepoints)
This dataset is notable for its biopsy-proven clinical endpoints and inclusion of cases with missing sequences, creating a realistic, heterogenous testing environment (Pellicer-Valero et al., 2021).
2. Pre-processing and Multimodal Data Harmonization
Volume crops of 160×160×24 voxels, centered on the prostate and resampled to 0.5×0.5×3 mm, were generated using BSpline interpolation (order 3; Gaussian for label masks). Per-channel intensity normalization was applied as:
where denotes the th percentile of channel intensities.
To harmonize across different acquisition protocols and to align with the ProstateX challenge dataset, modalities were mapped and zero-filled for missing sequences. DCE images were subsampled at t=10, 20, 30s. DWI b-values were organized into two canonical channels by mapping b=500 (IVO)/b=400 (ProstateX) to one, and b=1000/b=1400 (IVO)/b=800 (ProstateX) to the other. This yielded channel-aligned, intensity-normalized, 3D multichannel volumes suitable for robust CNN ingestion (Pellicer-Valero et al., 2021).
3. Automated Detection and Segmentation Framework
A 3D Retina U-Net architecture was trained, leveraging the harmonized dataset. This consists of:
- A 5-level encoder–decoder 3D U-Net backbone with residual and dense blocks (ResNet-101, batch normalization)
- RetinaNet feature pyramid detection heads for anchor-based lesion detection across pyramid scales (anchor aspect ratio 1:1:1, scales from 4×4×1 to 28×28×9 voxels)
- Voxel-wise segmentation head for lesion masks.
Detection heads apply a 3×3×3 convolutional subnetwork per feature map level for Gleason group (GGG) classification (via softmax) and bounding-box regression (smooth- loss).
Detection and segmentation outputs are ensembled by weighted box clustering (intersection over union threshold 1e–5) and voxel-wise averaging of segmentations, using five-fold cross-validation models and retaining the top-five epochs from each for a 25-model virtual ensemble (Pellicer-Valero et al., 2021).
4. Training Objectives and Loss Functions
The Retina U-Net in this context employed composite loss:
- Soft Dice loss for segmentation:
with and as the predicted and reference probabilities for voxel .
- Detection (classification) focal loss:
where is the probability assigned to the true class, with and for hard example focus.
- Bounding-box regression (smooth-):
- Multi-class cross-entropy for GGG classification.
Loss weights followed Jaeger et al. (2020).
5. Prostate Zonal Segmentation and Spatial Registration
A cascade of two 3D U-Nets produced detailed gland and zonal segmentations:
- Whole-prostate mask estimated from T2-weighted images.
- Central gland (CG) mask derived from joint input of T2 and whole-gland mask.
- Peripheral zone (PZ) mask defined as prostate minus CG.
Performance was confirmed (private test: mean Dice 0.94/0.93/0.87 for prostate/CG/PZ; NCI-ISBI test: 0.89/0.86/0.69).
Non-rigid registration aligned T2 and ADC sequences using B-spline grid deformation, Mattes mutual information, and optimizer selection (50 random inits per case; highest mean correlation between T2 and ADC within CG and PZ selected) (Pellicer-Valero et al., 2021).
6. Evaluation Metrics and Empirical Performance
At the GGG ≥ 2 threshold, the IVO test set (8 lesions, 9 patients) yielded:
| Level | AUC | Sens (max) | Spec (max) | Sens (bal) | Spec (bal) |
|---|---|---|---|---|---|
| Lesion | 0.945 | 1.000 | 0.800 | 0.875 | 0.920 |
| Patient | 0.910 | 1.000 | 0.762 | 0.889 | 0.810 |
PI-RADS ≥ 4 by IVO radiologists achieved:
- Lesion-level: Sens 0.882, Spec 0.558
- Patient-level: Sens 0.850, Spec 0.576
The automated system thus demonstrated higher specificity at comparable or superior sensitivity versus expert readers. No formal statistical significance (p-values, confidence intervals) was reported for these differences (Pellicer-Valero et al., 2021).
7. Clinical and Research Implications
The IVO dataset enabled rigorous, fully-automatic CAD system training and validation, using realistic, end-to-end clinical pipelines (including non-rigid registration, prostate and zonal segmentation, and multimodal harmonization). The demonstrated performance (AUC, sensitivity, specificity) indicates its potential as:
- A triage tool to flag high-risk cases, expediting diagnosis.
- A second-reader or safety-net for reducing observer variability.
- A provider of quantitative lesion maps and Gleason predictions to guide targeted biopsy.
- A testbed for future prospective clinical trials and integration into radiology workflows.
This suggests the IVO dataset and workflow provide a referential benchmark for developing, testing, and clinically translating AI-based mpMRI CAD in prostate cancer, particularly in settings with expert-verified annotations and biopsy-proven endpoints (Pellicer-Valero et al., 2021).