Sunnybrook Cardiac Dataset

Updated 30 March 2026

Sunnybrook Cardiac Dataset is a cine-MRI collection rigorously annotated for left ventricular segmentation with expert delineated contours and stratified patient cohorts.
It is structured with a standardized split for training, validation, and testing, and employs robust preprocessing and augmentation protocols to address limited data challenges.
The dataset has catalyzed advanced segmentation methods—such as GBU-Net, MS-FCN, and DPM—that achieve high accuracy and robust performance on heterogeneous imaging data.

The Sunnybrook Cardiac Dataset is a rigorously annotated, cine-MRI benchmark developed to enable the quantitative analysis of left ventricular structure and function. Originally assembled for the MICCAI 2009 Left Ventricle Segmentation Challenge at Sunnybrook Health Sciences Centre, it has provided a gold-standard for algorithmic development and comparison in the field of automated cardiac image segmentation. The dataset comprises expert-delineated short-axis cine-MRI studies, stratified by pathophysiological cohort and supported by controlled acquisition protocols and thoroughly specified ground-truth contours. Its compact yet heterogeneous character has necessitated technically sophisticated machine-learning approaches tailored to limited-data regimes.

1. Dataset Composition and Ground-Truth Protocols

The dataset contains 45 cardiac cine MRI cases, totaling 805–1,200 2D short-axis images, depending on the source count (reflecting sampling granularity or inclusion of both ED and ES frames) (Chu et al., 4 Jan 2026, Mo et al., 2017). Each case was acquired as a stack of 6–12 slices at end-diastole and end-systole, with each slice reconstructed onto a 256×256 grid at 1.3–1.4 mm in-plane resolution and 8 mm slice thickness. Patient cohorts were defined for comparative disease analysis: 12 cases with heart failure and infarction, 12 with heart failure without infarction, 12 with left ventricular hypertrophy, and 9 healthy controls (Chu et al., 4 Jan 2026). Between 12 and 28 slices per patient were acquired, with corresponding DICOM headers embedding additional protocol parameters.

Expert delineation was performed by experienced cardiologists or cardiac radiologists. The ground truth comprises both endocardial and, where available, epicardial contours, traced manually for each short-axis slice at two principal cardiac phases (end-diastole and end-systole). Each contour is typically encoded as a list of vertices or as a binary mask (value 1 within region of interest; 0 outside) for computational evaluation (Mo et al., 2017).

2. Data Partitioning and Preprocessing

The standardized split for algorithm development partitions the dataset into 15 training, 15 validation, and 15 “online” (test) cases (“15:15:15” split), maintaining close alignment with the MICCAI challenge protocol (Chu et al., 4 Jan 2026, Kang et al., 2018). In most studies, training is performed solely on the 15 training cases, with validation for hyperparameter tuning on the next 15 and performance reporting on both the validation and held-out test set. Aggregation of results over all 45 cases is common for state-of-the-art comparison.

Preprocessing protocols vary but generally include spatial normalization and intensity scaling. All images were resampled to 256×256 pixels (Mo et al., 2017, Kang et al., 2018). Some approaches perform linear intensity normalization to [0,1] per volume (Mo et al., 2017), while others rely on in-network normalization layers. Extensive spatial augmentation is often necessary due to the dataset’s limited size; affine translation, rotation (by 90°, 180°, 270°), flipping, elastic deformation, and cropping (center-extraction of 108×108 or 64×64 patches) are standard to increase anatomical variability and mitigate class imbalance (Chu et al., 4 Jan 2026, Kang et al., 2018).

3. Methodological Applications and Representative Algorithms

The Sunnybrook Cardiac Dataset has formed the core of evaluation for several advanced left ventricular segmentation pipelines:

GBU-Net: A group-batch-normalized U-Net derivative, incorporating a fully convolutional encoder–decoder, skip connections, and ensemble inference. Augmentations include affine and elastic transformations; normalization is handled through batch/group or hybrid layers to accommodate variable batch sizes. The architecture leverages Cropping2D and drop-connections for spatial and regularization efficiency, achieving Dice = 0.97 (std = 0.02), sensitivity = 0.98, and APD = 1.88 mm on held-out test cases (Chu et al., 4 Jan 2026).
MS-FCN: A multi-scale fully convolutional network with an encoder that incorporates a Multi-Scale Pooling Module (MSPM) and a decoder with dense connection upsampling. Key design elements include four-stride parallel max-pooling streams with learnable deconvolution for context aggregation, dense decoder fusion for boundary refinement, and skip connections for spatial detail preservation. Augmentation includes translation, cropping, rotation, and flipping for 40× data expansion. MS-FCN achieves Dice = 0.93 (endocardium), 0.96 (epicardium), with APD of 1.61 mm and a good contour rate (APD < 5 mm) near 98% (Kang et al., 2018).
Deep Poincaré Map (DPM): An iterative policy-based segmentation leveraging a CNN regressor to guide a virtual agent along vector fields derived from customized dynamical priors. Ground-truth contours define a vector field whose limit cycle matches the anatomical boundary, with stopping via a Poincaré map-based criterion. DPM attains Dice = 0.92 (endo), 0.95 (epi), with APD ≈ 1.76 mm and >97% good contour rate on the Sunnybrook dataset (Mo et al., 2017).

4. Evaluation Metrics and Comparative Benchmarks

Performance assessment relies on region overlap and boundary metrics:

Dice coefficient ( $\mathrm{Dice}(A, B) = \frac{2|A \cap B|}{|A| + |B|}$ ).
Jaccard index ( $\mathrm{Jaccard}(A, B) = \frac{|A \cap B|}{|A \cup B|}$ ).
Average Perpendicular Distance (APD), quantifying mean boundary error in millimeters.
Hausdorff distance for maximal contour deviation.
Good contour rate, defined as the proportion of slices with APD ≤ threshold (commonly 5 mm).

Recent approaches report:

Method	Dice (Endo)	Dice (Epi)	APD (Endo) [mm]	APD (Epi) [mm]	Good Contours (%)
GBU-Net	0.97	—	1.39–1.88	—	—
MS-FCN	0.93	0.96	1.61	1.61	98.35–98.51
DPM	0.92	0.95	1.75	1.78	97.5–97.7

These results indicate ongoing incremental improvements, generally reflecting architectural advances in deep learning that exploit multi-scale context and boundary-focused fusion (Chu et al., 4 Jan 2026, Kang et al., 2018, Mo et al., 2017).

5. Influence of Dataset Characteristics on Methodological Design

The modest sample size (805–1,200 images) and slice heterogeneity (12–28 slices per patient, variable pathology) have shaped algorithmic choices. Data augmentation is essential for anatomical diversity and regularization. Normalization strategies—ranging from intensity scaling to hybrid group-batch normalization—address statistical instability in small-batch training. Fully convolutional architectures with skip connections (U-Net, MS-FCN) natively accommodate variable input sizes and preserve spatial fidelity. Nonlinear activation (ELUs) and drop-connections/regularization further mitigate overfitting and enhance convergence (Chu et al., 4 Jan 2026).

Boundary-based learning approaches, such as DPM, leverage prior knowledge and sparse dynamical fields to offset limited annotation data and exploit anatomical invariants. The dataset’s design, originally oriented around comprehensive expert contouring across clinically relevant disease states, ensures results and conclusions are generalizable to both healthy and pathological cardiac function.

6. Transferability, Generalization, and Limitations

Models trained on the Sunnybrook Cardiac Dataset demonstrate substantial transferability to external cohorts, albeit with some degradation under domain shift (e.g., scanner–population changes). In DPM, transfer to the STACOM 2011 challenge led to a Dice drop from 0.92 to 0.74, but maintained high specificity (0.99) and robust sensitivity (0.84); this highlights strong generalization of learned representations and policies (Mo et al., 2017).

Most segmentation failures occur in apical/basal slices characterized by poor tissue contrast and ambiguous boundaries, indicating dataset-intrinsic challenges. The field has explored, but not exhaustively validated, extensions to 3D volumetric architectures and the incorporation of temporal continuity priors (e.g., bidirectional recurrent or attention-based modules), which may further improve robustness against the unique spatiotemporal noise profiles of cardiac cine MRI (Kang et al., 2018).

7. Role in the Research Ecosystem

The Sunnybrook Cardiac Dataset remains a primary point of reference for method development and benchmarking in left ventricular MRI segmentation. Its clinical stratification, diverse anatomical representations, and mature annotation pipeline have made it a definitive testbed for innovations in deep learning, agent-based, and hybrid segmentation strategies. Comparative reporting over the entire dataset is standard for establishing state-of-the-art and for evaluating methodological improvements across network architectures, loss functions, and regularization paradigms (Chu et al., 4 Jan 2026, Kang et al., 2018, Mo et al., 2017).

The dataset’s legacy is its catalytic role in advancing automated cardiac MRI analysis, illuminating the interaction between dataset properties, architectural innovations, and clinical translational significance.