LIMUC Dataset: Lumbar MRI Segmentation

Updated 2 April 2026

LIMUC dataset is a large, multi-center collection of sagittal lumbar spine MRI studies with detailed 3D annotations for vertebrae, intervertebral discs, and the spinal canal.
It incorporates heterogeneous imaging protocols from four Dutch hospitals and uses an iterative, semi-automatic annotation workflow to improve labeling reliability.
Benchmarking with IIS and nnU-Net models achieved high segmentation accuracy (e.g., Dice scores up to 0.93), demonstrating its value for automated segmentation research.

The LIMUC dataset is a large, multi-center, fully annotated collection of sagittal lumbar spine magnetic resonance imaging (MRI) studies designed to support the development, benchmarking, and comparison of automatic segmentation algorithms targeting vertebrae, intervertebral discs (IVDs), and the spinal canal. It was assembled to address limitations in prior lumbar imaging resources, particularly regarding scale, annotation quality, and the availability of a standardized public benchmark for spine segmentation (Graaf et al., 2023).

1. Dataset Composition and Source

The LIMUC dataset comprises 447 sagittal MRI series from 218 unique patients, all with a documented history of low back pain. Each patient may have up to three series associated: a T1-weighted scan, a standard-resolution T2-weighted scan, and—where available in the University Medical Center (UMC) subset—a high-resolution T2 SPACE acquisition. Data were sourced retrospectively from four Dutch hospitals: UMC, two regional hospitals, and one orthopedic hospital. The demographic breakdown indicates a predominance of female patients (63%), with age and additional clinical metadata included where available. Imaging was collected between each institution's standard clinical protocols, with T1- and T2-weighted sequences serving as the imaging modalities for all sites (Graaf et al., 2023).

2. Imaging Protocols and Acquisition Variability

MRI acquisition parameters—including voxel size—varied significantly across centers, reflecting clinical heterogeneity and increasing dataset utility for domain-adaptive modeling. Voxel dimensions for the standard T1/T2 sequences ranged from approximately 3.15 × 0.24 × 0.24 mm³ to 9.63 × 1.06 × 1.23 mm³. The UMC’s T2 SPACE studies provide a near-isotropic high-resolution subset at 0.90 × 0.47 × 0.47 mm³. Detailed acquisition ranges for each center are provided in the metadata and summarized below:

Center	Voxel Size Range (mm)	% Female
UMC	(3.24 × 0.27 × 0.47) – (3.34 × 0.59 × 0.85)*	55
RH1	(0.46 × 0.46 × 4.20) – (9.63 × 1.06 × 1.06)	58
RH2	(0.46 × 0.46 × 4.20) – (5.17 × 1.00 × 1.23)	59
OH	(3.15 × 0.24 × 0.24) – (3.39 × 0.83 × 1.02)	68

*Regular T1/T2 only; UMC T2 SPACE at 0.90 × 0.47 × 0.47 mm³ (Graaf et al., 2023).

3. Annotation Classes and Labeling Workflow

Every visible lumbar vertebra (sacrum excluded), each IVD, and the full spinal canal were segmented in 3D. Anatomical instances are labeled in volume masks by integer index, starting from the most caudal vertebra, which is necessary due to anatomical variation in transitional vertebrae (numerical labels should not be naively mapped to L1–L5).

The annotation process adopted an iterative semi-automatic approach to scale up labeling efficiency:

Initial Training: Twenty high-resolution T2 SPACE series were fully annotated slice-by-slice in all planes via 3D Slicer, forming the initial training set.
Baseline Algorithm: A 3D patch-based U-net derivative, denoted the Iterative Instance Segmentation (IIS) model, with “instance memory” for vertebrae and IVD channels plus an image channel. The model operates on 64 × 192 × 192 voxel patches at 2 × 0.6 × 0.6 mm³.
Iterative Loop: In four rounds, the IIS model was trained on the then-available annotations, predicted masks for further series, and these masks were manually corrected by an expert (~1 hour/study). Corrected studies expanded the training set, iteratively covering the entire dataset (Graaf et al., 2023).

4. Baseline Algorithms, Experimental Setup, and Metrics

Two segmentation baselines were established:

IIS Baseline: Used to bootstrap and iteratively annotate the dataset.
nnU-Net Baseline: The nnU-Net framework (Isensee et al., Nat. Methods 2020), known for self-configuring 3D segmentation, was trained in full-resolution mode with 5-fold cross-validation.

Preprocessing included resampling all volumes to 2 × 0.6 × 0.6 mm³, rigid orientation alignment, and standard augmentations (elastic deformations, Gaussian noise/smoothing). For evaluation, an 82%/18% train/validation split of the public data was maintained, with a completely sequestered test set (39 patients, 97 series) reserved for blinded benchmarking on the Grand Challenge platform.

Segmentation quality was evaluated by:

Dice coefficient: $\mathrm{Dice}(X, Y) = \frac{2|X \cap Y|}{|X| + |Y|}$
Intersection over Union: $\mathrm{IoU}(X, Y) = \frac{|X \cap Y|}{|X \cup Y|}$
Average absolute surface distance (ASD, mm)
Detection rate (% of labeled structures correctly segmented)
Completeness classification accuracy (%)

5. Benchmark Results and Public Challenge

On the sequestered test set, both IIS and nnU-Net yielded robust and comparable scores, as summarized below:

Structure	Dice (IIS ±SD)	Dice (nnU-Net ±SD)	ASD (IIS ±SD, mm)	ASD (nnU-Net ±SD, mm)	Detection (%) (IIS)
Vertebrae	0.93 ± 0.05	0.92 ± 0.05	0.48 ± 0.95	0.43 ± 0.78	99.8
IVDs	0.84 ± 0.10	0.86 ± 0.09	0.54 ± 0.45	0.54 ± 1.04	98.9
Canal	0.92 ± 0.04	0.92 ± 0.03	0.39 ± 0.45	0.37 ± 0.22	100.0

A continuous evaluation challenge is hosted at https://spider.grand-challenge.org/, requiring participants to submit segmentations for the hidden test set. Results are ranked and updated using fixed evaluation code, preventing overfitting and supporting ongoing benchmarking (Graaf et al., 2023).

6. Data Access, Licensing, and Metadata

The LIMUC dataset is distributed under a Creative Commons Attribution 4.0 (CC-BY 4.0) license. Data can be accessed via Zenodo at https://doi.org/10.5281/zenodo.10159290. The release includes:

MRI volumes (MHA format) and corresponding 3D segmentation masks.
Metadata listing splits (train, validation), basic demographics, scanner and sequence parameters.
Per-IVD radiological gradings (Modic changes, Schmorl’s nodes, spondylolisthesis, herniation, narrowing, bulging, Pfirrmann grade) (Graaf et al., 2023).

7. Applications and Best Practices

The LIMUC dataset enables:

Development and benchmarking of deep models for lumbar vertebra, IVD, and canal segmentation.
Quantitative morphometric and volumetric analyses in low back pain research.
Radiomics and imaging biomarker extraction (e.g., Pfirrmann grade).
Training and evaluating computer-aided diagnosis tools for clinical decision support.
Transfer learning and domain adaptation experiments, leveraging the diversity in imaging protocols and anatomy.
Comparative studies against CT-based segmentation datasets (e.g., VerSe).

Best practices dictate careful use of the metadata, explicit management of anatomical labeling assumptions, and cross-validation of model generalization due to heterogeneous acquisition. The presence of public, high-quality, multi-center 3D annotations positions LIMUC as a comprehensive resource to drive advances in automated lumbar spine MRI analysis (Graaf et al., 2023).

Markdown Report Issue Upgrade to Chat

References (1)

Lumbar spine segmentation in MR images: a dataset and a public benchmark (2023)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to LIMUC Dataset.