fastMRI Dataset Overview
- fastMRI Dataset is a large-scale, public repository of raw and reconstructed MRI data spanning knee, brain, prostate, and breast, with diverse acquisition protocols.
- It provides multi-coil imaging data alongside clinical annotations and standardized evaluation splits to support reproducible accelerated MRI research.
- The dataset underpins advances in machine learning-based MRI reconstruction by simulating undersampling and offering detailed meta-data for quantitative benchmarks.
The fastMRI Dataset is a large-scale, public repository of raw and reconstructed magnetic resonance imaging (MRI) data, spanning multiple anatomical regions and MRI contrasts, designed to advance research in accelerated MRI reconstruction and clinically relevant machine learning tasks. Originating as a collaboration between Facebook AI Research (Meta AI) and NYU Langone Health, it has expanded to include a growing collection of multicoil brain, knee, prostate, and breast MRI data, now annotated with clinical labels, pathology localizations, and extensive acquisition meta-data. The dataset combines vendor-diverse acquisition protocols, large sample sizes, and rigorously defined evaluation standards, and is considered the reference benchmark for machine learning–based MRI reconstruction, evaluation, and clinical validation (Zbontar et al., 2018, Muckley et al., 2020, Zhao et al., 2021, Tibrewala et al., 2023, Solomon et al., 7 Jun 2024).
1. Dataset Scope and Anatomical Coverage
The fastMRI repository encompasses several major anatomical domains:
- Knee: Multi-coil, single-coil, and clinical DICOM data sets using coronal proton-density (PD) and T2-weighted protocols. The core multi-coil knee set contains 1,594 volumes, all acquired using a 15-channel Siemens array, with additional DICOM-only clinical volumes numbering over 10,000 (Zbontar et al., 2018).
- Brain: Multi-coil axial acquisitions (6,970 volumes) across T1, T1 post-contrast, T2, and FLAIR, predominantly 1.5 T and 3 T Siemens systems, with transfer-test sets from GE and Philips (Zbontar et al., 2018, Muckley et al., 2020).
- Prostate: Biparametric MRI (312 subjects) with both T2-weighted turbo-spin-echo (TSE) and diffusion-weighted EPI; provides raw k-space, reference reconstructions, and slice-level PI-RADS v2.1 labels (Tibrewala et al., 2023).
- Breast: Free-breathing, golden-angle radial 3D dynamic contrast-enhanced (DCE) exams (300 cases) with detailed case-level malignancy labels and full k-space (Solomon et al., 7 Jun 2024).
- Annotation Extensions (fastMRI+): Pathology bounding boxes and paper-level clinical labels for knee and brain, enabling object-detection and lesion-aware reconstruction studies (Zhao et al., 2021).
Acquisitions are strictly de-identified, HIPAA-compliant, and supported by detailed meta-data and usage agreements permitting free academic research.
2. Acquisition Protocols and Data Formats
Summary Table: Sequence, Field Strength, and File Structure
| Domain | Sequences / Contrasts | Field Strength | Coils / Vendor | File Format |
|---|---|---|---|---|
| Knee | Coronal PD, PD-FS | 1.5 T, 3 T | 15 ch Siemens | HDF5 (raw) |
| Brain | Axial T1, T1POST, T2, FLAIR | 1.5 T, 3 T | 32 ch Siemens, GE, Philips | HDF5 (raw) |
| Prostate | T2w TSE, DWI EPI | 3 T | 10–30 ch Siemens | HDF5/ISMRMRD |
| Breast | 3D DCE golden-angle radial | 3 T | 16 ch Siemens | HDF5 (radial) |
- Raw k-space: Complex-valued arrays per coil, slice, and readout direction. ISMRMRD-compliant headers retain pulse sequence and scanner details.
- Reconstruction (“ground-truth”): Root-sum-of-squares (RSS) magnitude images; reference ESC inverse-FFT for single-coil data (Zbontar et al., 2018).
- Annotations/Labels: CSV files for pathology boxes, paper-level findings (fastMRI+), and diagnostic grades (PI-RADS for prostate).
- DICOM: Vendor-standardized reference reconstructions for clinical and compatibility purposes.
For each anatomical region, the acquisition matrix, FOV, resolution, TR/TE, and other critical protocol fields match clinical standards but are recorded in the ISMRMRD header and, for breast, also in per-case HDF5 attributes (Muckley et al., 2020, Tibrewala et al., 2023, Solomon et al., 7 Jun 2024).
3. Masking, Undersampling, and Forward Models
The dataset emphasizes retrospective undersampling of fully sampled k-space to simulate accelerated acquisition:
- Knee: Variable-density random masks with a fully sampled central 8% (4×) or 4% (8×) of phase-encode lines; outer lines randomly discarded (Zbontar et al., 2018, Pezzotti et al., 2020).
- Brain: Pseudo-equispaced (pseudo-regular) masks with fully sampled low-frequency center; exact 4× and 8× factors enforced. Only equispaced, no random/probabilistic-masks used in 2020 challenge (Muckley et al., 2020, Tavaf et al., 2021).
- Prostate: T2 under-sampled via interleaved odd/even lines; DWI via GRAPPA (R=2); explicit phase-correction lines are provided (Tibrewala et al., 2023).
- Breast: Golden-angle stack-of-stars (radial) with full 288 spokes per segment; flexible temporal binning supports dynamic/granular reconstruction (Solomon et al., 7 Jun 2024).
Forward Model (multi-coil, Cartesian):
where is the unknown image, the coil sensitivity (channel-wise), the 2D discrete Fourier transform, the sampling mask (Cartesian or radial), the sampled k-space, and complex white noise (Zbontar et al., 2018, Muckley et al., 2020, Sanda et al., 25 Sep 2025).
For radial (breast):
with denoting the non-uniform FFT (NUFFT) operator, as determined by golden-angle trajectories (Solomon et al., 7 Jun 2024).
4. Evaluation, Splits, and Benchmarks
The dataset is accompanied by rigorously defined and locked training, validation, test, and challenge (“finalheldout”) splits to enable reproducibility and blind benchmarking.
- Knee:
- Multi-coil train/val/test/challenge (973/199/118/104 volumes)
- Single-coil: emulated from multi-coil (Zbontar et al., 2018, Pezzotti et al., 2020)
- Brain:
- Siemens: 4,469/1,378 train/val, 558 test (held-out)
- Transfer: 329 test cases (GE, Philips) only for challenge phase
- Each with detailed pathology representation (tumors, stroke, microvascular disease)
- Only pathologically diverse cases included in final held-out splits (Muckley et al., 2020)
- Prostate:
- 218/48/46 subjects (train/val/test), each with T2w and DWI, slice-level PI-RADS (Tibrewala et al., 2023)
- Breast:
- All 300 cases have label splits (train/test), with lesion status and histologic subtype (Solomon et al., 7 Jun 2024)
For all canonical recon tasks, reference is provided as RSS magnitude images, not phase. Model quantitative evaluation is via NMSE, PSNR, SSIM, and optionally labeled radiological review (Zbontar et al., 2018, Muckley et al., 2020).
Example Metrics (Brain):
| Metric | Definition (magnitude images) |
|---|---|
| SSIM | |
| PSNR | |
| NRMSE |
Special note: improper background masking can depress SSIM by 0.3 in brain, motivating post-hoc exclusion of air pixels from metrics (Muckley et al., 2020).
5. Annotation Resources and Clinical Labeling
The fastMRI+ resource provides multi-level annotation:
- Knee: 16,154 image-level bounding boxes and 13 paper-level labels over 22 pathology categories (meniscal tears, cartilage defect, etc.) (Zhao et al., 2021).
- Brain: 7,570 bounding boxes, 643 paper-level labels, 30 pathology classes (e.g., tumor, edema, stroke, post-surgical change), annotated by subspecialty neuroradiologist.
- Prostate: Slice-level PI-RADS v2.1 scores linking each image to clinical suspicion of malignancy (Tibrewala et al., 2023).
- Breast: Lesion status and type, menopause/age, repeat exam flags.
Annotations facilitate clinically relevant evaluation—disease-specific sensitivity, object detection, and robustness analysis—extending beyond global image similarity metrics.
6. Access, Data Use, and Tools
- Data Hosting: Primary distribution at https://fastmri.med.nyu.edu/ (registration and data-use agreement required); open-source metadata and annotations via associated GitHub repositories for fastMRI+ and prostate/breast extensions (Zhao et al., 2021).
- File Structure: HDF5 files per volume/case, with standard keys:
"kspace","reconstruction_rss","mask"(test), and acquisition parameter meta-data; DICOM exports available for clinical integrations (Zbontar et al., 2018, Tibrewala et al., 2023). - Loading and Preprocessing: Provided PyTorch and ISMRMRD loaders, mask generation code, standard normalization routines, and usage guidelines supporting exact protocol replication (Zbontar et al., 2018).
- Classical and Learning-based Baselines: BART toolbox for GRAPPA/TV, provided wide baseline U-Net implementations, and reference scripts for end-to-end benchmarking.
- Reconstruction Extension: Example Python code for radial DCE breast using GRASP and ESPIRiT with GPU-accelerated gridding (Solomon et al., 7 Jun 2024).
Licensing is non-commercial, research- and education-only, with all supporting scripts and files released under permissive open-source terms.
7. Research Impact, Limitations, and Future Directions
The fastMRI Dataset has substantially driven progress in machine learning for MRI reconstruction, providing rigorously defined, vendor-diverse, HIPAA-compliant, and annotation-rich samples across major clinical domains. Its standardized splits and accompanying codebases have defined the reference for quantitative and qualitative assessment. Many recent advances—including patch-based diffusion solvers, GRAPPA-GANs, and data-efficient reconstruction—have benchmarked on fastMRI, frequently reporting radiologist-preferred images and improvements in SSIM/PSNR/NRMSE over classical baselines (Sanda et al., 25 Sep 2025, Tavaf et al., 2021).
Limitations include absence of phase reference (only magnitude), moderate vendor/field-strength distribution outside Siemens, and primary focus on 2D rather than volumetric trajectories in early releases (addressed in extensions for breast and prostate) (Muckley et al., 2020, Solomon et al., 7 Jun 2024, Tibrewala et al., 2023). All slice-level and paper-level annotations in fastMRI+ are single-rater; consensus or interrater stats are not provided (Zhao et al., 2021).
Future community benchmarks are expected to build on pathology-level metrics, joint reconstruction-classification objectives, and multisequence harmonization, leveraging the continuously expanding anatomical and annotation scope of the dataset.
Sponsored by Paperpile, the PDF & BibTeX manager trusted by top AI labs.
Get 30 days free