fastMRI Dataset Overview

Updated 21 November 2025

fastMRI Dataset is a large-scale, public repository of raw and reconstructed MRI data spanning knee, brain, prostate, and breast, with diverse acquisition protocols.
It provides multi-coil imaging data alongside clinical annotations and standardized evaluation splits to support reproducible accelerated MRI research.
The dataset underpins advances in machine learning-based MRI reconstruction by simulating undersampling and offering detailed meta-data for quantitative benchmarks.

The fastMRI Dataset is a large-scale, public repository of raw and reconstructed magnetic resonance imaging (MRI) data, spanning multiple anatomical regions and MRI contrasts, designed to advance research in accelerated MRI reconstruction and clinically relevant machine learning tasks. Originating as a collaboration between Facebook AI Research (Meta AI) and NYU Langone Health, it has expanded to include a growing collection of multicoil brain, knee, prostate, and breast MRI data, now annotated with clinical labels, pathology localizations, and extensive acquisition meta-data. The dataset combines vendor-diverse acquisition protocols, large sample sizes, and rigorously defined evaluation standards, and is considered the reference benchmark for machine learning–based MRI reconstruction, evaluation, and clinical validation (Zbontar et al., 2018, Muckley et al., 2020, Zhao et al., 2021, Tibrewala et al., 2023, Solomon et al., 2024).

1. Dataset Scope and Anatomical Coverage

The fastMRI repository encompasses several major anatomical domains:

Knee: Multi-coil, single-coil, and clinical DICOM data sets using coronal proton-density (PD) and T2-weighted protocols. The core multi-coil knee set contains 1,594 volumes, all acquired using a 15-channel Siemens array, with additional DICOM-only clinical volumes numbering over 10,000 (Zbontar et al., 2018).
Brain: Multi-coil axial acquisitions (6,970 volumes) across T1, T1 post-contrast, T2, and FLAIR, predominantly 1.5 T and 3 T Siemens systems, with transfer-test sets from GE and Philips (Zbontar et al., 2018, Muckley et al., 2020).
Prostate: Biparametric MRI (312 subjects) with both T2-weighted turbo-spin-echo (TSE) and diffusion-weighted EPI; provides raw k-space, reference reconstructions, and slice-level PI-RADS v2.1 labels (Tibrewala et al., 2023).
Breast: Free-breathing, golden-angle radial 3D dynamic contrast-enhanced (DCE) exams (300 cases) with detailed case-level malignancy labels and full k-space (Solomon et al., 2024).
Annotation Extensions (fastMRI+): Pathology bounding boxes and study-level clinical labels for knee and brain, enabling object-detection and lesion-aware reconstruction studies (Zhao et al., 2021).

Acquisitions are strictly de-identified, HIPAA-compliant, and supported by detailed meta-data and usage agreements permitting free academic research.

2. Acquisition Protocols and Data Formats

Summary Table: Sequence, Field Strength, and File Structure

Domain	Sequences / Contrasts	Field Strength	Coils / Vendor	File Format
Knee	Coronal PD, PD-FS	1.5 T, 3 T	15 ch Siemens	HDF5 (raw)
Brain	Axial T1, T1POST, T2, FLAIR	1.5 T, 3 T	32 ch Siemens, GE, Philips	HDF5 (raw)
Prostate	T2w TSE, DWI EPI	3 T	10–30 ch Siemens	HDF5/ISMRMRD
Breast	3D DCE golden-angle radial	3 T	16 ch Siemens	HDF5 (radial)

Raw k-space: Complex-valued arrays per coil, slice, and readout direction. ISMRMRD-compliant headers retain pulse sequence and scanner details.
Reconstruction (“ground-truth”): Root-sum-of-squares (RSS) magnitude images; reference ESC inverse-FFT for single-coil data (Zbontar et al., 2018).
Annotations/Labels: CSV files for pathology boxes, study-level findings (fastMRI+), and diagnostic grades (PI-RADS for prostate).
DICOM: Vendor-standardized reference reconstructions for clinical and compatibility purposes.

For each anatomical region, the acquisition matrix, FOV, resolution, TR/TE, and other critical protocol fields match clinical standards but are recorded in the ISMRMRD header and, for breast, also in per-case HDF5 attributes (Muckley et al., 2020, Tibrewala et al., 2023, Solomon et al., 2024).

3. Masking, Undersampling, and Forward Models

The dataset emphasizes retrospective undersampling of fully sampled k-space to simulate accelerated acquisition:

Knee: Variable-density random masks with a fully sampled central 8% (4×) or 4% (8×) of phase-encode lines; outer lines randomly discarded (Zbontar et al., 2018, Pezzotti et al., 2020).
Brain: Pseudo-equispaced (pseudo-regular) masks with fully sampled low-frequency center; exact 4× and 8× factors enforced. Only equispaced, no random/probabilistic-masks used in 2020 challenge (Muckley et al., 2020, Tavaf et al., 2021).
Prostate: T2 under-sampled via interleaved odd/even lines; DWI via GRAPPA (R=2); explicit phase-correction lines are provided (Tibrewala et al., 2023).
Breast: Golden-angle stack-of-stars (radial) with full 288 spokes per segment; flexible temporal binning supports dynamic/granular reconstruction (Solomon et al., 2024).

Forward Model (multi-coil, Cartesian):

$y = P\,F\,S\,x + \epsilon$

where $x$ is the unknown image, $S$ the coil sensitivity (channel-wise), $F$ the 2D discrete Fourier transform, $P$ the sampling mask (Cartesian or radial), $y$ the sampled k-space, and $\epsilon$ complex white noise (Zbontar et al., 2018, Muckley et al., 2020, Sanda et al., 25 Sep 2025).

For radial (breast):

$y = F_r\,x + n$

with $F_r$ denoting the non-uniform FFT (NUFFT) operator, as determined by golden-angle trajectories (Solomon et al., 2024).

4. Evaluation, Splits, and Benchmarks

The dataset is accompanied by rigorously defined and locked training, validation, test, and challenge (“finalheldout”) splits to enable reproducibility and blind benchmarking.

Knee:
- Multi-coil train/val/test/challenge (973/199/118/104 volumes)
- Single-coil: emulated from multi-coil (Zbontar et al., 2018, Pezzotti et al., 2020)
Brain:
- Siemens: 4,469/1,378 train/val, 558 test (held-out)
- Transfer: 329 test cases (GE, Philips) only for challenge phase
- Each with detailed pathology representation (tumors, stroke, microvascular disease)
- Only pathologically diverse cases included in final held-out splits (Muckley et al., 2020)
Prostate:
- 218/48/46 subjects (train/val/test), each with T2w and DWI, slice-level PI-RADS (Tibrewala et al., 2023)
Breast:
- All 300 cases have label splits (train/test), with lesion status and histologic subtype (Solomon et al., 2024)

For all canonical recon tasks, reference is provided as RSS magnitude images, not phase. Model quantitative evaluation is via NMSE, PSNR, SSIM, and optionally labeled radiological review (Zbontar et al., 2018, Muckley et al., 2020).

Example Metrics (Brain):

Metric	Definition (magnitude images)
SSIM	$\mathrm{SSIM}(x,y) = \frac{(2\mu_x\mu_y+C_1)(2\sigma_{xy}+C_2)}{(\mu_x^2+\mu_y^2+C_1)(\sigma_x^2+\sigma_y^2+C_2)}$
PSNR	$20\log_{10}\left(\frac{\max\|x\|}{\mathrm{RMSE}(x,y)}\right)$
NRMSE	$\\|x-y\\|_2/\\|x\\|_2$

Special note: improper background masking can depress SSIM by 0.3 in brain, motivating post-hoc exclusion of air pixels from metrics (Muckley et al., 2020).

5. Annotation Resources and Clinical Labeling

The fastMRI+ resource provides multi-level annotation:

Knee: 16,154 image-level bounding boxes and 13 study-level labels over 22 pathology categories (meniscal tears, cartilage defect, etc.) (Zhao et al., 2021).
Brain: 7,570 bounding boxes, 643 study-level labels, 30 pathology classes (e.g., tumor, edema, stroke, post-surgical change), annotated by subspecialty neuroradiologist.
Prostate: Slice-level PI-RADS v2.1 scores linking each image to clinical suspicion of malignancy (Tibrewala et al., 2023).
Breast: Lesion status and type, menopause/age, repeat exam flags.

Annotations facilitate clinically relevant evaluation—disease-specific sensitivity, object detection, and robustness analysis—extending beyond global image similarity metrics.

6. Access, Data Use, and Tools

Data Hosting: Primary distribution at https://fastmri.med.nyu.edu/ (registration and data-use agreement required); open-source metadata and annotations via associated GitHub repositories for fastMRI+ and prostate/breast extensions (Zhao et al., 2021).
File Structure: HDF5 files per volume/case, with standard keys: "kspace", "reconstruction_rss", "mask" (test), and acquisition parameter meta-data; DICOM exports available for clinical integrations (Zbontar et al., 2018, Tibrewala et al., 2023).
Loading and Preprocessing: Provided PyTorch and ISMRMRD loaders, mask generation code, standard normalization routines, and usage guidelines supporting exact protocol replication (Zbontar et al., 2018).
Classical and Learning-based Baselines: BART toolbox for GRAPPA/TV, provided wide baseline U-Net implementations, and reference scripts for end-to-end benchmarking.
Reconstruction Extension: Example Python code for radial DCE breast using GRASP and ESPIRiT with GPU-accelerated gridding (Solomon et al., 2024).

Licensing is non-commercial, research- and education-only, with all supporting scripts and files released under permissive open-source terms.

7. Research Impact, Limitations, and Future Directions

The fastMRI Dataset has substantially driven progress in machine learning for MRI reconstruction, providing rigorously defined, vendor-diverse, HIPAA-compliant, and annotation-rich samples across major clinical domains. Its standardized splits and accompanying codebases have defined the reference for quantitative and qualitative assessment. Many recent advances—including patch-based diffusion solvers, GRAPPA-GANs, and data-efficient reconstruction—have benchmarked on fastMRI, frequently reporting radiologist-preferred images and improvements in SSIM/PSNR/NRMSE over classical baselines (Sanda et al., 25 Sep 2025, Tavaf et al., 2021).

Limitations include absence of phase reference (only magnitude), moderate vendor/field-strength distribution outside Siemens, and primary focus on 2D rather than volumetric trajectories in early releases (addressed in extensions for breast and prostate) (Muckley et al., 2020, Solomon et al., 2024, Tibrewala et al., 2023). All slice-level and study-level annotations in fastMRI+ are single-rater; consensus or interrater stats are not provided (Zhao et al., 2021).

Future community benchmarks are expected to build on pathology-level metrics, joint reconstruction-classification objectives, and multisequence harmonization, leveraging the continuously expanding anatomical and annotation scope of the dataset.