Papers
Topics
Authors
Recent
2000 character limit reached

MMCMR-427K: Multi-Coil CMR Imaging Database

Updated 1 January 2026
  • MMCMR-427K Database is a comprehensive open-access repository featuring 427,465 multi-coil CMR k-space acquisitions across 12 modalities and 17 ICD-10 CVD categories.
  • Data is acquired using varied scanner models and modalities, with a unified preprocessing pipeline (coil compression, noise normalization, cropping) ensuring consistency across heterogeneous clinical environments.
  • The database underpins advanced CMR reconstruction research, enabling generalist foundation models to achieve ultra-fast imaging with robust zero-shot performance on diverse clinical data.

The MMCMR-427K Database is the largest and most diverse open-access resource of raw, multi-coil cardiovascular magnetic resonance (CMR) k-space data, curated to enable generalist foundation models for ultra-fast, high-quality CMR imaging across heterogeneous clinical environments. Encompassing 427,465 multi-coil k-space acquisitions from 1,504 participants across 13 international centers, 12 imaging modalities, 15 scanner models, and 17 ICD-10 CVD categories, MMCMR-427K provides an unprecedented and richly structured data substrate paired with harmonized metadata, comprehensive quantitative metrics, and rigorously defined data splits for training, validation, and testing applications (Wang et al., 25 Dec 2025).

1. Dataset Composition and Scope

The MMCMR-427K database aggregates k-space data from 6,120 scans, corresponding to 1,504 unique participants distributed across three continents (Asia, Europe, North America). Data are stratified by institution: eight “internal” centers support model training/validation, while five “external” centers are reserved for zero-shot generalization testing. The dataset encompasses 12 distinct CMR modalities, with the following k-space counts:

Modality k-space acquisitions Example Parameters
Cine 207,126 TR ≈ 2.8 ms, TE ≈ 1.2 ms, 1.3×1.3 mm²
2D flow 7,454
Late Gadolinium Enhancement (LGE) 8,580 TR ≈ 3.5 ms, TE ≈ 1.4 ms, 1.4×1.4 mm², TI ≈ 250–350 ms
T1 Mapping 54,301 Variable TI, 1.5×1.5 mm²
T2 Mapping 15,944 Variable TE, 1.5×1.5 mm²
T1ρ Mapping 68
First-pass Perfusion 46,441
Aortic Imaging 46,836
Black-blood 1,484
T1-weighted 3,370
T2-weighted 3,810
Tagging 31,188

Scanner hardware diversity includes 15 models from four vendors (Siemens, United Imaging UIH, GE, Philips) spanning 0.55 T, 1.5 T, 3 T, and 5 T field strengths. Participant diagnoses cover 17 ICD-10 categories (e.g., I42.0 for dilated cardiomyopathy, I25 for coronary artery disease, I40 for myocarditis, Q20–Q28 for congenital heart disease).

Structured metadata covers both participant (age, sex, height, weight, BMI, diagnostic codes) and scan (center code, vendor/model, field strength, sequence, anatomical view, spatial resolution, TR, TE, flip angle, number of coils) levels, with each k-space file paired to a harmonized CSV metadata record.

2. Acquisition Protocols and Preprocessing Workflow

Data acquisition protocols are tailored per modality, with representative scan parameters including (for cine): TR ≈ 2.8 ms, TE ≈ 1.2 ms, in-plane resolution ≈ 1.3×1.3 mm², slice thickness 8 mm, and 30 cardiac phases; for LGE: TR ≈ 3.5 ms, TE ≈ 1.4 ms, resolution ≈ 1.4×1.4 mm², inversion time 250–350 ms; for T1/T2 mapping: variable TI/TE, resolution ≈ 1.5×1.5 mm². All raw data undergoes coil compression to 10 virtual channels via the Zhang et al. method (Mag Reson Med 2013), guaranteeing standardization across institutions.

Three retrospective k-space undersampling patterns are included:

  • Uniform Cartesian
  • Random Cartesian
  • Radial

The acceleration factor RR for undersampling is defined as:

R=NfullNacquiredR = \frac{N_{\mathrm{full}}}{N_{\mathrm{acquired}}}

where NfullN_{\mathrm{full}} is the number of phase-encoding lines in the fully-sampled k-space, and NacquiredN_{\mathrm{acquired}} excludes autocalibration (ACS) lines.

Undersampling masks are generated on-the-fly for RR ranging from 4×4\times to 24×24\times, with ACS regions defined as either 20 central lines (Cartesian) or a 20×2020 \times 20 central block.

A unified preprocessing pipeline ensures data consistency:

  1. Coil compression of raw k-space to N=10N = 10,
  2. Noise normalization (per-coil mean subtraction, division by standard deviation on nonzero readouts),
  3. Cropping/padding to a standard 512×246512 \times 246 matrix,
  4. Storage in MATLAB ".mat" format, metadata in parallel CSV.

An example preprocessing pseudocode, as provided, is:

1
2
3
4
5
6
7
8
9
function preprocess(raw_kspace):
    [k10, C] = coil_compress(raw_kspace, 10)
    for c = 1:10
        μ = mean(k10[c,:,:]); σ = std(k10[c,:,:]);
        k10[c,:,:] = (k10[c,:,:]  μ)/σ;
    k10 = center_crop(k10, [512,246])
    save_mat(sample.mat, k10)
    save_csv(sample.csv, meta_dict)
    return

3. Metadata Design and Harmonization

Each scan is annotated with a standardized set of fields, stored per-instance in matched CSV files:

  • Center_id, subject_id, scan_id, date
  • Vendor, model, field_strength_T, coils, sequence name, anatomical view, in-plane FOV, matrix_size, slice_thickness_mm, TR_ms, TE_ms, flip_angle_deg
  • Clinical: age, sex, height, weight, BMI, ICD10_codes (semicolon-separated)

Vendor-specific DICOM tags are remapped to a unified ontology, achieving harmonization across institutional protocols. Metadata are leveraged for conditioning text-aware foundation models (e.g., CardioMM), and support subgroup analyses by scan type, field strength, population demographics, and pathology. The metadata schema facilitates complex queries such as extracting all 2D flow scans at 3 T from Siemens devices for hypertrophic cardiomyopathy in subjects over 50 years.

4. Quantitative Metrics and Dataset Statistics

Reconstruction and image quality are quantitatively assessed using several established metrics:

  • PSNR: 20log10(MAXI/RMSE)20 \cdot \log_{10} ( \mathrm{MAX}_I / \mathrm{RMSE} ), with MAXI\mathrm{MAX}_I as the maximum pixel magnitude.
  • SSIM: as defined by Zhao et al. (IEEE TIP 2004).
  • SNR\mathrm{SNR}: μsignal/σnoise|\mu_{\mathrm{signal}}|/\sigma_{\mathrm{noise}}, calculated as mean myocardial magnitude over background noise standard deviation.
  • g-factor mapping for parallel imaging: g(r)=1/[EΨ1ET]rr1g(r) = 1 / \sqrt{[E \Psi^{-1} E^T]^{-1}_{rr}}, where EE is the encoding matrix and Ψ\Psi the coil covariance.
  • 2\ell_2 reconstruction error: xreconxref2\|x_{\mathrm{recon}} - x_{\mathrm{ref}}\|_2

Dataset-wide summary statistics:

Demographic/Parameter Value
Age (mean ± SD, range) 54 ± 16 years (5–85)
Sex distribution 53% male, 47% female
Field strengths 0.55 T (2%), 1.5 T (40%), 3 T (50%), 5 T (8%)
Acceleration factors 4×–8× (30%), 8×–16× (40%), 12×–24× (30%)

A plausible implication is that this scale and diversity establish an empirically robust substrate for large-scale generalization studies in CMR reconstruction.

5. Organization, Access, and Usage

The repository employs a clear directory tree, separating data for training, validation, internal test, and external zero-shot generalization test as follows:

1
2
3
4
5
6
7
8
9
10
MMCMR-427K/
  train/
    center01/
      sub-001_LGE_uniform_8x.mat
      sub-001_LGE_uniform_8x.csv
      ...
    center02/...
  val/
  test_internal/
  test_external/

The file naming scheme is: {centerID}_{subID}_{sequence}_{pattern}_{R}x.mat/.csv. Data splits consist of:

Subset k-space files Scans Unique Participants
Training 241,526 3,400 789
Validation 26,836
Internal Test 75,753 1,495 320
External Test 110,186 1,225 395

A standardized code example for loading one sample in Python/PyTorch is provided:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
import scipy.io
import pandas as pd
import torch

def load_sample(mat_path, csv_path):
    m = scipy.io.loadmat(mat_path)
    kspace = m['kspace']   # shape [coils, nx, ny], complex64
    meta = pd.read_csv(csv_path)  # single-row DataFrame
    real = torch.from_numpy(kspace.real).float()
    imag = torch.from_numpy(kspace.imag).float()
    x = torch.stack([real, imag], dim=0)  # [2, coils, nx, ny]
    return x, meta

k, meta = load_sample(
    "MMCMR-427K/train/center01/sub-001_cine_random_16x.mat",
    "MMCMR-427K/train/center01/sub-001_cine_random_16x.csv"
)

6. Applications and Research Significance

MMCMR-427K serves as the primary substrate for developing and evaluating generalist CMR reconstruction and analysis models across heterogeneous and accelerated imaging scenarios. Notably, it has been used for training the CardioMM foundation model, which leverages semantic contextualization with physics-informed data consistency for robust reconstruction across diverse protocols, scanners, and patient pathologies. Comprehensive benchmarking demonstrates state-of-the-art reconstruction performance and strong zero-shot adaptation to unseen clinical environments, even at acceleration factors up to 24×24\times, with retention of clinically salient cardiac phenotypes and quantitative myocardial biomarkers. These properties position MMCMR-427K as a critical bridge toward scalable, high-throughput, and clinically-accessible CMR diagnostics (Wang et al., 25 Dec 2025).

Whiteboard

Topic to Video (Beta)

Follow Topic

Get notified by email when new papers are published related to MMCMR-427K Database.