MMCMR-427K: Multi-Coil CMR Imaging Database

Updated 1 January 2026

MMCMR-427K Database is a comprehensive open-access repository featuring 427,465 multi-coil CMR k-space acquisitions across 12 modalities and 17 ICD-10 CVD categories.
Data is acquired using varied scanner models and modalities, with a unified preprocessing pipeline (coil compression, noise normalization, cropping) ensuring consistency across heterogeneous clinical environments.
The database underpins advanced CMR reconstruction research, enabling generalist foundation models to achieve ultra-fast imaging with robust zero-shot performance on diverse clinical data.

The MMCMR-427K Database is the largest and most diverse open-access resource of raw, multi-coil cardiovascular magnetic resonance (CMR) k-space data, curated to enable generalist foundation models for ultra-fast, high-quality CMR imaging across heterogeneous clinical environments. Encompassing 427,465 multi-coil k-space acquisitions from 1,504 participants across 13 international centers, 12 imaging modalities, 15 scanner models, and 17 ICD-10 CVD categories, MMCMR-427K provides an unprecedented and richly structured data substrate paired with harmonized metadata, comprehensive quantitative metrics, and rigorously defined data splits for training, validation, and testing applications (Wang et al., 25 Dec 2025).

1. Dataset Composition and Scope

The MMCMR-427K database aggregates k-space data from 6,120 scans, corresponding to 1,504 unique participants distributed across three continents (Asia, Europe, North America). Data are stratified by institution: eight “internal” centers support model training/validation, while five “external” centers are reserved for zero-shot generalization testing. The dataset encompasses 12 distinct CMR modalities, with the following k-space counts:

Modality	k-space acquisitions	Example Parameters
Cine	207,126	TR ≈ 2.8 ms, TE ≈ 1.2 ms, 1.3×1.3 mm²
2D flow	7,454
Late Gadolinium Enhancement (LGE)	8,580	TR ≈ 3.5 ms, TE ≈ 1.4 ms, 1.4×1.4 mm², TI ≈ 250–350 ms
T1 Mapping	54,301	Variable TI, 1.5×1.5 mm²
T2 Mapping	15,944	Variable TE, 1.5×1.5 mm²
T1ρ Mapping	68
First-pass Perfusion	46,441
Aortic Imaging	46,836
Black-blood	1,484
T1-weighted	3,370
T2-weighted	3,810
Tagging	31,188

Scanner hardware diversity includes 15 models from four vendors (Siemens, United Imaging UIH, GE, Philips) spanning 0.55 T, 1.5 T, 3 T, and 5 T field strengths. Participant diagnoses cover 17 ICD-10 categories (e.g., I42.0 for dilated cardiomyopathy, I25 for coronary artery disease, I40 for myocarditis, Q20–Q28 for congenital heart disease).

Structured metadata covers both participant (age, sex, height, weight, BMI, diagnostic codes) and scan (center code, vendor/model, field strength, sequence, anatomical view, spatial resolution, TR, TE, flip angle, number of coils) levels, with each k-space file paired to a harmonized CSV metadata record.

2. Acquisition Protocols and Preprocessing Workflow

Data acquisition protocols are tailored per modality, with representative scan parameters including (for cine): TR ≈ 2.8 ms, TE ≈ 1.2 ms, in-plane resolution ≈ 1.3×1.3 mm², slice thickness 8 mm, and 30 cardiac phases; for LGE: TR ≈ 3.5 ms, TE ≈ 1.4 ms, resolution ≈ 1.4×1.4 mm², inversion time 250–350 ms; for T1/T2 mapping: variable TI/TE, resolution ≈ 1.5×1.5 mm². All raw data undergoes coil compression to 10 virtual channels via the Zhang et al. method (Mag Reson Med 2013), guaranteeing standardization across institutions.

Three retrospective k-space undersampling patterns are included:

Uniform Cartesian
Random Cartesian
Radial

The acceleration factor $R$ for undersampling is defined as:

$R = \frac{N_{\mathrm{full}}}{N_{\mathrm{acquired}}}$

where $N_{\mathrm{full}}$ is the number of phase-encoding lines in the fully-sampled k-space, and $N_{\mathrm{acquired}}$ excludes autocalibration (ACS) lines.

Undersampling masks are generated on-the-fly for $R$ ranging from $4\times$ to $24\times$ , with ACS regions defined as either 20 central lines (Cartesian) or a $20 \times 20$ central block.

A unified preprocessing pipeline ensures data consistency:

Coil compression of raw k-space to $N = 10$ ,
Noise normalization (per-coil mean subtraction, division by standard deviation on nonzero readouts),
Cropping/padding to a standard $512 \times 246$ matrix,
Storage in MATLAB ".mat" format, metadata in parallel CSV.

An example preprocessing pseudocode, as provided, is:

function preprocess(raw_kspace):
    [k10, C] = coil_compress(raw_kspace, 10)
    for c = 1:10
        μ = mean(k10[c,:,:]); σ = std(k10[c,:,:]);
        k10[c,:,:] = (k10[c,:,:] – μ)/σ;
    k10 = center_crop(k10, [512,246])
    save_mat(“sample.mat”, k10)
    save_csv(“sample.csv”, meta_dict)
    return

3. Metadata Design and Harmonization

Each scan is annotated with a standardized set of fields, stored per-instance in matched CSV files:

Center_id, subject_id, scan_id, date
Vendor, model, field_strength_T, coils, sequence name, anatomical view, in-plane FOV, matrix_size, slice_thickness_mm, TR_ms, TE_ms, flip_angle_deg
Clinical: age, sex, height, weight, BMI, ICD10_codes (semicolon-separated)

Vendor-specific DICOM tags are remapped to a unified ontology, achieving harmonization across institutional protocols. Metadata are leveraged for conditioning text-aware foundation models (e.g., CardioMM), and support subgroup analyses by scan type, field strength, population demographics, and pathology. The metadata schema facilitates complex queries such as extracting all 2D flow scans at 3 T from Siemens devices for hypertrophic cardiomyopathy in subjects over 50 years.

4. Quantitative Metrics and Dataset Statistics

Reconstruction and image quality are quantitatively assessed using several established metrics:

PSNR: $20 \cdot \log_{10} ( \mathrm{MAX}_I / \mathrm{RMSE} )$ , with $\mathrm{MAX}_I$ as the maximum pixel magnitude.
SSIM: as defined by Zhao et al. (IEEE TIP 2004).
$\mathrm{SNR}$ : $|\mu_{\mathrm{signal}}|/\sigma_{\mathrm{noise}}$ , calculated as mean myocardial magnitude over background noise standard deviation.
g-factor mapping for parallel imaging: $g(r) = 1 / \sqrt{[E \Psi^{-1} E^T]^{-1}_{rr}}$ , where $E$ is the encoding matrix and $\Psi$ the coil covariance.
$\ell_2$ reconstruction error: $\|x_{\mathrm{recon}} - x_{\mathrm{ref}}\|_2$

Dataset-wide summary statistics:

Demographic/Parameter	Value
Age (mean ± SD, range)	54 ± 16 years (5–85)
Sex distribution	53% male, 47% female
Field strengths	0.55 T (2%), 1.5 T (40%), 3 T (50%), 5 T (8%)
Acceleration factors	4×–8× (30%), 8×–16× (40%), 12×–24× (30%)

A plausible implication is that this scale and diversity establish an empirically robust substrate for large-scale generalization studies in CMR reconstruction.

5. Organization, Access, and Usage

The repository employs a clear directory tree, separating data for training, validation, internal test, and external zero-shot generalization test as follows:

MMCMR-427K/
  train/
    center01/
      sub-001_LGE_uniform_8x.mat
      sub-001_LGE_uniform_8x.csv
      ...
    center02/...
  val/
  test_internal/
  test_external/

The file naming scheme is: {centerID}_{subID}_{sequence}_{pattern}_{R}x.mat/.csv. Data splits consist of:

Subset	k-space files	Scans	Unique Participants
Training	241,526	3,400	789
Validation	26,836	—	—
Internal Test	75,753	1,495	320
External Test	110,186	1,225	395

A standardized code example for loading one sample in Python/PyTorch is provided:

import scipy.io
import pandas as pd
import torch

def load_sample(mat_path, csv_path):
    m = scipy.io.loadmat(mat_path)
    kspace = m['kspace']   # shape [coils, nx, ny], complex64
    meta = pd.read_csv(csv_path)  # single-row DataFrame
    real = torch.from_numpy(kspace.real).float()
    imag = torch.from_numpy(kspace.imag).float()
    x = torch.stack([real, imag], dim=0)  # [2, coils, nx, ny]
    return x, meta

k, meta = load_sample(
    "MMCMR-427K/train/center01/sub-001_cine_random_16x.mat",
    "MMCMR-427K/train/center01/sub-001_cine_random_16x.csv"
)

6. Applications and Research Significance

MMCMR-427K serves as the primary substrate for developing and evaluating generalist CMR reconstruction and analysis models across heterogeneous and accelerated imaging scenarios. Notably, it has been used for training the CardioMM foundation model, which leverages semantic contextualization with physics-informed data consistency for robust reconstruction across diverse protocols, scanners, and patient pathologies. Comprehensive benchmarking demonstrates state-of-the-art reconstruction performance and strong zero-shot adaptation to unseen clinical environments, even at acceleration factors up to $24\times$ , with retention of clinically salient cardiac phenotypes and quantitative myocardial biomarkers. These properties position MMCMR-427K as a critical bridge toward scalable, high-throughput, and clinically-accessible CMR diagnostics (Wang et al., 25 Dec 2025).

Markdown Report Issue Upgrade to Chat

References (1)

Enabling Ultra-Fast Cardiovascular Imaging Across Heterogeneous Clinical Environments with a Generalist Foundation Model and Multimodal Database (2025)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to MMCMR-427K Database.