MMCMR-427K: Multi-Coil CMR Imaging Database
- MMCMR-427K Database is a comprehensive open-access repository featuring 427,465 multi-coil CMR k-space acquisitions across 12 modalities and 17 ICD-10 CVD categories.
- Data is acquired using varied scanner models and modalities, with a unified preprocessing pipeline (coil compression, noise normalization, cropping) ensuring consistency across heterogeneous clinical environments.
- The database underpins advanced CMR reconstruction research, enabling generalist foundation models to achieve ultra-fast imaging with robust zero-shot performance on diverse clinical data.
The MMCMR-427K Database is the largest and most diverse open-access resource of raw, multi-coil cardiovascular magnetic resonance (CMR) k-space data, curated to enable generalist foundation models for ultra-fast, high-quality CMR imaging across heterogeneous clinical environments. Encompassing 427,465 multi-coil k-space acquisitions from 1,504 participants across 13 international centers, 12 imaging modalities, 15 scanner models, and 17 ICD-10 CVD categories, MMCMR-427K provides an unprecedented and richly structured data substrate paired with harmonized metadata, comprehensive quantitative metrics, and rigorously defined data splits for training, validation, and testing applications (Wang et al., 25 Dec 2025).
1. Dataset Composition and Scope
The MMCMR-427K database aggregates k-space data from 6,120 scans, corresponding to 1,504 unique participants distributed across three continents (Asia, Europe, North America). Data are stratified by institution: eight “internal” centers support model training/validation, while five “external” centers are reserved for zero-shot generalization testing. The dataset encompasses 12 distinct CMR modalities, with the following k-space counts:
| Modality | k-space acquisitions | Example Parameters |
|---|---|---|
| Cine | 207,126 | TR ≈ 2.8 ms, TE ≈ 1.2 ms, 1.3×1.3 mm² |
| 2D flow | 7,454 | |
| Late Gadolinium Enhancement (LGE) | 8,580 | TR ≈ 3.5 ms, TE ≈ 1.4 ms, 1.4×1.4 mm², TI ≈ 250–350 ms |
| T1 Mapping | 54,301 | Variable TI, 1.5×1.5 mm² |
| T2 Mapping | 15,944 | Variable TE, 1.5×1.5 mm² |
| T1ρ Mapping | 68 | |
| First-pass Perfusion | 46,441 | |
| Aortic Imaging | 46,836 | |
| Black-blood | 1,484 | |
| T1-weighted | 3,370 | |
| T2-weighted | 3,810 | |
| Tagging | 31,188 |
Scanner hardware diversity includes 15 models from four vendors (Siemens, United Imaging UIH, GE, Philips) spanning 0.55 T, 1.5 T, 3 T, and 5 T field strengths. Participant diagnoses cover 17 ICD-10 categories (e.g., I42.0 for dilated cardiomyopathy, I25 for coronary artery disease, I40 for myocarditis, Q20–Q28 for congenital heart disease).
Structured metadata covers both participant (age, sex, height, weight, BMI, diagnostic codes) and scan (center code, vendor/model, field strength, sequence, anatomical view, spatial resolution, TR, TE, flip angle, number of coils) levels, with each k-space file paired to a harmonized CSV metadata record.
2. Acquisition Protocols and Preprocessing Workflow
Data acquisition protocols are tailored per modality, with representative scan parameters including (for cine): TR ≈ 2.8 ms, TE ≈ 1.2 ms, in-plane resolution ≈ 1.3×1.3 mm², slice thickness 8 mm, and 30 cardiac phases; for LGE: TR ≈ 3.5 ms, TE ≈ 1.4 ms, resolution ≈ 1.4×1.4 mm², inversion time 250–350 ms; for T1/T2 mapping: variable TI/TE, resolution ≈ 1.5×1.5 mm². All raw data undergoes coil compression to 10 virtual channels via the Zhang et al. method (Mag Reson Med 2013), guaranteeing standardization across institutions.
Three retrospective k-space undersampling patterns are included:
- Uniform Cartesian
- Random Cartesian
- Radial
The acceleration factor for undersampling is defined as:
where is the number of phase-encoding lines in the fully-sampled k-space, and excludes autocalibration (ACS) lines.
Undersampling masks are generated on-the-fly for ranging from to , with ACS regions defined as either 20 central lines (Cartesian) or a central block.
A unified preprocessing pipeline ensures data consistency:
- Coil compression of raw k-space to ,
- Noise normalization (per-coil mean subtraction, division by standard deviation on nonzero readouts),
- Cropping/padding to a standard matrix,
- Storage in MATLAB ".mat" format, metadata in parallel CSV.
An example preprocessing pseudocode, as provided, is:
1 2 3 4 5 6 7 8 9 |
function preprocess(raw_kspace):
[k10, C] = coil_compress(raw_kspace, 10)
for c = 1:10
μ = mean(k10[c,:,:]); σ = std(k10[c,:,:]);
k10[c,:,:] = (k10[c,:,:] – μ)/σ;
k10 = center_crop(k10, [512,246])
save_mat(“sample.mat”, k10)
save_csv(“sample.csv”, meta_dict)
return |
3. Metadata Design and Harmonization
Each scan is annotated with a standardized set of fields, stored per-instance in matched CSV files:
- Center_id, subject_id, scan_id, date
- Vendor, model, field_strength_T, coils, sequence name, anatomical view, in-plane FOV, matrix_size, slice_thickness_mm, TR_ms, TE_ms, flip_angle_deg
- Clinical: age, sex, height, weight, BMI, ICD10_codes (semicolon-separated)
Vendor-specific DICOM tags are remapped to a unified ontology, achieving harmonization across institutional protocols. Metadata are leveraged for conditioning text-aware foundation models (e.g., CardioMM), and support subgroup analyses by scan type, field strength, population demographics, and pathology. The metadata schema facilitates complex queries such as extracting all 2D flow scans at 3 T from Siemens devices for hypertrophic cardiomyopathy in subjects over 50 years.
4. Quantitative Metrics and Dataset Statistics
Reconstruction and image quality are quantitatively assessed using several established metrics:
- PSNR: , with as the maximum pixel magnitude.
- SSIM: as defined by Zhao et al. (IEEE TIP 2004).
- : , calculated as mean myocardial magnitude over background noise standard deviation.
- g-factor mapping for parallel imaging: , where is the encoding matrix and the coil covariance.
- reconstruction error:
Dataset-wide summary statistics:
| Demographic/Parameter | Value |
|---|---|
| Age (mean ± SD, range) | 54 ± 16 years (5–85) |
| Sex distribution | 53% male, 47% female |
| Field strengths | 0.55 T (2%), 1.5 T (40%), 3 T (50%), 5 T (8%) |
| Acceleration factors | 4×–8× (30%), 8×–16× (40%), 12×–24× (30%) |
A plausible implication is that this scale and diversity establish an empirically robust substrate for large-scale generalization studies in CMR reconstruction.
5. Organization, Access, and Usage
The repository employs a clear directory tree, separating data for training, validation, internal test, and external zero-shot generalization test as follows:
1 2 3 4 5 6 7 8 9 10 |
MMCMR-427K/
train/
center01/
sub-001_LGE_uniform_8x.mat
sub-001_LGE_uniform_8x.csv
...
center02/...
val/
test_internal/
test_external/ |
The file naming scheme is: {centerID}_{subID}_{sequence}_{pattern}_{R}x.mat/.csv. Data splits consist of:
| Subset | k-space files | Scans | Unique Participants |
|---|---|---|---|
| Training | 241,526 | 3,400 | 789 |
| Validation | 26,836 | — | — |
| Internal Test | 75,753 | 1,495 | 320 |
| External Test | 110,186 | 1,225 | 395 |
A standardized code example for loading one sample in Python/PyTorch is provided:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 |
import scipy.io import pandas as pd import torch def load_sample(mat_path, csv_path): m = scipy.io.loadmat(mat_path) kspace = m['kspace'] # shape [coils, nx, ny], complex64 meta = pd.read_csv(csv_path) # single-row DataFrame real = torch.from_numpy(kspace.real).float() imag = torch.from_numpy(kspace.imag).float() x = torch.stack([real, imag], dim=0) # [2, coils, nx, ny] return x, meta k, meta = load_sample( "MMCMR-427K/train/center01/sub-001_cine_random_16x.mat", "MMCMR-427K/train/center01/sub-001_cine_random_16x.csv" ) |
6. Applications and Research Significance
MMCMR-427K serves as the primary substrate for developing and evaluating generalist CMR reconstruction and analysis models across heterogeneous and accelerated imaging scenarios. Notably, it has been used for training the CardioMM foundation model, which leverages semantic contextualization with physics-informed data consistency for robust reconstruction across diverse protocols, scanners, and patient pathologies. Comprehensive benchmarking demonstrates state-of-the-art reconstruction performance and strong zero-shot adaptation to unseen clinical environments, even at acceleration factors up to , with retention of clinically salient cardiac phenotypes and quantitative myocardial biomarkers. These properties position MMCMR-427K as a critical bridge toward scalable, high-throughput, and clinically-accessible CMR diagnostics (Wang et al., 25 Dec 2025).