MIMIC-IV-ECG Dataset Overview

Updated 27 February 2026

MIMIC-IV-ECG is a comprehensive ECG dataset comprising over 795,000 ten-second, 12-lead recordings linked with rich clinical metadata.
It integrates raw waveform data, high-resolution ECG plots, beat-level quantitative features, and AI-generated textual interpretations for multimodal analysis.
Researchers use this dataset to benchmark AI models in arrhythmia classification, interval estimation, survival analysis, and laboratory value prediction.

MIMIC-IV-ECG (“Medical Information Mart for Intensive Care, Version IV – Electrocardiogram”) is a large-scale, multi-institutional, publicly available dataset comprising more than 795,000 ten-second, 12-lead ECG recordings acquired from over 159,000 adult patients in a tertiary academic hospital setting. This dataset provides access to raw signal waveforms, standardized clinical measurements, extensive demographic metadata, and—through recent multimodal extensions—plotted images and machine or expert-generated textual interpretations. The resource, curated under stringent de-identification protocols, is foundational for cardiovascular and multimodal AI research due to its scale, heterogeneity, and precise linkage to clinical and mortality outcomes.

1. Dataset Structure, Clinical Scope, and Access

MIMIC-IV-ECG consists primarily of ten-second 12-lead ECG recordings obtained using clinical-grade hardware (Burdick/Spacelabs, Philips, GE) and digitized at 500 Hz with 12-bit resolution. Each recording is accompanied by machine-extracted fiducial measurements (intervals, axes), device-wise metadata, and, for a large subset, expert-verified narrative reports. Data are formatted in WFDB-compliant binary files with associated headers, and are indexed by unique study and patient identifiers. The recordings encompass a wide diagnostic spectrum, including arrhythmias (AF, flutter, ectopy), conduction blocks (AVB, BBB), ischemic and nonspecific findings, and inpatient/emergency as well as outpatient presentations (Zhang et al., 21 Jul 2025).

All datasets and derived resources such as MEETI or CardioLab maintain strict patient de-identification as mandated by the HIPAA Safe Harbor provision. Public access is managed via the PhysioNet repository (credentialed, data user agreement) with comprehensive file hierarchies organized as top-level patient folders containing study-specific subdirectories and auxiliary tables (e.g., record_list.csv).

2. Multimodal Extensions and Dataset Expansion

The MEETI (“MIMIC-IV-Ext ECG-Text-Image”) extension augments traditional ECG datasets by explicitly synchronizing four modalities:

Raw ECG Waveforms: Stored as WFDB (.dat/.hea), sampled at 500 Hz, preserving full dynamic range.
High-Resolution Images: PNG plots emulating standard paper ECG (25 mm/s, 10 mm/mV, grid), generated via ecg_plot, available at 300 dpi to capture low-amplitude features.
Quantitative Beat-Level Features: Extracted via FeatureDB (adaptive peak detection, multiscale DWT), providing structured HR, RR1/RR2, P/QRS/T amplitudes and durations, PR/QT intervals, rate-corrected QTc (Bazett), and morphology codes, per-lead and per-beat.
Textual Interpretations: Narrative LLM-based (GPT-4o) reports constructed from original MIMIC clinic reports plus parameter vectors, following role-specific, cardiology-focused prompts for parameter-grounded interpretation (Zhang et al., 21 Jul 2025).

Modality alignment is ensured by unique 8-digit study and patient IDs. All four data types are thus directly linked at the granularity of both patient and ECG recording, facilitating multimodal research across signals, images, structured features, and language.

3. Analytical Workflows and Recent Benchmarking

MIMIC-IV-ECG is central to a diverse range of AI tasks, from interval estimation and arrhythmia classification to survival modeling and laboratory-value prediction. Recent studies leverage its scale and heterogeneity to benchmark advanced architectures:

Multimodal Fusion and Interpretation: MEETI enables transformer-based multimodal learning, grounding parametrized waveform analysis in correlated image and text representations. Example conceptual tasks include arrhythmia detection using all four modalities and automated generation of structured narrative interpretations tethered to fiducial measurements.
Beat-Level Analyses: Studies such as Cardioformer extract beatwise segments (centered on R-peaks, resampled to 250 Hz) to train hybrid transformer and ResNet models with multi-granularity patching and hierarchical cross-lead embedding. This approach achieves AUROC of 96.34% (±0.11) on MIMIC-IV-ECG for four-class rhythm classification and demonstrates strong cross-cohort generalization (Mobin et al., 8 May 2025).
Interval Estimation: Deep neural models trained solely on lead-I data (e.g., IKres, a ResNet-18 variant) externally validate on MIMIC-IV-ECG to yield MAEs of 6.4 ms (QRS), 10.8 ms (QT), and 12.0 ms (PR) for fiducial interval estimation, outperforming conventional toolkits by a margin of 2–5× (Alam et al., 2024).
Mortality Prediction: Large-scale survival modeling using deep networks (ResNet, InceptionTime) achieves C-indices of 0.77–0.78 and AUPRC far exceeding outcome prevalence (e.g., 0.45–0.53 vs. 14–28% event rates) for 1- to 10-year all-cause mortality, with explicit benchmarking of survival-specific and classifier–Cox pipelines (Lukyanenko et al., 2024).
Ensemble Learning and Explainability: Ensemble models (CardioForest—optimized Random Forest), trained via extensive cross-validation, yield 94.95% accuracy and 0.88 ROC-AUC for automatic detection of Wide QRS Complex Tachycardia, with SHAP analysis confirming clinical plausibility of feature ranking (e.g., QRS duration as dominant predictor) (Chakma et al., 30 Sep 2025).
Laboratory Value Estimation: The CardioLab framework fuses ECG S4-encoded representations with tabular demographic and vital sign features to predict contemporaneous and near-future laboratory abnormalities (up to 120 min), achieving AUROC >0.70 for 23–26 lab parameters across cardiac, renal, hematologic, metabolic, and immunologic domains (Alcaraz et al., 2024).

4. File Organization, Preprocessing, and Feature Extraction

Typical analytic workflows operate on per-record (ten-second, 12-lead) WFDB files. Preprocessing includes artifact rejection, amplitude normalization (as reported), lead or segment extraction, and, where required, synchronization to external modalities (lab, vital, mortality) by nearest timestamp within user-defined windows.

Feature extraction strategies depend on task. For interval estimation, device/dataset-provided fiducial points are used where available. Custom algorithms (e.g., FeatureDB, in MEETI) combine adaptive peak detection and multiscale discrete wavelet transforms for robust beat segmentation and quantification. For explainable AI models, engineered features such as HR variability, axis orientation, and explicit PQRS-T intervals are preferentially included.

All four MEETI modalities are organized under a consistent directory hierarchy, indexed by patient and study. MATLAB structs encapsulate all extracted parameters, LLM outputs, and linkages for programmatic access (Zhang et al., 21 Jul 2025). CardioLab’s multimodal fusion aligns ECG waveform windows, tabular features (demographics, vitals), and contemporaneous/lab window targets across patient-level splits (Alcaraz et al., 2024).

5. Applications, Benchmark Tasks, and Evaluation

MIMIC-IV-ECG and its extensions serve as benchmarks for a spectrum of clinical and methodological tasks, including but not limited to:

Arrhythmia and conduction abnormality classification: Multimodal and sequence-aware architectures.
Automated narrative report generation: LLM-based, parameter-grounded factual synthesis.
Interval measurement and abnormality detection (e.g., PR interval loss, QRS prolongation).
Multimodal self-supervised and contrastive learning: Cross-modal representation alignment.
Mortality prediction: Deep survival analysis with both ECG and demographic covariates.
Laboratory abnormality estimation and monitoring: ECG–tabular late-fusion models for real-time and predictive screening.
Explainable diagnosis: Feature attribution (SHAP) for model transparency and clinical trust (Zhang et al., 21 Jul 2025, Mobin et al., 8 May 2025, Alam et al., 2024, Lukyanenko et al., 2024, Chakma et al., 30 Sep 2025, Alcaraz et al., 2024).

Evaluation protocols employ subject-independent splits, cross-validation, and external holdouts (e.g., lead-I models validated on MIMIC-IV after training on external institutions) with rigorous metrics: AUROC, AUPRC, F1, balanced accuracy, mean absolute error (intervals), and, for survival, concordance index and time-dependent calibration.

6. Limitations, Cohort Bias, and Data Quality

Noted limitations of MIMIC-IV-ECG include:

Representativeness: Source population skews toward adult, high-acuity ICU and emergency admissions. Pediatric and outpatient cohorts are underrepresented.
Annotation Quality: Only machine-extracted fiducials and clinical reporting are universally present; no pixel-level or manual beat-label annotations are provided apart from selected waveform-note links.
LLM Interpretations: LLM-generated narrative outputs (e.g., GPT-4o in MEETI) may encode model-specific biases and are not a substitute for clinical review.
Automated Features: Beat-segmentation and morphology extraction algorithms may fail on rare patterns, and no human adjudication of extracted features exists.
Cross-Institutional Shift: External validation studies report C-index or AUROC drops (Δ ≈ 0.08–0.15) when models are transferred between MIMIC-IV and other hospital ECG datasets, underscoring the need for site-specific fine-tuning (Zhang et al., 21 Jul 2025, Lukyanenko et al., 2024).

7. Future Directions and Research Utility

MIMIC-IV-ECG, particularly with extensions like MEETI, has established itself as a canonical resource for developing next-generation multimodal ECG AI models. Future work is anticipated to:

Extend the dataset with additional waveform modalities (Holter, stress tests) and pixel-level annotations for segmentation.
Incorporate dynamic continuous learning pipelines for improved generalization and temporal adaptation.
Systematically embed explainable AI outputs into clinical decision support to facilitate transparency and regulatory compliance.
Validate multimodal and fusion models in diverse, prospective, and pediatric settings.
Develop and report unified benchmarks for ECG-based diagnosis, prognosis, and laboratory value monitoring tasks across major academic centers (Zhang et al., 21 Jul 2025, Mobin et al., 8 May 2025, Alcaraz et al., 2024).

In summary, MIMIC-IV-ECG provides the research community with a rigorously curated, multimodal, and widely adopted foundation for clinically focused AI model development, enabling comprehensive benchmarking, exploration of under-explored inference and monitoring tasks, and robust external validation.