Papers
Topics
Authors
Recent
2000 character limit reached

ECG Foundation Models: Scalable Deep Learning

Updated 13 January 2026
  • ECG Foundation Models are scalable deep learning frameworks pre-trained on vast, heterogeneous ECG data to generate versatile representations for multiple clinical tasks.
  • They employ advanced architectures such as Transformer-based backbones, convolutional networks, and mixture-of-experts to capture temporal and spatial characteristics in ECG signals.
  • Efficient fine-tuning methods like LoRA, linear probing, and adapter-tuning allow these models to achieve significant diagnostic accuracy improvements with minimal additional resource overhead.

Electrocardiogram (ECG) foundation models are large-scale, pre-trained deep learning architectures designed to learn general-purpose representations from massive and heterogeneous ECG datasets. These models are not tied to a specific diagnostic task but instead provide a flexible backbone that can be adapted—often with minimal effort—for a variety of downstream clinical applications, including arrhythmia detection, risk factor prediction, demographic estimation, and real-time monitoring. By aggregating data from millions of unlabeled traces and employing advanced self-supervised, contrastive, or generative pretraining strategies, ECG foundation models address the limitations of narrow, task-specific learners and maximize clinical scalability, robustness, and efficiency.

1. Core Architectures and Design Strategies

ECG foundation models span multiple neural network and ensemble design families, optimized for time-series signal processing:

  • Transformer-based backbones: These utilize self-attention mechanisms, including hierarchical temporal blocks (TimesNet), encoder variants tailored for masked modeling (MOMENT), and prompt-based generative models (TEMPO), to capture long-range dependencies, inter-lead relationships, and event temporal context (Xu et al., 28 Nov 2025).
  • Convolutional networks: Architectures such as convolutional encoder–transformers (ECG-FM), RegNet-style CNNs (ECGFounder), and hybrid ConvNeXt backbones (TolerantECG) focus on spatial and temporal feature extraction, often augmented with attention and aggregation modules (Xu et al., 28 Nov 2025, McKeen et al., 2024, Dang et al., 14 Jul 2025, Li et al., 2024).
  • Mixture-of-Experts and Ensemble designs: Recent ensemble frameworks, notably "EnECG," integrate multiple specialized foundation models, including TimesNet, DLinear, MOMENT, TEMPO, and ECG-FM, each pre-trained for time-series and clinical ECG tasks, coordinated via a low-rank adapted gating network (Xu et al., 28 Nov 2025).
  • Multi-modal and graph-aware systems: Models such as CSFM leverage Transformer encoders jointly across ECG, PPG, and textual domains, while FoundationalECGNet includes Graph Attention Networks and wavelet-augmented denoising for improved fidelity and interpretability in abnormality detection (Gu et al., 23 Jun 2025, Sk. et al., 10 Sep 2025).

2. Pretraining Objectives and Data Regimes

Foundation ECG models are consistently pre-trained on large, heterogeneous datasets—often exceeding one million recordings—using specialized objectives:

Diverse pretraining corpora include MIMIC-IV-ECG, Harvard-Emory ECG Database (HEEDB), PhysioNet, PTB-XL, CODE-15, Chapman-Shaoxing, and ambulatory/wearable collections, with preprocessing pipelines standardizing sampling rates (250–500 Hz), lead configurations, and segment durations (5–10 s typical; up to hours for ambulatory data) (Li et al., 2024, Wan et al., 2 Mar 2025, Xu et al., 28 Nov 2025, Dang et al., 14 Jul 2025, Lunelli et al., 12 Sep 2025).

3. Adaptation, Fine-tuning, and Efficient Transfer

Most ECG foundation models are designed for parameter-efficient adaptation to downstream tasks:

  • LoRA and adapter-tuning: Parameter-efficient LoRA (Low-Rank Adaptation) is often applied only on newly attached output layers or gating heads, freezing >99% of backbone parameters, as in EnECG (Xu et al., 28 Nov 2025). This helps reduce computation and memory demands—EnECG peak memory <10GB (five tasks), compared to ≥12GB for full fine-tuning per backbone.
  • Linear probing and lightweight heads: A frozen backbone + trainable linear head can deliver strong classification/regression (e.g., ECG-FM achieves AUROC 0.930, AUPRC 0.735 under linear probe), confirming feature generality (McKeen et al., 2024, Xu et al., 28 Nov 2025).
  • Ensemble learning and MoE: Dynamic mixture-of-experts strategies outperform static or zero-shot ensembles, saturating accuracy with N=5 (up to +15% F₁ loss for smaller ensembles) (Xu et al., 28 Nov 2025).
  • Preview linear probing and stochastic depth: Post-training strategies introduce a brief, frozen linear probing phase and stochastic depth regularization, closing the gap between large pre-trained FMs and specialized models, with gains up to +3.3% AUROC and +20.9% AUPRC on PTB-XL (Zhou et al., 16 Sep 2025).

4. Multi-task Learning and Evaluation Protocols

A defining trait of ECG foundation models is simultaneous optimization for diverse downstream tasks, often within a unified framework:

  • Typical multi-task suite (EnECG):
    1. RR-interval estimation (regression)
    2. Age estimation (regression)
    3. Sex classification (binary)
    4. Potassium abnormality detection (binary; rare, ~3% incidence)
    5. Arrhythmia detection (multiclass, e.g. 15-way)

Joint loss is a weighted sum over per-task losses:

L(θ,A,B,ψ)=tλtLt(yt,y^t)L(\theta, A, B, \psi) = \sum_t \lambda_t L_t(y_t, \hat{y}_t)

5. Empirical Gains, Robustness, and Clinical Impact

ECG foundation models have demonstrated substantial performance improvements and practical gains:

  • Accuracy improvements: EnECG attains RR MAE 87.7 ±6.4 (vs. 141.5), age MAE 12.97 ±0.61 (vs. 13.41), sex F₁ 0.69, K⁺ F₁ 0.53 (vs. 0.50), arrhythmia accuracy 0.76 (vs. 0.66), statistically significant at p<0.05 across seeds (Xu et al., 28 Nov 2025).
  • Resource and memory efficiency: EnECG achieves state-of-the-art accuracy with <0.1% of backbone parameters adapted, ≤5% increase in FLOPs/sample, and supports real-time (<0.1 s/patient) clinical workflows on commodity GPUs (Xu et al., 28 Nov 2025).
  • Robustness to missing data and noise: TolerantECG is robust to arbitrary lead subsets and realistic noise scenarios, outperforming baselines across PTB-XL and MIT-BIH test conditions (Dang et al., 14 Jul 2025). AnyECG exhibits superior performance with only 1–4 leads and under strong noise/heterogeneity, driven by dedicated tokenization and denoising stages (Wang et al., 2024).
  • Label efficiency and data scaling: Pretrained models (e.g., ECG-JEPA, ECG-CPC) achieve up to 9× label efficiency on structure-function tasks; pretraining gains are invariant under subsampling for N∈250,1000.
  • Multimodal and cross-domain generalizability: CSFM transfers robustly across ECG, PPG, and clinical text, maintaining high accuracy (e.g., SBP MAE 4.42 mmHg, macro-F₁ 0.328) under variable lead configurations and device types (Gu et al., 23 Jun 2025). EchoingECG models uncertainty for ECG→ECHO prediction, outperforming prior deterministic and multimodal baselines in zero- and few-shot regimes (Gao et al., 30 Sep 2025).

6. Limitations and Future Directions

Despite rapid advances, current ECG foundation models remain limited by several factors:

  • Domain gaps and task coverage: Most models excel in adult ECG interpretation; gaps persist for cardiac structure/function prediction, high-dimensional clinical outcomes, and patient characterization (Al-Masud et al., 29 Sep 2025).
  • Pretraining data heterogeneity: Methodological differences in training corpora and preprocessing hinder direct, architecture-only comparisons (Lunelli et al., 12 Sep 2025, Li et al., 2024).
  • Model interpretability and trust: Transformer and deep CNN FMs are opaque; saliency map alignment to clinical landmarks has improved transparency, but regulatory-grade explainability awaits standardization (McKeen et al., 2024, Dang et al., 14 Jul 2025).
  • Scaling laws and efficiency: While data scaling experiments show saturation at ~60–70% of SSL pool size (BYOL/MAE), marginal returns for contrastive-only objectives (SimCLR) require larger datasets, raising resource constraints (Wan et al., 2 Mar 2025).
  • Multimodal, federated, and privacy-preserving expansion: Integrating ECG with other biosignals, demographics, and EHR at scale is a frontier; federated learning and privacy-preserving strategies remain early-stage (Han et al., 2024).

Prominent future extensions include hierarchical MoE with class-specific gating, unified joint pretraining on comprehensive ECG corpora, multi-modal late fusion (ECG, PPG, text, imaging), adaptive expert selection, and deeper generalization benchmarking (Xu et al., 28 Nov 2025, Gu et al., 23 Jun 2025, Wan et al., 2 Mar 2025, Al-Masud et al., 29 Sep 2025, Han et al., 2024).

7. Summary Table of Key Models and Innovations

Model/System Pretraining Regime Innovation Principal Gains or Findings Reference
EnECG Ensemble + LoRA/MoE Efficient adapters, multi-expert fusion +50% memory reduction, SOTA accuracy (Xu et al., 28 Nov 2025)
ECG-FM Contrastive + generative Masked contrastive, saliency, open weights AUROC 0.935 (LVEF<40%), robust (McKeen et al., 2024)
CSFM Masked Transformer Multimodal, channel-agnostic Robust transfer, low memory (Gu et al., 23 Jun 2025)
TolerantECG ConvNeXt + duo-distill Robust to missing/noisy leads Best/2nd-best PTB-XL, MIT-BIH (Dang et al., 14 Jul 2025)
AnyECG Tokenizer + CMA Rhythm codebook, proxy-task synergy +6% multi-task gain, SOTA anomaly/long (Wang et al., 2024)
CLEF ResNeXt + risk-weighted Clinically-guided contrastive loss +2.6% AUROC, robust single-lead (Shu et al., 1 Dec 2025)
ECGFounder RegNet CNN, PU loss Large-scale supervised backbone 150 labels, expert-level AUC ≥0.95 (Li et al., 2024)
xECG (BenchECG) xLSTM + SimDINOv2 Linear complexity, robust pretraining SOTA BenchECG score 0.868, long-context (Lunelli et al., 12 Sep 2025)
CardX (ExChanGeAI) MoE (4 experts), router Privacy-preserving, plugin platform 6× fewer params, strong external F1 (Bickmann et al., 17 Mar 2025)
EchoingECG Probabilistic CLIP Uncertainty-aware ECG→ECHO SOTA zero/few-shot echo prediction (Gao et al., 30 Sep 2025)

Foundation models for ECG analysis now enable high-accuracy, efficient, and generalizable cardiac diagnostics across large, diverse datasets, supporting robust multi-task frameworks, resource-efficient adaptation, and clinical deployment within standard hospital or edge hardware environments.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (16)

Topic to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to ECG Foundation Models.