ECGFounder: Foundation Model for ECG Analysis

Updated 20 October 2025

ECGFounder is a large-scale foundation model for ECG analysis that leverages over 10 million expert-annotated recordings to enable multi-label cardiovascular disease diagnosis.
It employs advanced CNN architectures with innovative lead-augmentation and positive-unlabeled training strategies to overcome data noise, missing annotations, and class imbalances.
The model achieves state-of-the-art performance in both internal and external benchmarks and is adaptable for mobile, wearable, and diverse clinical applications.

ECGFounder is a large-scale, general-purpose foundation model for electrocardiogram (ECG) analysis, designed for comprehensive cardiovascular disease diagnosis and broad clinical adaptability. Developed using more than 10 million expert-annotated ECG recordings from the Harvard-Emory ECG Database (HEEDB), ECGFounder leverages convolutional neural network (CNN) architectures combined with advanced training strategies to address the diagnostic challenges posed by noisy, incomplete, and imbalanced real-world ECG datasets (Li et al., 5 Oct 2024). The model supports both multi-lead and single-lead ECG analysis and demonstrates strong performance in internal and external benchmarks. It is built for extensibility in mobile and demographic applications and released as an open-source resource to facilitate broad adoption and further research.

1. Data Scale, Annotation, and Label Framework

ECGFounder is trained on HEEDB, which comprises 10,771,552 clinical 12-lead, 10-second ECGs from over 1.8 million unique subjects. The dataset was annotated using outputs from the Marquette 12SL ECG Analysis Program, and these text-based annotations were further filtered and mapped by physician review to yield 150 clinically relevant diagnostic categories. These annotation categories capture a spectrum of phenomena, including rhythm abnormalities, conduction disorders, myocardial infarction, hypertrophy, and detailed waveform morphology labels. This label set enables the model to function as a robust multi-label classifier suited to the multifaceted nature of clinical ECG diagnosis.

A distinguishing challenge is that real-world clinical ECG datasets are “noisy,” containing missing, incomplete, or imprecise annotation. ECGFounder adopts a positive-unlabeled (PU) training framework to address class imbalance and partial labeling. The custom loss function attenuates the weight of negative samples near the decision boundary:

$\mathcal{L} = -(\gamma - p) p^2,$

where $\gamma$ is a hyperparameter (set to 1.5) and $p$ is the model's predicted probability for a given diagnosis. This loss formulation prevents the model from over-relying on absent labels and mitigates bias toward true negatives during optimization.

2. Model Architecture and Training Regimen

ECGFounder employs a RegNet architecture with stage-wise design, supporting variable scaling of channels and blocks to capture both temporal (within-lead) and spatial (inter-lead) characteristics. This architecture processes $12\times5000$ input tensors per record, corresponding to standard 12-lead clinical settings. The pipeline integrates sequential convolutional blocks and skip connections, facilitating robust learning of cardiac morphology and rhythm.

Training utilized the AdamW optimizer, with an initial learning rate of 0.001 decaying by a factor of 10 every 5 epochs across a 20 epoch maximum schedule. Batch size during pretraining was 1024. A key innovation is the “lead-augmentation” method: for wearable and mobile contexts where only a single limb lead (often lead I) is available, synthetic signals for other leads are generated using axis inversion, allowing the model to be generalized and validated in both single-lead and multi-lead formats.

3. Diagnostic Performance and Baseline Comparisons

In internal validation on held-back test data (834,926 ECGs), ECGFounder achieved an average area under the ROC curve (AUROC) of approximately 0.968 across twenty key diagnostic categories, with mean sensitivity near 0.971 and specificity near 0.937. For 82 out of 150 diagnostic labels, AUROC exceeded 0.95, and for 112 categories, AUROC exceeded 0.90.

On external datasets (e.g., CODE-test, PTB-XL, PhysioNet Challenge-2017), the model outperformed contemporary baselines (such as S12L-ECG, CTN, and ECG-SE-ResNet) by 1–2 points in AUROC on 12-lead diagnosis, with single-lead atrial fibrillation detection in the 0.957–0.975 AUROC range. Direct comparison with expert cardiologists in a committee-verified internal set revealed that while F1 scores were similar or higher, model sensitivity was often superior.

Fine-tuning on downstream tasks—demographics (age/sex), clinical event detection (chronic kidney disease, coronary heart disease), laboratory regression (NT-proBNP, left ventricular ejection fraction), and even cross-modality diagnostic targets (e.g., photoplethysmography-based atrial fibrillation)—led to substantial gains versus baseline models (e.g., ECG-SimCLR, ECG-ResNet), improving AUROC by 3–5 points in several tasks.

4. Adaptability: Mobile, Single-Lead, and Downstream Task Fine-Tuning

ECGFounder’s architecture allows for robust adaptability across device and application scenarios. The lead-augmentation training protocol ensures that features associated with axis deviations, conduction abnormality, and rhythm can be detected even from single-lead or lower-rank ECG inputs. This property enables ECGFounder to operate effectively in wearable systems, ambulatory monitors, or resource-constrained clinical environments where comprehensive lead sets may be unavailable.

The model supports fine-tuning for downstream and cross-modality tasks. In standard practice, downstream adaptation is implemented by replacing the classification head with a task-specific linear layer and training for approximately 30 epochs with a reduced learning rate ( $10^{-4}$ ). The optimizer (AdamW) and learning rate scheduler (ReduceLROnPlateau) are retained, preserving pre-trained weights while adapting to domain-specific targets. This transfer learning approach yields robust performance improvements across demographic, event, and biomarker regression tasks.

5. Evaluation in Benchmarks, Limitations, and Comparative Analysis

Systematic external benchmarking, such as the BenchECG suite (Lunelli et al., 12 Sep 2025), indicates that ECGFounder’s CNN-based backbone provides competitive results for standard 12-lead, 10-second multi-label classification. However, the model’s feature aggregation and spatial focus can degrade performance in fine-grained temporal tasks (e.g., R-peak detection), where precise localization is required. In such settings, models with recurrent or xLSTM backbones (e.g., xECG) or state-space models (e.g., ECG-CPC) may exhibit better temporal fidelity and label efficiency.

In large-scale benchmarks covering 1,650 clinical targets across 12 datasets (Al-Masud et al., 29 Sep 2025), ECGFounder consistently ranks among the leading models for adult ECG interpretation (macro-AUROC superiority over strong S4 supervised baselines), but shows moderate relative gains in cardiac structure/function prediction and outcome modeling—areas where more compact SSM-based architectures (ECG-CPC) sometimes outperform.

A post-training strategy—comprising preview linear probing for initialization and subsequent stochastic depth regularized fine-tuning (Zhou et al., 16 Sep 2025)—enables further improvements in macro AUROC (by 1.2%–3.3%) and macro AUPRC (by 5.3%–20.9%), especially benefiting sample efficiency and stability in the face of data scarcity.

6. Applications, Public Availability, and Future Directions

ECGFounder serves as a robust foundation for a spectrum of applications:

Real-time and point-of-care diagnosis in both hospital and ambulatory settings
Wearable/mobile device integration (leveraging lead-augmentation and single-lead robustness)
Demographic classification and laboratory value regression from ECGs
Cross-modality and multi-modal clinical workflows (e.g., integration with PPG or imaging data)

Its public release (planned via platforms such as bdsp.io, Hugging Face, PhysioNet, and https://github.com/bdsp-core/ECGFounder) provides both the model weights and the corresponding curated dataset, allowing the research community to reproduce, fine-tune, and build upon the model for new diagnostic and predictive tasks.

Major avenues for further improvement include expansion to more ethnically and regionally diverse datasets, enhanced explainability via integration of XAI methods for clinical interpretability, augmentation with lifestyle or historical clinical covariates, and enriched natural language processing for annotation extraction.

7. Model Design, Formulas, and Figures

The training loss employed for positive-unlabeled multi-label classification is:

$\mathcal{L} = -(\gamma - p) p^2,$

with $\gamma = 1.5$ , steering the model away from bias toward negatives under missing labels.

The model architecture (as detailed in Figure 1/Supplement S3.1) consists of sequential convolutional blocks with skip connections for robust time-series and spatial feature aggregation.

Evaluation metrics such as AUROC curves (Figure: eval) and detailed fine-tuning results (Figure: ft_results) are used to visually and quantitatively assess comparative performance across internal and external datasets.

In summary, ECGFounder exemplifies the transition from task-specific ECG classifiers to large-scale, general-purpose foundation models for cardiovascular diagnostics. With its extensive coverage, robust architecture, and well-curated release strategy, ECGFounder offers an extensible and high-performance backbone for modern ECG-based AI applications and forms a cornerstone for future advances in clinical cardiology AI research.