Pulse-PPG Foundation Model
- Pulse-PPG is a large-scale model that extracts physiologically significant features from photoplethysmography signals using self-supervised and contrastive learning.
- It integrates convolutional, transformer, and vision architectures to enhance regression, classification, and multi-organ diagnostic tasks.
- The model achieves superior accuracy, generalization, fairness, and on-device efficiency compared to traditional engineered baselines in biomedical applications.
Pulse-PPG Foundation Model defines a class of large-scale, pre-trained neural networks that capture physiologically significant information from photoplethysmography (PPG) waveforms and serve as general-purpose feature extractors and initializations for a diverse set of downstream biomedical tasks. These models use ultra-large corpora of raw or minimally-processed PPG—often synchronized with other biosignals such as ECG—combined with self-supervised or contrastive objectives to learn abstracted representations that transfer to regression, classification, diagnostic, and screening domains. Recent advances illuminate roles for convolutional, transformer, and vision architectures, enable robust and fair usage in real-world settings, and unlock multi-organ health profiling, outperforming highly engineered baselines and prior models in accuracy, generalization, and scalability (Nie et al., 3 Nov 2025, Saha et al., 3 Feb 2025, Ni et al., 23 Sep 2025, Pillai et al., 2024).
1. Architectural Innovations and Pretraining Regimes
Pulse-PPG models employ varied architectures—1D convolutional networks (ResNet, Net1D), temporal transformers (GPT-PPG), dual encoders for multimodal fusion, and vision models for transformed inputs—each exploiting different signal inductive biases:
- Dual encoder for cross-modal alignment: AnyPPG implements a paired PPG/ECG encoder, each Net1D-derived (∼5.85 M params), producing 1024-dim latent vectors per 10 s, 125 Hz segment. Each encoder comprises stacked ConvBlocks and BasicStages with squeeze-and-excitation attention and dropout regularization (Nie et al., 3 Nov 2025).
- Self-supervised objectives: PaPaGei introduces morphology-aware and participant-aware contrastive SSL; on PaPaGei-S, pulses with similar sVRI, IPA, or SQI indices are positive pairs, with multitask loss (Pillai et al., 2024).
- Transformer-based patch prediction: PPG-GPT tokenizes 30 s@40 Hz segments into 1 s patches, trains causally (autoregressive next-patch prediction), RMSNorm, and rotary positional embeddings with scaling to 1B parameters (Kataria et al., 12 Feb 2025, Kataria et al., 16 Oct 2025).
- Image-based representations for vision FMs: Vision4PPG converts PPG to 2D inputs (STFT magnitude/phase or recurrence plots), then fine-tunes robust vision models (DINOv3, SigLIP-2) with LoRA, keeping main weights frozen (Kataria et al., 11 Oct 2025).
Pretraining typically leverages millions of hours and tens of millions of subject-segments from clinical monitors, wearables, and field deployments, with standardized filtering, normalization, and artifact curation or explicit modeling of signal quality for domain robustness (Nie et al., 3 Nov 2025, Saha et al., 3 Feb 2025, Ding et al., 2024).
2. Representation Learning Objectives and Physiological Alignment
Objectives for representation learning are driven by physiological constraints and data availability:
- Cross-modal physiological alignment: AnyPPG aligns PPG and ECG embeddings in a shared space using a symmetric InfoNCE loss:
with learnable temperature τ (Nie et al., 3 Nov 2025).
- Morphology and rhythm modeling: PPG-Distill distills patch-wise local morphology and global rhythm by InfoNCE on patch-pair similarity and smoothL1 on inter-patch distances, ensuring compact student models preserve critical waveform information for arrhythmia and beat-to-beat variation (Ni et al., 23 Sep 2025).
- Motif-based retrieval: Pulse-PPG employs a cross-attention REBAR distance function to align segments by masked-motif reconstruction, followed by relative contrastive loss clustering representations by physiological similarity, not just binary class (Saha et al., 3 Feb 2025).
- Quality-pairing: SiamQuality uses SimSiam-style loss on pairs differing in artifact level but presumed similar physiology, enforcing invariance to real-world corruption (Ding et al., 2024).
- Participant-aware contrastive learning and temporal/objective regularization: Apple Heart Study models use participant-level positive pairs and a KoLeo regularizer for compactness and diversity in the learned embedding space (Abbaspourazad et al., 2023).
3. Downstream Applications and Evaluation
Pulse-PPG foundation models generalize across a spectrum of applications, typically using frozen backbone feature extraction or full fine-tuning:
- Vital signs and regression: Heart rate, systolic/diastolic blood pressure, respiration rate, oxygen saturation, and activity quantification (regression metrics: MAE, R², MAPE), with consistent performance improvements—e.g., AnyPPG HR MAE 9.28 bpm vs. 13.78 for prior best (Nie et al., 3 Nov 2025).
- Classification and detection: Signal quality, stress and affect, atrial fibrillation (AUC, F1, accuracy); e.g., AF detection AUC up to 0.90 with AnyPPG vs. 0.81 for PaPaGei (Nie et al., 3 Nov 2025).
- Multi-organ and disease profiling: AnyPPG and Pulse-PPG are tested for multi-label ICD-10 disease code prediction (AUC >0.8 for major cardiac and several non-cardiac diseases), indicating encoded representations capture systemic physiological and pathological cues (Nie et al., 3 Nov 2025).
- Fairness and transferability: FairTune demonstrates that naive fine-tuning may exacerbate fairness gaps; inverse frequency weighting (IF) and GroupDRO mitigate gender bias without sacrificing accuracy—e.g., reducing group MAE gap by 13%, confirmed by silhouette and MMD analysis (Panchumarthi et al., 20 Sep 2025).
- On-device and wearables: PPG-Distill provides real-time (<1M param) models for smartwatches, with up to 21.8% F1 improvement and 19× compression over teacher (Ni et al., 23 Sep 2025).
- Cardiac event prediction: FEAN applies PPG-GPT embeddings to cardiac arrest prediction using 1–24 h histories, achieving up to 0.82 AUROC for next-hour IHCA and revealing interpretable patient health trajectory in latent space (Kataria et al., 12 Feb 2025).
- Spectral-spatial fusion: FusionPPG yields high-fidelity waveforms from imaging using spectral and spatial priors, supporting arrhythmia analysis (Amelard et al., 2016).
4. Quantitative Benchmarks and Comparative Analysis
Comparative evaluations show Pulse-PPG models outperform prior baselines and other time-series FMs:
| Model | Architecture | Corpus Hours | Regression MAE (HR) | Classification AUC (AF) | Field Generalization |
|---|---|---|---|---|---|
| AnyPPG | Dual Net1D | 109,909 | 9.28 | 0.90 | Yes |
| PaPaGei-S | ResNet-1D | 57,000 | 10.12 | 0.67 | Robust to skin tone |
| Pulse-PPG | ResNet-26 | 21,000 (Field) | 8.94 | 0.54 (Stress F1) | Yes |
| PPG-GPT (Spec) | Transformer | 200M | 7.83 | Varied | High in-domain, poorer OOD |
| PPG-Distill | Small Transf. | 10k+ | ≤8.34 | 0.77 | Efficient, real-time |
| Vision4PPG | ViT | 10B+ (Image) | 8.10 | - | SOTA on BP, vital lab |
Notably, field-trained Pulse-PPG outperforms clinical data pretraining on both field and clinical tasks, challenging conventional lab-to-field paradigms (Saha et al., 3 Feb 2025). Specialist models (e.g., PPG-GPT) yield superior win scores and regression metrics for classic PPG tasks but may lack transferability; generalists excel in cross-domain generalization (Kataria et al., 16 Oct 2025).
5. Robustness, Fairness, and Deployment Considerations
Robustness to noise, artifacts, and domain shifts is achieved through architectural design and training regimes:
- Robust to artifacts: SiamQuality and Pulse-PPG models exhibit flat artifact-tolerance curves, retaining performance across the full corruption spectrum through quality-pairing and curriculum learning (Ding et al., 2024, Saha et al., 3 Feb 2025).
- Skin tone and demographic bias: PaPaGei introduces explicit benchmarking and finds persistent but reduced bias across Fitzpatrick skin tone scales (Pillai et al., 2024).
- Fairness-aware fine-tuning: Bias-aware adaptation with IF, GroupDRO, and adversarial debiasing is shown essential—unmitigated fine-tuning may increase subgroup disparity (Panchumarthi et al., 20 Sep 2025).
- On-device deployment: PPG-Distill achieves sub-1M parameter models with pruning, quantization, and efficient patch-level distillation, allowing deployment on smartwatches and wearables (Ni et al., 23 Sep 2025).
6. Extensions, Future Directions, and Limitations
Current trends and open challenges include:
- Multimodal fusion: Dual-encoder models (AnyPPG) and multimodal plug-in approaches enable fusion of PPG, ECG, accelerometry, and clinical text, for richer bio-profiling (Nie et al., 3 Nov 2025, Pillai et al., 2024).
- Vision FMs: Repurposing large vision transformers for imagified PPG representations achieves state-of-the-art blood pressure and vital lab estimation, expanding the modality and improving parameter efficiency (Kataria et al., 11 Oct 2025).
- Generalist-specialist hybrid strategies: Two-stage pretraining—generalist for broad representations, followed by domain adaptation—may optimize transferability and domain performance (Kataria et al., 16 Oct 2025).
- Representation visualization: Latent manifold tracking (PaCMAP) as in continuous cardiac prediction reveals interpretable patient risk trajectories (Kataria et al., 12 Feb 2025).
- Limitations: Most models rely on self-reported or silver-standard labels in field settings, and demographic/sleep/longitudinal diversity remains underrepresented; application to imaging or tissue-specific PPG (camera-PPG) is early-stage (Saha et al., 3 Feb 2025, Amelard et al., 2016).
A plausible implication is that physiologically-aligned, large-scale self-supervised modeling of PPG—with explicit consideration of real-world artifacts, fairness, and multimodal fusion—will fundamentally broaden the clinical utility and real-world reliability of noninvasive biosignal monitors.
7. Resources and Reproducibility
Open-source implementations of several Pulse-PPG foundation models are provided for reproducible research:
- Pulse-PPG ResNet encoder and code: https://github.com/pulse-ppg (Saha et al., 3 Feb 2025)
- PaPaGei foundation models: https://github.com/nokia-bell-labs/papagei-foundation-model (Pillai et al., 2024)
- SiamQuality model and demo: https://github.com/chengding0713/SiamQuality (Ding et al., 2024)
- PPG-Distill methods: full recipe and ablations in paper (Ni et al., 23 Sep 2025)
Practitioners are advised to deploy bias-mitigation strategies, carefully validate on-device inference speed and memory constraints, and benchmark against demographic and fairness metrics to ensure equitable, robust real-world performance.