Papers
Topics
Authors
Recent
2000 character limit reached

PPG Foundation Models

Updated 25 January 2026
  • PPG foundation models are large-scale, self-supervised neural networks that learn transferable optical pulse waveform representations for diverse diagnostic and clinical tasks.
  • They integrate multi-modal data and domain-specific constraints using 1D CNNs, transformers, and masked reconstruction techniques to enhance key vital sign estimations.
  • They enable efficient edge deployment via model compression, distillation, and quantization while addressing challenges like fairness and domain shift.

Photoplethysmography (PPG) foundation models are large-scale, self-supervised or contrastively pretrained neural networks that ingest unlabeled or weakly labeled optical pulse waveform data to learn general-purpose signal representations. These embeddings are explicitly designed to transfer across a wide array of physiological, diagnostic, and clinical monitoring tasks. In contrast to task-specific models, which are limited by narrow data regimes and restricted scope, PPG foundation models synthesize diverse morphological, temporal, and hemodynamic features, often exploiting multi-domain, multi-modal, or physically motivated constraints to achieve scalability and robustness across both wearable and clinical settings (Saha et al., 3 Feb 2025, Pillai et al., 2024, Abbaspourazad et al., 2023, Nie et al., 3 Nov 2025).

1. Model Families, Architectures, and Training Objectives

PPG foundation models have diversified rapidly, incorporating a range of neural backbones, self-supervised objectives, and specialized constraints. The primary architectural families include 1D convolutional neural networks (CNNs) (Pillai et al., 2024, Ding et al., 2024, Saha et al., 3 Feb 2025), transformer-based autoregressive decoders (Chen et al., 11 Mar 2025, Kataria et al., 12 Feb 2025, Ni et al., 23 Sep 2025, Kataria et al., 16 Oct 2025), multi-modal encoders for cross-physiological alignment (Nie et al., 3 Nov 2025, Tóth et al., 10 Feb 2025), and vision-based models utilizing 2D signal transformations (Kataria et al., 11 Oct 2025, Thukral et al., 18 Jan 2026).

Model Backbone Pretext Task Parameter Range
PaPaGei 1D ResNet Subject/morphology contrast, regression 5M–35M
GPT-PPG/PPG-GPT Causal Transformer Next-patch prediction, masked modeling 19M–1B
SiamQuality ResNet-50/101/152 Artifact-invariant SimSiam 25M–60M
Pulse-PPG 1D ResNet-26 Masked motif reconstruction, contrastive 28.5M
AnyPPG Multi-encoder (Net1D) InfoNCE alignment to ECG 5.85M/modality
CEReBrO Transformer encoder Masked-patch on EEG; fine-tuned to PPG 3.6M–85M
Vision4PPG ViT (DINOv3, SigLIP) CLIP/iBOT pretraining, imagified PPG 86M

Self-supervised objectives span contrastive instance or subject discrimination (Abbaspourazad et al., 2023, Pillai et al., 2024, Saha et al., 3 Feb 2025), morphology- or artifact-aware contrast (Pillai et al., 2024, Ding et al., 2024), masked region or motif reconstruction (Saha et al., 3 Feb 2025, Thukral et al., 18 Jan 2026, Kataria et al., 11 Oct 2025), autoregressive patch prediction via mixture distributions (Chen et al., 11 Mar 2025, Kataria et al., 12 Feb 2025), and multi-modal physiological alignment (Nie et al., 3 Nov 2025). Pretraining datasets scale from tens of millions to >100 million waveform segments, sourced from heterogeneous settings (ICU monitors, field wearables, sleep labs) and device types.

2. Representation Learning: Domain Knowledge and Signal-Informed Objectives

A defining trend is the incorporation of domain knowledge via morphologically informed contrastive losses, motif-aware reconstruction, artifact-invariance, and physics-guided constraints. PaPaGei integrates both patient and morphology indices (sVRI, IPA, SQI) into its loss (Pillai et al., 2024). SiamQuality explicitly aligns "good" and artifacted waveforms from the same subject, yielding representations invariant to quality but sensitive to underlying physiology (Ding et al., 2024). Pulse-PPG leverages motif-based distance metrics and masked motif reconstruction to capture physiologically meaningful substructures robust to field noise (Saha et al., 3 Feb 2025). The MMR framework applies masked time-frequency wavelet coefficient reconstruction to enforce scale-aware feature fusion (Thukral et al., 18 Jan 2026).

Transformers such as GPT-PPG/PPG-GPT employ patchification and logit-Laplace or MSE objectives tailored to continuous-valued, locally correlated signals (Chen et al., 11 Mar 2025, Kataria et al., 12 Feb 2025, Ni et al., 23 Sep 2025). Multi-modal models (AnyPPG, CEReBrO) align PPG with synchronized ECG (or EEG) in a shared latent space, using symmetric InfoNCE or similar cross-modal objectives, thereby integrating electrophysiological priors and enhancing discriminability (Nie et al., 3 Nov 2025, Tóth et al., 10 Feb 2025).

3. Downstream Transfer: Task Coverage, Evaluation, and Fine-Tuning Gains

PPG foundation models enable strong linear-probe and fine-tuning performance across a vast spectrum of tasks, including vital sign regression (heart rate, SpOâ‚‚, blood pressure), arrhythmia (AF) detection, stress and activity classification, sleep staging, risk prediction (e.g., cardiac arrest, hypertension), and even multi-organ disease profiling (Nie et al., 3 Nov 2025, Chen et al., 11 Mar 2025, Abbaspourazad et al., 2023, Panchumarthi et al., 20 Sep 2025, Kataria et al., 12 Feb 2025, Pillai et al., 2024).

Key benchmark results:

  • PaPaGei achieves +6.3% AUROC and +2.9% MAE improvement over prior open models on 14/20 tasks across 10 datasets, including strong robustness to skin-tone variation (no large systematic bias found) (Pillai et al., 2024).
  • Pulse-PPG pretraining on field data consistently outperforms clean clinical pretraining on 10/11 tasks, demonstrating improved robustness to domain shift (Saha et al., 3 Feb 2025).
  • AnyPPG yields a mean 12.8% reduction in MAE (regression) and 9.1% gain in AUC (classification), with multi-organ ICD-10 AUCs >0.8 for 13 categories, including non-cardiovascular conditions (Nie et al., 3 Nov 2025).
  • GPT-PPG (85M–1B): meets or exceeds prior SOTA on heart rate (WESAD MAE 4.98–5.42 BPM), AF (F1 up to 0.847), and blood pressure estimation suites (Chen et al., 11 Mar 2025). Fine-tuned 345M–1B models further improve cardiac event prediction (AUROC up to 0.82, 1 h pre-event) (Kataria et al., 12 Feb 2025).
  • SiamQuality (ResNet152): establishes new SOTA on respiration rate from PPG (BIDMC, MAE 0.89) and high-coverage AF detection (Stanford, F1 0.71) via artifact-tolerant representations (Ding et al., 2024).
  • Wavelet-based MMR: yields best overall AUROC (66.7%) and regression MAE (9.67) among all recent SSL baselines and open-source PPG foundation models (Thukral et al., 18 Jan 2026).
  • Vision4PPG: Vision transformer FMs, when paired with time–frequency "imaged" representations, achieve SOTA blood pressure regression with LoRA fine-tuning (e.g., Aurora-Osc, MAE 5.32/18.57 mmHg SigLIP-2) and close the gap to dedicated 1D FMs in other vital and lab measures (Kataria et al., 11 Oct 2025).

Robust downstream transfer is associated with model scale, pretraining data diversity, and integration of signal-aware or domain-knowledge constraints. Specialist PPG FMs typically outperform generalist models on regression (e.g., vital signs), while generalists (MOMENT) excel in cross-domain classification and "emergent" phenotype tasks but lag for fine-grained quantification (Kataria et al., 16 Oct 2025).

4. Efficiency, Compression, and Edge Deployment

Transferring large PPG FMs to resource-constrained environments requires model compression and adaptation techniques:

  • PPG-Distill demonstrates that distilling a 19M GPT-PPG into a 1M student yields a 17.3% lower MAE with 19× fewer parameters and 7× faster inference (4.7 ms per batch), making real-time on-device monitoring feasible (Ni et al., 23 Sep 2025).
  • Dynamic INT8 quantization of CEReBrO reduces model footprint by 3.5–4× with <0.1% accuracy loss, enabling real-time (<10ms per 10 s window) edge inference below 4MB memory (Tóth et al., 10 Feb 2025).
  • Vision4PPG’s PEFT (LoRA) adapters add only 3% of original ViT parameter count yet reach or surpass full fine-tuning performance (Kataria et al., 11 Oct 2025).
  • Model size scaling beyond ~5–35M parameters shows little benefit for PPG specialist models (PaPaGei, SiamQuality), indicating optimal edge-appropriate architectures are compact yet domain-specialized.

5. Fairness, Domain Shift, and Bias Mitigation

Demographic and device-related fairness remains an unresolved challenge:

  • FairTune analysis of PPG-GPT reveals that naive fine-tuning substantially reduces error (e.g., 80% MAE reduction) but can exacerbate gender gaps under domain shift, especially for larger models and across clinical vs. consumer device transitions (Panchumarthi et al., 20 Sep 2025).
  • Inverse-frequency weighting (IF) and GroupDRO reduce demographic MAE gaps by 30–60% with minimal accuracy loss, outperforming adversarial debiasing (Panchumarthi et al., 20 Sep 2025).
  • Artifact-tolerant self-supervision (SiamQuality) and diversity of pretraining data (Pulse-PPG) further support generalization to noisy and demographically heterogeneous populations (Ding et al., 2024, Saha et al., 3 Feb 2025).

Skin-tone fairness benchmarks (PaPaGei) demonstrate generally small bias, with marginally higher error on darker tones, highlighting the need for broader training corpora (Pillai et al., 2024).

Several important trends and open questions define the cutting edge of PPG foundation models:

  • Multiscale and multi-modal integration: Cross-modal physiological alignment (PPG–ECG/EEG) injects electrophysiological constraints, improving transfer to non-cardiac phenotypes (Nie et al., 3 Nov 2025, Tóth et al., 10 Feb 2025). Time-frequency representations (wavelet or STFT transforms) drive further gains, particularly for models with ViT or cross-scale attention (Thukral et al., 18 Jan 2026, Kataria et al., 11 Oct 2025).
  • Data-centric robustness: Exposure to field data and real-world noise during pretraining (Pulse-PPG, Apple/AHMS) outperforms clinical-only data on both clinical and wearable tasks (Saha et al., 3 Feb 2025, Abbaspourazad et al., 2023).
  • Task and deployment specialization: Full-model fine-tuning gives maximal gain for specialist regression tasks; generalist models and PEFT suffice for classification and emergent phenotypes (Kataria et al., 16 Oct 2025).
  • Interpretability and physiological grounding: Despite strong empirical results, explicit mapping from foundation model embeddings to mechanistic physiological variables remains an open problem, except for models embedding physics or morphological constraints (e.g., PaPaGei, MMR, PPG-GPT with physics losses) (Pillai et al., 2024, Thukral et al., 18 Jan 2026, Farhadloo et al., 20 Feb 2025).
  • Resource constraints: Compression, quantization, and efficient student-teacher distillation are active areas of research for scaling deployment to wearables and edge devices (Ni et al., 23 Sep 2025, Tóth et al., 10 Feb 2025).
  • Open challenges: There is a need for standardized benchmarks (cross-domain tasks, demographic subpopulations, device settings), systematic studies on model fairness, explainability, and the integration of multimodal sensor streams (PPG+ECG+motion), as well as dynamic retrieval-augmented foundation model frameworks (Farhadloo et al., 20 Feb 2025).

PPG foundation models thus represent a maturing paradigm, unifying advances in self-supervised learning, physiological signal processing, and device-centric AI to create generalizable, robust, and increasingly interpretable representations for digital health (Thukral et al., 18 Jan 2026, Pillai et al., 2024, Chen et al., 11 Mar 2025, Abbaspourazad et al., 2023, Saha et al., 3 Feb 2025, Ni et al., 23 Sep 2025, Tóth et al., 10 Feb 2025, Ding et al., 2024, Nie et al., 3 Nov 2025, Kataria et al., 11 Oct 2025, Panchumarthi et al., 20 Sep 2025, Kataria et al., 12 Feb 2025, Kataria et al., 16 Oct 2025, Farhadloo et al., 20 Feb 2025).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (14)

Topic to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to PPG Foundation Models.