AnyPPG: Universal PPG Analysis Framework
- AnyPPG is a comprehensive framework that employs a dual-modal foundation model and sensor-agnostic pipelines for robust, real-time PPG signal analysis.
- It integrates contrastive self-supervised learning, topological data analysis, and CycleGAN approaches to ensure high signal quality and effective artifact removal.
- AnyPPG enables accurate vital sign estimation and multi-organ disease diagnostics, demonstrating broad device generalizability and clinical relevance.
AnyPPG is a framework, methodology, and family of models centered on universal and robust photoplethysmography (PPG) signal analysis. Explicitly, AnyPPG denotes (1) a large-scale foundation model pretrained on synchronized PPG–electrocardiogram (ECG) data for comprehensive health profiling, and (2) a broader class of sensor-agnostic pipelines and models targeting cross-device signal quality assurance, artifact removal, and extensible physiological analytics. AnyPPG integrates advances in deep contrastive learning, topological data analysis, generative adversarial methods, and vision foundation models to enable accurate end-to-end inference—from vital sign estimation to multi-organ disease diagnostics—across diverse hardware, user populations, and clinical/ambulatory scenarios (Nie et al., 3 Nov 2025, Shao et al., 15 Sep 2025, Zargari et al., 2021, Kataria et al., 11 Oct 2025).
1. Foundation Model Architecture and Physiology-Guided Pretraining
At the core of AnyPPG is a dual-modal foundation model for PPG, pre-trained on 109,909 hours of synchronized PPG-ECG from six diverse clinical and sleep datasets encompassing 58,796 subjects. The architecture comprises paired Net1D-based 1D ResNet encoders (≈5.85M parameters/encoder, 7 convolutional stages, global mean pooling, 1024-dimensional outputs) for PPG and ECG signals (input: 10 s/1250 pts @125 Hz). Each encoder output is mapped through an MLP projector: Linear(1024→512)→GELU→Linear(512→256), yielding ℓ₂-normalized 256-dimensional embeddings (Nie et al., 3 Nov 2025).
Contrastive pretraining employs a symmetric InfoNCE loss,
where sim(·,·) denotes cosine similarity and τ is a learnable temperature (initialized at 0.07). This loss ensures physiological alignment: embeddings of true PPG–ECG pairs from synchronized windows are drawn close, enabling PPG encodings to inherit informative cardiac, hemodynamic, and systemic structure from ECG (Nie et al., 3 Nov 2025). Pretraining uses AdamW (lr=5×10⁻⁴, wd=1×10⁻², batch=6144, cosine schedule, gradient clipping=1.0).
2. Cross-Device Signal Quality Assessment and Motion Artifact Mitigation
AnyPPG incorporates a robust, fully unsupervised signal quality assessment (SQA) pipeline leveraging contrastive self-supervised learning and persistent homology. A 1D ResNet-18 encoder is pre-trained by NT-Xent contrastive loss on heterogeneous PPG (different LED wavelengths, sampling rates 25–128 Hz, multiple device classes) using strong augmentations mimicking motion, perfusion, and optical artifacts. The resulting 512-dimensional embeddings are mapped to a 4-dimensional topological signature via persistent homology (number of H₁ classes, sum of H₁ lifetimes, max and mean H₀ lifetimes). HDBSCAN is used for density-based clustering; the largest cluster is designated as "clean" (SQI=1), all others as poor (SQI=0). This approach yields Silhouette=0.72, Davies–Bouldin=0.34, Calinski–Harabasz=6173 on a stratified sample of 10,000 windows and generalizes across new hardware modalities without device-specific tuning (Shao et al., 15 Sep 2025).
For noise/artifact removal, AnyPPG-class deploys CycleGANs trained for direct mapping of noisy to clean PPG in the absence of synchronized accelerometer signals. The model utilizes unpaired training of 256×256 grayscale images transformed from 1D PPG segments (32 Hz sampling, 8 s windows) with generators and discriminators adapted from Johnson et al. and PatchGAN architectures. Objective functions include adversarial, cycle-consistency, and optional identity losses with λ_cyc=10, λ_id=0–5. CycleGAN reconstruction achieves averaged RMSE_G=2.18 BPM, PPE_G=0.95 BPM, representing ≈9.5× improvement over prior art for motion artifact removal (Zargari et al., 2021).
3. Downstream Physiological Analysis and Multi-Organ Diagnostics
The AnyPPG foundation model exhibits state-of-the-art generalizability across both canonical physiological and extended diagnostic tasks. On eleven linear-probe benchmarks spanning six datasets (PPG-DaLiA, UCI-BP, BUT PPG, Gyro-Acc-PPG, WESAD, DeepBeat), it achieves regression MAE reductions of 12.8% (e.g., HR MAE 13.8→9.28 bpm, R² 0.33→0.61) and classification AUC improvement averaging 9.1% over strong baselines. Notable metrics include stress recognition AUC of 0.90 (F1 +16 pt, accuracy +7 pt), and AF detection AUC of 0.90 (F1 0.77, accuracy 0.94) (Nie et al., 3 Nov 2025).
For multi-organ disease diagnosis, full-model fine-tuning on 719 ICD-10 codes (MC-MED) yields 13 conditions with AUC > 0.80 and 137 with AUC > 0.70, spanning cardiovascular (heart failure, valvular disorders, hypertension, AUC 0.74), neurodegenerative (Parkinson’s AUC 0.78, Alzheimer’s 0.77), renal (CKD AUC 0.74), metabolic (T2DM AUC 0.73), ocular (cataract 0.76, glaucoma 0.74), and musculoskeletal domains (osteoporosis, arthritis AUC ≥ 0.73) (Nie et al., 3 Nov 2025).
Physiological analysis reveals that model embeddings encode beat-to-beat interval, waveform morphology, and rhythm, as well as hemodynamic features (vascular stiffness, peripheral resistance) and multi-organ signatures (CKM, autonomic dysregulation). These facilitate deployment for wearable-based monitoring (HR/BP/arrhythmia), early cross-system risk screening, and personalized health profiling (Nie et al., 3 Nov 2025).
4. Device, Modality, and Representation Generalization
AnyPPG methods are designed for broad compatibility, including wrist, finger, and ring-form factor wearables, mobile phone PPG, and clinical monitors. CycleGAN-based artifact removal, fully unsupervised SQA, and the core foundation model pipelines are sensor-agnostic: they require only baseline calibration and minimal transfer learning for new device characteristics (e.g., wavelength, sensor layout) (Zargari et al., 2021, Shao et al., 15 Sep 2025).
Recent expansion includes the use of vision foundation models (VFMs). Vision4PPG demonstrates that DINOv3 ViT and SigLIP-2, fine-tuned via LoRA adapters on PPG-derived 2D representations (STFT, STFT+phase, recurrence plots), achieve or surpass time-series FM performance on blood pressure (e.g., CAS-BP DBP MAE 8.31 mmHg, SBP 13.04 mmHg), heart/respiratory rate, oxygen saturation, and blood lab measures. These vision-based models support efficient parameter tuning (∼0.5 M trainable params, <20 GFLOPs inference) and real-time (<12 ms latency) operation on edge hardware (Kataria et al., 11 Oct 2025). Generalization across 2D input types and tasks is robust, and future work aims to incorporate hybrid and domain-specific vision FMs.
5. Implementation Considerations and Deployment
AnyPPG pipelines are optimized for scalability and computational efficiency at every stage:
- Encoder inference (ResNet-18, 1D): 1–2 ms on Cortex-M7 microcontroller-class hardware.
- Persistent homology: <5 ms per window for 512 points.
- Pre-trained generator models (CycleGAN) can be pruned and quantized to ≤4 MB for low-power wearable deployment; window-level inference time is ~10 ms per second of PPG, with energy overhead ≈100 mW·s.
- VFMs (DINOv3, SigLIP-2) process PPG “images” at up to 90 samples/s (30 s window: ~11 ms) on desktop-class GPUs; PEFT LoRA adapters enable efficient personalization on consumer/mobile compute (Zargari et al., 2021, Kataria et al., 11 Oct 2025, Shao et al., 15 Sep 2025).
For clinical and consumer deployment, guidelines include calibration on a small device-specific clean dataset, online gating (SQA, artifact detection), windowed overlapping inference with smoothing, and direct integration into HR/HRV/BP/arrhythmia/endocrine analytics workflows. For smartphone-based PPG (e.g., blood glucose estimation), the complete pipeline—frame extraction, ALS baseline correction, feature extraction, principal component regression—runs in real time, achieving clinically acceptable accuracy (SEP = 18.31 mg/dL) (Chowdhury et al., 2019).
6. Current Limitations and Prospective Directions
Present limitations include:
- Pretraining almost exclusively on clinical context PPG/ECG, with limited representation for real-world ambulatory wearables and mobile PPG (artifact types, skin tones, form-factor diversity) (Nie et al., 3 Nov 2025).
- External validation of multi-organ disease diagnostic capacity is limited to MC-MED; multicenter and global cohort studies are pending.
- Disease labeling in emergency-department datasets exhibits label noise and heterogeneity, possibly affecting individual AUCs.
- Model interpretability (physiological mechanism attribution) and longitudinal prediction capabilities are not yet addressed.
- Current blood glucose and metabolic analyses are small-cohort studies, requiring further validation and extension (Chowdhury et al., 2019).
Future research directions are poised to:
- Expand pretraining and fine-tuning to wearable-derived data (including wrist/ring/smartphone PPG) with field artifacts.
- Undertake large-scale, diverse, population-based external validations.
- Develop joint foundation models (PPG + accelerometer/SpO₂/temperature).
- Incorporate explicit temporal/longitudinal architectures for disease onset/progression modeling.
- Advance mechanistic explainability for regulatory/scientific transparency.
- Explore richer multi-modal, multi-representation pipelines including VFMs, topological, and generative models (Nie et al., 3 Nov 2025, Shao et al., 15 Sep 2025, Kataria et al., 11 Oct 2025).
7. Related Methodologies and Impact
AnyPPG stands at the intersection of self-supervised representation learning, contrastive alignment, generative modeling, and topological data analysis for biosignals. It advances beyond heuristic, device-specific analytics by establishing robust cross-device, cross-demographic generalization, state-of-the-art physiological and diagnostic performance, and a clear pathway toward universal, low-cost, continuous health profiling.
The foundation laid by AnyPPG informs the broader field’s shift toward foundation models for biosignal analytics, mirroring trends in computer vision and natural language processing. Its sensor-agnostic infrastructure and demonstrated efficacy in multi-organ health profiling have positioned it as a basis for next-generation PPG-based digital health, clinical decision support, and consumer-grade diagnostics (Nie et al., 3 Nov 2025, Kataria et al., 11 Oct 2025, Shao et al., 15 Sep 2025, Zargari et al., 2021).