Papers
Topics
Authors
Recent
2000 character limit reached

Phase-shifted remote photoplethysmography for estimating heart rate and blood pressure from facial video

Published 9 Jan 2024 in cs.CV | (2401.04560v4)

Abstract: Human health can be critically affected by cardiovascular diseases, such as hypertension, arrhythmias, and stroke. Heart rate and blood pressure are important biometric information for the monitoring of cardiovascular system and early diagnosis of cardiovascular diseases. Existing methods for estimating the heart rate are based on electrocardiography and photoplethyomography, which require contacting the sensor to the skin surface. Moreover, catheter and cuff-based methods for measuring blood pressure cause inconvenience and have limited applicability. Therefore, in this thesis, we propose a vision-based method for estimating the heart rate and blood pressure. This thesis proposes a 2-stage deep learning framework consisting of a dual remote photoplethysmography network (DRP-Net) and bounded blood pressure network (BBP-Net). In the first stage, DRP-Net infers remote photoplethysmography (rPPG) signals for the acral and facial regions, and these phase-shifted rPPG signals are utilized to estimate the heart rate. In the second stage, BBP-Net integrates temporal features and analyzes phase discrepancy between the acral and facial rPPG signals to estimate SBP and DBP values. To improve the accuracy of estimating the heart rate, we employed a data augmentation method based on a frame interpolation model. Moreover, we designed BBP-Net to infer blood pressure within a predefined range by incorporating a scaled sigmoid function. Our method resulted in estimating the heart rate with the mean absolute error (MAE) of 1.78 BPM, reducing the MAE by 34.31 % compared to the recent method, on the MMSE-HR dataset. The MAE for estimating the systolic blood pressure (SBP) and diastolic blood pressure (DBP) were 10.19 mmHg and 7.09 mmHg. On the V4V dataset, the MAE for the heart rate, SBP, and DBP were 3.83 BPM, 13.64 mmHg, and 9.4 mmHg, respectively.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (50)
  1. Early identification of pcos with commonly known diseases: obesity, diabetes, high blood pressure and heart disease using machine learning techniques. Expert Systems with Applications, 217, 119532.
  2. Video-based real-time monitoring for heart rate and respiration rate. Expert Systems with Applications, 225, 120135.
  3. Noninvasive continuous blood pressure estimation from pulse transit time: A review of the calibration models. IEEE Reviews in Biomedical Engineering, 15, 138–151.
  4. Unsupervised skin tissue segmentation for remote photoplethysmography. Pattern Recognition Letters, 124, 82–90.
  5. Estimation of blood pressure waveform from facial video using a deep u-shaped network and the wavelet representation of imaging photoplethysmographic signals. Biomedical Signal Processing and Control, 78, 103895.
  6. Deepphys: Video-based physiological measurement using convolutional attention networks. In Proceedings of the european conference on computer vision (ECCV) (pp. 349–365).
  7. Remote blood pressure estimation via the spatiotemporal mapping of facial videos. Sensors, 23, 2963.
  8. Collins, J. R. (1976). Robust estimation of a location parameter in the presence of asymmetry. The Annals of Statistics, (pp. 68–85).
  9. Robust pulse rate from chrominance-based rppg. IEEE Transactions on Biomedical Engineering, 60, 2878–2886.
  10. A systematic review of healthcare recommender systems: Open issues, challenges, and techniques. Expert Systems with Applications, 213, 118823.
  11. High blood pressure and cardiovascular disease. Hypertension, 75, 285–292.
  12. Global cardiovascular diseases death rate prediction. Current Problems in Cardiology, (p. 101622).
  13. Wearable cuff-less blood pressure estimation at home via pulse transit time. IEEE journal of biomedical and health informatics, 25, 1926–1937.
  14. Pulse transit time as an indicator of arterial blood pressure. psychophysiology, 18, 71–74.
  15. A support system for automatic classification of hypertension using bcg signals. Expert Systems with Applications, 214, 119058.
  16. Neural network model combination for video-based blood pressure estimation: New approach and evaluation. Sensors, 23, 1753.
  17. Ppg-based blood pressure estimation can benefit from scalable multi-scale fusion neural networks and multi-task learning. Biomedical Signal Processing and Control, 78, 103891.
  18. Mlp-bp: A novel framework for cuffless blood pressure measurement with ppg and ecg signals based on mlp-mixer neural networks. Biomedical Signal Processing and Control, 73, 103404.
  19. Heart rate estimation network from facial videos using spatiotemporal feature image. Computers in Biology and Medicine, 151, 106307.
  20. Cebpm: A cloud-edge collaborative noncontact blood pressure estimation model. IEEE Transactions on Instrumentation and Measurement, 71, 1–12.
  21. A study of projection-based attentive spatial–temporal map for remote photoplethysmography measurement. Bioengineering, 9, 638.
  22. Measuring pulse rate with a webcam—a non-contact method for evaluating cardiac activity. In 2011 federated conference on computer science and information systems (FedCSIS) (pp. 405–410). IEEE.
  23. The obf database: A large face video database for remote physiological signal measurement and atrial fibrillation detection. In 2018 13th IEEE international conference on automatic face & gesture recognition (FG 2018) (pp. 242–249). IEEE.
  24. Efficientphys: Enabling simple, fast and accurate camera-based vitals measurement. arXiv preprint arXiv:2110.04447, .
  25. Dual-gan: Joint bvp and noise modeling for remote physiological measurement. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 12404–12413).
  26. Using high-fidelity avatars to advance camera-based cardiac pulse measurement. IEEE Transactions on Biomedical Engineering, 69, 2646–2656.
  27. Continuous blood pressure measurement from one-channel electrocardiogram signal using deep-learning techniques. Artificial Intelligence in Medicine, 108, 101919.
  28. Vipl-hr: A multi-modal database for pulse estimation from less-constrained face video. In Computer Vision–ACCV 2018: 14th Asian Conference on Computer Vision, Perth, Australia, December 2–6, 2018, Revised Selected Papers, Part V 14 (pp. 562–576). Springer.
  29. The benefit of distraction: Denoising camera-based physiological measurements using inverse attention. In Proceedings of the IEEE/CVF international conference on computer vision (pp. 4955–4964).
  30. The british hypertension society protocol for the evaluation of automated and semi-automated blood pressure measuring devices with special reference to ambulatory systems. Journal of hypertension, 8, 607–619.
  31. X-ippgnet: A novel one stage deep learning architecture based on depthwise separable convolutions for video-based pulse rate estimation. Computers in Biology and Medicine, 154, 106592.
  32. Pp-net: A deep learning framework for ppg-based blood pressure and heart rate estimation. IEEE Sensors Journal, 20, 10000–10011.
  33. Bam: Bottleneck attention module. arXiv preprint arXiv:1807.06514, .
  34. Heart rate as a risk factor for cardiovascular disease. Progress in cardiovascular diseases, 52, 6–10.
  35. Advancements in noncontact, multiparameter physiological measurements using a webcam. IEEE transactions on biomedical engineering, 58, 7–11.
  36. Film: Frame interpolation for large motion. In European Conference on Computer Vision (pp. 250–266). Springer.
  37. The first vision for vitals (v4v) challenge for non-contact video-based physiological estimation. In Proceedings of the IEEE/CVF International Conference on Computer Vision (pp. 2760–2767).
  38. A blood pressure prediction method based on imaging photoplethysmography in combination with machine learning. Biomedical Signal Processing and Control, 64, 102328.
  39. Assessment of non-invasive blood pressure prediction from ppg and rppg signals using deep learning. Sensors, 21, 6022.
  40. A multimodal database for affect recognition and implicit tagging. IEEE transactions on affective computing, 3, 42–55.
  41. A universal standard for the validation of blood pressure measuring devices: Association for the advancement of medical instrumentation/european society of hypertension/international organization for standardization (aami/esh/iso) collaboration statement. Hypertension, 71, 368–374.
  42. Non-contact video-based pulse rate measurement on a mobile service robot. In The 23rd IEEE International Symposium on Robot and Human Interactive Communication (pp. 1056–1062). IEEE.
  43. Algorithmic principles of remote ppg. IEEE Transactions on Biomedical Engineering, 64, 1479–1491.
  44. A facial-image-based blood pressure measurement system without calibration. IEEE Transactions on Instrumentation and Measurement, 71, 1–13.
  45. Autohr: A strong end-to-end baseline for remote heart rate measurement with neural searching. IEEE Signal Processing Letters, 27, 1245–1249.
  46. Remote photoplethysmograph signal measurement from facial videos using spatio-temporal networks. arXiv preprint arXiv:1905.02419, .
  47. Physformer++: Facial video-based physiological measurement with slowfast temporal difference transformer. International Journal of Computer Vision, 131, 1307–1330.
  48. Physformer: Facial video-based physiological measurement with temporal difference transformer. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 4186–4196).
  49. Joint face detection and alignment using multitask cascaded convolutional networks. IEEE signal processing letters, 23, 1499–1503.
  50. Multimodal spontaneous emotion corpus for human behavior analysis. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 3438–3446).
Citations (1)

Summary

  • The paper presents a dual-site, phase-shifted rPPG framework that significantly enhances the accuracy of heart rate and blood pressure estimation from facial videos.
  • It employs a two-stage deep learning pipeline with DRP-Net and BBP-Net, using spatial-temporal attention and signal bounding for robust feature extraction.
  • Empirical evaluations on MMSE-HR and V4V datasets show marked improvements in MAE and RMSE, underscoring its potential for unobtrusive clinical and telehealth applications.

Phase-shifted Remote Photoplethysmography for Estimating Heart Rate and Blood Pressure from Facial Video

Introduction and Motivation

"Phase-shifted remote photoplethysmography for estimating heart rate and blood pressure from facial video" (2401.04560) addresses the challenge of contactless, simultaneous heart rate (HR) and blood pressure (BP) estimation, leveraging facial video. Traditional ECG, PPG, and oscillometric cuff-based methods—while clinically dominant—require physical contact, which limits their deployment in continuous and ambulatory settings. The work proposes a framework that directly models temporal and phase shifts inherent in remote PPG (rPPG) signals extracted from facial regions versus classic acral (e.g., finger) PPG, introducing a dual-site, phase-aware approach to non-contact physiological monitoring.

Methodology

Pipeline Overview

The proposed method is a two-stage deep learning pipeline comprising:

  1. DRP-Net (Dual Remote Photoplethysmography Network): Extracts phase-shifted rPPG signals from facial video sequences at both acral-like (projected) and facial anatomical locations.
  2. BBP-Net (Bounded Blood Pressure Network): Consumes outputs from DRP-Net, leveraging both the rPPG signals and their phase discrepancy to predict SBP/DBP, using strong temporal modeling and explicit bounding of outputs via a scaled sigmoid constraint. Figure 1

    Figure 1: An end-to-end pipeline for extracting physiological parameters, with blue and red denoting ground-truth and predicted signals.

Data Augmentation and Preprocessing

To mitigate class imbalance and enrich the training distribution, the framework adopts temporal upsampling and downsampling via frame interpolation (FILM-Net), generating augmented bradycardia and tachycardia cases from limited video data. ROIs are extracted using MTCNN, normalized, and processed into pseudo-PPG reference signals from synchronous ABP recordings, following detrending and strict bandpass filtering. Figure 2

Figure 2: Strategy for increasing bradycardia and tachycardia diversity through frame interpolation and sampling.

DRP-Net Architecture

DRP-Net is a 3D CNN incorporating atrous convolutions for expansive spatiotemporal receptive fields, feeding into Siamese-structured heads that independently predict "facial" and "acral" rPPG signals. Twin spatial and temporal attention mechanisms bias the feature aggregation toward skin regions with strong physiological signals and temporally consistent (artifact-minimized) periods, respectively. Figure 3

Figure 3: DRP-Net structure with parallel spatial/temporal attention and dual prediction heads producing phase-aligned signals.

Losses for optimizing DRP-Net incorporate both frequency-domain (PSD-based) and time-domain objectives, including a peak-valley matching term (LpvL_{pv}) to enforce physiological plausibility.

BBP-Net Architecture

BBP-Net employs multi-scale feature fusion (MSF) blocks with depthwise separable convolutions, Hardswish activations, and BAM for attention. Both raw and derivative (VPG, APG) physiological signals (from both predicted sites) are used as input. The network constrains outputs with a temperature-controlled scaled sigmoid, ensuring physiologically realistic ranges for SBP/DBP. Training objectives utilize Huber loss and explicit time-domain alignment via reconstructed ABP signals. Figure 4

Figure 4: BBP-Net architecture: stacks of MSF blocks extract multi-temporal dependencies from phase-shifted multi-modal signals.

Experimental Results

Evaluations utilize the MMSE-HR and V4V public datasets, which provide paired facial video and ABP data. Heart rate estimation is measured via MAE, RMSE, and rr, while BP estimation is assessed with MAE and RMSE for both SBP/DBP.

Heart Rate Estimation:

DRP-Net achieves substantial improvement over prior methods, e.g., on MMSE-HR, facial-rPPG yields MAE=1.78\text{MAE}=1.78 BPM, RMSE=4.27\text{RMSE}=4.27 BPM, r=0.95r=0.95, surpassing the best baseline (PhysFormer++) [2.71, 5.15, 0.93]. Similar trends are observed on V4V. Pearson correlation plots (Figure 5) show tight coupling between predicted and ground-truth HR. Figure 5

Figure 5: High linearity in Pearson correlation between estimated and actual heart rates on MMSE-HR.

Signal waveform analysis demonstrates phase shifts and high morphological fidelity to reference PPG signals. Figure 6

Figure 6: Overlaid predicted facial, acral, and true pseudo-PPG, highlighting phase and amplitude matching.

Blood Pressure Estimation:

BBP-Net yields MAE/RMSE of 10.19/13.01 mmHg (SBP) and 7.09/8.86 mmHg (DBP) on MMSE-HR—significant reductions over prior camera-based and hybrid methods. Bland-Altman plots (Figure 7) indicate tight error bounds and minimal bias. Figure 7

Figure 7: Bland-Altman analysis of SBP (left) and DBP (right) reveals high agreement between predictions and references.

On V4V, BBP-Net consistently outperforms AlexNet, ResNet, LSTM, and recent neural combination models, even when all baselines utilize strong learning architectures. Notably, using both facial and acral rPPG (phase-shifted pair) as BBP-Net input dominates single-location approaches, with DBP error improvements exceeding 2 mmHg.

Ablation Studies

Multiple ablations confirm:

  • Data Augmentation: Frame-interpolation-based HR variants directly benefit model generalization, lowering MAE by over 1 BPM.
  • Loss Functions: Including LpvL_{pv} improves HR estimation by about 2 BPM.
  • Window Length: Longer input sequences enhance HR accuracy, though with increased inference latency.
  • Phase-shifted Input: Dual-site input to BBP-Net is strictly superior for BP estimation versus either facial or acral alone (DBP MAE: 7.09 mmHg vs 8.67/9.50 mmHg).
  • Output Bounding: Scaled sigmoid in BBP-Net is critical—the removal yields SBP MAE degradation of ~9 mmHg.

Implications and Future Directions

This dual-site, phase-aware method offers a paradigm shift for contactless vital sign monitoring. Practically, the approach is conducive to deployment in real-world clinical and telehealth applications, particularly for populations where continuous, unobtrusive monitoring is essential. Theoretically, the explicit modeling of phase shifts in pulse wave propagation expands the utility of facial video as a physiological surrogate, supporting more robust biomarker extraction and multi-task architectures.

Challenges remain, including broader clinical validation, real-world deployment with diverse demographic/lighting confounds, and dataset limitations (e.g., absence of facial-site PPG for hard phase verification). The framework can serve as a basis for extending non-contact monitoring to further cardiovascular indices, arrhythmia detection, and even high-fidelity waveform reconstruction.

Conclusion

The work systematically introduces a phase-shifted, dual-site rPPG modeling paradigm with a novel deep learning pipeline, demonstrating strong advancements in camera-based HR and BP estimation. Empirical evidence shows state-of-the-art performance across established datasets, robust ablation support, and methodical compliance with international BP validation standards. Future directions include dataset expansion for facial PPG and adaptation to varied acquisition conditions, marking critical steps toward practical, ubiquitous remote vital sign monitoring systems.

Paper to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Authors (2)

Collections

Sign up for free to add this paper to one or more collections.