Remote Photoplethysmograph Signal Measurement from Facial Videos Using Spatio-Temporal Networks (1905.02419v2)

Published 7 May 2019 in cs.CV

Abstract: Recent studies demonstrated that the average heart rate (HR) can be measured from facial videos based on non-contact remote photoplethysmography (rPPG). However for many medical applications (e.g., atrial fibrillation (AF) detection) knowing only the average HR is not sufficient, and measuring precise rPPG signals from face for heart rate variability (HRV) analysis is needed. Here we propose an rPPG measurement method, which is the first work to use deep spatio-temporal networks for reconstructing precise rPPG signals from raw facial videos. With the constraint of trend-consistency with ground truth pulse curves, our method is able to recover rPPG signals with accurate pulse peaks. Comprehensive experiments are conducted on two benchmark datasets, and results demonstrate that our method can achieve superior performance on both HR and HRV levels comparing to the state-of-the-art methods. We also achieve promising results of using reconstructed rPPG signals for AF detection and emotion recognition.

Citations (244)

View on Semantic Scholar

Summary

The paper introduces PhysNet, an end-to-end spatio-temporal network that precisely measures rPPG signals from facial videos.
It leverages a combined 3DCNN and RNN architecture with a novel negative Pearson correlation loss to outperform existing HR and HRV estimation methods.
The empirical evaluation on benchmark datasets demonstrates its utility for AF detection, emotion recognition, and remote health monitoring.

Overview of "Remote Photoplethysmograph Signal Measurement from Facial Videos Using Spatio-Temporal Networks"

The paper, "Remote Photoplethysmograph Signal Measurement from Facial Videos Using Spatio-Temporal Networks," presents a novel approach for measuring precise remote photoplethysmography (rPPG) signals from facial videos. This is achieved through the application of deep spatio-temporal networks, a first in this field. The work addresses the limitations of prior methods that predominantly relied on estimating average heart rates (HR) from face videos, which do not suffice for medical applications requiring heart rate variability (HRV) analysis, such as atrial fibrillation (AF) detection.

Contributions and Methodology

The paper introduces an end-to-end spatio-temporal network, named PhysNet, which captures and reconstructs rPPG signals from raw facial videos with high precision. This approach considers the temporal dynamics of face images, enhancing the accuracy of pulse peak detection. Unlike prior works, PhysNet does not rely on pre-defined facial regions but dynamically adapts to efficacious regions through learning.

Key aspects of the methodology include:

Network Architecture: PhysNet employs both 3D convolutional neural networks (3DCNN) and recurrent neural networks (RNN) to model the spatio-temporal aspects of the video data. The network processes facial videos as a temporal sequence and maps it to the target rPPG signal space.
Loss Function: A novel loss function based on negative Pearson correlation ensures trend consistency with ground truth signals, optimizing for accurate pulse peak recovery.
Comparative Evaluation: The paper evaluates different spatio-temporal architectures and losses, establishing that 3DCNN with an encoder-decoder approach yields superior results in retrieving HR and HRV metrics.

Empirical Validation

The research employs two benchmark datasets—OBF and MAHNOB-HCI—for rigorous testing:

HR and HRV Measurement: PhysNet demonstrates excellent performance in recovering both HR and HRV features, surpassing state-of-the-art methods. This includes traditional methods such as CHROM and POS, as well as more recent data-driven techniques.
Application in AF Detection and Emotion Recognition: The reconstructed rPPG signals enable effective AF detection and show promising results for emotion recognition, demonstrating the versatility and clinical relevance of PhysNet.

Implications and Future Directions

This work significantly advances the rPPG measurement field by facilitating non-contact, accurate HRV analysis using facial video data. The implications for remote health monitoring are substantial, providing a means to assess cardiovascular and emotional states without physical proximity or sensor contact. This approach could be integrated into telemedicine platforms, enhancing diagnostic capabilities in settings lacking traditional monitoring equipment.

Future research may focus on improving generalization capabilities across varied environmental conditions and demographic populations. Moreover, integrating multi-modal data, such as audio or thermal, could further enhance the robustness of remote physiological monitoring systems. Additionally, with the growing interest in personalized health analytics, tailoring PhysNet’s framework to adapt to individual physiological baselines could be another avenue for exploration.

In conclusion, this paper lays a foundation for advanced applications in remote health diagnostics and paves the way for leveraging spatio-temporal modeling in similar domains.

PDF Markdown

Related Papers

YouTube

Show All Videos