Papers
Topics
Authors
Recent
2000 character limit reached

FacePhys: State of the Heart Learning

Published 6 Dec 2025 in cs.CV | (2512.06275v1)

Abstract: Vital sign measurement using cameras presents opportunities for comfortable, ubiquitous health monitoring. Remote photoplethysmography (rPPG), a foundational technology, enables cardiac measurement through minute changes in light reflected from the skin. However, practical deployment is limited by the computational constraints of performing analysis on front-end devices and the accuracy degradation of transmitting data through compressive channels that reduce signal quality. We propose a memory efficient rPPG algorithm - \emph{FacePhys} - built on temporal-spatial state space duality, which resolves the trilemma of model scalability, cross-dataset generalization, and real-time operation. Leveraging a transferable heart state, FacePhys captures subtle periodic variations across video frames while maintaining a minimal computational overhead, enabling training on extended video sequences and supporting low-latency inference. FacePhys establishes a new state-of-the-art, with a substantial 49\% reduction in error. Our solution enables real-time inference with a memory footprint of 3.6 MB and per-frame latency of 9.46 ms -- surpassing existing methods by 83\% to 99\%. These results translate into reliable real-time performance in practical deployments, and a live demo is available at https://www.facephys.com/.

Summary

  • The paper presents a novel framework using neural CDEs to model cardiac periodicity and improve remote heart rate estimation.
  • It introduces temporal-spatial state space duality with a complex state transition matrix to achieve efficient, low-latency inference.
  • Experimental results demonstrate a 49% error reduction and up to 83% latency improvement across diverse datasets.

FacePhys: Efficient State Space Modeling for Real-Time Remote Physiological Measurement

Introduction and Motivation

FacePhys proposes a novel framework for remote photoplethysmography (rPPG) — the non-contact estimation of physiological signals such as heart rate from video. The paper addresses major limitations of deep learning-based rPPG, including computational inefficiency and loss of long-range temporal dependencies, particularly in practical deployment scenarios where on-device, real-time, and private inference are paramount. FacePhys leverages a neural Controlled Differential Equation (CDE)-based state space model (SSM) and introduces temporal-spatial state space duality (TSD), enabling both efficient training on arbitrarily long videos and extremely low-latency, memory-efficient online inference.

Methodology

FacePhys frames the cardiac signal as a latent dynamical system, modeled via a neural CDE, which captures the physiological periodicity of the heart rather than exploiting generic temporal dependencies as in RNNs. Figure 1

Figure 1: The discretized state space form enables efficient computation, in contrast to the low-efficiency ideal continuous-time CDE formulation for heart state evolution.

The model is discretized using the Zero-Order Hold (ZOH) method, preserving equivalence between continuous and discrete state transitions, which is critical for practical video data. The key mathematical innovation is that the recurrent hidden state recursion under the ZOH yields linear complexity in time and enables long-sequence modeling via memory-efficient SSM attention duality. Figure 2

Figure 2: The FacePhys framework combines the SSM dual as a discretization solver for the heart state CDE and as a linear attention mechanism, with temporal normalization for stability and a complex state transition matrix facilitating periodic attention.

Temporal features are stabilized by a dedicated Temporal Normalization (TN) module, which eliminates trends via detrending (least squares in training, recursive moving average in inference) and standardization. This ensures numerical stability and enables constant complexity inference, even over extended video streams.

Spatial and temporal modeling are decoupled: spatial features are extracted using 2D convolutions, while temporal dependencies are modeled via SSM duality, allowing for training/inference over long sequences without quadratic memory or computation costs, a critical advantage over transformer-based approaches.

Periodicity via Complex State Transition Matrix

To encode physiological cardiac periodicity, FacePhys parameterizes the diagonal elements of the state transition matrix AA using trainable complex numbers. This yields hidden-state evolution with oscillatory components, aligning the inductive bias of the network directly with the periodic nature of cardiac signals. Figure 3

Figure 3: Introducing trainable complex numbers in the diagonal state transition matrix AA generates oscillatory terms in the solution, functionally corresponding to periodic attention.

Mathematically, the eigenvalues of the complex-diagonal AA induce both exponential decay and oscillatory sinusoids. This is dual to periodic attention in the SSM’s convolutional expansion, allowing long-range temporal dependencies to be modeled with the necessary periodic structure — critical for accurate physiological signal recovery.

Experimental Evidence and Results

Comprehensive experiments across five large-scale, diverse datasets (MMPD, PURE, UBFC, RLAP, VitalVideo) showcase FacePhys’s superior intra- and cross-dataset generalization, with substantial improvements in accuracy, inference latency, and memory efficiency relative to state-of-the-art baselines. The framework achieves a 49% reduction in heart rate estimation error and up to 83% reduction in per-frame latency versus prior leading methods, confirmed by extensive ablation and chunk-length studies. Figure 4

Figure 4: FacePhys achieves markedly better model accuracy and latency compared to existing approaches, validating the superiority of heart state space modeling.

FacePhys attains the lowest memory footprint (3.6 MB), supporting real-time streaming inference with per-frame latency of 9.46 ms, outperforming all compared methods, including transformer and advanced SSM architectures (e.g., Mamba, PhysFormer, RhythmMamba). The model supports training on full-length video sequences — a regime previously inaccessible due to memory explosion in other frameworks — and demonstrates stable performance across varied real-world (e.g., mobile, low-bandwidth) settings.

Ablation studies confirm the necessity of each architectural module: removing TN, SSM duality, or oscillator matrix A sharply degrades performance, underscoring the effectiveness of each design component.

Implications and Future Directions

Practically, FacePhys’s computational and memory efficiency opens viable deployment on resource-constrained edge and mobile devices, enabling instant, privacy-preserving, and real-time cardiac monitoring — a critical step for ubiquitous health sensing and telemedicine.

Theoretically, the work demonstrates the efficacy of embedding domain-specific priors by aligning the model’s inductive bias (periodicity, physiological dynamics) with latent state space structure, thereby improving generalization and robustness under distribution shift and cross-dataset transfer. The approach outperforms both CNNs (limited receptive field) and transformers/vanilla SSMs (high computational burden, lack of physiological periodicity).

The explicit periodic structure induced by complex diagonal SSMs establishes a powerful blueprint for modeling other quasi-periodic physiological time series.

Limitations remain with respect to utilization in clinical/hospital environments and in cardiovascular disease populations, where future clinical validation is required. The authors indicate future work in extending FacePhys to multimodal sensor fusion (e.g., thermal, IMU), blood oxygen and blood pressure estimation, and further optimization for higher frame-rate real-time operation.

Conclusion

FacePhys establishes a new standard for efficiency and accuracy in remote camera-based physiological measurement, uniting neural CDE modeling with structured SSM duality and domain-specific periodicity constraints. This results in robust long-range temporal modeling, extremely low latency, and suitability for on-device and streaming physiological monitoring. By introducing structured modeling aligned with the underlying biomedical phenomenon, FacePhys lays the groundwork for advanced, generalizable, and practical health sensing AI systems.

Reference: "FacePhys: State of the Heart Learning" (2512.06275)

Paper to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We're still in the process of identifying open problems mentioned in this paper. Please check back in a few minutes.

Collections

Sign up for free to add this paper to one or more collections.

Tweets

Sign up for free to view the 1 tweet with 2 likes about this paper.