RhythmMamba: Fast, Lightweight, and Accurate Remote Physiological Measurement (2404.06483v2)
Abstract: Remote photoplethysmography (rPPG) is a method for non-contact measurement of physiological signals from facial videos, holding great potential in various applications such as healthcare, affective computing, and anti-spoofing. Existing deep learning methods struggle to address two core issues of rPPG simultaneously: understanding the periodic pattern of rPPG among long contexts and addressing large spatiotemporal redundancy in video segments. These represent a trade-off between computational complexity and the ability to capture long-range dependencies. In this paper, we introduce RhythmMamba, a state space model-based method that captures long-range dependencies while maintaining linear complexity. By viewing rPPG as a time series task through the proposed frame stem, the periodic variations in pulse waves are modeled as state transitions. Additionally, we design multi-temporal constraint and frequency domain feed-forward, both aligned with the characteristics of rPPG time series, to improve the learning capacity of Mamba for rPPG signals. Extensive experiments show that RhythmMamba achieves state-of-the-art performance with 319% throughput and 23% peak GPU memory. The codes are available at https://github.com/zizheng-guo/RhythmMamba.
- Vivit: A video vision transformer. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 6836–6846.
- Unsupervised skin tissue segmentation for remote photoplethysmography. Pattern Recognition Letters 124 (2019), 82–90.
- Video mamba suite: State space model as a versatile alternative for video understanding. arXiv preprint arXiv:2403.09626 (2024).
- Weixuan Chen and Daniel McDuff. 2018. Deepphys: Video-based physiological measurement using convolutional attention networks. In Proceedings of the European Conference on Computer Vision (ECCV). 349–365.
- Fusion-Vital: Video-RF Fusion Transformer for Advanced Remote Physiological Measurement. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 38. 1344–1352.
- Efficient remote photoplethysmography with temporal derivative modules and time-shift invariant loss. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops. 2182–2191.
- MS-TCT: multi-scale temporal convtransformer for action detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 20041–20051.
- Gerard De Haan and Vincent Jeanne. 2013. Robust pulse rate from chrominance-based rPPG. IEEE Transactions on Biomedical Engineering 60, 10 (2013), 2878–2886.
- Gerard De Haan and Arno Van Leest. 2014. Improved motion robustness of remote-PPG by using the blood volume pulse signature. Physiological measurement 35, 9 (2014), 1913.
- An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929 (2020).
- Dual-Bridging With Adversarial Noise Generation for Domain Adaptive rPPG Estimation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 10355–10364.
- Hungry Hungry Hippos: Towards Language Modeling with State Space Models. In The Eleventh International Conference on Learning Representations, ICLR 2023, Kigali, Rwanda, May 1-5, 2023. OpenReview.net. https://openreview.net/pdf?id=COZDy0WYGg
- Albert Gu and Tri Dao. 2023. Mamba: Linear-time sequence modeling with selective state spaces. arXiv preprint arXiv:2312.00752 (2023).
- Efficiently Modeling Long Sequences with Structured State Spaces. In The Tenth International Conference on Learning Representations, ICLR 2022, Virtual Event, April 25-29, 2022. OpenReview.net. https://openreview.net/forum?id=uYLFoz1vlAC
- RADIANT: Better rPPG estimation using signal embeddings and Transformer. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision. 4976–4986.
- Transrac: Encoding multi-scale temporal correlation with transformers for repetitive action counting. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 19013–19022.
- Challenges and prospects of visual contactless physiological monitoring in clinical study. NPJ Digital Medicine 6, 1 (2023), 231.
- Transppg: Two-stream transformer for remote heart rate estimate. arXiv preprint arXiv:2201.10873 (2022).
- MTT: Multi-scale temporal transformer for skeleton-based action recognition. IEEE Signal Processing Letters 29 (2022), 528–532.
- Meta-rppg: Remote heart rate estimation using a transductive meta-learner. In Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XXVII 16. Springer, 392–409.
- LSTC-rPPG: Long Short-Term Convolutional Network for Remote Photoplethysmography. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops. 6014–6022.
- Measuring pulse rate with a webcam—a non-contact method for evaluating cardiac activity. In 2011 Federated Conference on Computer Science and Information Systems (FedCSIS). IEEE, 405–410.
- Learning motion-robust remote photoplethysmography through arbitrary resolution videos. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 37. 1334–1342.
- Videomamba: State space model for efficient video understanding. arXiv preprint arXiv:2403.06977 (2024).
- Remote heart rate measurement from face videos under realistic situations. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 4264–4271.
- Spiking-PhysFormer: Camera-Based Remote Photoplethysmography with Parallel Spike-driven Transformer. arXiv:2402.04798 [cs.CV]
- Multi-task temporal shift attention networks for on-device contactless vitals measurement. Advances in Neural Information Processing Systems 33 (2020), 19400–19411.
- Efficientphys: Enabling simple, fast and accurate camera-based cardiac measurement. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision. 5008–5017.
- rPPG-Toolbox: Deep Remote PPG Toolbox. In Thirty-seventh Conference on Neural Information Processing Systems Datasets and Benchmarks Track. https://openreview.net/forum?id=q4XNX15kSe
- Vmamba: Visual state space model. arXiv preprint arXiv:2401.10166 (2024).
- Birla Lokendra and Gupta Puneet. 2022. AND-rPPG: A novel denoising-rPPG network for improving remote heart rate estimation. Computers in biology and medicine 141 (2022), 105146.
- Dual-gan: Joint bvp and noise modeling for remote physiological measurement. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 12404–12413.
- Neuron Structure Modeling for Generalizable Remote Physiological Measurement. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 18589–18599.
- Daniel McDuff. 2023. Camera measurement of physiological vital signs. Comput. Surveys 55, 9 (2023), 1–40.
- Long Range Language Modeling via Gated State Spaces. In The Eleventh International Conference on Learning Representations, ICLR 2023, Kigali, Rwanda, May 1-5, 2023. OpenReview.net. https://openreview.net/pdf?id=5MkYIYCbva
- Synrhythm: Learning a deep heart rate estimator from general to specific. In 2018 24th International Conference on Pattern Recognition (ICPR). IEEE, 3580–3585.
- Rhythmnet: End-to-end heart rate estimation from face via spatial-temporal representation. IEEE Transactions on Image Processing 29 (2019), 2409–2423.
- Video-based remote physiological measurement via cross-verified feature disentangling. In Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part II 16. Springer, 295–310.
- Badri N Patro and Vijay S Agneeswaran. 2024. SiMBA: Simplified Mamba-Based Architecture for Vision and Multivariate Time series. arXiv preprint arXiv:2403.15360 (2024).
- Local group invariance for heart rate estimation from face videos in the wild. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops. 1254–1262.
- Non-contact, automated cardiac pulse measurements using video imaging and blind source separation. Optics express 18, 10 (2010), 10762–10774.
- Instantaneous physiological estimation using video transformers. In Multimodal AI in healthcare: A paradigm shift in health intelligence. Springer, 307–319.
- TranPhys: Spatiotemporal Masked Transformer Steered Remote Photoplethysmography Estimation. IEEE Transactions on Circuits and Systems for Video Technology (2023).
- Simplified State Space Layers for Sequence Modeling. In The Eleventh International Conference on Learning Representations, ICLR 2023, Kigali, Rwanda, May 1-5, 2023. OpenReview.net. https://openreview.net/pdf?id=Ai8Hw3AXqks
- PulseGAN: Learning to generate realistic pulse waveforms in remote photoplethysmography. IEEE Journal of Biomedical and Health Informatics 25, 5 (2021), 1373–1384.
- Visual heart rate estimation with convolutional neural network. In Proceedings of the British Machine Vision Conference, Newcastle, UK. 3–6.
- Non-contact video-based pulse rate measurement on a mobile service robot. In The 23rd IEEE International Symposium on Robot and Human Interactive Communication. IEEE, 1056–1062.
- MMPD: Multi-Domain Mobile Video Physiology Dataset. In 45th Annual International Conference of the IEEE Engineering in Medicine and Biology Society.
- Siamese-rPPG network: Remote photoplethysmography signal estimation from face videos. In Proceedings of the 35th annual ACM symposium on applied computing. 2066–2073.
- Attention is all you need. Advances in neural information processing systems 30 (2017).
- Remote plethysmographic imaging using ambient light. Optics express 16, 26 (2008), 21434–21445.
- Algorithmic principles of remote PPG. IEEE Transactions on Biomedical Engineering 64, 7 (2016), 1479–1491.
- A novel algorithm for remote photoplethysmography: Spatial subspace rotation. IEEE Transactions on Biomedical Engineering 63, 9 (2015), 1974–1984.
- Vivim: a video vision mamba for medical video object segmentation. arXiv preprint arXiv:2401.14168 (2024).
- Frequency-domain MLPs are more effective learners in time series forecasting. Advances in Neural Information Processing Systems 36 (2024).
- Autohr: A strong end-to-end baseline for remote heart rate measurement with neural searching. IEEE Signal Processing Letters 27 (2020), 1245–1249.
- Transrppg: Remote photoplethysmography transformer for 3d mask face presentation attack detection. IEEE Signal Processing Letters 28 (2021), 1290–1294.
- Remote photoplethysmograph signal measurement from facial videos using spatio-temporal networks. In 30th British Machine Visison Conference: BMVC 2019. 9th-12th September 2019, Cardiff, UK. The British Machine Vision Conference (BMVC).
- Remote heart rate measurement from highly compressed facial videos: an end-to-end deep learning solution with video enhancement. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 151–160.
- Physformer++: Facial video-based physiological measurement with slowfast temporal difference transformer. International Journal of Computer Vision 131, 6 (2023), 1307–1330.
- Physformer: Facial video-based physiological measurement with temporal difference transformer. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 4186–4196.
- Demodulation based transformer for rppg generation and heart rate estimation. IEEE Signal Processing Letters (2023).
- Video-based physiological measurement using 3d central difference convolution attention network. In 2021 IEEE International Joint Conference on Biometrics (IJCB). IEEE, 1–6.
- Vision mamba: Efficient visual representation learning with bidirectional state space model. arXiv preprint arXiv:2401.09417 (2024).
- RhythmFormer: Extracting rPPG Signals Based on Hierarchical Temporal Periodic Transformer. arXiv preprint arXiv:2402.12788 (2024).
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.