Papers
Topics
Authors
Recent
Assistant
AI Research Assistant
Well-researched responses based on relevant abstracts and paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses.
Gemini 2.5 Flash
Gemini 2.5 Flash 186 tok/s
Gemini 2.5 Pro 48 tok/s Pro
GPT-5 Medium 35 tok/s Pro
GPT-5 High 40 tok/s Pro
GPT-4o 103 tok/s Pro
Kimi K2 207 tok/s Pro
GPT OSS 120B 451 tok/s Pro
Claude Sonnet 4.5 38 tok/s Pro
2000 character limit reached

RhythmMamba: Fast, Lightweight, and Accurate Remote Physiological Measurement (2404.06483v2)

Published 9 Apr 2024 in cs.CV

Abstract: Remote photoplethysmography (rPPG) is a method for non-contact measurement of physiological signals from facial videos, holding great potential in various applications such as healthcare, affective computing, and anti-spoofing. Existing deep learning methods struggle to address two core issues of rPPG simultaneously: understanding the periodic pattern of rPPG among long contexts and addressing large spatiotemporal redundancy in video segments. These represent a trade-off between computational complexity and the ability to capture long-range dependencies. In this paper, we introduce RhythmMamba, a state space model-based method that captures long-range dependencies while maintaining linear complexity. By viewing rPPG as a time series task through the proposed frame stem, the periodic variations in pulse waves are modeled as state transitions. Additionally, we design multi-temporal constraint and frequency domain feed-forward, both aligned with the characteristics of rPPG time series, to improve the learning capacity of Mamba for rPPG signals. Extensive experiments show that RhythmMamba achieves state-of-the-art performance with 319% throughput and 23% peak GPU memory. The codes are available at https://github.com/zizheng-guo/RhythmMamba.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (65)
  1. Vivit: A video vision transformer. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 6836–6846.
  2. Unsupervised skin tissue segmentation for remote photoplethysmography. Pattern Recognition Letters 124 (2019), 82–90.
  3. Video mamba suite: State space model as a versatile alternative for video understanding. arXiv preprint arXiv:2403.09626 (2024).
  4. Weixuan Chen and Daniel McDuff. 2018. Deepphys: Video-based physiological measurement using convolutional attention networks. In Proceedings of the European Conference on Computer Vision (ECCV). 349–365.
  5. Fusion-Vital: Video-RF Fusion Transformer for Advanced Remote Physiological Measurement. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 38. 1344–1352.
  6. Efficient remote photoplethysmography with temporal derivative modules and time-shift invariant loss. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops. 2182–2191.
  7. MS-TCT: multi-scale temporal convtransformer for action detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 20041–20051.
  8. Gerard De Haan and Vincent Jeanne. 2013. Robust pulse rate from chrominance-based rPPG. IEEE Transactions on Biomedical Engineering 60, 10 (2013), 2878–2886.
  9. Gerard De Haan and Arno Van Leest. 2014. Improved motion robustness of remote-PPG by using the blood volume pulse signature. Physiological measurement 35, 9 (2014), 1913.
  10. An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929 (2020).
  11. Dual-Bridging With Adversarial Noise Generation for Domain Adaptive rPPG Estimation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 10355–10364.
  12. Hungry Hungry Hippos: Towards Language Modeling with State Space Models. In The Eleventh International Conference on Learning Representations, ICLR 2023, Kigali, Rwanda, May 1-5, 2023. OpenReview.net. https://openreview.net/pdf?id=COZDy0WYGg
  13. Albert Gu and Tri Dao. 2023. Mamba: Linear-time sequence modeling with selective state spaces. arXiv preprint arXiv:2312.00752 (2023).
  14. Efficiently Modeling Long Sequences with Structured State Spaces. In The Tenth International Conference on Learning Representations, ICLR 2022, Virtual Event, April 25-29, 2022. OpenReview.net. https://openreview.net/forum?id=uYLFoz1vlAC
  15. RADIANT: Better rPPG estimation using signal embeddings and Transformer. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision. 4976–4986.
  16. Transrac: Encoding multi-scale temporal correlation with transformers for repetitive action counting. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 19013–19022.
  17. Challenges and prospects of visual contactless physiological monitoring in clinical study. NPJ Digital Medicine 6, 1 (2023), 231.
  18. Transppg: Two-stream transformer for remote heart rate estimate. arXiv preprint arXiv:2201.10873 (2022).
  19. MTT: Multi-scale temporal transformer for skeleton-based action recognition. IEEE Signal Processing Letters 29 (2022), 528–532.
  20. Meta-rppg: Remote heart rate estimation using a transductive meta-learner. In Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XXVII 16. Springer, 392–409.
  21. LSTC-rPPG: Long Short-Term Convolutional Network for Remote Photoplethysmography. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops. 6014–6022.
  22. Measuring pulse rate with a webcam—a non-contact method for evaluating cardiac activity. In 2011 Federated Conference on Computer Science and Information Systems (FedCSIS). IEEE, 405–410.
  23. Learning motion-robust remote photoplethysmography through arbitrary resolution videos. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 37. 1334–1342.
  24. Videomamba: State space model for efficient video understanding. arXiv preprint arXiv:2403.06977 (2024).
  25. Remote heart rate measurement from face videos under realistic situations. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 4264–4271.
  26. Spiking-PhysFormer: Camera-Based Remote Photoplethysmography with Parallel Spike-driven Transformer. arXiv:2402.04798 [cs.CV]
  27. Multi-task temporal shift attention networks for on-device contactless vitals measurement. Advances in Neural Information Processing Systems 33 (2020), 19400–19411.
  28. Efficientphys: Enabling simple, fast and accurate camera-based cardiac measurement. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision. 5008–5017.
  29. rPPG-Toolbox: Deep Remote PPG Toolbox. In Thirty-seventh Conference on Neural Information Processing Systems Datasets and Benchmarks Track. https://openreview.net/forum?id=q4XNX15kSe
  30. Vmamba: Visual state space model. arXiv preprint arXiv:2401.10166 (2024).
  31. Birla Lokendra and Gupta Puneet. 2022. AND-rPPG: A novel denoising-rPPG network for improving remote heart rate estimation. Computers in biology and medicine 141 (2022), 105146.
  32. Dual-gan: Joint bvp and noise modeling for remote physiological measurement. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 12404–12413.
  33. Neuron Structure Modeling for Generalizable Remote Physiological Measurement. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 18589–18599.
  34. Daniel McDuff. 2023. Camera measurement of physiological vital signs. Comput. Surveys 55, 9 (2023), 1–40.
  35. Long Range Language Modeling via Gated State Spaces. In The Eleventh International Conference on Learning Representations, ICLR 2023, Kigali, Rwanda, May 1-5, 2023. OpenReview.net. https://openreview.net/pdf?id=5MkYIYCbva
  36. Synrhythm: Learning a deep heart rate estimator from general to specific. In 2018 24th International Conference on Pattern Recognition (ICPR). IEEE, 3580–3585.
  37. Rhythmnet: End-to-end heart rate estimation from face via spatial-temporal representation. IEEE Transactions on Image Processing 29 (2019), 2409–2423.
  38. Video-based remote physiological measurement via cross-verified feature disentangling. In Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part II 16. Springer, 295–310.
  39. Badri N Patro and Vijay S Agneeswaran. 2024. SiMBA: Simplified Mamba-Based Architecture for Vision and Multivariate Time series. arXiv preprint arXiv:2403.15360 (2024).
  40. Local group invariance for heart rate estimation from face videos in the wild. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops. 1254–1262.
  41. Non-contact, automated cardiac pulse measurements using video imaging and blind source separation. Optics express 18, 10 (2010), 10762–10774.
  42. Instantaneous physiological estimation using video transformers. In Multimodal AI in healthcare: A paradigm shift in health intelligence. Springer, 307–319.
  43. TranPhys: Spatiotemporal Masked Transformer Steered Remote Photoplethysmography Estimation. IEEE Transactions on Circuits and Systems for Video Technology (2023).
  44. Simplified State Space Layers for Sequence Modeling. In The Eleventh International Conference on Learning Representations, ICLR 2023, Kigali, Rwanda, May 1-5, 2023. OpenReview.net. https://openreview.net/pdf?id=Ai8Hw3AXqks
  45. PulseGAN: Learning to generate realistic pulse waveforms in remote photoplethysmography. IEEE Journal of Biomedical and Health Informatics 25, 5 (2021), 1373–1384.
  46. Visual heart rate estimation with convolutional neural network. In Proceedings of the British Machine Vision Conference, Newcastle, UK. 3–6.
  47. Non-contact video-based pulse rate measurement on a mobile service robot. In The 23rd IEEE International Symposium on Robot and Human Interactive Communication. IEEE, 1056–1062.
  48. MMPD: Multi-Domain Mobile Video Physiology Dataset. In 45th Annual International Conference of the IEEE Engineering in Medicine and Biology Society.
  49. Siamese-rPPG network: Remote photoplethysmography signal estimation from face videos. In Proceedings of the 35th annual ACM symposium on applied computing. 2066–2073.
  50. Attention is all you need. Advances in neural information processing systems 30 (2017).
  51. Remote plethysmographic imaging using ambient light. Optics express 16, 26 (2008), 21434–21445.
  52. Algorithmic principles of remote PPG. IEEE Transactions on Biomedical Engineering 64, 7 (2016), 1479–1491.
  53. A novel algorithm for remote photoplethysmography: Spatial subspace rotation. IEEE Transactions on Biomedical Engineering 63, 9 (2015), 1974–1984.
  54. Vivim: a video vision mamba for medical video object segmentation. arXiv preprint arXiv:2401.14168 (2024).
  55. Frequency-domain MLPs are more effective learners in time series forecasting. Advances in Neural Information Processing Systems 36 (2024).
  56. Autohr: A strong end-to-end baseline for remote heart rate measurement with neural searching. IEEE Signal Processing Letters 27 (2020), 1245–1249.
  57. Transrppg: Remote photoplethysmography transformer for 3d mask face presentation attack detection. IEEE Signal Processing Letters 28 (2021), 1290–1294.
  58. Remote photoplethysmograph signal measurement from facial videos using spatio-temporal networks. In 30th British Machine Visison Conference: BMVC 2019. 9th-12th September 2019, Cardiff, UK. The British Machine Vision Conference (BMVC).
  59. Remote heart rate measurement from highly compressed facial videos: an end-to-end deep learning solution with video enhancement. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 151–160.
  60. Physformer++: Facial video-based physiological measurement with slowfast temporal difference transformer. International Journal of Computer Vision 131, 6 (2023), 1307–1330.
  61. Physformer: Facial video-based physiological measurement with temporal difference transformer. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 4186–4196.
  62. Demodulation based transformer for rppg generation and heart rate estimation. IEEE Signal Processing Letters (2023).
  63. Video-based physiological measurement using 3d central difference convolution attention network. In 2021 IEEE International Joint Conference on Biometrics (IJCB). IEEE, 1–6.
  64. Vision mamba: Efficient visual representation learning with bidirectional state space model. arXiv preprint arXiv:2401.09417 (2024).
  65. RhythmFormer: Extracting rPPG Signals Based on Hierarchical Temporal Periodic Transformer. arXiv preprint arXiv:2402.12788 (2024).
Citations (11)

Summary

We haven't generated a summary for this paper yet.

Dice Question Streamline Icon: https://streamlinehq.com

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Lightbulb Streamline Icon: https://streamlinehq.com

Continue Learning

We haven't generated follow-up questions for this paper yet.

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.

X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets

This paper has been mentioned in 1 tweet and received 0 likes.

Upgrade to Pro to view all of the tweets about this paper: