Multi-Task Temporal Shift Attention Networks for On-Device Contactless Vitals Measurement (2006.03790v2)

Published 6 Jun 2020 in eess.SP, cs.CV, and eess.IV

Abstract: Telehealth and remote health monitoring have become increasingly important during the SARS-CoV-2 pandemic and it is widely expected that this will have a lasting impact on healthcare practices. These tools can help reduce the risk of exposing patients and medical staff to infection, make healthcare services more accessible, and allow providers to see more patients. However, objective measurement of vital signs is challenging without direct contact with a patient. We present a video-based and on-device optical cardiopulmonary vital sign measurement approach. It leverages a novel multi-task temporal shift convolutional attention network (MTTS-CAN) and enables real-time cardiovascular and respiratory measurements on mobile platforms. We evaluate our system on an Advanced RISC Machine (ARM) CPU and achieve state-of-the-art accuracy while running at over 150 frames per second which enables real-time applications. Systematic experimentation on large benchmark datasets reveals that our approach leads to substantial (20%-50%) reductions in error and generalizes well across datasets.

Authors (4)

Xin Liu (820 papers)
Josh Fromm (7 papers)
Shwetak Patel (58 papers)
Daniel McDuff (88 papers)

Citations (224)

View on Semantic Scholar

Summary

An Analysis of Multi-Task Temporal Shift Attention Networks for On-Device Contactless Vitals Measurement

In recent advancements toward improving telehealth solutions, the paper "Multi-Task Temporal Shift Attention Networks for On-Device Contactless Vitals Measurement" introduces a novel approach for optical cardiopulmonary vital sign measurement using smartphones. The authors describe an advanced multi-task temporal shift convolutional attention network (MTTS-CAN), designed explicitly for on-device execution, allowing real-time monitoring via video input. This development is timely, given the increased reliance on remote health monitoring induced by the COVID-19 pandemic, where seamless and efficient telehealth solutions have become essential.

Technical Contributions

The MTTS-CAN is an innovative blend of several machine learning techniques tailored for contactless physiological measurement. The model achieves efficient temporal modeling and noise reduction via temporal shift modules (TSM), which facilitate information exchange across frames without a significant computational burden. This method strategically enhances spatial-temporal modeling by incorporating an attention mechanism to improve signal separation.

A critical feature of MTTS-CAN is its multi-task learning capability, which enables simultaneous estimation of both cardiovascular and respiratory signals. The paper highlights the utility of sharing intermediate representations—a clever optimization to reduce computational demand while maintaining high accuracy. Moreover, MTTS-CAN showcases excellent performance on low-end mobile platforms, a significant step toward accessible telehealth solutions.

Evaluation and Results

The paper's robustness is further demonstrated by its comprehensive evaluation. Tested on ARM CPUs, the model achieves inference speeds exceeding 150 frames per second, without compromising accuracy. The performance is benchmarked against existing datasets such as AFRL and MMSE-HR, where MTTS-CAN exhibits substantial improvements in mean absolute error (MAE) and root mean square error (RMSE) metrics, showing a 20\%-50\% reduction in errors over previous methods. This is an indicator of both the system’s precision and portability.

Implications and Future Developments

MTTS-CAN holds significant implications for telehealth and broader applications in passive health monitoring, particularly concerning privacy and accessibility. The capability to function on-device mitigates privacy concerns inherent to cloud-based solutions, as it avoids data transmission that could lead to potential breaches.

The implications of this work extend beyond telehealth. Robust, high-frequency vital measurements could enhance monitoring in fields requiring high frame-rate data capture. Future iterations could see these models integrated with non-invasive diagnostics, potentially aiding in the recognition of conditions like atrial fibrillation or even monitoring for signs of COVID-19 via respiratory patterns.

Conclusion

This work presents a technically proficient and well-evaluated model for contactless vital sign monitoring, underscoring the practical potential of deploying complex machine learning models on constrained devices. MTTS-CAN reflects a broader movement toward decentralizing healthcare solutions, making quality health monitoring accessible in a wide range of settings, including those with limited resources. As telehealth solutions continue to evolve, approaches like MTTS-CAN will play a vital role in shaping future standards of care delivery. While the current research shows promising results, continuous exploration of neural architectures that balance performance, efficiency, and generalization remains a fruitful avenue for ongoing and future research.

PDF Markdown