Beyond-Voice: Towards Continuous 3D Hand Pose Tracking on Commercial Home Assistant Devices (2306.17477v2)
Abstract: The surging popularity of home assistants and their voice user interface (VUI) have made them an ideal central control hub for smart home devices. However, current form factors heavily rely on VUI, which poses accessibility and usability issues; some latest ones are equipped with additional cameras and displays, which are costly and raise privacy concerns. These concerns jointly motivate Beyond-Voice, a novel high-fidelity acoustic sensing system that allows commodity home assistant devices to track and reconstruct hand poses continuously. It transforms the home assistant into an active sonar system using its existing onboard microphones and speakers. We feed a high-resolution range profile to the deep learning model that can analyze the motions of multiple body parts and predict the 3D positions of 21 finger joints, bringing the granularity for acoustic hand tracking to the next level. It operates across different environments and users without the need for personalized training data. A user study with 11 participants in 3 different environments shows that Beyond-Voice can track joints with an average mean absolute error of 16.47mm without any training data provided by the testing subject.
- RF-Capture: Capturing a coarse human figure through a wall. Proceedings of the ACM Transactions on Graphics (2015).
- 3D tracking via body radio reflections. In Proceedings of the USENIX Symposium on Networked Systems Design and Implementation. 317–329.
- Fadel Adib and Dina Katabi. 2013. See through walls with WiFi!. In Proceedings of the ACM Conference of the Special Interest Group on Data Communication. 75–86.
- Curriculum learning. In Proceedings of the Annual International Conference on Machine Learning. 41–48.
- Beyond-Voice. 2022. Demo video. (2022). https://drive.google.com/file/d/1CfFvV-cCkutmsynIu1LmDuNROpkfAP9L/view?usp=sharing
- Xinghao Chen. 2023. Evaluations on hand pose estimation on several public datasets. (August 2023). https://github.com/xinghaochen/awesome-hand-pose-estimation/tree/master/evaluation
- SwipePass: Acoustic-based second-factor user authentication for smartphones. Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies 6, 3 (2022), 1–25.
- Ultraleap Leap Motion Controller. 2020. How hand tracking works. (September 2020). https://www.ultraleap.com/company/news/blog/how-hand-tracking-works/
- CyberGlove. 2022. hand tracking. (2022). http://www.cyberglovesystems.com/
- DigiKey. 2022. speaker. (2022). https://www.digikey.com/en/products/detail/pui-auio-inc/AS07104PO-R/2763867
- P.E. Dr. Erol Kalkan. 2023. xcorrFD(x,y) MATLAB Central File Exchange. (January 27 2023). https://www.mathworks.com/matlabcentral/fileexchange/63353-xcorrfd-x-y
- ” Learn, Use, and (Intermittently) Abandon”: Exploring the Practices of Early Smart Speaker Adopters in Urban India. Proceedings of the ACM on Human-Computer Interaction 5, CSCW2 (2021), 1–28.
- Robust 3d hand pose estimation in single depth images: from single-view cnn to multi-view cnns. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 3593–3601.
- 3d hand shape and pose estimation from a single rgb image. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 10833–10842.
- Soundwave: using the doppler effect to sense gestures. In Proceedings of the Conference on Human Factors in Computing Systems. 1911–1914.
- MEgATrack: monochrome egocentric articulated hand-tracking for virtual reality. Proceedings of the ACM Transactions on Graphics 39, 4 (2020), 87–1.
- UmeTrack: Unified multi-view end-to-end hand tracking for VR. In Proceedings of the Conference and Exhibition on Computer Graphics and Interactive Techniques in Asia. 1–9.
- Radarnet: Efficient gesture recognition technique utilizing a miniature radar sensor. In Proceedings of the Conference on Human Factors in Computing Systems. 1–14.
- FingerTrak: Continuous 3D hand pose tracking by deep learning hand silhouettes captured by miniature thermal cameras on wrist. Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies 4, 2 (2020), 1–24.
- Texas Instruments. 2020. The fundamentals of millimeter wave radar sensors.
- Construct 3D Hand Skeleton with Commercial WiFi. Proceedings of the Conference on Embedded Networked Sensor Systems (2023).
- Towards 3D human pose construction using wifi. In Proceedings of the Annual International Conference on Mobile Computing and Networking. 1–14.
- Experience: practical problems for acoustic sensing. In Proceedings of the Annual International Conference on Mobile Computing And Networking. 381–390.
- FM-track: pushing the limits of contactless multi-target tracking using acoustic signals. In Proceedings of the Conference on Embedded Networked Sensor Systems. 150–163.
- Interacting attention graph for single image two-hand reconstruction. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2761–2770.
- Acoustic-based 2-D target tracking with constrained intelligent edge device. Proceedings of the Journal of Systems Architecture 131 (2022), 102696.
- WR-Hand: Wearable armband can track user’s hand. Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies 5, 3 (2021), 1–27.
- Rnn-based room scale hand motion tracking. In Proceedings of the Annual International Conference on Mobile Computing and Networking. 1–16.
- Investigating the accessibility of voice assistants with impaired users: mixed methods study. Proceedings of the Journal of medical Internet research 22, 9 (2020), e18431.
- Meta. 2021a. Inside Facebook Reality Labs: Wrist-based interaction for the next computing platform. (March 2021). https://tech.fb.com/ar-vr/2021/03/inside-facebook-reality-labs-wrist-based-interaction-for-the-next-computing-platform/
- Meta. 2021b. Inside Reality Labs Research: Bringing Touch to the Virtual World. (November 2021). https://about.fb.com/news/2021/11/reality-labs-haptic-gloves-research/
- MiniDSP. 2022. Development microphone array board. (2022). https://www.minidsp.com/products/usb-audio-interface/uma-8-sp-detail
- Leap Motion. 2020. Google MediaPipe hand tracking. (2020). https://google.github.io/mediapipe/solutions/hands.html
- Leap Motion. 2022. Hand API. (2022). https://developer-archive.leapmotion.com/documentation/v2/javascript/devguide/Leap_Overview.html
- Possible Effects on Health of Ultrasound Exposure, Risk Factors in the Work Environment and Occupational Safety Review. In Healthcare, Vol. 10. MDPI, 423.
- Fingerio: Using active sonar for fine-grained finger tracking. In Proceedings of the Conference on Human Factors in Computing Systems. 1515–1525.
- Covertband: Activity information leakage using music. Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies 1, 3 (2017), 1–24.
- Efficient model-based 3D tracking of hand articulations using Kinect.. In Proceedings of the British Machine Vision Conference, Vol. 1. 3.
- AuraRing: Precise electromagnetic finger tracking. Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies 3, 4 (2019), 1–28.
- Whole-home gesture recognition using wireless signals. In Proceedings of the Annual Iternational Conference on Mobile Computing and Networking. 27–38.
- Aung Pyae and Tapani N Joelsson. 2018. Investigating the usability and user experiences of voice user interface: a case of Google home smart speaker. In Proceedings of the International Conference on Human-Computer Interaction with Mobile Devices and Services Adjunct. 127–131.
- Aung Pyae and Paul Scifleet. 2019. Investigating the role of user’s English language proficiency in using a voice user interface: A case of Google Home smart speaker. In Proceedings of the extended abstracts of the Conference on Human Factors in Computing Systems. 1–6.
- Widar: Decimeter-level passive tracking via velocity monitoring with commodity Wi-Fi. In Proceedings of the ACM International Symposium on Mobile Ad Hoc Networking and Computing. 1–10.
- Widar2. 0: Passive human tracking with a single Wi-Fi link. In Proceedings of the Annual International Conference on Mobile Systems, Applications, and Services. 350–361.
- Translating sEMG signals to continuous hand poses using recurrent neural networks. In Proceedings of the IEEE EMBS International Conference on Biomedical and Health Informatics. IEEE, 166–169.
- Hand keypoint detection in single images using multiview bootstrapping. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1145–1153.
- Continuous gesture recognition from sEMG sensor data with recurrent neural networks and adversarial domain adaptation. In Proceedings of the International Conference on Control, Automation, Robotics and Vision. IEEE, 1436–1441.
- Mediapipe hands: On-device real-time hand tracking. Proceedings of the Workshop on Computer Vision for AR/VR (2020).
- Self-supervised 3d hand pose estimation through training by fitting. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 10853–10862.
- Amaging: Acoustic Hand Imaging for Self-adaptive Gesture Recognition. In Proceedings of the IEEE Conference on Computer Communications. 80–89. https://doi.org/10.1109/INFOCOM48880.2022.9796906
- Interacting with soli: Exploring fine-grained dynamic gesture recognition in the radio-frequency spectrum. In Proceedings of the Annual Symposium on User Interface Software and Technology. 851–860.
- C-FMCW based contactless respiration detection using acoustic signal. Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies 1, 4 (2018), 1–20.
- Device-free gesture tracking using acoustic signals. In Proceedings of the 22nd Annual International Conference on Mobile Computing and Networking. 82–94.
- Push the limit of acoustic gesture recognition. Proceedings of the IEEE Transactions on Mobile Computing (2020).
- mm3DFace: Nonintrusive 3D Facial Reconstruction Leveraging mmWave Signals. In Proceedings of the 21st Annual International Conference on Mobile Systems, Applications and Services. 462–474.
- Camera Pose Estimation and Localization with Active Audio Sensing. In Proceedings of the European Conference on Computer Vision. Springer, 271–291.
- PoseKernelLifter: Metric Lifting of 3D Human Pose using Sound. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 13179–13189.
- Sequence-Based Device-Free Gesture Recognition Framework for Multi-Channel Acoustic Signals. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing. IEEE, 1–5.
- Soundtrak: Continuous 3d tracking of a finger using active acoustics. Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies 1, 2 (2017), 1–25.
- Widar3.0: Zero-Effort Cross-Domain Gesture Recognition with Wi-Fi. Proceedings of the IEEE Transactions on Pattern Analysis and Machine Intelligence (2021), 1–1.
- Through-wall human pose estimation using radio signals. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 7356–7365.
- RF-based 3D skeletons. In Proceedings of the Conference of the ACM Special Interest Group on Data Communication. 267–281.
- H3WB: Human3. 6M 3D WholeBody Dataset and Benchmark. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 20166–20177.