ActSonic: Recognizing Everyday Activities from Inaudible Acoustic Wave Around the Body (2404.13924v3)
Abstract: We present ActSonic, an intelligent, low-power active acoustic sensing system integrated into eyeglasses that can recognize 27 different everyday activities (e.g., eating, drinking, toothbrushing) from inaudible acoustic waves around the body. It requires only a pair of miniature speakers and microphones mounted on each hinge of the eyeglasses to emit ultrasonic waves, creating an acoustic aura around the body. The acoustic signals are reflected based on the position and motion of various body parts, captured by the microphones, and analyzed by a customized self-supervised deep learning framework to infer the performed activities on a remote device such as a mobile phone or cloud server. ActSonic was evaluated in user studies with 19 participants across 19 households to track its efficacy in everyday activity recognition. Without requiring any training data from new users (leave-one-participant-out evaluation), ActSonic detected 27 activities, achieving an average F1-score of 86.6% in fully unconstrained scenarios and 93.4% in prompted settings at participants' homes.
- Attend and Discriminate: Beyond the State-of-the-Art for Human Activity Recognition Using Wearable Sensors. Proc. ACM Interact. Mob. Wearable Ubiquitous Technol. 5, 1, Article 1 (mar 2021), 22 pages. https://doi.org/10.1145/3448083
- Analog Devices AI. [n. d.]. ADI MAX78000/MAX78002 Model Training and Synthesis. https://github.com/MaximIntegratedAI/ai8x-synthesis. [Online; accessed 29-Nov-2023].
- Amazon.com. [n. d.]. SanDisk 32GB Extreme microSDHC UHS-I Memory Card with Adapter - Up to 100MB/s, C10, U3, V30, 4K, A1, Micro SD - SDSQXAF-032G-GN6MA. https://www.amazon.com/SanDisk-Extreme-microSDHC-UHS-3-SDSQXAF-032G-GN6MA/dp/B06XWMQ81P?th=1. [Online; accessed 29-Nov-2023].
- Ling Bao and Stephen S Intille. 2004. Activity recognition from user-annotated acceleration data. In International conference on pervasive computing. Springer, 1–17.
- Audio privacy: reducing speech intelligibility while preserving environmental sounds. In Proceedings of the 16th ACM international conference on Multimedia. 733–736.
- C-Face: Continuously Reconstructing Facial Expressions by Deep Learning Contours of the Face with Ear-mounted Miniature Cameras. In Proceedings of the 33rd Annual ACM Symposium on User Interface Software and Technology (Virtual Event, USA) (UIST ’20). Association for Computing Machinery, New York, NY, USA, 112–125. https://doi.org/10.1145/3379337.3415879
- PE Chucri A. Kardous, MS and Ph.D. Peter B. Shaw. [n. d.]. CDC: So How Accurate Are These Smartphone Sound Measurement Apps? https://blogs.cdc.gov/niosh-science-blog/2014/04/09/sound-apps/. [Online; accessed 29-Nov-2023].
- Android Developers. [n. d.]. Profile your app performance. https://developer.android.com/studio/profile. [Online; accessed 27-Nov-2023].
- Revisiting skeleton-based action recognition. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2969–2978.
- Activities of daily living. (2019).
- Yazan Abu Farha and Jurgen Gall. 2019. Ms-tcn: Multi-stage temporal convolutional network for action segmentation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 3575–3584.
- HydroSense: infrastructure-mediated single-point sensing of whole-home water activity. In Proceedings of the 11th international conference on Ubiquitous computing. 235–244.
- GoPro. 2020. HERO9 Black. https://gopro.com/en/us/shop/cameras/hero9-black/CHDHX-901-master.html. [Online; accessed 12-September-2023].
- Yu Guan and Thomas Plötz. 2017. Ensembles of deep lstm learners for activity recognition using wearables. Proceedings of the ACM on interactive, mobile, wearable and ubiquitous technologies 1, 2 (2017), 1–28.
- ElectriSense: single-point sensing using EMI for electrical event detection and classification in the home. In Proceedings of the 12th ACM international conference on Ubiquitous computing. 139–148.
- Deep, convolutional, and recurrent models for human activity recognition using wearables. arXiv preprint arXiv:1604.08880 (2016).
- On the role of features in human activity recognition. In Proceedings of the 2019 ACM International Symposium on Wearable Computers. 78–88.
- Masked reconstruction based self-supervision for human activity recognition. In Proceedings of the 2020 ACM International Symposium on Wearable Computers. 45–49.
- Contrastive Predictive Coding for Human Activity Recognition. Proc. ACM Interact. Mob. Wearable Ubiquitous Technol. 5, 2, Article 65 (jun 2021), 26 pages. https://doi.org/10.1145/3463506
- Deep Residual Learning for Image Recognition. 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2015), 770–778.
- Connectionist temporal modeling for weakly supervised action labeling. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part IV 14. Springer, 137–153.
- Sergey Ioffe and Christian Szegedy. 2015. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift. 448–456. http://jmlr.org/proceedings/papers/v37/ioffe15.pdf
- Privacymic: Utilizing inaudible frequencies for privacy preserving daily activity recognition. In Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems. 1–13.
- Interferi: Gesture sensing using on-body acoustic interferometry. In Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems. 1–13.
- TransASL: A Smart Glass based Comprehensive ASL Recognizer in Daily Life. In Proceedings of the 28th International Conference on Intelligent User Interfaces (¡conf-loc¿, ¡city¿Sydney¡/city¿, ¡state¿NSW¡/state¿, ¡country¿Australia¡/country¿, ¡/conf-loc¿) (IUI ’23). Association for Computing Machinery, New York, NY, USA, 802–818. https://doi.org/10.1145/3581641.3584071
- SonicASL: An Acoustic-based Sign Language Gesture Recognizer Using Earphones. Proc. ACM Interact. Mob. Wearable Ubiquitous Technol. 5, 2, Article 67 (jun 2021), 30 pages. https://doi.org/10.1145/3463519
- Multiple door opening/closing detection system using infrasound sensor. In 2018 17th ACM/IEEE International Conference on Information Processing in Sensor Networks (IPSN). IEEE, 126–127.
- Detector-free weakly supervised group activity recognition. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 20083–20093.
- Diederik P Kingma and Jimmy Ba. 2014. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014).
- Unsupervised learning of action classes with continuous temporal embedding. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 12066–12074.
- Activity Recognition Using Cell Phone Accelerometers. SIGKDD Explor. Newsl. 12, 2 (mar 2011), 74–82. https://doi.org/10.1145/1964897.1964918
- Adding structural characteristics to distribution-based accelerometer representations for activity recognition using wearables. In Proceedings of the 2018 ACM international symposium on wearable computers. 72–75.
- Deepear: robust smartphone audio sensing in unconstrained acoustic environments using deep learning. In Proceedings of the 2015 ACM international joint conference on pervasive and ubiquitous computing. 283–294.
- Ubicoustics: Plug-and-play acoustic activity recognition. In Proceedings of the 31st Annual ACM Symposium on User Interface Software and Technology. 213–224.
- Gierad Laput and Chris Harrison. 2019. Sensing Fine-Grained Hand Activity with Smartwatches. In Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems (Glasgow, Scotland Uk) (CHI ’19). Association for Computing Machinery, New York, NY, USA, 1–13. https://doi.org/10.1145/3290605.3300568
- ViBand: High-Fidelity Bio-Acoustic Sensing Using Commodity Smartwatch Accelerometers. In Proceedings of the 29th Annual Symposium on User Interface Software and Technology (Tokyo, Japan) (UIST ’16). Association for Computing Machinery, New York, NY, USA, 321–333. https://doi.org/10.1145/2984511.2984582
- Synthetic sensors: Towards general-purpose sensing. In Proceedings of the 2017 CHI Conference on Human Factors in Computing Systems. 3986–3999.
- Temporal convolutional networks for action segmentation and detection. In proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 156–165.
- EchoWrist: Continuous Hand Pose Tracking and Hand-Object Interaction Recognition Using Low-Power Active Acoustic Sensing On a Wristband. arXiv preprint arXiv:2401.17409 (2024).
- Boning Li and Akane Sano. 2020. Extraction and Interpretation of Deep Autoencoder-Based Temporal Features from Wearables for Forecasting Personalized Mood, Health, and Stress. Proc. ACM Interact. Mob. Wearable Ubiquitous Technol. 4, 2, Article 49 (jun 2020), 26 pages. https://doi.org/10.1145/3397318
- GazeTrak: Exploring Acoustic-based Eye Tracking on a Glass Frame. arXiv preprint arXiv:2402.14634 (2024).
- EyeEcho: Continuous and Low-power Facial Expression Tracking on Glasses. arXiv preprint arXiv:2402.12388 (2024).
- EarIO: A Low-Power Acoustic Sensing Earable for Continuously Tracking Detailed Facial Movements. 6, 2, Article 62 (jul 2022), 24 pages. https://doi.org/10.1145/3534621
- Dawei Liang and Edison Thomaz. 2019. Audio-based activities of daily living (adl) recognition with large-scale acoustic embeddings from online videos. Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies 3, 1 (2019), 1–18.
- C-Auth: Exploring the Feasibility of Using Egocentric View of Face Contour for User Authentication on Glasses. In Proceedings of the 2023 ACM International Symposium on Wearable Computers. 6–10.
- BodyTrak: Inferring Full-Body Poses from Body Silhouettes Using a Miniature Camera on a Wristband. Proc. ACM Interact. Mob. Wearable Ubiquitous Technol. 6, 3, Article 154 (sep 2022), 21 pages. https://doi.org/10.1145/3552312
- D-Touch: Recognizing and Predicting Fine-grained Hand-face Touching Activities Using a Neck-mounted Wearable. In Proceedings of the 28th International Conference on Intelligent User Interfaces (¡conf-loc¿, ¡city¿Sydney¡/city¿, ¡state¿NSW¡/state¿, ¡country¿Australia¡/country¿, ¡/conf-loc¿) (IUI ’23). Association for Computing Machinery, New York, NY, USA, 569–583. https://doi.org/10.1145/3581641.3584063
- Focal loss for dense object detection. In Proceedings of the IEEE international conference on computer vision. 2980–2988.
- Diffusion Action Segmentation. arXiv preprint arXiv:2303.17959 (2023).
- LowPowerLab. [n. d.]. Current Ranger. https://lowpowerlab.com/guide/currentranger/. [Online; accessed 29-Nov-2023].
- Recognizing workshop activity using body worn microphones and accelerometers. In Pervasive Computing: Second International Conference, PERVASIVE 2004, Linz/Vienna, Austria, April 21-23, 2004. Proceedings 2. Springer, 18–32.
- AttnSense: Multi-level attention mechanism for multimodal human activity recognition.. In IJCAI. 3109–3115.
- PoseSonic: 3D Upper Body Pose Estimation Through Egocentric Acoustic Sensing on Smartglasses. Proc. ACM Interact. Mob. Wearable Ubiquitous Technol. 7, 3, Article 111 (sep 2023), 28 pages. https://doi.org/10.1145/3610895
- Human Activity Recognition from Wearable Sensor Data Using Self-Attention. In ECAI 2020 - 24th European Conference on Artificial Intelligence, 29 August-8 September 2020, Santiago de Compostela, Spain.
- maxim integrated. [n. d.]. MAX98357A/ MAX98357B Tiny, Low-Cost, PCM Class D Amplifier with Class AB Performance. https://www.analog.com/media/en/technical-documentation/data-sheets/MAX98357A-MAX98357B.pdf. [Online; accessed 29-Nov-2023].
- A CNN-Based Human Activity Recognition System Combining a Laser Feedback Interferometry Eye Movement Sensor and an IMU for Context-Aware Smart Glasses. Proc. ACM Interact. Mob. Wearable Ubiquitous Technol. 5, 4, Article 172 (dec 2022), 24 pages. https://doi.org/10.1145/3494998
- SAMoSA: Sensing Activities with Motion and Subsampled Audio. Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies 6, 3 (2022), 1–19.
- Possible effects on health of ultrasound exposure, risk factors in the work environment and occupational safety review. In Healthcare, Vol. 10. MDPI, 423.
- Vishvak S Murahari and Thomas Plötz. 2018. On attention models for human activity recognition. In Proceedings of the 2018 ACM international symposium on wearable computers. 100–103.
- William J Murphy and John R Franks. 2002. Revisiting the NIOSH criteria for a recommended standard: Occupational noise exposure. The Journal of the Acoustical Society of America 111, 5 (2002), 2397.
- Contactless sleep apnea detection on smartphones. In Proceedings of the 13th annual international conference on mobile systems, applications, and services. 45–57.
- FingerIO: Using Active Sonar for Fine-Grained Finger Tracking. In Proceedings of the 2016 CHI Conference on Human Factors in Computing Systems (San Jose, California, USA) (CHI ’16). Association for Computing Machinery, New York, NY, USA, 1515–1525. https://doi.org/10.1145/2858036.2858580
- AssemblyHands: Towards Egocentric Activity Understanding via 3D Hand Pose Estimation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 12999–13008.
- Francisco Javier Ordóñez and Daniel Roggen. 2016. Deep convolutional and lstm recurrent neural networks for multimodal wearable activity recognition. Sensors 16, 1 (2016), 115.
- PJRC. [n. d.]. Teensy® 4.1 Development Board. https://www.pjrc.com/store/teensy41.html. [Online; accessed 29-Nov-2023].
- Unweavenet: Unweaving activity stories. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 13770–13779.
- A Novel Distribution-Embedded Neural Network for Sensor-Based Activity Recognition.. In IJCAI, Vol. 2019. 5614–5620.
- Multimodal Deep Learning for Activity and Context Recognition. Proc. ACM Interact. Mob. Wearable Ubiquitous Technol. 1, 4, Article 157 (jan 2018), 27 pages. https://doi.org/10.1145/3161174
- Federated self-supervised learning of multisensor representations for embedded intelligence. IEEE Internet of Things Journal 8, 2 (2020), 1030–1040.
- Mobilenetv2: Inverted residuals and linear bottlenecks. In Proceedings of the IEEE conference on computer vision and pattern recognition. 4510–4520.
- Temporally-weighted hierarchical clustering for unsupervised action segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 11225–11234.
- Grad-CAM: Visual Explanations from Deep Networks via Gradient-Based Localization. International Journal of Computer Vision 128, 2 (oct 2019), 336–359. https://doi.org/10.1007/s11263-019-01228-7
- Nordic Semiconductors. [n. d.]. nRF52840 Multiprotocol Bluetooth 5.4 SoC supporting Bluetooth Low Energy, Bluetooth mesh, NFC, Thread and Zigbee. https://www.nordicsemi.com/products/nrf52840. [Online; accessed 29-Nov-2023].
- SGWireless. [n. d.]. SGW111X BLE Modules. https://www.sgwireless.com/product/SGW111X. [Online; accessed 29-Nov-2023].
- MyDJ: Sensing Food Intakes with an Attachable on Your Eyeglass Frame. In Proceedings of the 2022 CHI Conference on Human Factors in Computing Systems (, New Orleans, LA, USA,) (CHI ’22). Association for Computing Machinery, New York, NY, USA, Article 341, 17 pages. https://doi.org/10.1145/3491102.3502041
- Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15, 1 (2014), 1929–1958.
- MM-Fit: Multimodal Deep Learning for Automatic Exercise Logging across Sensing Devices. Proc. ACM Interact. Mob. Wearable Ubiquitous Technol. 4, 4, Article 168 (dec 2020), 22 pages. https://doi.org/10.1145/3432701
- EchoNose: Sensing Mouth, Breathing and Tongue Gestures inside Oral Cavity using a Non-contact Nose Interface. In Proceedings of the 2023 ACM International Symposium on Wearable Computers (Cancun, Quintana Roo, Mexico) (ISWC ’23). Association for Computing Machinery, New York, NY, USA, 22–26. https://doi.org/10.1145/3594738.3611358
- ThumbTrak: Recognizing micro-finger poses using a ring with proximity sensing. In Proceedings of the 23rd International Conference on Mobile Human-Computer Interaction. 1–9.
- Teethtap: Recognizing discrete teeth gestures using motion and acoustic sensing on an earpiece. In 26th International Conference on Intelligent User Interfaces. 161–169.
- TDK. [n. d.]. ICS-43434 Multi-Mode Microphone with I²S Digital Output. https://invensense.tdk.com/products/ics-43434/. [Online; accessed 29-Nov-2023].
- Are accelerometers for activity recognition a dead-end?. In Proceedings of the 21st International Workshop on Mobile Computing Systems and Applications. 39–44.
- Hierarchical Self Attention Based Autoencoder for Open-Set Human Activity Recognition. In Advances in Knowledge Discovery and Data Mining, Kamal Karlapalem, Hong Cheng, Naren Ramakrishnan, R. K. Agrawal, P. Krishna Reddy, Jaideep Srivastava, and Tanmoy Chakraborty (Eds.). Springer International Publishing, Cham, 351–363.
- Context Recognition In-the-Wild: Unified Model for Multi-Modal Sensors and Multi-Label Classification. Proc. ACM Interact. Mob. Wearable Ubiquitous Technol. 1, 4, Article 168 (jan 2018), 22 pages. https://doi.org/10.1145/3161192
- C-FMCW Based Contactless Respiration Detection Using Acoustic Signal. 1, 4, Article 170 (jan 2018), 20 pages. https://doi.org/10.1145/3161188
- Activity recognition of assembly tasks using body-worn microphones and accelerometers. IEEE transactions on pattern analysis and machine intelligence 28, 10 (2006), 1553–1567.
- Automated Class Discovery and One-Shot Interactions for Acoustic Activity Recognition. In Proceedings of the 2020 CHI Conference on Human Factors in Computing Systems. 1–14.
- Empirical evaluation of rectified activations in convolutional network. arXiv preprint arXiv:1505.00853 (2015).
- Sadeepsense: Self-attention deep learning framework for heterogeneous on-device sensors in internet of things applications. In IEEE INFOCOM 2019-IEEE Conference on Computer Communications. IEEE, 1243–1251.
- Koji Yatani and Khai N Truong. 2012. Bodyscope: a wearable acoustic sensor for activity recognition. In Proceedings of the 2012 ACM Conference on Ubiquitous Computing. 341–350.
- FingerSound: Recognizing unistroke thumb gestures using a ring. Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies 1, 3 (2017), 1–19.
- FingerPing: Recognizing Fine-grained Hand Poses using Active Acoustic On-body Sensing. In Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems (¡conf-loc¿, ¡city¿Montreal QC¡/city¿, ¡country¿Canada¡/country¿, ¡/conf-loc¿) (CHI ’18). Association for Computing Machinery, New York, NY, USA, 1–10. https://doi.org/10.1145/3173574.3174011
- Vidat—ANU CVML Video Annotation Tool. https://github.com/anucvml/vidat.
- EchoSpeech: Continuous Silent Speech Recognition on Minimally-Obtrusive Eyewear Powered by Acoustic Sensing. In Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems (Hamburg, Germany) (CHI ’23). Association for Computing Machinery, New York, NY, USA, Article 852, 18 pages. https://doi.org/10.1145/3544548.3580801
- EatingTrak: Detecting Fine-Grained Eating Moments in the Wild Using a Wrist-Mounted IMU. Proc. ACM Hum.-Comput. Interact. 6, MHCI, Article 214 (sep 2022), 22 pages. https://doi.org/10.1145/3546749