CoPlay: Audio-agnostic Cognitive Scaling for Acoustic Sensing (2403.10796v1)
Abstract: Acoustic sensing manifests great potential in various applications that encompass health monitoring, gesture interface and imaging by leveraging the speakers and microphones on smart devices. However, in ongoing research and development in acoustic sensing, one problem is often overlooked: the same speaker, when used concurrently for sensing and other traditional applications (like playing music), could cause interference in both making it impractical to use in the real world. The strong ultrasonic sensing signals mixed with music would overload the speaker's mixer. To confront this issue of overloaded signals, current solutions are clipping or down-scaling, both of which affect the music playback quality and also sensing range and accuracy. To address this challenge, we propose CoPlay, a deep learning based optimization algorithm to cognitively adapt the sensing signal. It can 1) maximize the sensing signal magnitude within the available bandwidth left by the concurrent music to optimize sensing range and accuracy and 2) minimize any consequential frequency distortion that can affect music playback. In this work, we design a deep learning model and test it on common types of sensing signals (sine wave or Frequency Modulated Continuous Wave FMCW) as inputs with various agnostic concurrent music and speech. First, we evaluated the model performance to show the quality of the generated signals. Then we conducted field studies of downstream acoustic sensing tasks in the real world. A study with 12 users proved that respiration monitoring and gesture recognition using our adapted signal achieve similar accuracy as no-concurrent-music scenarios, while clipping or down-scaling manifests worse accuracy. A qualitative study also manifests that the music play quality is not degraded, unlike traditional clipping or down-scaling methods.
- 2023. Anonymized workshop paper.
- Cognitive radar framework for target detection and tracking. IEEE Journal of Selected Topics in Signal Processing 9, 8 (2015), 1427–1439.
- Go Direct Vernier Belt. 2023. Go Direct Python SDK. https://github.com/VernierST/godirect-examples/tree/main/python
- Generating long sequences with sparse transformers. arXiv preprint arXiv:1904.10509 (2019).
- Pytorch Document. 2023. Training a classifier - cifar10 tutorial. https://pytorch.org/tutorials/beginner/blitz/cifar10_tutorial.html#define-a-convolutional-neural-network
- P.E. Dr. Erol Kalkan. 2023. xcorrFD(x,y) MATLAB Central File Exchange. (January 27 2023). https://www.mathworks.com/matlabcentral/fileexchange/63353-xcorrfd-x-y
- Neural audio synthesis of musical notes with wavenet autoencoders. In International Conference on Machine Learning. PMLR, 1068–1077.
- SVoice: Enabling Voice Communication in Silence via Acoustic Sensing on Commodity Devices. In Proceedings of the 20th ACM Conference on Embedded Networked Sensor Systems. 622–636.
- Owlet: Enabling spatial information in ubiquitous acoustic devices. In Proceedings of the 19th Annual International Conference on Mobile Systems, Applications, and Services. 255–268.
- It’s raw! audio generation with state-space models. In International Conference on Machine Learning. PMLR, 7616–7633.
- Soundwave: using the doppler effect to sense gestures. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems. 1911–1914.
- Presence detection using ultrasonic signals with concurrent audio playback. US Patent 11,564,036.
- Experience: practical problems for acoustic sensing. In Proceedings of the 28th Annual International Conference on Mobile Computing And Networking. 381–390.
- Lasense: Pushing the limits of fine-grained activity sensing using acoustic signals. Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies 6, 1 (2022), 1–27.
- Diffsinger: Singing voice synthesis via shallow diffusion mechanism. In Proceedings of the AAAI conference on artificial intelligence, Vol. 36. 11020–11028.
- Cat: high-precision acoustic motion tracking. In Proceedings of the 22nd Annual International Conference on Mobile Computing and Networking. 69–81.
- Aim: Acoustic imaging on a mobile. In Proceedings of the 16th Annual International Conference on Mobile Systems, Applications, and Services. 468–481.
- Rnn-based room scale hand motion tracking. In The 25th Annual International Conference on Mobile Computing and Networking. 1–16.
- SampleRNN: An unconditional end-to-end neural audio generation model. arXiv preprint arXiv:1612.07837 (2016).
- Alias-Free Convnets: Fractional Shift Invariance via Polynomial Activations. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 16333–16342.
- Contactless sleep apnea detection on smartphones. In Proceedings of the 13th annual international conference on mobile systems, applications, and services. 45–57.
- Fingerio: Using active sonar for fine-grained finger tracking. In Proceedings of the 2016 CHI Conference on Human Factors in Computing Systems. 1515–1525.
- Covertband: Activity information leakage using music. Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies 1, 3 (2017), 1–24.
- Wavenet: A generative model for raw audio. arXiv preprint arXiv:1609.03499 (2016).
- Non-autoregressive neural text-to-speech. In International conference on machine learning. PMLR, 7586–7598.
- On the spectral bias of neural networks. In International Conference on Machine Learning. PMLR, 5301–5310.
- DeepSound. Samplernn. 2017. Samplernn. https://github.com/deepsound-project/samplernn-pytorch
- SpiroSonic: monitoring human lung function via acoustic sensing on commodity smartphones. In Proceedings of the 26th Annual International Conference on Mobile Computing and Networking. 1–14.
- Wave-u-net: A multi-scale neural network for end-to-end audio source separation. arXiv preprint arXiv:1806.03185 (2018).
- Contactless infant monitoring using white noise. In The 25th Annual International Conference on Mobile Computing and Networking. 1–16.
- Amaging: Acoustic Hand Imaging for Self-adaptive Gesture Recognition. In IEEE INFOCOM 2022 - IEEE Conference on Computer Communications. 80–89. https://doi.org/10.1109/INFOCOM48880.2022.9796906
- Device-free gesture tracking using acoustic signals. In Proceedings of the 22nd Annual International Conference on Mobile Computing and Networking. 82–94.
- Wikipedia. 2023. Amplitude modulation. https://en.wikipedia.org/wiki/Amplitude_modulation
- YoutubeMix. 2019. Pop music 2019 dataset. https://www.youtube.com/watch?v=-hg7ILmqadg
- YoutubeMix. 2022. Pop music 2022 dataset. https://www.youtube.com/watch?v=v8ASOBDTifo
- YoutubeMix. 2023a. Bass guitar lo-fi mix dataset. https://www.youtube.com/watch?v=UET4X_UgF5k
- YoutubeMix. 2023b. Podcast by Conan O Brien dataset. https://www.youtube.com/watch?v=du2sCHXJf2A
- YoutubeMix. 2023c. Podcast by Selena Gomez dataset. https://www.youtube.com/watch?v=AmARvccsdMI
- YoutubeMix. 2024. Bass boosted music mix dataset. https://www.youtube.com/watch?v=RULRXAf5AC8
- Richard Zhang. 2019. Making convolutional networks shift-invariant again. In International conference on machine learning. PMLR, 7324–7334.
- EchoPrint: Two-factor authentication using acoustics and vision on smartphones. In Proceedings of the 24th Annual International Conference on Mobile Computing and Networking. 321–336.