Audio Tagging on an Embedded Hardware Platform (2306.09106v1)
Abstract: Convolutional neural networks (CNNs) have exhibited state-of-the-art performance in various audio classification tasks. However, their real-time deployment remains a challenge on resource-constrained devices like embedded systems. In this paper, we analyze how the performance of large-scale pretrained audio neural networks designed for audio pattern recognition changes when deployed on a hardware such as Raspberry Pi. We empirically study the role of CPU temperature, microphone quality and audio signal volume on performance. Our experiments reveal that the continuous CPU usage results in an increased temperature that can trigger an automated slowdown mechanism in the Raspberry Pi, impacting inference latency. The quality of a microphone, specifically with affordable devices like the Google AIY Voice Kit, and audio signal volume, all affect the system performance. In the course of our investigation, we encounter substantial complications linked to library compatibility and the unique processor architecture requirements of the Raspberry Pi, making the process less straightforward compared to conventional computers (PCs). Our observations, while presenting challenges, pave the way for future researchers to develop more compact machine learning models, design heat-dissipative hardware, and select appropriate microphones when AI models are deployed for real-time applications on edge devices. All related assets and an interactive demo can be found on GitHub
- Q. Kong, Y. Cao, T. Iqbal, Y. Wang, W. Wang, and M. D. Plumbley, “PANNs: Large-scale pretrained audio neural networks for audio pattern recognition,” IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 28, pp. 2880–2894, 2020.
- Y.-H. Tu, J. Du, and C.-H. Lee, “Speech enhancement based on teacher–student deep learning using improved speech presence probability for noise-robust speech recognition,” IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 27, no. 12, pp. 2080–2091, 2019.
- The Pi Hut, “Raspberry Pi 4 Model B,” https://thepihut.com/products/raspberry-pi-4-model-b?src=raspberrypi&variant=20064052740158, 2023, accessed on May 21, 2023.
- A. Fernández, “General-purpose sound recognition demo,” https://github.com/yinkalario/General-Purpose-Sound-Recognition-Demo, 2021, accessed on May 21, 2023.
- W. Wang, F. Seraj, N. Meratnia, and P. Havinga, “Privacy-aware environmental sound classification for indoor human activity recognition,” in Proceedings of the PETRA ’19: 12th ACM International Conference on PErvasive Technologies Related to Assistive Environments, 2019, pp. 36–44.
- A. Vafeiadis, K. Votis, D. Giakoumis, D. Tzovaras, L. Chen, and R. Hamzaoui, “Audio content analysis for unobtrusive event detection in smart homes,” Eng. Appl. Artif. Intell., vol. 89, p. 103226, 2020.
- P. Rashidi and A. Mihailidis, “A survey on ambient-assisted living tools for older adults,” IEEE J. Biomed. Health Inform., vol. 17, pp. 579–590, 2012.
- A. Schwartz, “Chicago’s video surveillance cameras: A pervasive and poorly regulated threat to our privacy,” Northwest. J. Technol. Intell. Prop., vol. 11, p. 9, 2012.
- M. Chaudhary, V. Prakash, and N. Kumari, “Identification vehicle movement detection in forest area using MFCC and KNN,” in Proceedings of the 2018 International Conference on System Modeling & Advancement in Research Trends (SMART), 2018.
- F. Alías and R. Alsina-Pagès, “Review of wireless acoustic sensor networks for environmental noise monitoring in smart cities,” J. Sens., 2019.
- S. Bouakaz, M. Vacher, M.-E. B. Chaumon, F. Aman, S. Bekkadja, F. Portet, E. Guillou, S. Rossato, E. Desserée, P. Traineau, et al., “CIRDO: Smart companion for helping elderly to live at home for longer,” Irbm, vol. 35, no. 2, pp. 100–108, 2014.
- R. Alsina-Pagès, J. Navarro, F. Alías, and M. Hervás, “Homesound: Real-time audio event detection based on high performance computing for behaviour and surveillance remote monitoring,” Sensors, vol. 17, no. 854, 2017.
- J. Socoró, G. Ribera, X. Sevillano, and F. Alías, “Development of an anomalous noise event detection algorithm for dynamic road traffic noise mapping,” in Proceedings of the 22nd International Congress on Sound and Vibration (ICSV22), 2015.
- J. Jeon and J. Hong, “Classification of urban park soundscapes through perceptions of the acoustical environments,” Landsc. Urban Plan., vol. 141, pp. 100–111, 2015.
- A. Gruenstein, R. Alvarez, C. Thornton, and M. Ghodrat, “A cascade architecture for keyword spotting on mobile devices,” arXiv preprint arXiv:1712.03603, 2017.
- Y. Koizumi, S. Saito, H. Uematsu, N. Harada, and K. Imoto, “ToyADMOS: A dataset of miniature-machine operating sounds for anomalous sound detection,” in 2019 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA). IEEE, 2019, pp. 313–317.
- A. Chowdhery, P. Warden, J. Shlens, A. Howard, and R. Rhodes, “Visual wake words dataset,” arXiv preprint arXiv:1906.05721, 2019.
- C. Banbury, V. J. Reddi, P. Torelli, J. Holleman, N. Jeffries, C. Kiraly, P. Montino, D. Kanter, S. Ahmed, D. Pau, et al., “Mlperf tiny benchmark,” arXiv preprint arXiv:2106.07597, 2021.
- M. Siekkinen, M. Hiienkari, J. K. Nurminen, and J. Nieminen, “How low energy is bluetooth low energy? comparative measurements with zigbee/802.15. 4,” in 2012 IEEE wireless communications and networking conference workshops (WCNCW). IEEE, 2012, pp. 232–237.
- Google, “AIY Voice Kit,” https://aiyprojects.withgoogle.com/voice/, 2023, accessed on May 21, 2023.
- J. F. Gemmeke, D. P. Ellis, D. Freedman, A. Jansen, W. Lawrence, R. C. Moore, M. Plakal, and M. Ritter, “Audio Set: An ontology and human-labeled dataset for audio events,” in 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2017, pp. 776–780.
- R. Mehta, R. Zhu, and A. Cheema, “Is noise always bad? exploring the effects of ambient noise on creative cognition,” Journal of Consumer Research, vol. 39, no. 4, pp. 784–799, 2012.
- T. InvenSense. (2023) ICS-43432 Datasheet. Accessed: 2023-06-14. [Online]. Available: http://invensense.tdk.com/download-pdf/ics-43432-datasheet/
- T. Audacity, “Audacity,” The Name Audacity (R) Is a Registered Trademark of Dominic Mazzoni Retrieved from http://audacity. sourceforge. net, 2017.
Collections
Sign up for free to add this paper to one or more collections.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.