Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
169 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

A Data-Centric Framework for Machine Listening Projects: Addressing Large-Scale Data Acquisition and Labeling through Active Learning (2405.18153v3)

Published 28 May 2024 in cs.SD, cs.LG, and eess.AS

Abstract: Machine Listening focuses on developing technologies to extract relevant information from audio signals. A critical aspect of these projects is the acquisition and labeling of contextualized data, which is inherently complex and requires specific resources and strategies. Despite the availability of some audio datasets, many are unsuitable for commercial applications. The paper emphasizes the importance of Active Learning (AL) using expert labelers over crowdsourcing, which often lacks detailed insights into dataset structures. AL is an iterative process combining human labelers and AI models to optimize the labeling budget by intelligently selecting samples for human review. This approach addresses the challenge of handling large, constantly growing datasets that exceed available computational resources and memory. The paper presents a comprehensive data-centric framework for Machine Listening projects, detailing the configuration of recording nodes, database structure, and labeling budget optimization in resource-constrained scenarios. Applied to an industrial port in Valencia, Spain, the framework successfully labeled 6540 ten-second audio samples over five months with a small team, demonstrating its effectiveness and adaptability to various resource availability situations. Acknowledgments: The participation of Javier Naranjo-Alcazar, Jordi Grau-Haro and Pedro Zuccarello in this research was funded by the Valencian Institute for Business Competitiveness (IVACE) and the FEDER funds by means of project Soroll-IA2 (IMDEEA/2023/91). The research carried out for this publication has been partially funded by the project STARRING-NEURO (PID2022-137048OA-C44) funded by the Ministry of Science, Innovation and Universities of Spain and the European Union.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (23)
  1. Learning in the wild: Bioacoustics few shot learning without using a training set. In Proceedings of the 8th Detection and Classification of Acoustic Scenes and Events 2023 Workshop (DCASE2023), pages 6–10, Tampere, Finland, September 2023.
  2. Audio set: An ontology and human-labeled dataset for audio events. In 2017 IEEE international conference on acoustics, speech and signal processing (ICASSP), pages 776–780. IEEE, 2017.
  3. Sound event detection: A journey through dcase challenge series. APSIPA Transactions on Signal and Information Processing, 13(1), 2024.
  4. Panns: Large-scale pretrained audio neural networks for audio pattern recognition. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 28:2880–2894, 2020.
  5. Crowdsourcing a dataset of audio captions. arXiv preprint arXiv:1907.09238, 2019.
  6. Strong labeling of sound events using crowdsourced weak labels and annotator competence estimation. IEEE/ACM transactions on audio, speech, and language processing, 31:902–914, 2023.
  7. Crowdsourcing strong labels for sound event detection. In 2021 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA), pages 246–250. IEEE, 2021.
  8. Active learning in sound-based bearing fault detection. In Proceedings of the 8th Detection and Classification of Acoustic Scenes and Events 2023 Workshop (DCASE2023), pages 111–115, Tampere, Finland, September 2023.
  9. Sound event detection: A tutorial. IEEE Signal Processing Magazine, 38(5):67–83, 2021.
  10. The life of a new york city noise sensor network. Sensors, 19(6):1415, 2019.
  11. A strongly-labelled polyphonic dataset of urban sounds with spatiotemporal context. In 2021 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC), pages 982–988. IEEE, 2021.
  12. Karol J. Piczak. ESC: Dataset for Environmental Sound Classification. In Proceedings of the 23rd Annual ACM Conference on Multimedia, pages 1015–1018. ACM Press, 2015. ISBN 978-1-4503-3459-4. doi: 10.1145/2733373.2806390. URL http://dl.acm.org/citation.cfm?doid=2733373.2806390.
  13. A dataset and taxonomy for urban sound research. In Proceedings of the 22nd ACM international conference on Multimedia, pages 1041–1044, 2014.
  14. Distilling the knowledge of transformers and CNNs with CP-mobile. In Proceedings of the Detection and Classification of Acoustic Scenes and Events 2023 Workshop (DCASE2023), pages 161–165, 2023.
  15. Burr Settles. Active learning literature survey. 2009.
  16. Active learning for sound event classification using monte-carlo dropout and pann embeddings. In Proceedings of the 6th Workshop on Detection and Classication of Acoustic Scenes and Events (DCASE 2021), pages 150–154. DCASE, 2021.
  17. Active learning for sound event classification using bayesian neural networks with gaussian variational posterior. In ICASSP 2024-2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 896–900. IEEE, 2024.
  18. Active learning for sound event classification by clustering unlabeled data. In 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 751–755, 2017. doi: 10.1109/ICASSP.2017.7952256.
  19. An active learning method using clustering and committee-based sample selection for sound event classification. In 2018 16th International Workshop on Acoustic Signal Enhancement (IWAENC), pages 116–120, 2018. doi: 10.1109/IWAENC.2018.8521336.
  20. Active learning for sound event detection. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 28:2895–2905, 2020. doi: 10.1109/TASLP.2020.3029652.
  21. Active learning for efficient audio annotation and classification with a large amount of unlabeled data. In ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 880–884, 2019. doi: 10.1109/ICASSP.2019.8683063.
  22. Kevin Wilkinghoff. Self-supervised learning for anomalous sound detection. In ICASSP 2024-2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 276–280. IEEE, 2024.
  23. Mavd: A dataset for sound event detection in urban environments. In Proceedings of the Detection and Classification of Acoustic Scenes and Events 2019 Workshop (DCASE2019), pages 263–267, New York University, NY, USA, October 2019.

Summary

We haven't generated a summary for this paper yet.