Mind the Domain Gap: a Systematic Analysis on Bioacoustic Sound Event Detection (2403.18638v1)
Abstract: Detecting the presence of animal vocalisations in nature is essential to study animal populations and their behaviors. A recent development in the field is the introduction of the task known as few-shot bioacoustic sound event detection, which aims to train a versatile animal sound detector using only a small set of audio samples. Previous efforts in this area have utilized different architectures and data augmentation techniques to enhance model performance. However, these approaches have not fully bridged the domain gap between source and target distributions, limiting their applicability in real-world scenarios. In this work, we introduce an new dataset designed to augment the diversity and breadth of classes available for few-shot bioacoustic event detection, building on the foundations of our previous datasets. To establish a robust baseline system tailored for the DCASE 2024 Task 5 challenge, we delve into an array of acoustic features and adopt negative hard sampling as our primary domain adaptation strategy. This approach, chosen in alignment with the challenge's guidelines that necessitate the independent treatment of each audio file, sidesteps the use of transductive learning to ensure compliance while aiming to enhance the system's adaptability to domain shifts. Our experiments show that the proposed baseline system achieves a better performance compared with the vanilla prototypical network. The findings also confirm the effectiveness of each domain adaptation method by ablating different components within the networks. This highlights the potential to improve few-shot bioacoustic sound event detection by further reducing the impact of domain shift.
- P. E. Caiger, M. J. Dean, A. I. DeAngelis, L. T. Hatch, A. N. Rice, J. A. Stanley, C. Tholke, D. R. Zemeckis, and S. M. Van Parijs, “A decade of monitoring atlantic cod gadus morhua spawning aggregations in massachusetts bay using passive acoustics,” Marine Ecology Progress Series, vol. 635, pp. 89–103, 2020.
- S. Gillings and C. Scott, “Nocturnal flight calling behaviour of thrushes in relation to artificial light at night,” Ibis, vol. 163, no. 4, pp. 1379–1393, 2021.
- D. Tuia, B. Kellenberger, S. Beery, B. R. Costelloe, S. Zuffi, B. Risse, A. Mathis, M. W. Mathis, F. Van Langevelde, T. Burghardt et al., “Perspectives in machine learning for wildlife conservation,” Nature communications, vol. 13, no. 1, pp. 1–15, 2022.
- D. Stowell, “Computational bioacoustics with deep learning: a review and roadmap,” PeerJ, vol. 10, p. e13152, 2022.
- B. Ghani, T. Denton, S. Kahl, and H. Klinck, “Global birdsong embeddings enable superior transfer learning for bioacoustic classification,” Scientific Reports, vol. 13, no. 1, p. 22876, 2023.
- J. Hamer, E. Triantafillou, B. van Merrienboer, S. Kahl, H. Klinck, T. Denton, and V. Dumoulin, “Birb: A generalization benchmark for information retrieval in bioacoustics,” arXiv preprint arXiv:2312.07439, 2023.
- M. Boudiaf, T. Denton, B. Van Merriënboer, V. Dumoulin, and E. Triantafillou, “In search for a generalizable method for source free domain adaptation,” in International Conference on Machine Learning. PMLR, 2023, pp. 2914–2931.
- I. Nolasco, S. Singh, V. Morfi, V. Lostanlen, A. Strandburg-Peshkin, E. Vidaña-Vila, L. Gill, H. Pamuła, H. Whitehead, I. Kiskin et al., “Learning to detect an animal sound from five examples,” Ecological informatics, vol. 77, p. 102258, 2023.
- R. Li, J. Liang, and H. Phan, “Few-shot bioacoustic event detection: Enhanced classifiers for prototypical networks,” in Proceedings of the 7th Detection and Classification of Acoustic Scenes and Events 2022 Workshop (DCASE2022), Nancy, France, November 2022.
- J. Liang, H. Phan, and E. Benetos, “Leveraging label hierachies for few-shot everyday sound recognition,” in Proceedings of the 7th Detection and Classification of Acoustic Scenes and Events 2022 Workshop (DCASE2022), Nancy, France, November 2022.
- J. Liang, X. Liu, H. Liu, H. Phan, E. Benetos, M. D. Plumbley, and W. Wang, “Adapting Language-Audio Models as Few-Shot Audio Learners,” in Proc. INTERSPEECH 2023, 2023, pp. 276–280.
- V. Morfi, I. Nolasco, V. Lostanlen, S. Singh, A. Strandburg-Peshkin, D. Benvent, and D. Stowell, “Few-shot bioacoustic event detection: A new task at the dcase 2021 challenge.”
- I. Nolasco, S. Singh, E. Vidana-Villa, E. Grout, J. Morford, M. Emmerson, F. Jensens, H. Whitehead, A. Strandburg-Peshkin et al., “Few-shot bioacoustic event detection at the dcase 2022 challenge,” arXiv preprint arXiv:2207.07911, 2022.
- I. Nolasco, B. Ghani, S. Singh, E. Vidaña-Vila, H. Whitehead, E. Grout, M. Emmerson, F. Jensen, I. Kiskin, J. Morford et al., “Few-shot bioacoustic event detection at the dcase 2023 challenge,” arXiv preprint arXiv:2306.09223, 2023.
- H. Liu, X. Liu, X. Mei, Q. Kong, W. Wang, and M. D. Plumbley, “Surrey system for dcase 2022 task 5 : Few-shot bioacoustic event detection with segment-level metric learning,” DCASE2022 Challenge Technical Report, Tech. Rep., 2022.
- J. Robinson, C.-Y. Chuang, S. Sra, and S. Jegelka, “Contrastive learning with hard negative samples,” arXiv preprint arXiv:2010.04592, 2020.
- A. Arnold, R. Nallapati, and W. W. Cohen, “A comparative study of methods for transductive transfer learning,” in Seventh IEEE international conference on data mining workshops (ICDMW 2007). IEEE, 2007, pp. 77–82.
- J. Snell, K. Swersky, and R. S. Zemel, “Prototypical networks for few-shot learning,” 2017.
- J. Liang, H. Phan, and E. Benetos, “Learning from taxonomy: Multi-label few-shot classification for everyday sound recognition,” in ICASSP 2024 - 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2024, pp. 1–5.