Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
167 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Self-Supervised Learning for Few-Shot Bird Sound Classification (2312.15824v4)

Published 25 Dec 2023 in cs.SD, cs.LG, and eess.AS

Abstract: Self-supervised learning (SSL) in audio holds significant potential across various domains, particularly in situations where abundant, unlabeled data is readily available at no cost. This is pertinent in bioacoustics, where biologists routinely collect extensive sound datasets from the natural environment. In this study, we demonstrate that SSL is capable of acquiring meaningful representations of bird sounds from audio recordings without the need for annotations. Our experiments showcase that these learned representations exhibit the capacity to generalize to new bird species in few-shot learning (FSL) scenarios. Additionally, we show that selecting windows with high bird activation for self-supervised learning, using a pretrained audio neural network, significantly enhances the quality of the learned representations.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (21)
  1. Dan Stowell, “Computational bioacoustics with deep learning: a review and roadmap,” PeerJ, vol. 10, pp. e13152, 2022.
  2. “Overview of birdclef 2020: Bird sound recognition in complex acoustic environments,” in Conference and Labs of the Evaluation Forum, 2020.
  3. “BirdNET: A deep learning solution for avian diversity monitoring,” Ecological Informatics, 2021.
  4. “Learning to detect an animal sound from five examples,” Ecological Informatics, 2023.
  5. “Global birdsong embeddings enable superior transfer learning for bioacoustic classification,” arXiv:2307.06292, 2023.
  6. “Metaaudio: A few-shot audio classification benchmark,” in International Conference on Artificial Neural Networks, 2022.
  7. “A simple framework for contrastive learning of visual representations,” in International conference on machine learning, 2020.
  8. “Barlow twins: Self-supervised learning via redundancy reduction,” in International Conference on Machine Learning, 2021.
  9. “FroSSL: Frobenius Norm Minimization for Self-Supervised Learning,” arXiv:2310.02903, 2023.
  10. “Masked siamese networks for label-efficient learning,” in European Conference on Computer Vision, 2022.
  11. “Dinov2: Learning robust visual features without supervision,” arXiv:2304.07193, 2023.
  12. “Panns: Large-scale pretrained audio neural networks for audio pattern recognition,” IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2020.
  13. “Supervised contrastive learning,” Advances in neural information processing systems, 2020.
  14. “Regularized Contrastive Pre-training for Few-shot Bioacoustic Sound Detection,” in ICASSP IEEE International Conference on Acoustics, Speech and Signal Processing, 2024.
  15. “Contrastive learning of general-purpose audio representations,” in ICASSP IEEE International Conference on Acoustics, Speech and Signal Processing, 2021.
  16. “Unsupervised learning of semantic audio representations,” in ICASSP IEEE international conference on acoustics, speech and signal processing, 2018.
  17. “Unsupervised contrastive learning of sound event representations,” in ICASSP IEEE International Conference on Acoustics, Speech and Signal Processing, 2021.
  18. “BYOL for audio: Exploring pre-trained general-purpose audio representations,” IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2022.
  19. “Data augmentation approaches for improving animal audio classification,” Ecological Informatics, 2020.
  20. “Specaugment on large scale datasets,” in ICASSP IEEE International Conference on Acoustics, Speech and Signal Processing, 2020.
  21. “Understanding dimensional collapse in contrastive self-supervised learning,” arXiv:2110.09348, 2021.
Citations (6)

Summary

We haven't generated a summary for this paper yet.