BIRB: A Generalization Benchmark for Information Retrieval in Bioacoustics
Abstract: The ability for a machine learning model to cope with differences in training and deployment conditions--e.g. in the presence of distribution shift or the generalization to new classes altogether--is crucial for real-world use cases. However, most empirical work in this area has focused on the image domain with artificial benchmarks constructed to measure individual aspects of generalization. We present BIRB, a complex benchmark centered on the retrieval of bird vocalizations from passively-recorded datasets given focal recordings from a large citizen science corpus available for training. We propose a baseline system for this collection of tasks using representation learning and a nearest-centroid search. Our thorough empirical evaluation and analysis surfaces open research directions, suggesting that BIRB fills the need for a more realistic and complex benchmark to drive progress on robustness to distribution shifts and generalization of ML models.
- Karen Bakker. The Sounds of Life: How Digital Technology is Bringing Us Closer to the Worlds of Animals and Plants. Princeton University Press, 2022.
- Mixmatch: A holistic approach to semi-supervised learning. Advances in neural information processing systems, 32, 2019.
- In search for a generalizable method for source free domain adaptation. In To appear in the Proceedings of the International Conference on Machine Learning, 2023.
- Signature verification using aā siameseā time delay neural network. Advances in neural information processing systems, 6, 1993.
- Retrieval evaluation with incomplete information. In Proceedings of the 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR ā04, pp.Ā 25ā32, New York, NY, USA, 2004. Association for Computing Machinery. ISBN 1581138814. doi: 10.1145/1008992.1009000. URL https://doi.org/10.1145/1008992.1009000.
- Semi-supervised learning (chapelle, o. et al., eds.; 2006)[book reviews]. IEEE Transactions on Neural Networks, 20(3):542ā542, 2009.
- A closer look at few-shot classification. In Proceedings of the International Conference on Learning Representations, 2019.
- Meta-baseline: Exploring simple meta-learning for few-shot learning. In Proceedings of the IEEE/CVF international conference on computer vision, pp.Ā 9062ā9071, 2021.
- Learning a similarity metric discriminatively, with application to face verification. In 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPRā05), volumeĀ 1, pp.Ā 539ā546. IEEE, 2005.
- An annotated set of audio recordings of Eastern North American birds containing frequency, time, and species information. Dataset on Zenodo, January 2022. URL https://doi.org/10.5061/dryad.d2547d81z.
- A collection of fully-annotated soundscape recordings from the southern Sierra Nevada mountain range. Dataset on Zenodo, January 2023. URL https://doi.org/10.5281/zenodo.7525805.
- Improving bird classification with unsupervised sound separation. In ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp.Ā 636ā640, 2022. doi: 10.1109/ICASSP43922.2022.9747202.
- A unified few-shot classification benchmark to compare transfer and meta learning approaches. In Advances in Neural Information Processing Systems, Datasets and Benchmarks Track, 2021.
- Self-supervised representation learning: Introduction, advances, and challenges. IEEE Signal Processing Magazine, 39(3):42ā62, 2022.
- Head2toe: Utilizing intermediate representations for better transfer learning. In International Conference on Machine Learning, pp.Ā 6009ā6033. PMLR, 2022.
- Model-agnostic meta-learning for fast adaptation of deep networks. In International conference on machine learning, pp.Ā 1126ā1135. PMLR, 2017.
- Norbert Fuhr. Some common mistakes in ir evaluation, and how they can be avoided. SIGIR Forum, 51(3):32ā41, feb 2018. ISSN 0163-5840. doi: 10.1145/3190580.3190586. URL https://doi.org/10.1145/3190580.3190586.
- Audio set: An ontology and human-labeled dataset for audio events. In Proceedings of the International Conference on Acoustics, Speech and Signal Processing, pp.Ā 776ā780, 2017.
- Overview of BirdCLEF 2018: monospecies vs. soundscape bird identification. In Working Notes of CLEF 2018-Conference and Labs of the Evaluation Forum, 2018.
- Conformer: Convolution-augmented transformer for speech recognition. arXiv preprint arXiv:2005.08100, 2020.
- In search of lost domain generalization. arXiv preprint arXiv:2007.01434, 2020.
- A broader study of cross-domain few-shot learning. In Proceedings of the European Conference on Computer Vision, pp.Ā 124ā141, 2020.
- Beans: The benchmark of animal sounds. arXiv preprint arXiv:2210.12300, 2022.
- A collection of fully-annotated soundscape recordings from the Southwestern Amazon Basin. Dataset on Zenodo, September 2022. URL https://doi.org/10.5281/zenodo.7079124.
- Meta-learning in neural networks: A survey. IEEE transactions on pattern analysis and machine intelligence, 44(9):5149ā5169, 2021.
- Masked autoencoders that listen. In NeurIPS, 2022.
- Towards the generalization of contrastive self-supervised learning. In ICLR 2023, 2023.
- BirdNET: A deep learning solution for avian diversity monitoring. Ecological Informatics, 61:101236, 2021.
- A collection of fully-annotated soundscape recordings from the Northeastern United States. Dataset on Zenodo, August 2022a. URL https://doi.org/10.5281/zenodo.7079380.
- Overview of birdclef 2022: Endangered bird species recognition in soundscape recordings. In Proceedings of the Working Notes of CLEF 2022-Conference and Labs of the Evaluation Forum, 2022b.
- A collection of fully-annotated soundscape recordings from the Western United States. Dataset on Zenodo, September 2022c. URL https://doi.org/10.5281/zenodo.7050014.
- Wilds: A benchmark of in-the-wild distribution shifts. In International Conference on Machine Learning, pp.Ā 5637ā5664, 2021.
- Big transfer (bit): General visual representation learning. In Computer VisionāECCV 2020: 16th European Conference, Glasgow, UK, August 23ā28, 2020, Proceedings, Part V 16, pp.Ā 491ā507. Springer, 2020.
- One-shot learning by inverting a compositional causal process. Advances in Neural Information Processing Systems, 26, 2013.
- Integrating new technologies to broaden the scope of northern spotted owl monitoring and linkage with usda forest inventory data. Frontiers in Forests and Global Change, 5, 2022.
- Areas beneath the relative operating characteristics (roc) and relative operating levels (rol) curves: Statistical significance and interpretation. Quarterly Journal of the Royal Meteorological Society, 128(584):2145ā2166, 2002. doi: https://doi.org/10.1256/003590002320603584. URL https://rmets.onlinelibrary.wiley.com/doi/abs/10.1256/003590002320603584.
- A unifying view on dataset shift in classification. Pattern Recognition, 45(1):521ā530, 2012.
- A collection of fully-annotated soundscape recordings from the Island of Hawaiāi. Dataset on Zenodo, September 2022. URL https://doi.org/10.5281/zenodo.7078499.
- A survey on transfer learning. IEEE Transactions on Knowledge and Data Engineering, 22(10):1345ā1359, 2010. doi: 10.1109/TKDE.2009.191.
- Nathan Pieplow. Peterson field guide to bird sounds of western North America. Peterson Field Guides, 2019.
- Optimization as a model for few-shot learning. In International conference on learning representations, 2017.
- The australian acoustic observatory. Methods in Ecology and Evolution, 12(10):1802ā1808, 2021.
- Bioacoustics for species management: two case studies with a hawaiian forest bird. Ecology and evolution, 5(20):4696ā4705, 2015.
- Prototypical networks for few-shot learning. Advances in neural information processing systems, 30, 2017.
- A comprehensive survey of few-shot learning: Evolution, applications, challenges, and opportunities. ACM Computing Surveys, 2023.
- Dan Stowell. Computational bioacoustics with deep learning: A review and roadmap. PeerJ, 10:e13152, 2022.
- Efficientnet: Rethinking model scaling for convolutional neural networks. In International conference on machine learning, pp.Ā 6105ā6114. PMLR, 2019.
- Learning to learn: Introduction and overview. Learning to learn, pp.Ā 3ā17, 1998.
- Meta-Dataset: A dataset of datasets for learning to learn from few examples. Proceedings of the International Conference on Learning Representations, 2020.
- HEAR: Holistic evaluation of audio representations. In NeurIPS 2021 Competitions and Demonstrations Track, pp.Ā 125ā145. PMLR, 2022.
- A collection of fully-annotated soundscape recordings from neotropical coffee farms in Colombia and Costa Rica. Dataset on Zenodo, January 2023. URL https://doi.org/10.5281/zenodo.7525349.
- The xeno-canto collection and its relation to sound recognition and classification. In CLEF (Working Notes), 2015.
- Matching networks for one shot learning. Advances in Neural Information Processing Systems, 29, 2016.
- EllenĀ M. Voorhees. The trec robust retrieval track. SIGIR Forum, 39(1):11ā20, jun 2005. ISSN 0163-5840. doi: 10.1145/1067268.1067272. URL https://doi.org/10.1145/1067268.1067272.
- Generalizing to unseen domains: A survey on domain generalization. IEEE Transactions on Knowledge and Data Engineering, 2022.
- Deep visual domain adaptation: A survey. Neurocomputing, 312:135ā153, 2018.
- Trainable frontend for robust and far-field keyword spotting. In 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp.Ā 5670ā5674. IEEE, 2017.
- mixup: Beyond empirical risk minimization. In Proceedings of the International Conference on Learning Representations, 2017.
- Domain generalization: A survey. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2022.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.