BIRB: A Generalization Benchmark for Information Retrieval in Bioacoustics (2312.07439v2)
Abstract: The ability for a machine learning model to cope with differences in training and deployment conditions--e.g. in the presence of distribution shift or the generalization to new classes altogether--is crucial for real-world use cases. However, most empirical work in this area has focused on the image domain with artificial benchmarks constructed to measure individual aspects of generalization. We present BIRB, a complex benchmark centered on the retrieval of bird vocalizations from passively-recorded datasets given focal recordings from a large citizen science corpus available for training. We propose a baseline system for this collection of tasks using representation learning and a nearest-centroid search. Our thorough empirical evaluation and analysis surfaces open research directions, suggesting that BIRB fills the need for a more realistic and complex benchmark to drive progress on robustness to distribution shifts and generalization of ML models.
- Karen Bakker. The Sounds of Life: How Digital Technology is Bringing Us Closer to the Worlds of Animals and Plants. Princeton University Press, 2022.
- Mixmatch: A holistic approach to semi-supervised learning. Advances in neural information processing systems, 32, 2019.
- In search for a generalizable method for source free domain adaptation. In To appear in the Proceedings of the International Conference on Machine Learning, 2023.
- Signature verification using a” siamese” time delay neural network. Advances in neural information processing systems, 6, 1993.
- Retrieval evaluation with incomplete information. In Proceedings of the 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR ’04, pp. 25–32, New York, NY, USA, 2004. Association for Computing Machinery. ISBN 1581138814. doi: 10.1145/1008992.1009000. URL https://doi.org/10.1145/1008992.1009000.
- Semi-supervised learning (chapelle, o. et al., eds.; 2006)[book reviews]. IEEE Transactions on Neural Networks, 20(3):542–542, 2009.
- A closer look at few-shot classification. In Proceedings of the International Conference on Learning Representations, 2019.
- Meta-baseline: Exploring simple meta-learning for few-shot learning. In Proceedings of the IEEE/CVF international conference on computer vision, pp. 9062–9071, 2021.
- Learning a similarity metric discriminatively, with application to face verification. In 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’05), volume 1, pp. 539–546. IEEE, 2005.
- An annotated set of audio recordings of Eastern North American birds containing frequency, time, and species information. Dataset on Zenodo, January 2022. URL https://doi.org/10.5061/dryad.d2547d81z.
- A collection of fully-annotated soundscape recordings from the southern Sierra Nevada mountain range. Dataset on Zenodo, January 2023. URL https://doi.org/10.5281/zenodo.7525805.
- Improving bird classification with unsupervised sound separation. In ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 636–640, 2022. doi: 10.1109/ICASSP43922.2022.9747202.
- A unified few-shot classification benchmark to compare transfer and meta learning approaches. In Advances in Neural Information Processing Systems, Datasets and Benchmarks Track, 2021.
- Self-supervised representation learning: Introduction, advances, and challenges. IEEE Signal Processing Magazine, 39(3):42–62, 2022.
- Head2toe: Utilizing intermediate representations for better transfer learning. In International Conference on Machine Learning, pp. 6009–6033. PMLR, 2022.
- Model-agnostic meta-learning for fast adaptation of deep networks. In International conference on machine learning, pp. 1126–1135. PMLR, 2017.
- Norbert Fuhr. Some common mistakes in ir evaluation, and how they can be avoided. SIGIR Forum, 51(3):32–41, feb 2018. ISSN 0163-5840. doi: 10.1145/3190580.3190586. URL https://doi.org/10.1145/3190580.3190586.
- Audio set: An ontology and human-labeled dataset for audio events. In Proceedings of the International Conference on Acoustics, Speech and Signal Processing, pp. 776–780, 2017.
- Overview of BirdCLEF 2018: monospecies vs. soundscape bird identification. In Working Notes of CLEF 2018-Conference and Labs of the Evaluation Forum, 2018.
- Conformer: Convolution-augmented transformer for speech recognition. arXiv preprint arXiv:2005.08100, 2020.
- In search of lost domain generalization. arXiv preprint arXiv:2007.01434, 2020.
- A broader study of cross-domain few-shot learning. In Proceedings of the European Conference on Computer Vision, pp. 124–141, 2020.
- Beans: The benchmark of animal sounds. arXiv preprint arXiv:2210.12300, 2022.
- A collection of fully-annotated soundscape recordings from the Southwestern Amazon Basin. Dataset on Zenodo, September 2022. URL https://doi.org/10.5281/zenodo.7079124.
- Meta-learning in neural networks: A survey. IEEE transactions on pattern analysis and machine intelligence, 44(9):5149–5169, 2021.
- Masked autoencoders that listen. In NeurIPS, 2022.
- Towards the generalization of contrastive self-supervised learning. In ICLR 2023, 2023.
- BirdNET: A deep learning solution for avian diversity monitoring. Ecological Informatics, 61:101236, 2021.
- A collection of fully-annotated soundscape recordings from the Northeastern United States. Dataset on Zenodo, August 2022a. URL https://doi.org/10.5281/zenodo.7079380.
- Overview of birdclef 2022: Endangered bird species recognition in soundscape recordings. In Proceedings of the Working Notes of CLEF 2022-Conference and Labs of the Evaluation Forum, 2022b.
- A collection of fully-annotated soundscape recordings from the Western United States. Dataset on Zenodo, September 2022c. URL https://doi.org/10.5281/zenodo.7050014.
- Wilds: A benchmark of in-the-wild distribution shifts. In International Conference on Machine Learning, pp. 5637–5664, 2021.
- Big transfer (bit): General visual representation learning. In Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 491–507. Springer, 2020.
- One-shot learning by inverting a compositional causal process. Advances in Neural Information Processing Systems, 26, 2013.
- Integrating new technologies to broaden the scope of northern spotted owl monitoring and linkage with usda forest inventory data. Frontiers in Forests and Global Change, 5, 2022.
- Areas beneath the relative operating characteristics (roc) and relative operating levels (rol) curves: Statistical significance and interpretation. Quarterly Journal of the Royal Meteorological Society, 128(584):2145–2166, 2002. doi: https://doi.org/10.1256/003590002320603584. URL https://rmets.onlinelibrary.wiley.com/doi/abs/10.1256/003590002320603584.
- A unifying view on dataset shift in classification. Pattern Recognition, 45(1):521–530, 2012.
- A collection of fully-annotated soundscape recordings from the Island of Hawai’i. Dataset on Zenodo, September 2022. URL https://doi.org/10.5281/zenodo.7078499.
- A survey on transfer learning. IEEE Transactions on Knowledge and Data Engineering, 22(10):1345–1359, 2010. doi: 10.1109/TKDE.2009.191.
- Nathan Pieplow. Peterson field guide to bird sounds of western North America. Peterson Field Guides, 2019.
- Optimization as a model for few-shot learning. In International conference on learning representations, 2017.
- The australian acoustic observatory. Methods in Ecology and Evolution, 12(10):1802–1808, 2021.
- Bioacoustics for species management: two case studies with a hawaiian forest bird. Ecology and evolution, 5(20):4696–4705, 2015.
- Prototypical networks for few-shot learning. Advances in neural information processing systems, 30, 2017.
- A comprehensive survey of few-shot learning: Evolution, applications, challenges, and opportunities. ACM Computing Surveys, 2023.
- Dan Stowell. Computational bioacoustics with deep learning: A review and roadmap. PeerJ, 10:e13152, 2022.
- Efficientnet: Rethinking model scaling for convolutional neural networks. In International conference on machine learning, pp. 6105–6114. PMLR, 2019.
- Learning to learn: Introduction and overview. Learning to learn, pp. 3–17, 1998.
- Meta-Dataset: A dataset of datasets for learning to learn from few examples. Proceedings of the International Conference on Learning Representations, 2020.
- HEAR: Holistic evaluation of audio representations. In NeurIPS 2021 Competitions and Demonstrations Track, pp. 125–145. PMLR, 2022.
- A collection of fully-annotated soundscape recordings from neotropical coffee farms in Colombia and Costa Rica. Dataset on Zenodo, January 2023. URL https://doi.org/10.5281/zenodo.7525349.
- The xeno-canto collection and its relation to sound recognition and classification. In CLEF (Working Notes), 2015.
- Matching networks for one shot learning. Advances in Neural Information Processing Systems, 29, 2016.
- Ellen M. Voorhees. The trec robust retrieval track. SIGIR Forum, 39(1):11–20, jun 2005. ISSN 0163-5840. doi: 10.1145/1067268.1067272. URL https://doi.org/10.1145/1067268.1067272.
- Generalizing to unseen domains: A survey on domain generalization. IEEE Transactions on Knowledge and Data Engineering, 2022.
- Deep visual domain adaptation: A survey. Neurocomputing, 312:135–153, 2018.
- Trainable frontend for robust and far-field keyword spotting. In 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 5670–5674. IEEE, 2017.
- mixup: Beyond empirical risk minimization. In Proceedings of the International Conference on Learning Representations, 2017.
- Domain generalization: A survey. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2022.