BEANS: The Benchmark of Animal Sounds (2210.12300v1)

Published 21 Oct 2022 in cs.SD and eess.AS

Abstract: The use of ML based techniques has become increasingly popular in the field of bioacoustics over the last years. Fundamental requirements for the successful application of ML based techniques are curated, agreed upon, high-quality datasets and benchmark tasks to be learned on a given dataset. However, the field of bioacoustics so far lacks such public benchmarks which cover multiple tasks and species to measure the performance of ML techniques in a controlled and standardized way and that allows for benchmarking newly proposed techniques to existing ones. Here, we propose BEANS (the BEnchmark of ANimal Sounds), a collection of bioacoustics tasks and public datasets, specifically designed to measure the performance of machine learning algorithms in the field of bioacoustics. The benchmark proposed here consists of two common tasks in bioacoustics: classification and detection. It includes 12 datasets covering various species, including birds, land and marine mammals, anurans, and insects. In addition to the datasets, we also present the performance of a set of standard ML methods as the baseline for task performance. The benchmark and baseline code is made publicly available at \url{https://github.com/earthspecies/beans} in the hope of establishing a new standard dataset for ML-based bioacoustic research.

PDF Abstract

Evaluation and Implications of the BEANS Benchmark in Bioacoustics

The paper entitled "BEANS: The Benchmark of Animal Sounds" presents a much-needed advancement in the field of bioacoustics by establishing a standardized benchmark for evaluating ML techniques. The primary objective of BEANS is to facilitate the assessment of ML algorithms across a shared set of tasks, enabling comparisons that are both objective and reproducible. This benchmark integrates 12 datasets focused on classification and detection tasks, encompassing a diverse array of species such as birds, mammals, anurans, and insects. By making the benchmark and baseline implementations publicly accessible, the authors aim to foster further developments in bioacoustic machine learning methodologies.

Datasets and Tasks

The authors present a well-curated selection of datasets, organized into classification and detection tasks. The classification datasets comprise species identification from various animal sound recordings, while detection tasks are centered on recognizing and isolating vocalization segments within continuous audio streams. The datasets were meticulously chosen based on criteria including availability, difficulty, size, and diversity, ensuring the benchmark is both representative and challenging.

Moreover, auxiliary datasets from the domains of environmental audio and human speech are included to encourage the development of ML models with cross-domain applicability. Such inclusion underscores an acknowledgment of bioacoustics as a part of a broader field of audio processing, where models must generalize across various auditory contexts.

Baseline Performance and Technical Analysis

The authors conducted extensive experiments using both non-deep learning and deep learning models. The baseline results demonstrate that, except for certain tasks, there is significant room for improvement in current methodologies. Among the algorithms tested, VGGish and pretrained ResNet variants achieved superior performance, particularly in classification tasks. However, traditional methods such as support vector machines (SVM) also showed competitive results in specific contexts.

The task of detection was notably more challenging, with several datasets showing lower performance metrics. This can be attributed to the sparsity of vocalization events and fewer annotations per class in detection datasets. The differentiation between detection and classification tasks offers a clear path for methodological innovations, particularly in developing robust methods that handle sparse data and varied acoustic profiles.

Practical and Theoretical Implications

Practically, BEANS provides a well-defined framework to drive the development of generalized models capable of handling diverse and heterogeneous bioacoustic data. The design considerations, such as addressing different sample rates and overlapping vocalizations, highlight significant challenges in the field. The authors recommend methodological innovations such as adaptive sampling methods and sound separation models to tackle these challenges.

Theoretically, the creation of a benchmark like BEANS facilitates structured progress assessment within bioacoustics. It serves as a diagnostic tool for researchers to evaluate their models beyond tailored scenarios, ensuring advancements are not merely incremental but structurally competent. By not advocating for a single performance metric aggregated across all tasks, the benchmark emphasizes a nuanced understanding of model competencies.

Future Directions

Several future developments are suggested by the work on BEANS. The exploration of advanced regularization techniques and their effect on model robustness in detection tasks could yield impactful results. Additionally, addressing biases inherent in dataset representation will be crucial for building truly generalizable models. The integration of newer architectures such as transformers or advancements in few-shot learning for underrepresented classes might further bridge existing performance gaps.

In conclusion, "BEANS: The Benchmark of Animal Sounds" marks a methodologically rigorous step towards enhancing the robustness and reproducibility of ML research in bioacoustics. It sets a precedent for future benchmarks to undertake comprehensive, standardized assessments and encourages a community-driven approach to addressing the many challenges unique to this domain.

PDF Markdown Bookmark Chat (Pro)

Authors (6)

Masato Hagiwara (15 papers)
Benjamin Hoffman (16 papers)
Jen-Yu Liu (14 papers)
Maddie Cusimano (6 papers)
Felix Effenberger (8 papers)
Katie Zacarian (1 paper)

Citations (19)

View on Semantic Scholar

BEANS: The Benchmark of Animal Sounds (2210.12300v1)

Evaluation and Implications of the BEANS Benchmark in Bioacoustics

Datasets and Tasks

Baseline Performance and Technical Analysis

Practical and Theoretical Implications

Future Directions

Related Papers

GitHub

YouTube