Computational bioacoustics with deep learning: a review and roadmap (2112.06725v1)

Published 13 Dec 2021 in cs.SD, eess.AS, and q-bio.QM

Abstract: Animal vocalisations and natural soundscapes are fascinating objects of study, and contain valuable evidence about animal behaviours, populations and ecosystems. They are studied in bioacoustics and ecoacoustics, with signal processing and analysis an important component. Computational bioacoustics has accelerated in recent decades due to the growth of affordable digital sound recording devices, and to huge progress in informatics such as big data, signal processing and machine learning. Methods are inherited from the wider field of deep learning, including speech and image processing. However, the tasks, demands and data characteristics are often different from those addressed in speech or music analysis. There remain unsolved problems, and tasks for which evidence is surely present in many acoustic signals, but not yet realised. In this paper I perform a review of the state of the art in deep learning for computational bioacoustics, aiming to clarify key concepts and identify and analyse knowledge gaps. Based on this, I offer a subjective but principled roadmap for computational bioacoustics with deep learning: topics that the community should aim to address, in order to make the most of future developments in AI and informatics, and to use audio data in answering zoological and ecological questions.

PDF Abstract

Overview of Deep Learning in Computational Bioacoustics

The paper “Computational Bioacoustics with Deep Learning: a Review and Roadmap” provides a detailed examination of how deep learning (DL) methods are employed in the field of computational bioacoustics. It elucidates the current state of the art, analyzes areas with existing knowledge gaps, and suggests a path for future exploration and development. The author, Dan Stowell, systematically reviews research developments, establishes a taxonomy of bioacoustic tasks being tackled using DL, and provides a strategic roadmap for ongoing and future research.

The advent of affordable digital recording devices, coupled with advances in machine learning, has propelled the adoption of DL in computational bioacoustics. This paper dissects how bioacoustics benefits from DL, with emphasis on species classification, acoustic signal processing, and sound event detection. The review methodically covers literature published from 2016 onwards, highlighting the rapid integration and the recent dominance of DL in this domain.

Current State and Techniques

The paper outlines a “standard recipe” that many researchers follow in applying DL to bioacoustics: using off-the-shelf convolutional neural networks (CNNs) such as ResNet, VGGish, or Inception; employing spectrograms as input data; and leveraging tools for data augmentation to expand training datasets artificially. There is also a noted reliance on transfer learning by pretraining on large datasets like AudioSet.

The author highlights CNNs as the primary architecture in use, with some studies exploring recurrent neural networks (RNNs) and temporal CNNs (TCNs) for handling sequential audio data. Attention is also given to acoustic features, with discussions on using raw waveforms, spectrogram variations such as mel or log-frequency, and preprocessing options like per-channel energy normalization (PCEN).

Challenges and Future Research Directions

Despite advancements, the paper identifies several challenges that remain unresolved. The author calls for enhanced methodologies in domains such as few-shot and transfer learning, active learning, and sim-to-real techniques. Addressing these could alleviate the domain’s central issue of small and unbalanced datasets.

Research on individual identification (ID) is singled out as a high-priority area. Four speculated futures are proposed to refine DL’s application across tasks such as segmentation, object detection within spectrograms, and spatial acoustics. Implicitly, the author invites exploration into how DL can better model nuanced vocal interactions reflective of complex animal behaviors.

The discussion on equal representation stresses the necessity for dataset diversity, taxonomically and geographically, to avoid biases prevalent in existing data. Furthermore, the significance of context and auxiliary information for improving DL’s reliability in varying environmental conditions is highlighted.

Integration and Application

Practically, the application of DL is expanding into user interfaces and devices capable of on-device DL tasks through smaller capsule networks. These interventions can optimize workflows, allowing conservationists and ecologists to make real-time, data-driven decisions without exhaustive computational overheads.

A poignant point is made about the need for recalibration of DL outputs to fit conservation objectives, which often require more than binary decisions — emphasizing the need for probabilistic frameworks and the reconciliation of DL outputs with established ecological models.

Conclusion

The paper provides a comprehensive review of the use of DL in computational bioacoustics, offering critical insights into both the potential and the drawbacks of current methodologies. Through a foundational understanding of current practices, challenges, and a forward-looking roadmap, Stowell’s review facilitates a deeper appreciation and understanding of DL’s transformative potential while candidly addressing the strategic areas that require further research and development. The suggested roadmap outlines a promising expansion and application of DL across diverse bioacoustic challenges, highlighting the trajectory of artificial intelligence as a profound tool for ecological and conservation research.

PDF Markdown Bookmark Chat (Pro)

Authors (1)

Dan Stowell (51 papers)

Citations (201)

View on Semantic Scholar