Explain why bird-trained models transfer better than general-audio models in bioacoustics

Identify the causal factors responsible for the superior cross-domain transfer performance of networks pretrained on large bird bioacoustic datasets (e.g., BirdNET, Perch) compared to networks pretrained on general audio datasets (e.g., YAMNet, VGGish), and determine whether this advantage arises from shared bioacoustic signal properties or from the higher acoustic complexity and variety of bird vocalizations.

Background

Empirical results in the paper show that bird-pretrained networks outperform general-audio-pretrained networks on coral reef bioacoustic tasks, echoing findings in prior work.

The authors explicitly state that the underlying reasons for this consistent advantage are not yet established, offering hypotheses such as cross-domain commonalities among bioacoustic signals or the intrinsic diversity and complexity of bird vocalizations.

References

The performance of existing pretrained networks supports similar work in Ghani et al. (2023) which reported networks pretrained on large and diverse bird bioacoustic datasets generalize better to other bioacoustic domains than those pretrained on more general audio. The reasons behind this remain an open research question, this could be due to common properties between signals across bioacoustic domains, or, the high innate acoustic complexity and variety of bird vocalizations compared to AudioSet.

— Leveraging tropical reef, bird and unrelated sounds for superior transfer learning in marine bioacoustics (2404.16436 - Williams et al., 25 Apr 2024) in Discussion

Explain why bird-trained models transfer better than general-audio models in bioacoustics

Background

References

Related Problems