Distilling Spectrograms into Tokens: Fast and Lightweight Bioacoustic Classification for BirdCLEF+ 2025 (2507.08236v1)

Published 11 Jul 2025 in cs.SD and eess.AS

Abstract: The BirdCLEF+ 2025 challenge requires classifying 206 species, including birds, mammals, insects, and amphibians, from soundscape recordings under a strict 90-minute CPU-only inference deadline, making many state-of-the-art deep learning approaches impractical. To address this constraint, the DS@GT BirdCLEF team explored two strategies. First, we establish competitive baselines by optimizing pre-trained models from the Bioacoustics Model Zoo for CPU inference. Using TFLite, we achieved a nearly 10x inference speedup for the Perch model, enabling it to run in approximately 16 minutes and achieve a final ROC-AUC score of 0.729 on the public leaderboard post-competition and 0.711 on the private leaderboard. The best model from the zoo was BirdSetEfficientNetB1, with a public score of 0.810 and a private score of 0.778. Second, we introduce a novel, lightweight pipeline named Spectrogram Token Skip-Gram (STSG) that treats bioacoustics as a sequence modeling task. This method converts audio into discrete "spectrogram tokens" by clustering Mel-spectrograms using Faiss K-means and then learns high-quality contextual embeddings for these tokens in an unsupervised manner with a Word2Vec skip-gram model. For classification, embeddings within a 5-second window are averaged and passed to a linear model. With a projected inference time of 6 minutes for a 700-minute test set, the STSG approach achieved a final ROC-AUC public score of 0.559 and a private score of 0.520, demonstrating the viability of fast tokenization approaches with static embeddings for bioacoustic classification. Supporting code for this paper can be found at https://github.com/dsgt-arc/birdclef-2025.

Summary

We haven't generated a summary for this paper yet.

Summarize Now

GitHub

GitHub - dsgt-arc/birdclef-2025 (2 stars)

Tweets

https://twitter.com/ArxivSound/status/1944623136244428804

Distilling Spectrograms into Tokens: Fast and Lightweight Bioacoustic Classification for BirdCLEF+ 2025 (2507.08236v1)

Summary

Related Papers

GitHub

Tweets