Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

An empirical study of weakly supervised audio tagging embeddings for general audio representations (2209.15167v1)

Published 30 Sep 2022 in cs.SD and eess.AS

Abstract: We study the usability of pre-trained weakly supervised audio tagging (AT) models as feature extractors for general audio representations. We mainly analyze the feasibility of transferring those embeddings to other tasks within the speech and sound domains. Specifically, we benchmark weakly supervised pre-trained models (MobileNetV2 and EfficientNet-B0) against modern self-supervised learning methods (BYOL-A) as feature extractors. Fourteen downstream tasks are used for evaluation ranging from music instrument classification to language classification. Our results indicate that AT pre-trained models are an excellent transfer learning choice for music, event, and emotion recognition tasks. Further, finetuning AT models can also benefit speech-related tasks such as keyword spotting and intent classification.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (5)
  1. Heinrich Dinkel (29 papers)
  2. Zhiyong Yan (16 papers)
  3. Yongqing Wang (29 papers)
  4. Junbo Zhang (84 papers)
  5. Yujun Wang (61 papers)
Citations (1)

Summary

We haven't generated a summary for this paper yet.