Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
110 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

How Low Can You Go? Reducing Frequency and Time Resolution in Current CNN Architectures for Music Auto-tagging (1911.04824v3)

Published 12 Nov 2019 in cs.IR, cs.SD, and eess.AS

Abstract: Automatic tagging of music is an important research topic in Music Information Retrieval and audio analysis algorithms proposed for this task have achieved improvements with advances in deep learning. In particular, many state-of-the-art systems use Convolutional Neural Networks and operate on mel-spectrogram representations of the audio. In this paper, we compare commonly used mel-spectrogram representations and evaluate model performances that can be achieved by reducing the input size in terms of both lesser amount of frequency bands and larger frame rates. We use the MagnaTagaTune dataset for comprehensive performance comparisons and then compare selected configurations on the larger Million Song Dataset. The results of this study can serve researchers and practitioners in their trade-off decision between accuracy of the models, data storage size and training and inference times.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (5)
  1. Andres Ferraro (17 papers)
  2. Dmitry Bogdanov (18 papers)
  3. Xavier Serra (82 papers)
  4. Jay Ho Jeon (1 paper)
  5. Jason Yoon (2 papers)
Citations (3)

Summary

We haven't generated a summary for this paper yet.