Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
169 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Classifying Variable-Length Audio Files with All-Convolutional Networks and Masked Global Pooling (1607.02857v1)

Published 11 Jul 2016 in cs.NE, cs.LG, cs.MM, and cs.SD

Abstract: We trained a deep all-convolutional neural network with masked global pooling to perform single-label classification for acoustic scene classification and multi-label classification for domestic audio tagging in the DCASE-2016 contest. Our network achieved an average accuracy of 84.5% on the four-fold cross-validation for acoustic scene recognition, compared to the provided baseline of 72.5%, and an average equal error rate of 0.17 for domestic audio tagging, compared to the baseline of 0.21. The network therefore improves the baselines by a relative amount of 17% and 19%, respectively. The network only consists of convolutional layers to extract features from the short-time Fourier transform and one global pooling layer to combine those features. It particularly possesses neither fully-connected layers, besides the fully-connected output layer, nor dropout layers.

Citations (18)

Summary

We haven't generated a summary for this paper yet.