Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
169 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Sound event localization and detection based on crnn using rectangular filters and channel rotation data augmentation (2010.06422v1)

Published 13 Oct 2020 in eess.AS and cs.SD

Abstract: Sound Event Localization and Detection refers to the problem of identifying the presence of independent or temporally-overlapped sound sources, correctly identifying to which sound class it belongs, estimating their spatial directions while they are active. In the last years, neural networks have become the prevailing method for sound Event Localization and Detection task, with convolutional recurrent neural networks being among the most used systems. This paper presents a system submitted to the Detection and Classification of Acoustic Scenes and Events 2020 Challenge Task 3. The algorithm consists of a convolutional recurrent neural network using rectangular filters, specialized in recognizing significant spectral features related to the task. In order to further improve the score and to generalize the system performance to unseen data, the training dataset size has been increased using data augmentation. The technique used for that is based on channel rotations and reflection on the xy plane in the First Order Ambisonic domain, which allows improving Direction of Arrival labels keeping the physical relationships between channels. Evaluation results on the development dataset show that the proposed system outperforms the baseline results, considerably improving Error Rate and F-score for location-aware detection.

Citations (8)

Summary

We haven't generated a summary for this paper yet.