Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
162 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Onset and offset weighted loss function for sound event detection (2403.13254v1)

Published 20 Mar 2024 in cs.SD and eess.AS

Abstract: In a typical sound event detection (SED) system, the existence of a sound event is detected at a frame level, and consecutive frames with the same event detected are combined as one sound event. The median filter is applied as a post-processing step to remove detection errors as much as possible. However, detection errors occurring around the onset and offset of a sound event are beyond the capacity of the median filter. To address this issue, an onset and offset weighted binary cross-entropy (OWBCE) loss function is proposed in this paper, which trains the DNN model to be more robust on frames around (a) onsets and offsets. Experiments are carried out in the context of DCASE 2022 task 4. Results show that OWBCE outperforms BCE when different models are considered. For a basic CRNN, relative improvements of 6.43% in event-F1, 1.96% in PSDS1, and 2.43% in PSDS2 can be achieved by OWBCE.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (17)
  1. N. Takahashi, M. Gygli, and L. Van Gool, “Aenet: Learning deep audio features for video analysis,” IEEE Transactions on Multimedia, vol. 20, no. 3, pp. 513–524, 2018.
  2. A. Vafeiadis, K. Votis, D. Giakoumis, D. Tzovaras, L. Chen, and R. Hamzaoui, “Audio content analysis for unobtrusive event detection in smart homes,” Engineering Applications of Artificial Intelligence, vol. 89, p. 103226, 2020.
  3. K. Manikanta, K. Soman, and M. S. Manikandan, “Deep learning based effective baby crying recognition method under indoor background sound environments,” in International Conference on Computational Systems and Information Technology for Sustainable Solution (CSITSS), 2019, pp. 1–6.
  4. Z. Mnasri, S. Rovetta, and F. Masulli, “Anomalous sound event detection: A survey of machine learning based methods and applications,” Multimedia Tools and Applications, vol. 81, no. 4, pp. 5537–5586, 2022.
  5. A. Mesaros, T. Heittola, T. Virtanen, and M. D. Plumbley, “Sound event detection: A tutorial,” IEEE Signal Processing Magazine, vol. 38, no. 5, pp. 67–83, 2021.
  6. E. Çakır, G. Parascandolo, T. Heittola, H. Huttunen, and T. Virtanen, “Convolutional recurrent neural networks for polyphonic sound event detection,” IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 25, no. 6, pp. 1291–1303, 2017.
  7. J. Ebbers and R. Haeb-Umbach, “Self-trained audio tagging and sound event detection in domestic environments,” in Proceedings of the 6th Detection and Classification of Acoustic Scenes and Events 2021 Workshop (DCASE2021), 2021, p. 226–230.
  8. H. Nam, S. Kim, B. Ko, and Y. Park, “Frequency dynamic convolution: Frequency-adaptive pattern recognition for sound event detection,” in Interspeech, Incheon, Korea, Sep. 2022, pp. 2763–2767.
  9. S. Park and M. Elhilali, “Time-balanced focal loss for audio event detection,” in International Conference on Acoustics, Speech and Signal Processing (ICASSP), Singapore, Singapore, Apr. 2022, pp. 311–315.
  10. Y. Kiyokawa, S. Mishima, T. Toizumi, K. Sagi, R. Kondo, and T. Nomura, “Sound event detection with resnet and self-mask module for dcase 2019 task 4,” Tech. Rep., 2019.
  11. K. Imoto, S. Mishima, Y. Arai, and R. Kondo, “Impact of sound duration and inactive frames on sound event detection performance,” in International Conference on Acoustics, Speech and Signal Processing (ICASSP), Toronto, Canada, May 2021, pp. 860–864.
  12. R. Serizel, N. Turpault, A. Shah, and J. Salamon, “Sound event detection in synthetic domestic environments,” in International Conference on Acoustics, Speech, and Signal Processing, Barcelona, Spain, May 2020.
  13. H. Zhang, M. Cissé, Y. N. Dauphin, and D. Lopez-Paz, “mixup: Beyond empirical risk minimization,” in International Conference on Learning Representations (ICLR), 2018.
  14. H. Nam, S.-H. Kim, and Y.-H. Park, “Filteraugment: An acoustic environmental data augmentation method,” in International Conference on Acoustics, Speech and Signal Processing (ICASSP), Singapore, Singapore, Apr. 2022, pp. 4308–4312.
  15. K. He, X. Shu, S. Jia, and Y. He, “Semi-Supervised Sound Event Detection System For Dcase 2022 Task 4,” Tech. Rep., 2022.
  16. Ç. Bilen, G. Ferroni, F. Tuveri, J. Azcarreta, and S. Krstulović, “A framework for the robust evaluation of sound event detection,” in International Conference on Acoustics, Speech and Signal Processing (ICASSP), Barcelona, Spain, Apr. 2020, pp. 61–65.
  17. “DCASE 2022 task 4: Task description,” https://dcase.community/challenge2022/task-sound-event-detection-in-domestic-environments.

Summary

We haven't generated a summary for this paper yet.