Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
126 tokens/sec
GPT-4o
47 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Data-Efficient Low-Complexity Acoustic Scene Classification in the DCASE 2024 Challenge (2405.10018v2)

Published 16 May 2024 in eess.AS and cs.SD

Abstract: This article describes the Data-Efficient Low-Complexity Acoustic Scene Classification Task in the DCASE 2024 Challenge and the corresponding baseline system. The task setup is a continuation of previous editions (2022 and 2023), which focused on recording device mismatches and low-complexity constraints. This year's edition introduces an additional real-world problem: participants must develop data-efficient systems for five scenarios, which progressively limit the available training data. The provided baseline system is based on an efficient, factorized CNN architecture constructed from inverted residual blocks and uses Freq-MixStyle to tackle the device mismatch problem. The task received 37 submissions from 17 teams, with the large majority of systems outperforming the baseline. The top-ranked system's accuracy ranges from 54.3% on the smallest to 61.8% on the largest subset, corresponding to relative improvements of approximately 23% and 9% over the baseline system on the evaluation set.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (20)
  1. E. Benetos, D. Stowell, and M. D. Plumbley, “Approaches to complex sound scene analysis,” in Cham: Springer International Publishing, 2018, pp. 215–242.
  2. A. Mesaros, T. Heittola, and T. Virtanen, “Acoustic scene classification in DCASE 2019 challenge: Closed and open set classification and data mismatch setups,” in Proceedings of the Workshop on Detection and Classification of Acoustic Scenes and Events, 2019, pp. 164–168.
  3. T. Heittola, A. Mesaros, and T. Virtanen, “Acoustic scene classification in DCASE 2020 challenge: Generalization across devices and low complexity solutions,” in Proceedings of 5th the Workshop on Detection and Classification of Acoustic Scenes and Events, 2020, pp. 56–60.
  4. I. Martín-Morató, T. Heittola, A. Mesaros, and T. Virtanen, “Low-complexity acoustic scene classification for multi-device audio: Analysis of DCASE 2021 challenge systems,” in Proceedings of the 6th Workshop on Detection and Classification of Acoustic Scenes and Events, 2021, pp. 85–89.
  5. I. Martín-Morató, F. Paissan, A. Ancilotto, T. Heittola, A. Mesaros, E. Farella, A. Brutti, and T. Virtanen, “Low-complexity acoustic scene classification in DCASE 2022 challenge,” in Proceedings of the 7th Workshop on Detection and Classification of Acoustic Scenes and Events, 2022.
  6. A. Mesaros, T. Heittola, and T. Virtanen, “A multi-device dataset for urban acoustic scene classification,” in Proceedings of the Workshop on Detection and Classification of Acoustic Scenes and Events, 2018, pp. 9–13.
  7. C.-H. H. Yang, H. Hu, S. M. Siniscalchi, Q. Wang, W. Yuyang, X. Xia, Y. Zhao, Y. Wu, Y. Wang, J. Du, and C.-H. Lee, “A lottery ticket hypothesis framework for low-complexity device-robust neural acoustic scene classification,” DCASE2021 Challenge, Tech. Rep., 2021.
  8. K. Koutini, S. Jan, and G. Widmer, “Cpjku submission to dcase21: Cross-device audio scene classification with wide sparse frequency-damped CNNs,” DCASE2021 Challenge, Tech. Rep., 2021.
  9. B. Kim, S. Yang, J. Kim, and S. Chang, “QTI submission to DCASE 2021: Residual normalization for device-imbalanced acoustic scene classification with efficient design,” DCASE2021 Challenge, Tech. Rep., 2021.
  10. F. Schmid, S. Masoudian, K. Koutini, and G. Widmer, “CP-JKU submission to dcase22: Distilling knowledge for low-complexity convolutional neural networks from a patchout audio transformer,” DCASE2022 Challenge, Tech. Rep., 2022.
  11. J. Tan and Y. Li, “Low-complexity acoustic scene classification using blueprint separable convolution and knowledge distillation,” DCASE2023 Challenge, Tech. Rep., 2023.
  12. Y. Cai, M. Lin, C. Zhu, S. Li, and X. Shao, “Dcase2023 task1 submission: Device simulation and time-frequency separable convolution for acoustic scene classification,” DCASE2023 Challenge, Tech. Rep., 2023.
  13. F. Schmid, T. Morocutti, S. Masoudian, K. Koutini, and G. Widmer, “CP-JKU submission to dcase23: Efficient acoustic scene classification with cp-mobile,” DCASE2023 Challenge, Tech. Rep., May 2023.
  14. K. Koutini, F. Henkel, H. Eghbal-zadeh, and G. Widmer, “CP-JKU submissions to DCASE’20: Low-complexity cross-device acoustic scene classification with RF-regularized CNNs,” DCASE2020 Challenge, Tech. Rep., 2020.
  15. J.-H. Lee, J.-H. Choi, P. M. Byun, and J.-H. Chang, “Hyu submission for the DCASE 2022: Efficient fine-tuning method using device-aware data-random-drop for device-imbalanced acoustic scene classification,” DCASE2022 Challenge, Tech. Rep., 2022.
  16. T. Morocutti, F. Schmid, K. Koutini, and G. Widmer, “Device-robust acoustic scene classification via impulse response augmentation,” in 31st European Signal Processing Conference, 2023, pp. 176–180.
  17. J. F. Gemmeke, D. P. W. Ellis, D. Freedman, A. Jansen, W. Lawrence, R. C. Moore, M. Plakal, and M. Ritter, “Audio set: An ontology and human-labeled dataset for audio events,” in 2017 IEEE International Conference on Acoustics, Speech and Signal Processing, 2017, pp. 776–780.
  18. E. Fonseca, X. Favory, J. Pons, F. Font, and X. Serra, “FSD50K: an open dataset of human-labeled sound events,” IEEE ACM Trans. Audio Speech Lang. Process., vol. 30, pp. 829–852, 2022.
  19. F. Schmid, T. Morocutti, S. Masoudian, K. Koutini, and G. Widmer, “Distilling the knowledge of transformers and CNNs with CP-mobile,” in Proceedings of the 8th Detection and Classification of Acoustic Scenes and Events, 2023, pp. 161–165.
  20. I. Loshchilov and F. Hutter, “Decoupled weight decay regularization,” in 7th International Conference on Learning Representations, 2019.
Citations (5)

Summary

We haven't generated a summary for this paper yet.