Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
97 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Deep Learning Frameworks Applied For Audio-Visual Scene Classification (2106.06840v1)

Published 12 Jun 2021 in cs.SD and eess.AS

Abstract: In this paper, we present deep learning frameworks for audio-visual scene classification (SC) and indicate how individual visual and audio features as well as their combination affect SC performance. Our extensive experiments, which are conducted on DCASE (IEEE AASP Challenge on Detection and Classification of Acoustic Scenes and Events) Task 1B development dataset, achieve the best classification accuracy of 82.2%, 91.1%, and 93.9% with audio input only, visual input only, and both audio-visual input, respectively. The highest classification accuracy of 93.9%, obtained from an ensemble of audio-based and visual-based frameworks, shows an improvement of 16.5% compared with DCASE baseline.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (6)
  1. Lam Pham (49 papers)
  2. Alexander Schindler (33 papers)
  3. Mina Schütz (2 papers)
  4. Jasmin Lampert (5 papers)
  5. Sven Schlarb (2 papers)
  6. Ross King (7 papers)
Citations (9)

Summary

We haven't generated a summary for this paper yet.