Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
102 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Ambient Sound Helps: Audiovisual Crowd Counting in Extreme Conditions (2005.07097v2)

Published 14 May 2020 in cs.CV

Abstract: Visual crowd counting has been recently studied as a way to enable people counting in crowd scenes from images. Albeit successful, vision-based crowd counting approaches could fail to capture informative features in extreme conditions, e.g., imaging at night and occlusion. In this work, we introduce a novel task of audiovisual crowd counting, in which visual and auditory information are integrated for counting purposes. We collect a large-scale benchmark, named auDiovISual Crowd cOunting (DISCO) dataset, consisting of 1,935 images and the corresponding audio clips, and 170,270 annotated instances. In order to fuse the two modalities, we make use of a linear feature-wise fusion module that carries out an affine transformation on visual and auditory features. Finally, we conduct extensive experiments using the proposed dataset and approach. Experimental results show that introducing auditory information can benefit crowd counting under different illumination, noise, and occlusion conditions. The dataset and code will be released. Code and data have been made available

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (7)
  1. Di Hu (88 papers)
  2. Lichao Mou (50 papers)
  3. Qingzhong Wang (26 papers)
  4. Junyu Gao (63 papers)
  5. Yuansheng Hua (16 papers)
  6. Dejing Dou (112 papers)
  7. Xiao Xiang Zhu (201 papers)
Citations (30)

Summary

We haven't generated a summary for this paper yet.