Ambient Sound Helps: Audiovisual Crowd Counting in Extreme Conditions (2005.07097v2)

Published 14 May 2020 in cs.CV

Abstract: Visual crowd counting has been recently studied as a way to enable people counting in crowd scenes from images. Albeit successful, vision-based crowd counting approaches could fail to capture informative features in extreme conditions, e.g., imaging at night and occlusion. In this work, we introduce a novel task of audiovisual crowd counting, in which visual and auditory information are integrated for counting purposes. We collect a large-scale benchmark, named auDiovISual Crowd cOunting (DISCO) dataset, consisting of 1,935 images and the corresponding audio clips, and 170,270 annotated instances. In order to fuse the two modalities, we make use of a linear feature-wise fusion module that carries out an affine transformation on visual and auditory features. Finally, we conduct extensive experiments using the proposed dataset and approach. Experimental results show that introducing auditory information can benefit crowd counting under different illumination, noise, and occlusion conditions. The dataset and code will be released. Code and data have been made available

View on arXiv

Authors (7)

Di Hu (88 papers)
Lichao Mou (50 papers)
Qingzhong Wang (26 papers)
Junyu Gao (63 papers)
Yuansheng Hua (16 papers)
Dejing Dou (112 papers)
Xiao Xiang Zhu (201 papers)

Citations (30)

View on Semantic Scholar

Summary

We haven't generated a summary for this paper yet.

Summarize Now

Ambient Sound Helps: Audiovisual Crowd Counting in Extreme Conditions (2005.07097v2)

Summary

Related Papers