Revisiting Perspective Information for Efficient Crowd Counting (1807.01989v3)

Published 5 Jul 2018 in cs.CV

Abstract: Crowd counting is the task of estimating people numbers in crowd images. Modern crowd counting methods employ deep neural networks to estimate crowd counts via crowd density regressions. A major challenge of this task lies in the perspective distortion, which results in drastic person scale change in an image. Density regression on the small person area is in general very hard. In this work, we propose a perspective-aware convolutional neural network (PACNN) for efficient crowd counting, which integrates the perspective information into density regression to provide additional knowledge of the person scale change in an image. Ground truth perspective maps are firstly generated for training; PACNN is then specifically designed to predict multi-scale perspective maps, and encode them as perspective-aware weighting layers in the network to adaptively combine the outputs of multi-scale density maps. The weights are learned at every pixel of the maps such that the final density combination is robust to the perspective distortion. We conduct extensive experiments on the ShanghaiTech, WorldExpo'10, UCF_CC_50, and UCSD datasets, and demonstrate the effectiveness and efficiency of PACNN over the state-of-the-art.

Authors (4)

Miaojing Shi (53 papers)
Zhaohui Yang (193 papers)
Chao Xu (283 papers)
Qijun Chen (49 papers)

Citations (222)

View on Semantic Scholar

Summary

Revisiting Perspective Information for Efficient Crowd Counting

The paper "Revisiting Perspective Information for Efficient Crowd Counting" addresses the critical challenge of perspective distortion in the task of crowd counting. This field of research focuses on estimating the number of people in crowd images, which often showcases a considerable challenge due to the significant scale variations caused by perspective changes within a single image. Modern methods have largely relied on deep neural networks to approximate these counts indirectly by estimating crowd density functions.

Problem Statement and Approach

A fundamental issue encountered by these methods is perspective distortion, which leads to substantial changes in the apparent scale of individuals across the image. The paper introduces a perspective-aware convolutional neural network (PACNN) that integrates perspective information directly into the density regression process. This integration is proposed to enhance knowledge regarding person scale variations across an image and address density regression difficulties, particularly with smaller, more distant individuals.

The proposed PACNN uses perspective maps to inform multi-scale density contributions by incorporating perspective-aware weighting layers. This approach posits that by learning weights at each pixel location, the model can robustly combine densities across scales while being less affected by perspective distortions.

Methodology

The PACNN described incorporates several key innovations:

Ground Truth Perspective Map Generation: The authors employ a nonlinear fitting approach to generate ground truth perspective maps. The maps are informed by both the known average height of individuals and locally sampled values concerning apparent person heights in the image.
Multi-scale Density and Perspective Regression: The network regresses multiple density maps from various layers of the network. The layers correspond to different receptive fields, capturing details at varying resolutions to manage the density estimation in parts of the image reflecting different person sizes.
Perspective-aware Weighting: By applying sigmoid functions to the predicted perspective maps, the network weights contributions from multi-scale density maps. This method allows PACNN to handle scale changes due to perspective distortion more effectively than traditional models.

Results and Evaluation

The authors conduct extensive experiments on multiple established datasets, including ShanghaiTech, WorldExpo'10, UCF_CC_50, and UCSD. Exceling across these datasets, PACNN demonstrates compelling performance improvements, achieving accuracy with a lower mean absolute error (MAE) and mean squared error (MSE) when compared to the state-of-the-art methodologies.

The experimental results underline PACNN's effectiveness in leveraging perspective information to enhance crowd counting accuracy. The integration of perspective maps helps to stabilize predictions against scale inconsistencies and improve the usability of the model in real-world applications where traditional approaches may falter.

Implications and Future Directions

PACNN's implementation suggests substantial potential in employing scene geometry explicitly in neural network architectures for crowd counting. The work establishes a baseline for future investigations into how similar environmental cues can ameliorate model performance across different vision tasks, particularly those impacted by visual distortions like perspective or depth.

Moving forward, further exploration could involve refining perspective map accuracy or expanding the model to incorporate additional environmental cues, such as depth or occlusion maps. Moreover, pushing the boundaries of framework efficiency could address real-time application needs in dynamic environments, perhaps using transfer learning to decrease the reliance on large-scale datasets for initial training.

In conclusion, "Revisiting Perspective Information for Efficient Crowd Counting" provides a significant contribution by coupling scale-aware techniques with perspective modeling, and presents a substantial advancement in the burgeoning field of intelligent crowd analysis.

PDF Markdown

Related Papers

Find Related Papers