Fully Convolutional Crowd Counting On Highly Congested Scenes (1612.00220v2)

Published 1 Dec 2016 in cs.CV

Abstract: In this paper we advance the state-of-the-art for crowd counting in high density scenes by further exploring the idea of a fully convolutional crowd counting model introduced by (Zhang et al., 2016). Producing an accurate and robust crowd count estimator using computer vision techniques has attracted significant research interest in recent years. Applications for crowd counting systems exist in many diverse areas including city planning, retail, and of course general public safety. Developing a highly generalised counting model that can be deployed in any surveillance scenario with any camera perspective is the key objective for research in this area. Techniques developed in the past have generally performed poorly in highly congested scenes with several thousands of people in frame (Rodriguez et al., 2011). Our approach, influenced by the work of (Zhang et al., 2016), consists of the following contributions: (1) A training set augmentation scheme that minimises redundancy among training samples to improve model generalisation and overall counting performance; (2) a deep, single column, fully convolutional network (FCN) architecture; (3) a multi-scale averaging step during inference. The developed technique can analyse images of any resolution or aspect ratio and achieves state-of-the-art counting performance on the Shanghaitech Part B and UCF CC 50 datasets as well as competitive performance on Shanghaitech Part A.

PDF Abstract

Fully Convolutional Crowd Counting on Highly Congested Scenes

This paper presents a significant advance in the domain of crowd counting using computer vision methodologies by detailing a deep learning approach leveraging fully convolutional networks (FCNs). The research addresses the complexities involved in accurately estimating crowd sizes in scenes with high populations, specifically in settings where traditional methods falter due to occlusions and variations in scene content.

Key Contributions

The authors propose a refined approach inspired by previous work on fully convolutional networks by Zhang et al. The model stands out through several critical innovations:

Training Set Augmentation: A novel augmentation strategy is employed, minimizing redundancy among training samples, thereby enhancing model generalization capabilities and improving overall counting accuracy. This method demonstrated a notable reduction in Mean Absolute Error (MAE) and Mean Squared Error (MSE) when applied to the Shanghaitech Part_B validation set.
Network Architecture: The paper introduces a deep, single column FCN architecture optimized for generating dense crowd count heatmaps. This architecture offers increased capacity to learn complex and abstract relationships in the data, outperforming existing multi-column architectures. The performance improvements on the Shanghaitech Part_B dataset highlight its efficacy.
Multi-Scale Averaging: The authors tackle issues related to scale and perspective shifts by implementing a multi-scale averaging step during inference. This technique improves the robustness and accuracy of crowd estimates by considering multiple scaled versions of input images.

Experimental Performance and Evaluation

The paper rigorously evaluates the proposed approach on prominent datasets, namely Shanghaitech Part_A and Part_B as well as the UCF_CC_50 dataset. The approach achieved state-of-the-art results on Shanghaitech Part_B with an MAE of 23.76 and an MSE of 33.12, and it demonstrated improved counting performance on the UCF_CC_50. These benchmarks underscore the model's robustness, particularly in scenes comprising several thousand individuals. Moreover, the paper finds that cross-dataset performance is significantly influenced by the density of the training and target dataset scenes.

Broader Implications and Future Directions

The implications of this research are multifaceted, offering practical advantages in urban planning, public safety monitoring, and retail analytics. On a theoretical front, the findings encourage further exploration into FCNs for handling pixel-wise tasks beyond crowd counting, such as crowd segmentation or anomaly detection in highly populated areas.

Future research may benefit from extending these approaches to adaptive systems capable of transferring learned models across varying contexts without retraining, bolstering model utility in real-world scenarios with diverse camera perspectives and crowd dynamics. Additionally, further investigation into optimizing computational efficiency without compromising accuracy could enrich real-time applications, particularly in surveillance contexts requiring rapid processing of large-scale video data streams.

In summary, the paper provides a robust framework for advancing crowd counting techniques under the duress of high-density scenarios, showcasing the potential of fully convolutional networks to mitigate long-standing challenges in the domain of computer vision crowd analytics.

PDF Markdown Bookmark Chat (Pro)

Authors (4)

Mark Marsden (6 papers)
Kevin McGuinness (76 papers)
Suzanne Little (18 papers)
Noel E. O'Connor (70 papers)

Citations (176)

View on Semantic Scholar

Fully Convolutional Crowd Counting On Highly Congested Scenes (1612.00220v2)