Iterative Crowd Counting (1807.09959v1)

Published 26 Jul 2018 in cs.CV

Abstract: In this work, we tackle the problem of crowd counting in images. We present a Convolutional Neural Network (CNN) based density estimation approach to solve this problem. Predicting a high resolution density map in one go is a challenging task. Hence, we present a two branch CNN architecture for generating high resolution density maps, where the first branch generates a low resolution density map, and the second branch incorporates the low resolution prediction and feature maps from the first branch to generate a high resolution density map. We also propose a multi-stage extension of our approach where each stage in the pipeline utilizes the predictions from all the previous stages. Empirical comparison with the previous state-of-the-art crowd counting methods shows that our method achieves the lowest mean absolute error on three challenging crowd counting benchmarks: Shanghaitech, WorldExpo'10, and UCF datasets.

Authors (3)

Viresh Ranjan (10 papers)
Hieu Le (52 papers)
Minh Hoai (48 papers)

Citations (266)

View on Semantic Scholar

Summary

Iterative Crowd Counting: ic-CNN and Its Performance Analysis

The paper "Iterative Crowd Counting" introduces a method for estimating crowd density via a convolutional neural network (CNN) based approach. The authors Viresh Ranjan, Hieu Le, and Minh Hoai from Stony Brook University develop a multi-stage extension of a two-branch network architecture named ic-CNN. The proposed method achieves notable advancements in the accuracy of density maps and demonstrates its effectiveness over existing methods on datasets such as Shanghaitech, WorldExpo'10, and UCF.

The paper is motivated by the practical need for automatic and accurate crowd counting, given the human limitations in estimating dense crowds from images. The majority of previous methods utilized CNNs focusing on the detection-then-counting strategy or density estimation without leveraging multiple stages. In contrast, this paper presents an iterative mechanism that refines density estimations over successive stages, improving the robustness and accuracy of crowd counts.

Key Contributions and Methodology

The authors propose ic-CNN, a two-branch architecture aimed at generating accurate crowd density maps:

Low Resolution Branch (LR-CNN): This branch addresses initial density estimation at a quarter of the original image resolution. Utilizing a traditional CNN framework, the low-resolution map serves as both an initial estimate and a critical feature in refining subsequent higher resolution predictions.
High Resolution Branch (HR-CNN): Building on outputs and features from the LR-CNN, HR-CNN computes high-resolution density maps at the original size of the input image. The iterative approach comprises the sharing of feature maps and density predictions across both branches, optimizing learning outcomes through successive refinement of predictions.

Additionally, the authors introduce a multi-stage pipeline by combining multiple ic-CNNs. Each system stage leverages historical density estimations as input features, refining crowd count predictions iteratively. This iterative learning approach ensures a hierarchy of feature dependencies that enable effective responses to varying densities within single images or across datasets.

Empirical Evaluation and Results

The authors validate the ic-CNN framework across three crowd counting datasets:

Shanghaitech Dataset: Achieves state-of-the-art performance with a notable reduction in mean absolute error (MAE) by 48.3% in Part B using the one-stage ic-CNN. On Part A, a multi-stage version improves responses further, albeit with diminishing returns beyond the second stage.
WorldExpo'10 Dataset: Demonstrates competent performance on three out of five sections, underscoring the system's ability to adapt to diverse environments with varying density profile characteristics.
UCF Dataset: Obtains a competitive MAE score, outperforming previous state-of-the-art methods, thus proving ic-CNN's efficiency even within high-density crowd scenarios.

Implications and Future Directions

From a practical stance, ic-CNN provides significant implications for robust, real-time crowd counting systems in sectors such as urban planning, safety management, and event monitoring. The multi-stage iterative approach is an adaptable framework suitable for re-training or fine-tuning on emerging datasets or domains.

In terms of future research, potential exploration might include expanding the ic-CNN framework into multi-modal or temporal domains. For example, incorporating videos or leveraging spatial-temporal data could enhance crowd counting applications in dynamic environments. Moreover, improvements in model architecture and parameter optimization leveraging automated machine learning (AutoML) tools could incrementally improve performance metrics without substantial overheads.

Overall, the proposed iterative approach contributes effectively to the literature on crowd counting, pushing the limits of current density estimation solutions through refined architecture and learning methodologies.

PDF Markdown