CNN-based Density Estimation and Crowd Counting: A Survey (2003.12783v1)

Published 28 Mar 2020 in cs.CV

Abstract: Accurately estimating the number of objects in a single image is a challenging yet meaningful task and has been applied in many applications such as urban planning and public safety. In the various object counting tasks, crowd counting is particularly prominent due to its specific significance to social security and development. Fortunately, the development of the techniques for crowd counting can be generalized to other related fields such as vehicle counting and environment survey, if without taking their characteristics into account. Therefore, many researchers are devoting to crowd counting, and many excellent works of literature and works have spurted out. In these works, they are must be helpful for the development of crowd counting. However, the question we should consider is why they are effective for this task. Limited by the cost of time and energy, we cannot analyze all the algorithms. In this paper, we have surveyed over 220 works to comprehensively and systematically study the crowd counting models, mainly CNN-based density map estimation methods. Finally, according to the evaluation metrics, we select the top three performers on their crowd counting datasets and analyze their merits and drawbacks. Through our analysis, we expect to make reasonable inference and prediction for the future development of crowd counting, and meanwhile, it can also provide feasible solutions for the problem of object counting in other fields. We provide the density maps and prediction results of some mainstream algorithm in the validation set of NWPU dataset for comparison and testing. Meanwhile, density map generation and evaluation tools are also provided. All the codes and evaluation results are made publicly available at https://github.com/gaoguangshuai/survey-for-crowd-counting.

View on arXiv

Authors (5)

Guangshuai Gao (9 papers)
Junyu Gao (63 papers)
Qingjie Liu (64 papers)
Qi Wang (561 papers)
Yunhong Wang (115 papers)

Citations (149)

View on Semantic Scholar

Summary

An Overview of CNN-based Density Estimation and Crowd Counting: A Survey

The paper "CNN-based Density Estimation and Crowd Counting: A Survey" by Guangshuai Gao et al. offers a comprehensive survey of over 220 published works in the field of object counting, with an emphasis on CNN-based crowd counting models. It explores the intricacies of density map estimation techniques as they pertain to crowd counting, recognizing the myriad applications such techniques have, be it in urban planning, public safety, or other domains such as vehicle counting and environmental surveys.

Summary and Analysis

The authors have categorized the surveyed methods into various taxonomies, primarily focusing on network architectures (basic CNN, multi-column, and single-column), the learning paradigm (single-task vs. multi-task), inference manner (patch-based vs. whole image-based), supervision form (fully-supervised vs. un/semi/weakly/self-supervised), and domain adaptation capabilities.

Network Architectures:
- Basic CNN Models: These models, the earliest in this domain, are simple but notably outperformed by more advanced architectures due to their limited capacity to handle scale variance and complex features.
- Multi-column Architectures: These have been prominent because of their ability to capture contextual features at different scales, using different branches of neural networks. However, they suffer from increased redundancy and computational complexity.
- Single-column Architectures: Increasingly favored for their simplicity and efficiency, these architectures use deeper networks to capture detailed features with less computational overhead.
Learning Paradigms:
- The survey highlights a shift from single-task learning, focusing solely on density maps, to multi-task learning, which integrates density estimation with auxiliary tasks (e.g., detection, classification) to improve performance.
Inference Manner:
- A significant portion of earlier methods adopts a patch-based inference manner, where images are divided into smaller patches for count estimation. Recently, there is a move towards using the whole image, preventing information loss and reducing computational load.
Supervision Forms:
- Fully-supervised approaches dominate, requiring extensive labeled data, while semi- and weakly-supervised methods show promise by alleviating labeled data dependence, utilizing unlabeled data more effectively.
Domain Adaptation:
- This is crucial for models to be applicable to diverse scenes unseen during training. Few models effectively generalize to other object counting domains or adapt to changes in environment or context.

Implications and Future Directions

Numerical results outlined in the paper show iterative improvements over time in terms of model accuracy, particularly with newer architectures like single-column CNNs complemented by context-aware features. Despite advances, models still face significant challenges when given complex factors such as occlusions, scale variations, and perspective distortions in crowds.

The survey suggests several areas for future exploration:

Development of more robust models that perform well across various domains without requiring exhaustive retraining.
Lightweight architectures allowing for real-time processing and deployment in resource-constrained environments.
Adoption of multi-task learning frameworks that incorporate other computer vision tasks beyond counting, such as localization and tracking.

In conclusion, the survey by Gao et al. serves as an invaluable resource for researchers, providing insights into current methodologies while charting pathways for future research endeavors in the field of crowd counting leveraging CNNs. The continuous evolution and cross-pollination of ideas within this dynamic area promise further breakthroughs in both theoretical development and practical applications.

PDF Markdown

Related Papers

Find Related Papers

GitHub

GitHub - gaoguangshuai/survey-for-crowd-counting: a survey for CNN-based crowd counting and density estimation (66 stars)