NWPU-Crowd: A Large-Scale Benchmark for Crowd Counting and Localization (2001.03360v4)

Published 10 Jan 2020 in cs.CV

Abstract: In the last decade, crowd counting and localization attract much attention of researchers due to its wide-spread applications, including crowd monitoring, public safety, space design, etc. Many Convolutional Neural Networks (CNN) are designed for tackling this task. However, currently released datasets are so small-scale that they can not meet the needs of the supervised CNN-based algorithms. To remedy this problem, we construct a large-scale congested crowd counting and localization dataset, NWPU-Crowd, consisting of 5,109 images, in a total of 2,133,375 annotated heads with points and boxes. Compared with other real-world datasets, it contains various illumination scenes and has the largest density range (0~20,033). Besides, a benchmark website is developed for impartially evaluating the different methods, which allows researchers to submit the results of the test set. Based on the proposed dataset, we further describe the data characteristics, evaluate the performance of some mainstream state-of-the-art (SOTA) methods, and analyze the new problems that arise on the new data. What's more, the benchmark is deployed at \url{https://www.crowdbenchmark.com/}, and the dataset/code/models/results are available at \url{https://gjy3035.github.io/NWPU-Crowd-Sample-Code/}.

Authors (4)

Qi Wang (561 papers)
Junyu Gao (63 papers)
Wei Lin (207 papers)
Xuelong Li (268 papers)

Citations (351)

View on Semantic Scholar

Summary

Overview of "NWPU-Crowd: A Large-Scale Benchmark for Crowd Counting and Localization"

The paper introduces NWPU-Crowd, a significant dataset designed to address challenges in crowd counting and localization. The dataset comprises 5,109 images with a total of 2,133,375 annotated heads, establishing it as one of the largest resources for this task. The authors identified inadequacies in existing datasets, particularly regarding scale, variance, and comprehensive labeling, which they aim to address with NWPU-Crowd.

Dataset Highlights and Methodology

NWPU-Crowd stands out due to several characteristics:

Scale and Diversity: This dataset surpasses previous benchmarks in terms of the number of images and annotated heads, incorporating a wide density range (0 to 20,033) and challenging scenes with diverse illumination conditions.
Negative Sample Integration: Unique to this dataset is the inclusion of 351 negative samples, which are pivotal in modeling the network's robustness against non-crowd but visually similar patterns.
High-Resolution Imagery: The dataset's average resolution is notable, reaching peaks of 4028 x 19044 pixels. This facilitates the detection of fine details that are critical in distinguishing densely packed crowds.
Comprehensive Annotations: Annotation is performed at both point and box levels, enabling precise evaluation of both counting and localization tasks. Additionally, a benchmark website is established to streamline and standardize performance evaluation across submissions.

Experimental Evaluation

The paper conducts extensive experiments with ten state-of-the-art crowd counting models, including MCNN, SANet, CSRNet, and SFCN\dag, highlighting performance across metrics such as MAE, MSE, NAE, PSNR, and SSIM. SCAR achieved the lowest MAE on the validation set, indicating its capacity for effective spatial and channel-wise attention modeling.

Insights and Implications

The results showcase specific challenges in negative samples and handling of extraordinarily dense crowds. Negative samples often lead models to misclassify dense non-human objects, indicating a gap in current methodologies’ robustness and adaptability. Additionally, performance degradation across extremely dense samples underscores the need for more sophisticated feature extraction methods capable of accommodating high-density scenes.

Future Directions

The implications of this research extend to practical applications in real-world environments such as public safety and urban planning. The depth of NWPU-Crowd provides a path for developing more robust, adaptable, and generalizable models. Future work should consider exploring hybrid methodologies and scale-aware architectures to address model rigidity against variance in scene attributes.

Conclusion

NWPU-Crowd emerges as a crucial resource that enhances the reliability and applicability of crowd counting and localization systems. The dataset's scale and diversity pave the way for new research trajectories focusing on improving model adaptability and robustness, especially in challenging real-world scenarios. Such advancements have the potential to significantly impact areas reliant on crowd data analysis, reflecting the paper's contribution to the broader field of computer vision and machine learning.

PDF Markdown