Learning from Synthetic Data for Crowd Counting in the Wild (1903.03303v1)

Published 8 Mar 2019 in cs.CV

Abstract: Recently, counting the number of people for crowd scenes is a hot topic because of its widespread applications (e.g. video surveillance, public security). It is a difficult task in the wild: changeable environment, large-range number of people cause the current methods can not work well. In addition, due to the scarce data, many methods suffer from over-fitting to a different extent. To remedy the above two problems, firstly, we develop a data collector and labeler, which can generate the synthetic crowd scenes and simultaneously annotate them without any manpower. Based on it, we build a large-scale, diverse synthetic dataset. Secondly, we propose two schemes that exploit the synthetic data to boost the performance of crowd counting in the wild: 1) pretrain a crowd counter on the synthetic data, then finetune it using the real data, which significantly prompts the model's performance on real data; 2) propose a crowd counting method via domain adaptation, which can free humans from heavy data annotations. Extensive experiments show that the first method achieves the state-of-the-art performance on four real datasets, and the second outperforms our baselines. The dataset and source code are available at https://gjy3035.github.io/GCC-CL/.

Authors (4)

Qi Wang (561 papers)
Junyu Gao (63 papers)
Wei Lin (207 papers)
Yuan Yuan (234 papers)

Citations (494)

View on Semantic Scholar

Summary

The paper introduces a large-scale, accurately labeled synthetic dataset (GCC from GTA5) to tackle diverse crowd counting challenges.
The method leverages pre-training and fine-tuning on synthetic data, achieving state-of-the-art results on benchmarks like UCF-QNRF.
The innovative SSIM Embedding Cycle GAN bridges the domain gap, enabling robust crowd counting without the need for labeled real-world data.

Learning from Synthetic Data for Crowd Counting in the Wild

The paper "Learning from Synthetic Data for Crowd Counting in the Wild" addresses the challenges associated with crowd counting in varied environments, particularly focusing on overcoming issues like changeable settings and limited annotated data. The authors propose innovative techniques and introduce a synthetic dataset to enhance model performance in diverse crowd scenarios.

Key Contributions

Synthetic Dataset Creation: The paper introduces a novel data collector and labeler that algorithmically generates and labels synthetic crowd scenes. Based on the game Grand Theft Auto V (GTA5), the generated "GTA5 Crowd Counting" (GCC) dataset is large-scale, diverse, and includes more accurate annotations than existing real-world datasets. Notably, the GCC dataset provides various environmental conditions, such as different times and weather scenarios, thereby introducing a broader range of scenarios for model training.
Methodologies for Model Improvement: Two primary strategies are proposed to leverage synthetic data for crowd counting:
- Pre-training and Fine-tuning: This involves pre-training a crowd counting model on synthetic data and then fine-tuning it using real data. This method significantly enhances results on real-world datasets by enabling better initialization compared to models trained from scratch or on general image classification tasks.
- Domain Adaptation (DA): An innovative approach using SSIM Embedding (SE) Cycle GAN is presented to transform synthetic images into a more realistic style, which helps bridge the domain gap between synthetic and real datasets. This method does not require labeled real data, providing an efficient alternative to labor-intensive data annotation.

Experimental Results

The experiments conducted demonstrate that the supervised learning strategy achieves state-of-the-art performance on multiple real datasets, including UCF-QNRF, ShanghaiTech A/B, and UCF_CC_50. The domain adaptation approach also yields competitive results without relying on real-world annotations.

On the UCF-QNRF dataset, the proposed method achieved a Mean Absolute Error (MAE) of 102.0, showcasing a substantial improvement compared to prior models.
The synthetic dataset, GCC, for in-domain evaluation, revealed that models trained on it address extreme scenarios effectively, indicating the dataset's potential for broadening model robustness.

Implications and Future Directions

This research presents significant implications for crowd counting, especially in scenarios with scarce real-world training data. The methodologies and synthetic dataset can serve as a foundation for future developments in crowd analysis, providing a cost-effective and scalable way to improve AI models in real applications such as video surveillance and public event monitoring. Future research might focus on refining domain adaptation techniques to further close the reality gap between synthetic and real-world data, potentially leveraging advancements in generative models and self-supervised learning for enhanced feature extraction and domain invariance.

PDF Markdown

Related Papers

GitHub

Learning from Synthetic Data for Crowd Counting in the Wild