Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
110 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Bayesian Loss for Crowd Count Estimation with Point Supervision (1908.03684v1)

Published 10 Aug 2019 in cs.CV

Abstract: In crowd counting datasets, each person is annotated by a point, which is usually the center of the head. And the task is to estimate the total count in a crowd scene. Most of the state-of-the-art methods are based on density map estimation, which convert the sparse point annotations into a "ground truth" density map through a Gaussian kernel, and then use it as the learning target to train a density map estimator. However, such a "ground-truth" density map is imperfect due to occlusions, perspective effects, variations in object shapes, etc. On the contrary, we propose \emph{Bayesian loss}, a novel loss function which constructs a density contribution probability model from the point annotations. Instead of constraining the value at every pixel in the density map, the proposed training loss adopts a more reliable supervision on the count expectation at each annotated point. Without bells and whistles, the loss function makes substantial improvements over the baseline loss on all tested datasets. Moreover, our proposed loss function equipped with a standard backbone network, without using any external detectors or multi-scale architectures, plays favourably against the state of the arts. Our method outperforms previous best approaches by a large margin on the latest and largest UCF-QNRF dataset. The source code is available at \url{https://github.com/ZhihengCV/Baysian-Crowd-Counting}.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (4)
  1. Zhiheng Ma (21 papers)
  2. Xing Wei (88 papers)
  3. Xiaopeng Hong (59 papers)
  4. Yihong Gong (38 papers)
Citations (448)

Summary

Bayesian Loss for Crowd Count Estimation

The paper "Bayesian Loss for Crowd Count Estimation with Point Supervision" introduces a novel approach to improve crowd counting methods using computer vision techniques. This work focuses on transitioning from traditional "ground-truth" density maps to a Bayesian loss formulation that effectively uses point annotations.

Key Contributions

The main contribution lies in the introduction of the Bayesian loss, which redefines the learning objective from strict pixel-level supervision to a probabilistic model, guiding the network to focus on count expectation at each annotated point. This approach is grounded in addressing imperfections seen in traditional density maps caused by occlusion, perspective variation, and shape irregularities.

Methodology

  • Model Construction: The Bayesian loss constructs a density contribution probability model from point annotations. It focuses on count expectation, calculated as the product of contribution probability and estimated density at each pixel, ensuring supervision reliability.
  • Background Pixel Modeling: To improve handling non-crowd areas, the model introduces a background label to better differentiate between foreground and background in crowd scenes. This enhancement provides robustness in varying densities.
  • Density Map Estimation: The paper contrasts the proposed Bayesian loss against traditional methods rooted in Gaussian kernel transforms of sparse annotations to density maps. Unlike strict pixel-level supervision, the Bayesian approach considers the distribution of count expectations as priors.

Results

The Bayesian loss, when integrated with standard CNN architectures such as VGG-19, consistently shows significant performance improvements over standard density map methods. When tested on datasets like UCF-QNRF and ShanghaiTech, the method outperformed state-of-the-art techniques by a noteworthy margin, particularly on high-variance datasets.

Implications

The paper paves the way for crowd counting models that can reliably perform in real-world scenarios with inherent challenges like perspective distortion and dense occlusions. The proposed method also reflects practical advancements for applications in areas such as traffic monitoring and public event management.

Future Directions

The research sparks new avenues for incorporating additional data modalities and leveraging spatial-temporal probabilities to refine estimation models further. Speculation on future developments may include integrating domain-specific priors or exploring unsupervised adaptations of the model for broader applicability across datasets lacking annotations. The exploration of model variants using different backbones demonstrates potential versatility and encourages adaptation to other network architectures.

In conclusion, this paper offers a robust alternative to traditional crowd counting estimations by showcasing a method that emphasizes expectation over rigid density targets, ultimately enhancing performance through principled probabilistic modeling.