Adaptive Mixture Regression Network with Local Counting Map for Crowd Counting (2005.05776v2)

Published 12 May 2020 in cs.CV

Abstract: The crowd counting task aims at estimating the number of people located in an image or a frame from videos. Existing methods widely adopt density maps as the training targets to optimize the point-to-point loss. While in testing phase, we only focus on the differences between the crowd numbers and the global summation of density maps, which indicate the inconsistency between the training targets and the evaluation criteria. To solve this problem, we introduce a new target, named local counting map (LCM), to obtain more accurate results than density map based approaches. Moreover, we also propose an adaptive mixture regression framework with three modules in a coarse-to-fine manner to further improve the precision of the crowd estimation: scale-aware module (SAM), mixture regression module (MRM) and adaptive soft interval module (ASIM). Specifically, SAM fully utilizes the context and multi-scale information from different convolutional features; MRM and ASIM perform more precise counting regression on local patches of images. Compared with current methods, the proposed method reports better performances on the typical datasets. The source code is available at https://github.com/xiyang1012/Local-Crowd-Counting.

PDF Abstract

Adaptive Mixture Regression Network with Local Counting Map for Crowd Counting

The paper presented in this paper introduces an innovative approach to crowd counting by leveraging a novel learning target termed the Local Counting Map (LCM) alongside an adaptive mixture regression framework. Crowd counting, a significant computer vision task, entails estimating the number of individuals in an image or video frame, a task traditionally approached through density maps that have inherent inconsistencies between their training objectives and evaluation metrics.

Methodological Innovations

The principal contribution of this work is the formulation of the Local Counting Map (LCM), which addressingly rectifies the mismatch often observed between training targets (density maps) and evaluation criteria (crowd counts derived from summating density maps). The LCM posits each value as the count of individuals in a local patch as opposed to density maps where values indicate probabilistic occurrence. It has been derived through summing the density map on a per-patch basis, ensuring greater alignment with evaluation metrics, which theoretically mitigates error accumulation by operating on sounder mathematical principles.

To further enhance performance, an Adaptive Mixture Regression Network framework was constructed. This framework comprises three principal modules:

Scale-Aware Module (SAM): This module enhances feature maps by incorporating multi-scale information from different convolutional layer outputs, which is crucial for handling the nuanced variability across different crowd densities and scales.
Mixture Regression Module (MRM): Functions on a coarse-to-fine basis to refine the crowd count estimations progressively by utilizing a mixture model that divides counting into a series of progressively finer intervals.
Adaptive Soft Interval Module (ASIM): This features both shifting and scaling capabilities of the intervals within the regression mixture, infusing the regression results with flexibility and smoothness.

Empirical Validation and Significance

The proposed methodology is benchmarked on prominent datasets, namely ShanghaiTech Part A and B, UCF-QNRF, and UCF-CC-50, where it demonstrates superior performance over existing crowd counting approaches. Particularly, it records significant reductions in both MAE and MSE metrics across these datasets, underlining its efficacy. For instance, on the ShanghaiTech Part B dataset, the method achieves a notable MAE of 7.02, substantially better than prior leading methods such as CSRNet and SANet.

Implications and Future Directions

The implications of this research are two-fold. Practically, it advances the state of the art in crowd counting by offering a more accurate and computationally feasible method to be applicable across both sparse and dense settings. Theoretically, it establishes a compelling case for redefining traditional targets in computer vision frameworks, showcasing how LCM bridges the gap between predictive training schemes and evaluation protocols.

Moving forward, further exploration of leveraging context and multi-scale information from various convolutional features appears promising. There also lies potential exploration in adapting the LCM framework across other domains, moving beyond crowd counting, to address target prediction tasks in diverse fields such as wildlife monitoring or urban planning.

In summary, this methodology stands out in the crowd counting literature, not simply by refining count accuracy but by articulating a nuanced understanding of the relationship between training paradigms and evaluation outcomes.

PDF Markdown Bookmark Chat (Pro)

Authors (3)

Xiyang Liu (23 papers)
Jie Yang (516 papers)
Wenrui Ding (13 papers)

Citations (105)

View on Semantic Scholar

Related Papers

Find Related Papers

GitHub

GitHub - xiyang1012/Local-Crowd-Counting: Adaptive Mixture Regression Network with Local Counting Map for Crowd Counting (ECCV2020) (75 stars)