Repulsion Loss: Detecting Pedestrians in a Crowd (1711.07752v2)

Published 21 Nov 2017 in cs.CV

Abstract: Detecting individual pedestrians in a crowd remains a challenging problem since the pedestrians often gather together and occlude each other in real-world scenarios. In this paper, we first explore how a state-of-the-art pedestrian detector is harmed by crowd occlusion via experimentation, providing insights into the crowd occlusion problem. Then, we propose a novel bounding box regression loss specifically designed for crowd scenes, termed repulsion loss. This loss is driven by two motivations: the attraction by target, and the repulsion by other surrounding objects. The repulsion term prevents the proposal from shifting to surrounding objects thus leading to more crowd-robust localization. Our detector trained by repulsion loss outperforms all the state-of-the-art methods with a significant improvement in occlusion cases.

Citations (475)

View on Semantic Scholar

Summary

The paper introduces Repulsion Loss to address occlusion challenges by minimizing bounding box overlaps in crowded scenes.
It integrates an attraction term with RepGT and RepBox losses to enhance regression accuracy and reduce false positives.
Empirical results on the CityPersons and Caltech-USA datasets demonstrate improved detection performance under occluded conditions.

Repulsion Loss: Detecting Pedestrians in a Crowd

In the paper titled "Repulsion Loss: Detecting Pedestrians in a Crowd," the authors address the complex challenge of pedestrian detection within crowded scenes, where occlusion significantly impairs the effectiveness of conventional detectors. They introduce a novel approach termed "Repulsion Loss" to enhance bounding box regression, thereby improving detection robustness in occlusion-heavy environments.

Problem Analysis

The authors first conduct an empirical analysis to demonstrate the detrimental effect of crowd occlusion on existing pedestrian detection frameworks. Using the CityPersons dataset, they identify that a substantial portion of missed detections and false positives stem from overlaps within pedestrian crowds. They categorize occlusion into inter-class and intra-class occlusions, with the latter—a primary focus of this paper—occurring predominantly in crowded pedestrian scenarios.

Methodology: Repulsion Loss

The central contribution of the paper is the introduction of the Repulsion Loss function, which builds upon the conventional bounding box regression loss. It comprises three components:

Attraction Term: This uses a $\mathrm{Smooth}_{L1}$ distance to close the gap between the predicted bounding box and its designated ground-truth target.
RepGT Loss: An additional loss penalizes the predicted box if it overlaps with nearby ground-truth objects, thereby reducing the likelihood of the bounding box shifting towards adjacent non-target pedestrians.
RepBox Loss: This term decreases overlap among predicted boxes that have differing designated targets, aiming to make predictions less sensitive to the non-maximum suppression (NMS) threshold.

Results and Implications

Empirical evaluations on the CityPersons and Caltech-USA benchmarks demonstrate the effectiveness of the proposed method. Detectors trained using Repulsion Loss achieve superior performance over state-of-the-art methods, notably improving detection accuracy in occluded scenarios.

On CityPersons, the methodology resulted in an improvement from 14.6% to 13.2% $\mathrm{MR}^{-2}$ on the reasonable validation subset.
The impact on occluded cases was particularly prominent, indicating that the approach effectively mitigates detection errors due to crowd overlaps.

Additionally, the paper extends the application of Repulsion Loss to generic object detection on the PASCAL VOC dataset, further validating its efficacy.

Theoretical and Practical Implications

Theoretically, the Repulsion Loss provides a compelling approach to address occlusion in object detection tasks, challenging the conventional use of attraction-based losses alone. Practically, the demonstrated robustness to varying NMS thresholds implies broader applicability and enhanced reliability in real-world scenarios, such as video surveillance and autonomous driving.

Future Directions

Potential future work could explore integrating Repulsion Loss into more diverse object detection frameworks and assessing its applicability across varied datasets featuring different crowd densities and occlusion complexities. Furthermore, refining the loss components to optimize computational efficiency without compromising accuracy could facilitate wider adoption.

In conclusion, the introduction of Repulsion Loss offers a substantive advancement in pedestrian detection, particularly within crowded and occluded settings. This approach paves the way for further research into loss functions that incorporate both attraction and repulsion dynamics, enhancing the reliability of object detection systems in increasingly complex visual environments.

PDF Markdown