IterDet: Iterative Scheme for Object Detection in Crowded Environments (2005.05708v2)

Published 12 May 2020 in cs.CV

Abstract: Deep learning-based detectors usually produce a redundant set of object bounding boxes including many duplicate detections of the same object. These boxes are then filtered using non-maximum suppression (NMS) in order to select exactly one bounding box per object of interest. This greedy scheme is simple and provides sufficient accuracy for isolated objects but often fails in crowded environments, since one needs to both preserve boxes for different objects and suppress duplicate detections. In this work we develop an alternative iterative scheme, where a new subset of objects is detected at each iteration. Detected boxes from the previous iterations are passed to the network at the following iterations to ensure that the same object would not be detected twice. This iterative scheme can be applied to both one-stage and two-stage object detectors with just minor modifications of the training and inference procedures. We perform extensive experiments with two different baseline detectors on four datasets and show significant improvement over the baseline, leading to state-of-the-art performance on CrowdHuman and WiderPerson datasets. The source code and the trained models are available at https://github.com/saic-vul/iterdet.

Citations (35)

View on Semantic Scholar

Summary

The paper introduces an iterative detection scheme to overcome occlusion and duplicate detection challenges in crowded scenes.
It utilizes a history-aware approach by integrating previous detections to improve contextual awareness and prevent rediscovery.
Experimental results demonstrate significant AP and recall improvements on datasets like CrowdHuman and WiderPerson.

IterDet: Iterative Scheme for Object Detection in Crowded Environments

The paper "IterDet: Iterative Scheme for Object Detection in Crowded Environments" addresses a fundamental challenge in the domain of object detection, particularly in scenarios characterized by dense, overlapping object appearances. Traditional object detection algorithms, although robust in isolated settings, encounter significant obstacles in crowded environments due to their reliance on non-maximum suppression (NMS) techniques, which tend to either miss occluded objects or result in duplicate detections. IterDet proposes an innovative iterative approach that adapts to both one-stage and two-stage deep learning-based detectors to alleviate these deficiencies.

Technical Contributions

The key technical advancement proposed by the authors is an iterative detection scheme termed IterDet. In IterDet, object detection is executed iteratively rather than simultaneously. At each iteration, a detector identifies a subset of objects, while previously detected bounding boxes are input into subsequent iterations to prevent rediscovery and preserve occluded objects. This mechanism ensures that objects overlooked in earlier iterations due to significant occlusion or close proximity to similar objects are effectively detected in later iterations.

The introduction of a history-aware detection model is central to this scheme. A history map is constructed from previously detected bounding boxes and integrated with the image input to the detector. This history-aware approach allows the detector to consider the spatial distribution and overlap of objects, providing contextual awareness that aids in distinguishing between overlapping instances.

Experimental Evaluation

IterDet demonstrated substantial improvements on multiple datasets representing crowded scenarios, including CrowdHuman and WiderPerson as well as the synthetic AdaptIS datasets. The experimental results indicate that IterDet significantly enhances average precision (AP) and recall rates over baseline models. For instance, on the CrowdHuman dataset, IterDet showed an increase in AP by 3.1% and recall by more than 5.5% when benchmarked against traditional methods. Similar gains were evident across other datasets, marking it as a robust solution for high-density object detection tasks.

Notably, IterDet establishes state-of-the-art results on the heavily cited CrowdHuman and WiderPerson datasets. The iterative process manifests substantial improvements in recall metrics without sacrificing precision, presenting a balanced enhancement over previous methodologies.

Implications and Future Directions

The implications of this research are profound in both practical and theoretical dimensions. Practically, IterDet could be transformative for applications such as surveillance, autonomous driving, and dense crowd analysis, where accurate detection of tightly clustered objects is crucial. The iterative approach creates an algorithmic architecture that scales effectively with dense environments and diverse detection contexts, rendering it broadly applicable across various industries requiring nuanced object detection capabilities.

Theoretically, IterDet challenges the conventional perception of simultaneous object detection. By prioritizing iterative processing, this mechanism could inspire novel adaptations in other deep learning-based tasks where contextual and iterative analyses might yield better outcomes. Future research may explore further optimizations of iterative schemes, including their extension to other complex detection tasks, integration with attention mechanisms to further enhance contextual object differentiation, and reduction of computational overhead through more efficient history mapping strategies.

In conclusion, the authors have proposed a well-structured and empirically validated scheme that fills a critical gap in object detection within crowded environments. IterDet manifests both a methodological innovation and a practical advancement, promising to influence future research directions significantly.

PDF Markdown

Follow-up Questions

We haven't generated follow-up questions for this paper yet.

Generate Now

Related Papers

Authors (5)

GitHub

GitHub - SamsungLabs/iterdet: [S+SSPR2020] IterDet: Iterative Scheme for Object Detection in Crowded Environments (209 stars)