Generalizable Pedestrian Detection: The Elephant In The Room (2003.08799v7)

Published 19 Mar 2020 in cs.CV

Abstract: Pedestrian detection is used in many vision based applications ranging from video surveillance to autonomous driving. Despite achieving high performance, it is still largely unknown how well existing detectors generalize to unseen data. This is important because a practical detector should be ready to use in various scenarios in applications. To this end, we conduct a comprehensive study in this paper, using a general principle of direct cross-dataset evaluation. Through this study, we find that existing state-of-the-art pedestrian detectors, though perform quite well when trained and tested on the same dataset, generalize poorly in cross dataset evaluation. We demonstrate that there are two reasons for this trend. Firstly, their designs (e.g. anchor settings) may be biased towards popular benchmarks in the traditional single-dataset training and test pipeline, but as a result largely limit their generalization capability. Secondly, the training source is generally not dense in pedestrians and diverse in scenarios. Under direct cross-dataset evaluation, surprisingly, we find that a general purpose object detector, without pedestrian-tailored adaptation in design, generalizes much better compared to existing state-of-the-art pedestrian detectors. Furthermore, we illustrate that diverse and dense datasets, collected by crawling the web, serve to be an efficient source of pre-training for pedestrian detection. Accordingly, we propose a progressive training pipeline and find that it works well for autonomous-driving oriented pedestrian detection. Consequently, the study conducted in this paper suggests that more emphasis should be put on cross-dataset evaluation for the future design of generalizable pedestrian detectors. Code and models can be accessed at https://github.com/hasanirtiza/Pedestron.

Authors (5)

Irtiza Hasan (12 papers)
Shengcai Liao (46 papers)
Jinpeng Li (67 papers)
Saad Ullah Akram (6 papers)
Ling Shao (244 papers)

Citations (26)

View on Semantic Scholar

Summary

Generalizable Pedestrian Detection: The Elephant In The Room

The paper "Generalizable Pedestrian Detection: The Elephant In The Room" addresses a critical issue in the field of pedestrian detection within computer vision: the generalization capabilities of existing pedestrian detectors. Despite current detectors achieving high performance on specific datasets, their ability to generalize to unseen data remains an open question. This paper systematically investigates this issue through comprehensive cross-dataset evaluations, shedding light on the limitations of popular state-of-the-art pedestrian detectors.

Pedestrian detection plays a pivotal role in various real-world applications, including autonomous driving, video surveillance, and action recognition. Traditionally, the efficacy of pedestrian detectors has been predominantly assessed using within-dataset evaluation, where both training and testing occur on the same dataset. The authors argue that this practice leads to overfitting and limits the real-world applicability of these models.

A key revelation from this paper is that while existing detectors perform well on individual benchmarks, they falter in cross-dataset evaluations. Two primary reasons are identified for this discrepancy: the inherent bias in detector designs—such as anchor settings tailored for specific datasets—and the lack of pedestrian density and scene diversity in training datasets. The authors highlight the surprising observation that general-purpose object detectors, devoid of pedestrian-specific adaptations, demonstrate superior generalization across different datasets compared to specialized pedestrian detectors.

The paper further explores how training on diverse and densely populated datasets, collected via web crawling, can boost generalization capabilities. Datasets such as CrowdHuman and Wider Pedestrian are shown to enhance the robustness and adaptability of pedestrian models. The authors propose a progressive training pipeline, which incrementally fine-tunes models starting from a broad domain and targeting more specific domains. Employing this strategy, they achieve significant performance improvements in pedestrian detection, particularly in autonomous driving scenarios.

Experimental results underscore the importance of evaluating models on more diverse datasets. A comprehensive comparison across various models—namely BGCNet, CSP, PRNet, ALFNet, and Cascade R-CNN—reveals that general detectors often outperform pedestrian-specific models in cross-dataset evaluation settings. Notably, while within-dataset evaluations favor pedestrian-specialized designs, cross-dataset evaluations prove general detectors' superior adaptability.

The implications of this paper are substantial for future advancements in AI-based detection systems. Emphasizing cross-dataset evaluation provides a more realistic assessment of model robustness, essential for practical deployment in dynamic and unpredictable environments. The findings suggest that developers should pivot towards creating detectors with broader applicability, rather than exclusively fine-tuning for individual benchmarks.

Looking ahead, the research invites further investigation into creating even more generalized and adaptable models, potentially leveraging unsupervised or few-shot learning techniques to tackle domain shifts more effectively. As pedestrian detection technology advances, such strategic shifts in evaluation and development practices could lead to safer and more reliable applications, particularly in safety-critical domains like autonomous vehicles.

PDF Markdown

Generalizable Pedestrian Detection: The Elephant In The Room (2003.08799v7)

Summary

Generalizable Pedestrian Detection: The Elephant In The Room

Related Papers

GitHub

YouTube