Pedestrian Detection: Domain Generalization, CNNs, Transformers and Beyond (2201.03176v2)

Published 10 Jan 2022 in cs.CV

Abstract: Pedestrian detection is the cornerstone of many vision based applications, starting from object tracking to video surveillance and more recently, autonomous driving. With the rapid development of deep learning in object detection, pedestrian detection has achieved very good performance in traditional single-dataset training and evaluation setting. However, in this study on generalizable pedestrian detectors, we show that, current pedestrian detectors poorly handle even small domain shifts in cross-dataset evaluation. We attribute the limited generalization to two main factors, the method and the current sources of data. Regarding the method, we illustrate that biasness present in the design choices (e.g anchor settings) of current pedestrian detectors are the main contributing factor to the limited generalization. Most modern pedestrian detectors are tailored towards target dataset, where they do achieve high performance in traditional single training and testing pipeline, but suffer a degrade in performance when evaluated through cross-dataset evaluation. Consequently, a general object detector performs better in cross-dataset evaluation compared with state of the art pedestrian detectors, due to its generic design. As for the data, we show that the autonomous driving benchmarks are monotonous in nature, that is, they are not diverse in scenarios and dense in pedestrians. Therefore, benchmarks curated by crawling the web (which contain diverse and dense scenarios), are an efficient source of pre-training for providing a more robust representation. Accordingly, we propose a progressive fine-tuning strategy which improves generalization. Code and models can accessed at https://github.com/hasanirtiza/Pedestron.

Authors (5)

Irtiza Hasan (12 papers)
Shengcai Liao (46 papers)
Jinpeng Li (67 papers)
Saad Ullah Akram (6 papers)
Ling Shao (244 papers)

Citations (21)

View on Semantic Scholar

Summary

Pedestrian Detection: Domain Generalization, CNNs, Transformers and Beyond

Overview

The paper "Pedestrian Detection: Domain Generalization, CNNs, Transformers and Beyond" explores the persistent challenge of domain generalization in pedestrian detection. The paper highlights the limitations of current pedestrian detectors that excel in single-dataset training and testing but falter under cross-dataset evaluation due to even modest domain shifts.

Key Findings

The core findings emphasize the biases in current pedestrian detection methodologies and data sources:

Methodological Bias: The paper attributes the lack of generalization primarily to the design biases within current pedestrian detectors. These detectors often overfit to specific datasets by tailoring core design choices—such as anchor settings—for particular targets. Consequently, they perform less effectively on novel datasets without further training.
Data Limitations: The standard datasets used for training pedestrian detectors, particularly for autonomous driving applications, lack diversity and density. Predominantly sourced from monotonous, similar environments, these datasets fail to provide a comprehensive representation of potential real-world scenarios.
General Object Detectors: A significant finding is that general object detectors consistently outperform tailored pedestrian detectors in cross-dataset evaluations due to their generic design. This suggests a paradigm shift towards adopting more general object detection frameworks for pedestrian detection tasks.

Proposed Solutions

The authors propose several strategies to address these limitations:

Progressive Fine-Tuning Strategy: By initial pre-training on diverse, dense datasets like those collected from the web, and subsequently fine-tuning on datasets closer to the target domain, detectors can enhance their generalization abilities. This strategy resulted in measurable improvements when applied to various pedestrian detection architectures.
Network Architecture Analysis: The paper conducts a comparative analysis of Convolutional Neural Networks (CNNs) and Transformer networks, suggesting that while Transformers show strong results in direct evaluations, CNNs exhibit superior domain generalization, particularly with large-scale data pre-training.

Implications and Future Directions

The findings indicate that a shift in focus is necessary for the advancement of pedestrian detection technologies. Next-generation detectors could benefit from increased emphasis on cross-dataset evaluation and the employment of more generalized object detection frameworks. The inclusion of diverse and dense datasets in training pipelines shows promise in enhancing robustness across domains.

The paper's insights imply several potential future developments in AI:

Enhanced Real-World Application: Improved generalization could lead to more reliable pedestrian detection in real-world applications such as autonomous driving and surveillance, where domain shifts are routine.
Adoption of Hybrid Models: Combining the strengths of CNNs and Transformers could potentially offer a robust solution for domain generalization, merging the representation capacities of Transformers with the generalization prowess of CNNs.

In conclusion, this paper illuminates critical pathways for improving pedestrian detection through refined methodological and data-centric approaches, challenging existing norms and paving the way for more adaptive, generalizable detection systems.

PDF Markdown

Related Papers

Find Related Papers

GitHub

GitHub - hasanirtiza/Pedestron: [Pedestron] Generalizable Pedestrian Detection: The Elephant In The Room. @ CVPR2021 (687 stars)

Tweets

https://twitter.com/magic_walnut/status/1296433841494732801