Illumination-aware Faster R-CNN for Robust Multispectral Pedestrian Detection (1803.05347v2)

Published 14 Mar 2018 in cs.CV

Abstract: Multispectral images of color-thermal pairs have shown more effective than a single color channel for pedestrian detection, especially under challenging illumination conditions. However, there is still a lack of studies on how to fuse the two modalities effectively. In this paper, we deeply compare six different convolutional network fusion architectures and analyse their adaptations, enabling a vanilla architecture to obtain detection performances comparable to the state-of-the-art results. Further, we discover that pedestrian detection confidences from color or thermal images are correlated with illumination conditions. With this in mind, we propose an Illumination-aware Faster R-CNN (IAF RCNN). Specifically, an Illumination-aware Network is introduced to give an illumination measure of the input image. Then we adaptively merge color and thermal sub-networks via a gate function defined over the illumination value. The experimental results on KAIST Multispectral Pedestrian Benchmark validate the effectiveness of the proposed IAF R-CNN.

Citations (334)

View on Semantic Scholar

Summary

The paper introduces an illumination-aware fusion that dynamically adjusts weights for color and thermal inputs based on lighting conditions.
It evaluates six fusion architectures, with Halfway Fusion and Score Fusion I demonstrating superior performance under varied illumination.
The approach achieves state-of-the-art results on the KAIST benchmark while offering efficient and robust multispectral pedestrian detection.

Illumination-Aware Faster R-CNN for Enhanced Multispectral Pedestrian Detection

The paper in question presents a compelling approach to multispectral pedestrian detection, employing an architecture dubbed Illumination-aware Faster R-CNN (IAF R-CNN). The novelty in this work is its enhancement of the multispectral detection framework by integrating illumination awareness, thereby improving detection robustness under various lighting conditions.

Background and Innovation

Pedestrian detection is a well-explored domain within computer vision, with convolutional neural networks (CNNs) and particularly, Faster R-CNNs, being pivotal in advancing detection performance. However, typical models predominantly focus on single-color modality and falter under poor illumination. The use of multispectral imaging—specifically combining color with long-wave infrared (thermal) imagery—has garnered attention for its potential to maintain performance across diverse lighting conditions. This paper identifies the crucial challenge lying in the effective fusion of these dual modalities.

Methodology and Contributions

The authors extensively analyze six convolutional network fusion architectures to address the multispectral fusion challenge. These architectures are: Input Fusion, Early Fusion, Halfway Fusion, Late Fusion, Score Fusion I, and Score Fusion II. Upon evaluation, the empirical results indicate Halfway Fusion and Score Fusion I achieving superior detection metrics, verifying their adaptation suitability for the task.

Key to their advancements is the introduction of an Illumination-aware Network (IAN), which estimates the illumination condition from the input data. This estimation is leveraged through a gated fusion mechanism designed to dynamically adjust the fusion of the thermal and color modalities in response to varying illumination levels. The gating function assigns weights to each modality based on illumination confidence, informed by the network's learned understanding of lighting conditions. This dynamic adaptability is crucial since empirical analysis shows that color and thermal images have complementary roles under good lighting but under poorer conditions, reliance should favor thermal imagery.

The combined framework is trained on the KAIST Multispectral Pedestrian Benchmark, and the results are promising. IAF R-CNN outperforms previous methods, as evidenced by achieving the lowest miss rate under reasonable configurations. Furthermore, this architecture exhibits competitive efficiency, demonstrating favorable processing times compared to existing solutions.

Results, Implications, and Speculations

The paper reports new state-of-the-art performance on the KAIST Benchmark, highlighting IAF R-CNN's competitive edge. By intelligently weighting the contribution of each modality per illumination context, the proposed approach effectively addresses the limitations of uniform fusion strategies in prior models.

The implications of these findings extend beyond pedestrian detection. They suggest potential avenues for adapting similar methodology in tasks where robust performance across varying environmental conditions is critical, such as in autonomous driving or security surveillance systems.

Future research may benefit from exploring the integration of additional sensory modalities, such as lidar, with multispectral images. Another prospective area is refining IAN with richer illumination datasets to potentially enhance its predictive capability.

Conclusion

This paper contributes a significant advancement in multispectral pedestrian detection by presenting a novel approach that synergizes two modalities' strengths using illumination awareness. While future pursuits are necessary to maximize its applicability, the IAF R-CNN sets a promising precedent for further innovation within multispectral object detection domains.

PDF Markdown