SINet: A Scale-insensitive Convolutional Neural Network for Fast Vehicle Detection (1804.00433v2)

Published 2 Apr 2018 in cs.CV

Abstract: Vision-based vehicle detection approaches achieve incredible success in recent years with the development of deep convolutional neural network (CNN). However, existing CNN based algorithms suffer from the problem that the convolutional features are scale-sensitive in object detection task but it is common that traffic images and videos contain vehicles with a large variance of scales. In this paper, we delve into the source of scale sensitivity, and reveal two key issues: 1) existing RoI pooling destroys the structure of small scale objects, 2) the large intra-class distance for a large variance of scales exceeds the representation capability of a single network. Based on these findings, we present a scale-insensitive convolutional neural network (SINet) for fast detecting vehicles with a large variance of scales. First, we present a context-aware RoI pooling to maintain the contextual information and original structure of small scale objects. Second, we present a multi-branch decision network to minimize the intra-class distance of features. These lightweight techniques bring zero extra time complexity but prominent detection accuracy improvement. The proposed techniques can be equipped with any deep network architectures and keep them trained end-to-end. Our SINet achieves state-of-the-art performance in terms of accuracy and speed (up to 37 FPS) on the KITTI benchmark and a new highway dataset, which contains a large variance of scales and extremely small objects.

Citations (227)

View on Semantic Scholar

Summary

The paper introduces SINet, a CNN that employs innovative context-aware RoI pooling and a multi-branch decision network to overcome scale sensitivity in vehicle detection.
SINet achieves state-of-the-art performance with speeds up to 37 FPS and superior accuracy on benchmarks such as KITTI and a large scale variance highway dataset.
The approach offers practical, real-time solutions for intelligent transportation systems and opens pathways for extensions to other variable-scale object detection domains.

SINet: A Scale-Insensitive Convolutional Neural Network for Fast Vehicle Detection

Overview

The paper introduces SINet, a scale-insensitive convolutional neural network specifically designed to tackle the challenge of fast vehicle detection in scenarios where vehicle sizes vary significantly. A major obstacle in CNN-based vehicle detection is the scale sensitivity of feature representations. Traditional CNN approaches often struggle to maintain performance across objects of varying scales, which is particularly problematic in traffic surveillance tasks involving cars, vans, and buses of different sizes and distances from the camera.

Key Contributions

Significant contributions include the development of two novel techniques: context-aware RoI pooling and a multi-branch decision network. The proposed context-aware RoI pooling maintains original object structures in smaller regions by employing deconvolution with bilinear kernels, addressing the limitation of existing RoI pooling methods that tend to distort small objects. This technique enables SINet to preserve essential contextual information without additional computational cost.

The multi-branch decision network minimizes the issue of intra-class feature distance by segregating object proposals into branches according to scale. This multi-branch strategy ensures each branch can focus on objects of specific size ranges, enhancing the network's ability to capture scale-specific discriminative features.

Experimental Results

The SINet demonstrates state-of-the-art performance on established benchmarks such as the KITTI dataset and a newly constructed large scale variance highway dataset (LSVH). When tested, the network achieves impressive speeds of up to 37 FPS on typical resolutions, substantially exceeding computational efficiency without compromising accuracy. Specifically, in the KITTI benchmark, the SINet outperforms competing methods on "moderate" difficulty tasks while maintaining a robust detection rate on "easy" and "hard" scenarios.

Implications and Future Work

The presented techniques suggest significant practical implications for real-time vehicle detection in intelligent transportation systems, where rapid and accurate vehicle detection is critical for autonomous driving and traffic management systems. Moreover, the lightweight nature of the proposed methods ensures compatibility with real-time operations, crucial for applications in scenarios demanding immediate analysis and feedback.

The research opens avenues for further investigation, particularly in extending SINet's applicability to other domains where object sizes fluctuate considerably, such as pedestrian detection in crowded environments. Future research might explore enhancing the network's accuracy in highly cluttered scenes by leveraging advanced feature fusion methodologies or integrating additional sensor data beyond visual inputs.

Conclusion

SINet represents a significant step forward in the field of vehicle detection. By addressing scale sensitivity with innovative techniques that maintain computational efficiency, SINet is poised to enhance practical deployment in intelligent transportation systems. The techniques developed in this work are adaptable to a variety of CNN architectures, showcasing the broader potential for improvement of machine vision systems at large in handling variable object scales effectively.

PDF Markdown