A Deep Learning-based Radar and Camera Sensor Fusion Architecture for Object Detection (2005.07431v1)

Published 15 May 2020 in cs.CV

Abstract: Object detection in camera images, using deep learning has been proven successfully in recent years. Rising detection rates and computationally efficient network structures are pushing this technique towards application in production vehicles. Nevertheless, the sensor quality of the camera is limited in severe weather conditions and through increased sensor noise in sparsely lit areas and at night. Our approach enhances current 2D object detection networks by fusing camera data and projected sparse radar data in the network layers. The proposed CameraRadarFusionNet (CRF-Net) automatically learns at which level the fusion of the sensor data is most beneficial for the detection result. Additionally, we introduce BlackIn, a training strategy inspired by Dropout, which focuses the learning on a specific sensor type. We show that the fusion network is able to outperform a state-of-the-art image-only network for two different datasets. The code for this research will be made available to the public at: https://github.com/TUMFTM/CameraRadarFusionNet.

Authors (5)

Felix Nobis (6 papers)
Maximilian Geisslinger (6 papers)
Markus Weber (15 papers)
Johannes Betz (62 papers)
Markus Lienkamp (41 papers)

Citations (231)

View on Semantic Scholar

Summary

The paper presents CRF-Net, a sensor fusion technique that integrates radar and camera data to improve detection accuracy under challenging conditions.
The novel BlackIn strategy within the RetinaNet framework enables optimal feature fusion and results in a 12.96 percentage point mAP increase on the nuScenes dataset.
Experimental results on nuScenes and custom datasets demonstrate enhanced object detection reliability by leveraging radar’s resilience in poor lighting and weather.

A Deep Learning-based Radar and Camera Sensor Fusion Architecture for Object Detection

The paper "A Deep Learning-based Radar and Camera Sensor Fusion Architecture for Object Detection" presents CameraRadarFusionNet (CRF-Net), a novel architecture designed to improve object detection in autonomous vehicles under challenging conditions. The research addresses the limitations of using camera sensors alone, which can be hindered by adverse weather conditions, and proposes a fusion of camera and radar data to enhance detection accuracy.

Technical Summary

The proposed CRF-Net architecture integrates both camera and radar data within a single neural network to perform object detection. The network employs a strategy called BlackIn, which is akin to Dropout, focusing the learning process on individual sensor types to optimize data fusion. This strategy promotes the automatic learning of optimal sensor fusion levels within the network, which is an incremental yet meaningful advancement over previous architectures.

CRF-Net builds on top of the established RetinaNet framework, incorporating a VGG backbone for feature extraction. A unique aspect of this architecture is its ability to determine at which network layer—thanks to a trainable structure—the camera and radar data should be fused for maximum efficacy. The research leverages radar's resilience in conditions such as poor lighting and adverse weather, which often degrade camera performance.

Evaluation and Results

The CRF-Net was tested on both the nuScenes dataset and a bespoke dataset gathered by the authors. The results showed that CRF-Net provided improved performance over traditional image-only object detection networks. For instance, training with ground-truth filtered radar data yielded a significant increase in mean Average Precision (mAP) by 12.96 percentage points in the nuScenes dataset compared to the image-only approach.

Furthermore, despite the sparsity and noise associated with radar inputs, the fusion network successfully enhanced object detection. When radar meta-data (e.g., distance and radar cross-section) was incorporated, detection accuracy improved, highlighting the critical role these meta-data play in sensor fusion. Hence the paper underscores the latent potential of radar channels in overcoming the adverse effects usually encountered with image data alone.

Implications and Future Directions

The integration of radar data into neural networks for object detection represents a significant step toward robust autonomous driving systems that can operate effectively in varied environmental conditions. This approach opens avenues for addressing the practical challenges of sensor fusion in both 2D and 3D spaces.

Going forward, future research could explore optimized architectures for multi-sensor fusion, potentially incorporating additional modalities like LIDAR. Development of non-ground-truth-based filtering techniques, improved noise reduction strategies, and the adaptation to high-resolution radar systems could further refine object detection capabilities.

Such advancements will strengthen autonomous driving systems’ perceptual performance, paving the way for more reliable and safe navigation in environments where conventional sensor setups might fail. Additionally, the robust integration of disparate sensor types poses a promising solution for achieving real-time object detection in challenging and dynamic driving environments.

Overall, this paper contributes to the broader quest for highly reliable sensor fusion techniques in the domain of autonomous vehicle technology. Its insights and methodologies offer substantial groundwork for future explorations in multi-modal perception systems.

PDF Markdown