RODNet: A Real-Time Radar Object Detection Network Cross-Supervised by Camera-Radar Fused Object 3D Localization (2102.05150v1)

Published 9 Feb 2021 in cs.CV and eess.SP

Abstract: Various autonomous or assisted driving strategies have been facilitated through the accurate and reliable perception of the environment around a vehicle. Among the commonly used sensors, radar has usually been considered as a robust and cost-effective solution even in adverse driving scenarios, e.g., weak/strong lighting or bad weather. Instead of considering to fuse the unreliable information from all available sensors, perception from pure radar data becomes a valuable alternative that is worth exploring. In this paper, we propose a deep radar object detection network, named RODNet, which is cross-supervised by a camera-radar fused algorithm without laborious annotation efforts, to effectively detect objects from the radio frequency (RF) images in real-time. First, the raw signals captured by millimeter-wave radars are transformed to RF images in range-azimuth coordinates. Second, our proposed RODNet takes a sequence of RF images as the input to predict the likelihood of objects in the radar field of view (FoV). Two customized modules are also added to handle multi-chirp information and object relative motion. Instead of using human-labeled ground truth for training, the proposed RODNet is cross-supervised by a novel 3D localization of detected objects using a camera-radar fusion (CRF) strategy in the training stage. Finally, we propose a method to evaluate the object detection performance of the RODNet. Due to no existing public dataset available for our task, we create a new dataset, named CRUW, which contains synchronized RGB and RF image sequences in various driving scenarios. With intensive experiments, our proposed cross-supervised RODNet achieves 86% average precision and 88% average recall of object detection performance, which shows the robustness to noisy scenarios in various driving conditions.

Citations (139)

View on Semantic Scholar

Summary

The paper introduces RODNet, a network that leverages cross-modal supervision with camera-based 3D localization to improve radar object detection.
It transforms raw radar signals into range-azimuth RF images to effectively integrate multi-chirp and temporal dynamics.
The method achieves robust performance with 86% precision and 88% recall, supporting applications in autonomous driving and ADAS.

An Overview of RODNet: A Real-Time Radar Object Detection Network

Radar sensors have long been valued for their robustness and cost-effectiveness in adverse environments, such as poor lighting and inclement weather, which can significantly degrade the performance of other sensing modalities like cameras and LiDAR. The research paper presents RODNet, a real-time radar object detection network that utilizes radar data alone to achieve effective object detection, backed by camera-radar fused supervision.

Key Contributions

Radar Data Utilization: Unlike standard RGB images from cameras, radar data is complex and lacks semantic richness. RODNet addresses this by transforming raw radar signals into RF images on range-azimuth coordinates, effectively harnessing multi-chirp information and temporal dynamics inherent in radar signals.
Cross-Modal Supervision: A novel cross-modal learning framework leverages camera-radar fusion for training. Camera-based 3D localization capabilities are combined with radar data to generate robust training annotations without manual labeling. This methodology ensures enhanced precision by systematically incorporating multi-modal information, significantly reducing the impact of potential biases from any single modality.
Architectural Innovation: The RODNet architecture introduces specialized modules such as M-Net and Temporal Deformable Convolution (TDC). M-Net processes chirp-level data to integrate detailed object features, while TDC handles dynamic feature evolution due to object motion across multiple frames. These architectural components collectively enable more accurate feature extraction and improved detection performance.
Comprehensive Dataset: The creation of the CRUW dataset fills a gap by providing synchronized RGB and radar (RF) image data across varied driving conditions. This dataset supports cross-modal research by supplying robust annotations and extensive data for model training and evaluation.

Numerical Results

The research demonstrates RODNet's efficacy with strong numerical results: 86% average precision and 88% average recall. This performance highlights RODNet's reliability across different driving conditions, positioning it as a strong contender in real-time applications where traditional vision-based methods might falter.

Implications and Future Research

RODNet's implications are twofold:

Practical Application: For autonomous vehicles and advanced driver-assistance systems (ADAS), integrating RODNet can enhance system reliability by maintaining object detection capabilities in challenging conditions where conventional sensors might fail.
Theoretical Development: The work opens avenues for further exploration in cross-modal learning frameworks, pushing the boundaries of sensor fusion techniques and real-time neural network architectures that can process inherently complex data types like radar signals.

Future developments could refine detection models by increasing the sophistication of radar signal processing and extending cross-modal frameworks to encompass additional sensor types beyond camera-radar configurations.

In conclusion, RODNet exemplifies advanced radar utilization for object detection, underscoring the importance of cross-modal supervision in addressing the challenges of semantic extraction from radar signals. It signifies a meaningful step forward in sensor fusion techniques and real-time detection capabilities in the field of autonomous driving technology.

PDF Markdown

Related Papers

YouTube

Show All Videos