CRUW Dataset: Radar-Only Detection Benchmark

Updated 27 February 2026

CRUW dataset is a large-scale resource featuring synchronized FMCW radar and stereo RGB camera data for autonomous driving.
The dataset provides detailed per-frame annotations for vehicles, pedestrians, and cyclists, enabling robust radar-only detection evaluation.
It supports advanced deep learning model development with high frame rates, precise radar range–azimuth heatmaps, and state-of-the-art preprocessing pipelines.

The CRUW dataset ("Camera–Radar Unifying Workbench") is a large-scale, publicly available resource designed to advance radar-only object detection (ROD) in autonomous driving scenarios. It consists of highly synchronized and calibrated data from automotive-grade frequency-modulated continuous-wave (FMCW) radar and stereo RGB cameras, accompanied by systematic per-frame annotations for vehicle, pedestrian, and cyclist objects. CRUW is foundational in enabling the development and evaluation of deep learning models that operate solely on radio frequency (RF) sensor data, a key requirement for robust perception in adverse weather and challenging visibility.

1. Sensor Configuration and Data Modalities

CRUW employs dual 77 GHz FMCW automotive radar antennas operating at a high 30 Hz frame rate, yielding robust RF returns with 256 chirps captured per frame, of which four representative chirps are typically used [indices: 0, 64, 128, 192]. The radar produces complex-valued Range–Azimuth (RA) heatmaps of size 128×128, with separate real and imaginary channels, delivering 0.23 m range resolution and 15° azimuth (angular) resolution. Synchronized stereo FLIR RGB cameras capture images at 1440×1080 px, 93.6° horizontal field of view, and a 0.35 m baseline—enabling high-accuracy extrinsic calibration and cross-modal annotation transfer (Wang et al., 2021).

<table> <tr> <th>Modality</th> <th>Resolution/Frequency</th> <th>Role</th> </tr> <tr> <td>Radar (FMCW, dual 77 GHz)</td> <td\>128×128 RA, 30 Hz</td> <td>Primary—detection and localization (RA maps)</td> </tr> <tr> <td>RGB Cameras (FLIR stereo)</td> <td\>1440×1080, 30 Hz</td> <td>Annotation propagation, 3D scene geometry</td> </tr> </table>

The radar sensors are co-mounted and strictly hardware-synchronized with cameras, allowing deterministic frame alignment and robust multi-modal fusion during annotation.

2. Data Collection Scenarios and Dataset Composition

CRUW consists of approximately 400,000 time-aligned frame pairs, encompassing ∼3.5 hours of diverse driving data across 464 sequences. The scenarios include parking lots, campus roads, city streets, highways, and intersections. CRUW emphasizes breadth over rare event coverage: public splits focus on clear-weather, daytime conditions without heavy precipitation or fog. Roughly 260,000 object instances (Car, Pedestrian, Cyclist) are annotated within the radar field of view (0–25 m range, ±90° azimuth), with about 92% in training sequences and 8% reserved for testing (Wang et al., 2021, Wang et al., 2020).

<table> <tr> <th>Scenario</th> <th>#Sequences</th> <th>#Frames</th> <th>Vision-Hard %</th> </tr> <tr> <td>Parking Lot</td> <td\>124</td> <td\>106,000</td> <td\>15%</td> </tr> <tr> <td>Campus Road</td> <td\>112</td> <td\>94,000</td> <td\>11%</td> </tr> <tr> <td>City Street</td> <td\>216</td> <td\>175,000</td> <td\>6%</td> </tr> <tr> <td>Highway</td> <td\>12</td> <td\>20,000</td> <td\>0%</td> </tr> </table>

The average number of annotated objects per frame in the original release is approximately 0.65; later radar-centric object detectors (e.g., RadarFormer) report up to ≈5 objects per frame, depending on the detection protocol (Dalbah et al., 2023, Wang et al., 2020).

3. Annotation Pipeline and Ground Truth Generation

CRUW uses a novel “camera–radar coordinate alignment” pipeline to generate radar-space labels without manual intervention (Wang et al., 2021, Wang et al., 2020). The multi-stage process includes:

Camera-based detection: Mask R-CNN detects and classifies objects in RGB frames, estimating 2D boxes, instance masks, and class labels.
CFAR-based radar peak detection: The Constant False Alarm Rate (CFAR) algorithm finds peaks in each radar frame, forming candidates in (range, azimuth) space.
Bilateral coordinate projection: Closed-form 3D transformations map detected objects between camera pixels and radar BEV (r, θ), with full derivations for both directions. This ensures geometric consistency by leveraging global extrinsic calibration ( $T_{cr}$ ) and ground-plane optimization.
Fusion and clustering: Gaussian probability maps in RA space are computed for each detected object, integrating camera and radar peak information. Spatial clustering via DBSCAN refines the radar object centroids.
Per-class “ConfMaps”: Each frame receives a separate 2D confidence Gaussian map (128×128) per class, where the center denotes the (range, azimuth) cell of the object and the standard deviation $\sigma$ encodes its spatial scale.

All “vision-hard” frames (strong occlusion, night, glare, blur) and a 10% subset of standard test frames undergo manual verification and correction. The final annotation protocol provides class, 2D position, and auxiliary kinematic attributes (e.g., speed), supporting both detection and tracking tasks (Wang et al., 2021).

4. Data Structure, Preprocessing, and Access

Each CRUW sequence contains synchronized directories for left/right camera images, radar tensors, and annotation text files. Radar data are provided as pre-processed .npy tensors (range×azimuth×channels), while annotations follow a KITTI-style plain text format, enumerating frame index, unique object ID, class, spatial coordinates, velocities, and other properties.

Preprocessing for deep learning models involves the following pipeline (Dalbah et al., 2023):

Raw radar ADC data are reshaped to $B × 2 × T × C × H × W$ (batch size, 2 channels, temporal window, chirps, spatial dimensions).
Channel–chirp–time merging (denoted “M-Net”) reduces the input dimension from $(2, C)$ to $C_h$ channels.
Temporal downsampling (via 3D convolution) collapses $T \to 1$ , accelerating inference with negligible AP loss (<1%).
Input normalization is minimal; the pipeline uses raw magnitude intensities, with no log-scaling or standardization. CFAR is strictly an annotation tool.
Post-backbone, the temporal dimension is restored for output “ConfMap” generation.

The dataset and code are distributed under a CC BY-NC-SA 4.0 license at https://www.cruwdataset.org/, with loading, visualization, and evaluation tools available at https://github.com/CRUWdataset/CRUW-Bench (Wang et al., 2021).

5. Evaluation Protocols and Metrics

CRUW evaluation centers on localization and classification. The core detection metric is the Object Location Similarity (OLS), which generalizes IoU to (range, azimuth) space:

$\mathrm{OLS}(i,j) = \exp\Bigl(-\frac{d_{ij}^2}{2(s_j \kappa_{cls})^2}\Bigr)$

where $d_{ij}$ is Euclidean separation in meters, $s_j$ the ground-truth distance to the sensor, and $\kappa_{cls}$ a class-specific tolerance (Wang et al., 2021, Wang et al., 2020). Detections are matched to ground truth by OLS (>0.5 considered a match).

For each class and for each OLS threshold $t \in \{0.5, 0.55, ..., 0.9\}$ :

Precision and recall are tabulated.
Average Precision (AP) and Average Recall (AR) are computed as means over thresholds.
Class-averaged mAP and mAR summarize detection performance.

Additional metrics include Mean Absolute Localization Error (MAE) and Detection Quality F1 (DQF1), which incorporates MAE, precision, and recall.

A representative benchmarking table (Wang et al., 2020):

Method	overall AP	overall AR
Decision Tree	4.70 %	44.26 %
CFAR+ResNet	40.49 %	60.56 %
CFAR+VGG16	40.73 %	72.88 %
RODNet	83.76 %	85.62 %

These protocols enforce consistency with leading 2D/3D vision benchmarks while accounting for radar-specific ambiguity and noise (Wang et al., 2021, Dalbah et al., 2023).

6. Comparative Analysis and Challenges

CRUW is notable for its high frame rate, large sample count, and minimalist, pure RA-space radar signal representation. It contrasts with other public radar datasets:

Radiate: Multi-modal (radar, camera, lidar) with adverse weather, but more complex data formats (RA, RD, RAD).
voD: Lower radar frame rate (10 Hz) and coarser angular bins; includes Doppler.
nuScenes: Aggregates radar as pseudo-LiDAR point clouds in coarsely binned BEV format; no real/imaginary channel distinction; object maps restricted to a single class (Dalbah et al., 2023).

CRUW’s technical challenges include:

Severe low SNR and speckle in RA heatmaps, requiring robust CFAR peak detection and strong model regularization.
The absence of elevation (vertical) cues—returns are collapsed to BEV, hampering separation of overlapping objects in height.
Temporal consistency is of minor importance; aggressive downsampling of temporal cues has minimal detrimental effect on detection accuracy.
CNN-only architectures like RODNet experience high inter-run variance and instability, while transformer-based backbones exhibit more stable convergence and generally higher AP (Dalbah et al., 2023).

7. Applications, Extensions, and Future Directions

CRUW serves as the main benchmarking platform for radar-only perception models and has catalyzed the development of advanced detection architectures (e.g., RODNet (Wang et al., 2020), RadarFormer (Dalbah et al., 2023)). Its influence extends to:

Enabling robust detection in weather-degraded visibility regimes, where optical sensors fail.
Providing a reproducible platform for multi-object tracking, sensor fusion, and self-supervised learning research.
Extension toward richer multi-modal datasets (e.g., CRUW3D (Wang et al., 2023)), which include LiDAR, expanded object classes (van, truck, bus), and broader sensor coverage.

Future improvements may entail higher-resolution Doppler data, more diverse environmental conditions, annotation of rare/edge-case events, and unified detection/tracking benchmarks. The pipeline and its systematic annotation methodology are directly transferable to new sites or sensor configurations, provided accurate synchronizations and calibrations are maintained.

CRUW remains a central asset underpinning progress in RF-centric autonomous navigation, radar deep learning, and robust, real-world urban perception.