RadarHD Benchmark for Radar Point Cloud Super-Resolution

Updated 29 September 2025

RadarHD Benchmark is a standard evaluation framework that uses synchronized radar and lidar data to assess high-resolution point cloud reconstruction.
It employs quantitative metrics like Chamfer Distance and Modified Hausdorff Distance to measure spatial fidelity and reconstruction accuracy across varied conditions.
The benchmark supports radar super-resolution methods through pretrained priors, radar-specific conditioning, and dual-space loss functions to enhance performance.

The RadarHD Benchmark refers to a set of protocols and datasets for quantitatively evaluating the ability of computational models to reconstruct high-resolution, lidar-like point clouds from mmWave radar data, with an emphasis on direct comparison between reconstructed radar point clouds and ground truth lidar across diverse environments and signal conditions. This benchmark emerged to address the growing demand for rigorous, task-specific metrics in radar-based perception, particularly as radar-only and radar-centric robotics have become practical for applications where lidar or vision systems are hindered by adverse environmental factors such as fog, dust, or smoke.

1. Definition and Benchmarking Protocol

The RadarHD Benchmark centers on assessing the spatial fidelity of radar-derived point clouds versus reference lidar scans. Each entry in the benchmark consists of radar input data (typically in polar or heatmap representation) and a synchronized lidar point cloud ground truth. The principal task is to generate a dense, high-resolution point cloud from the radar input that faithfully matches the spatial distribution and structure of corresponding lidar frames. This paradigm is specifically designed to measure the effectiveness of super-resolution, domain translation, and geometric hallucination methods in reconstructing scene structure from radar’s sparse and artifact-prone signals.

Quantitative evaluation on the RadarHD Benchmark relies on point cloud similarity metrics, with two principal measures:

Chamfer Distance (CD):

$CD(P,Q) = \frac{1}{|P|}\sum_{p \in P} \min_{q \in Q} \lVert p-q\rVert_2 + \frac{1}{|Q|}\sum_{q \in Q} \min_{p \in P} \lVert q-p \rVert_2$

This symmetric metric computes the average bidirectional nearest-neighbor Euclidean distance between the reconstructed point cloud $P$ and the lidar ground truth $Q$ .

Modified Hausdorff Distance (MHD):

Unlike classical Hausdorff distance, which measures the maximal deviation between sets, MHD is based on mean nearest-neighbor distances, providing robustness to outliers and noise in sparse radar data.

Lower values for CD and MHD signal closer alignment between radar and reference lidar clouds, capturing both the coverage of the physical scene and the precision in geometric placement of reconstructed points.

2. Benchmark Data and Input Representation

The RadarHD Benchmark is instantiated using a large-scale multimodal dataset comprising synchronized radar-lidar pairs collected from low-cost, single-chip mmWave radar sensors (e.g., TI AWR1843) and high-resolution lidar devices (such as Ouster OS0/OS1). Radar data is made available in minimally processed forms, typically as I/Q samples or low-thresholded, denoising-resistant polar heatmaps (e.g., 64×256 grids), thereby retaining both strong and weak returns alongside characteristic artifacts—including sinc-like spreading patterns and “ghost” reflections.

Reference lidar clouds serve both as reconstruction targets and as the metric anchor for all quantitative evaluations. Notably, all datasets under the RadarHD Benchmark protocol maintain time and pose alignment to support frame-level correspondence for pixel-to-point and point-to-point comparisons.

3. Evaluation Methodology and Metric Significance

Chamfer Distance and Modified Hausdorff Distance intrinsically capture different aspects of geometric accuracy:

CD is sensitive to both coverage (does the radar cloud span the same physical locations as the lidar cloud?) and clutter (are extraneous points hallucinated?).
MHD emphasizes the mean-case proximity between sets, particularly important for robust evaluation under noisy, artifact-prone radar input.

These metrics are chosen for their resilience to noise and outliers—problems endemic to radar imaging—while still penalizing both missing key scene structures and excessive false returns.

The use of CD and MHD makes the benchmark suitable for both strict super-resolution evaluation and for the assessment of generalization to new scenes and conditions, as shown by reporting not only indoor environment statistics but also cross-domain (e.g., campus building) results.

4. State-of-the-Art Baselines and Results

Recent entries on the RadarHD Benchmark reveal significant progress in radar point cloud super-resolution:

Model	# Radar Frames	Mean CD (m)	Mean MHD (m)	Radar Input Resolution	Notes
Single-frame RadarHD	1	0.56	0.45	4 cm native	Baseline (VAE-based)
Multi-frame RadarHD	5–41	≈0.35–0.38	≈0.28–0.29	4 cm native	Multi-temporal input, pixel diffusion
RadarSFD (Ours)	1	0.35	0.28	4 cm native	Latent diffusion, pretrained priors
RAL’24 (Latent Diff.)	1	0.38	0.29	4 cm native	Competing single-frame latent method

For example, RadarSFD achieves a mean CD of 0.35 m and mean MHD of 0.28 m on single radar-frame input, surpassing both the single-frame VAE RadarHD baseline (0.56 m / 0.45 m) and another latent diffusion model (0.38 m / 0.29 m). Notably, this performance is also competitive with multi-frame models (up to 41-frame stacks). This demonstrates that well-regularized, pretrained latent models with proper radar conditioning can recover fine spatial structure (“narrow gaps, fine walls”) from minimal input, and that CD/MHD can discriminate these effects.

A plausible implication is that continued refinement of pretrained priors and loss functions, even without temporal stacking or SAR techniques, could close the residual gap with multi-frame or lidar-based methods—an important direction for compact robotic platforms.

5. Algorithmic Design: Priors, Conditioning, and Losses

The benchmark has precipitated innovations in network design for radar super-resolution:

Pretrained Priors: Incorporating geometric knowledge from models like Marigold—a monocular depth estimator—has significantly reduced geometric errors. Initializing U-Nets with vision-trained priors guides the network to reconstruct detailed scene geometry from sparse radar features.
Radar Conditioning: Radar input is mapped to a BEV latent (channel-wise) and concatenated with the network’s backbone at multiple stages, enhancing the model’s ability to align sparse, artifacted observations with prior knowledge of scene structure.
Dual-Space Loss Functions: Combining loss terms in both latent/noise-prediction space (diffusion target) and pixel (voxel) reconstruction space—specifically, additive terms of latent MSE, $L_1$ , SSIM, and LPIPS—significantly improves reconstruction sharpness while constraining hallucination (wrongly plausible but incorrect details).

Ablation studies confirm that both pretrained initialization and dual loss are critical for the best results on CD and MHD, as removal of either meaningfully degrades reconstruction fidelity.

6. Qualitative and Generalization Findings

RadarHD Benchmark results are not limited to in-domain test scenes. Leading methods have demonstrated the ability to generalize, reconstructing fine geometric features in previously unseen environments (e.g., from office/lobby training to a campus building test). Qualitative visualizations highlight accurate recovery of:

Thin structural elements (e.g., walls)
Narrow passages and doorways
Persistent suppression of transient “ghost” returns

This generalization capacity is measured by maintaining competitive CD and MHD across all tested environments, indicating that the benchmark genuinely assesses transferable geometric understanding from radar.

7. Significance and Future Directions

The RadarHD Benchmark has established a standard for evaluating radar super-resolution pipelines in the absence of motion or synthetic aperture aggregation. Its influence can be seen in the design choices—pretrained priors, radar-specific latent conditioning, and tailored loss functions—that dominate leading submissions. By offering quantitative and qualitative assessment across domains and sensor modalities, it has become a de facto touchstone for compact, radar-centric robotics platforms and methods seeking to approach lidar-level fidelity under severe scene or environmental constraints.

Going forward, likely research directions include tighter integration of physics-based radar modeling in pretrained priors, improved artifact suppression, and broader ecological validity by extending the benchmark to include outdoor, highly non-stationary, or crowded scenes. The standardized, traceable metric reporting of the RadarHD Benchmark ensures ongoing rigorous, reproducible comparison between future methods and enables systematic progress in high-resolution radar scene reconstruction.

PDF Markdown Chat (Pro)

Follow Topic

Get notified by email when new papers are published related to RadarHD Benchmark.