Papers
Topics
Authors
Recent
2000 character limit reached

Radar-Only Models Overview

Updated 31 December 2025
  • Radar-only models are autonomous perception systems that use solely radar data, excelling in adverse environments with penetrative, low-cost sensors.
  • They employ diverse data forms including 2D/3D point clouds, raw tensors, and BEV grids, integrated with geometric and deep learning methods.
  • These models achieve competitive accuracy in tasks such as object detection, segmentation, and ego-motion estimation while mitigating challenges like resolution limits and data sparsity.

Radar-only models are autonomous perception and reasoning systems that process radar sensor data exclusively, without incorporating information from other modalities such as LiDAR, cameras, or GNSS. These models are constructed to leverage radar’s unique physical characteristics: penetrative capability through adverse environmental phenomena (fog, dust, rain), extended range, insensitivity to lighting, and suitability for low-cost, robust deployment. Recent research demonstrates that foundational, detection, segmentation, mapping, and scene understanding tasks can be successfully addressed using radar-only pipelines—sometimes matching or exceeding traditional multisensor benchmarks in adverse conditions.

1. Sensor Data Representations and Preprocessing

Radar-only models operate on a variety of input data forms, influenced by sensor design and downstream task requirements:

  • 2D/3D Point Clouds: Many automotive and robotics radars deliver point clouds with range, azimuth, sometimes elevation, and radial velocity (Doppler). For example, Navtech CIR-DEV-X produces 2D polar scans at 4 Hz with 7.5 cm range resolution (Overbye et al., 2023); 3+1D imaging radars give ego-motion compensated (x,y,z,vr)(x,y,z,v_r) returns (Palmer et al., 9 Sep 2024).
  • Raw Range-Azimuth-Doppler Tensors: Deep learning pipelines ingest multi-channel tensors after FFT processing, often retaining real/imaginary samples per antenna, chirp, or time slice (Dalbah et al., 2023). Four-dimensional cubes (e.g., 256×64×8×2) preserve most phase/amplitude information (Huang et al., 15 Sep 2025).
  • Polar/Bird’s-Eye-View Grids: For object detection or occupancy mapping, sparse point clouds are discretized into 2D polar grids or rasterized into BEV Cartesian grids, supplying binary occupancy, point-count, mean Doppler, or RCS features (Wei et al., 2023).

Preprocessing typically includes intensity thresholding (e.g., Ï„=0.26\tau=0.26 in (Overbye et al., 2023)), (optional) denoising (morphological operations, median filtering), motion compensation (deskewing via odometry), and feature engineering for task-specific cues.

2. Fundamental Model Architectures

Radar-only models span classical geometric, optimization-based, and deep neural architectures:

  • Robust Geometric Estimation: RANSAC plane fitting for ground estimation; k-d tree clustering for obstacle detection; Euclidean subset selection, bounding-box size thresholds (Overbye et al., 2023).
  • Graph-based Ego-Motion Estimation: Keypoint extraction from saliency in 2D radar scans; graph matching optimization (eigenvector relaxation, greedy rounding) for feature correspondence; closed-form 2D/3DOF SVD motion estimation (Cen et al., 2019).
  • Deep Learning for Object Detection: Transformer and CNN backbones adapt vision architectures (RadarFormer, MaXViT) using channel-chirp-time merging and multi-axis attention (Dalbah et al., 2023). Point cloud detectors (Complex-YOLO, PointPillars, DSVT-P) operate on rasterized BEV or voxelized radar point clouds (Lee, 2020, Palmer et al., 9 Sep 2024).
  • Inverse Sensor and Occupancy Models: Polar-grid deep ISMs based on ResNet plus dual-attention decoders (PAM/CAM), evidential masses for free/occupied/unknown, temporal Dempster-Shafer fusion for dynamic mapping (Wei et al., 2023).
  • Transformer-based Spatial-Temporal Networks: RadarMOSEVE introduces object and scenario-level self-attention plus cross-frame temporal attention, explicitly integrating radial Doppler channels for segmentation and ego-velocity estimation (Pang et al., 22 Feb 2024).

Foundational radar models (GRT, RadarFM) use transformer encoders with 4D patch tokenization; cross-modal knowledge distillation and curriculum pretraining (LEROjD) optimize radar performance for 3D object detection by leveraging alternate modalities during training (Huang et al., 15 Sep 2025, Mishra et al., 26 Nov 2025, Palmer et al., 9 Sep 2024).

3. Learning Objectives, Loss Functions, and Training Protocols

Radar-only frameworks employ task-specific and multi-task learning objectives:

  • Occupancy and Segmentation Losses: Binary cross-entropy (range-squared weighting), class-balanced Dice loss, weighted softmax cross entropy for multi-class semantics, or evidential fusion rules (Dempster-Shafer) (Wei et al., 2023, Huang et al., 15 Sep 2025).
  • Detection and Box Regression: YOLO-style joint loss with objectness, class, and smooth-L1 location penalties; mAP at 3D IoU thresholds for quantitative benchmarking (Lee, 2020, Palmer et al., 9 Sep 2024).
  • Contrastive and Hash-Aware Losses: RadarFM employs hash-aware CLIP loss functions quantifying fine-grained continuous similarity between scenes, weighted by spatial bin Hamming distances and kernel widths (Mishra et al., 26 Nov 2025).
  • Velocity Estimation Losses: Mean squared error, Doppler consistency regularizer to align per-point velocities with the ego-motion estimate (Pang et al., 22 Feb 2024).
  • Cross-Modal Knowledge Distillation: Feature map and logit matching, pseudo-label fusion, weight initialization from lidar-teacher models, cyclical super-convergence scheduling (Palmer et al., 9 Sep 2024).

Critical hyperparameters—learning rate, batch size, dropout, temporal window stride, and class frequency balancing—are tuned per dataset/task, and data augmentation protocols aggressively randomize spatial, temporal, and feature dimensions to address radar sparsity.

4. Benchmark Datasets, Evaluation Metrics, and Comparative Results

Major public and private datasets underpin radar-only development:

Performance is assessed using precision, recall, mAP at 3D IoU ≥0.5\geq0.5, mean intersection-over-union per class, object location similarity (OLS), velocity estimation MAE/MSE, and advanced localization-aware metrics (cell-wise precision/recall in native radar bins) (Mishra et al., 26 Nov 2025, Pang et al., 22 Feb 2024).

Notable results (mean AP, recall, IoU):

  • RadarFormer achieves 77.18% AP, 83.45% AR on CRUW, with 2× inference speed and 10× parameter reduction vs. prior models (Dalbah et al., 2023).
  • GRT exhibits log-linear scaling: test loss improves 20% per 10×10\times data increase; raw tensor inputs outperform lossy (CFAR/AoA) by ~28–31% (Huang et al., 15 Sep 2025).
  • RadarMOSEVE delivers mIoU 70.2% (static 73.3, moving 67.2), velocity MAE 0.182 m/s, outpacing LiDAR-based and point-transformer baselines (Pang et al., 22 Feb 2024).
  • Multi-stage and KD-based radar detection in LEROjD raises mAP by up to 4.2 points (SR), generalizes across architectures without core modification (Palmer et al., 9 Sep 2024).
  • Radar-only odometry matches or exceeds visual odometry in structured and unstructured terrain with median errors as low as 2.08 cm/0.0597° (city) (Cen et al., 2019).

5. Applications and Real-World Deployments

Radar-only models are deployed for multiple autonomous systems and perception tasks:

  • Off-road Local Navigation: Real-time ground-plane and obstacle mapping enables autonomous traversal of 350 m off-road at 2.5 m/s, with higher detection range than lidar and robust operation in mixed terrains, no manual intervention (Overbye et al., 2023).
  • Ego-Motion Estimation: Scan-matching via graph-based radar keypoints yields GPS-level odometry accuracy, with resilience to weather and false returns (Cen et al., 2019).
  • Occupancy and Dynamic Mapping: Deep ISMs infer free/occupied/unknown states and fuse Doppler for dynamic grid maps directly from radar; multi-radar systems achieve mount-agnostic, low-latency mapping without retraining (Wei et al., 2023).
  • Moving Object Segmentation and Ego-Velocity Estimation: RadarMOSEVE simultaneously segments static/moving targets and estimates the robot velocity, robust to marine and urban environments (Pang et al., 22 Feb 2024).
  • Object Detection: Transformer and pillar/voxel-based detectors identify cars, pedestrians, and cyclists within sparse radar point clouds, with curriculum and distillation on lidar visibility boosting radar-only mAP (Dalbah et al., 2023, Palmer et al., 9 Sep 2024).
  • Foundational Scene Understanding: RadarFM and GRT pretrain generalizable scene representations enabling cross-task transfer—scene captioning, semantic segmentation, occupancy prediction—using only radar inputs (Huang et al., 15 Sep 2025, Mishra et al., 26 Nov 2025).

6. Challenges, Limitations, and Future Directions

Radar-only perception faces inherent limitations:

  • Physical Resolution Constraints: Angular and vertical resolution remain low for single-chip and automotive imaging radars; narrow pillars and small objects near the resolution limit may be missed (Overbye et al., 2023, Huang et al., 15 Sep 2025).
  • Sparsity and Noise: Returns are orders of magnitude sparser and noisier than lidar; multi-frame accumulation and deep spatial-temporal attention are required for robust feature learning (Pang et al., 22 Feb 2024).
  • Ground Truth and Annotation Bias: Supervised learning relies on camera/lidar fusion labels, which propagate annotation noise into radar-only heatmap regression (Dalbah et al., 2023).
  • Scaling Laws: Data requirements for foundational pretraining are steep; GRT and RadarFM models are projected to saturate after 100M samples (3000 h), consistent with large vision foundation model trends (Huang et al., 15 Sep 2025).

Priority future directions include: scalable collection of raw radar cubes (I/Q streams), joint Doppler-aware temporal transformers, semi-supervised or self-supervised learning methods to mitigate label noise, and radar-specific positional encodings. Early results suggest that curriculum design and cross-modal knowledge transfer during training can partially bridge the sparseness and annotation gap, facilitating high-performance radar-only inference with generic backbone architectures (Palmer et al., 9 Sep 2024, Mishra et al., 26 Nov 2025).


Summary Table: Key Results and Architectures

Model/Method Data Type Backbone/Approach Key Metrics/Results Reference
Off-Road Navigation 2D polar scans RANSAC, k-d tree clustering 2.5 m/s, 55 m range, 0.3 m elev error (Overbye et al., 2023)
Ego-Motion 2D radar scans Keypoint+Graph Matching Median error 0.052–0.089 m, 0.06–0.13° (Cen et al., 2019)
RadarFormer 3D multi-chirp tensor MaXViT Transformer 77.18% AP, 83.45% AR, 12 FPS, 6.4M params (Dalbah et al., 2023)
GRT 4D radar tensors Transformer Encoder-Decoder –20% loss per 10×10\times data, 0.98m BEV err (Huang et al., 15 Sep 2025)
RadarMOSEVE 4D point clouds Radar Transformer (Obj+Scen) mIoU 70.2%, V MAE 0.182 m/s (Pang et al., 22 Feb 2024)
Deep ISM Sparse polar grids ResNet+DANet dual-attention p(Ä¥=O O)=38.7%, mount-agnostic
LEROjD 3+1D radar points PointPillars/DSVT-P/VoxelRCNN +3–4pp mAP gain via curriculum/KD (Palmer et al., 9 Sep 2024)
RadarFM Range–angle heatmaps CLIP ViT-B/16+GPT-2 Cell-wise F1, spatial scene embedding (Mishra et al., 26 Nov 2025)

Radar-only models now encompass high-resolution mapping, ego-motion, dynamic occupancy, object detection, segmentation, and unified scene representation. Ongoing foundational work is extending their capabilities towards robust, scalable perception in the absence of lidar/camera data, especially under adverse or cost-constrained operational regimes.

Whiteboard

Topic to Video (Beta)

Follow Topic

Get notified by email when new papers are published related to Radar-Only Models.