RadarOcc: Robust 3D Occupancy Prediction with 4D Imaging Radar (2405.14014v4)
Abstract: 3D occupancy-based perception pipeline has significantly advanced autonomous driving by capturing detailed scene descriptions and demonstrating strong generalizability across various object categories and shapes. Current methods predominantly rely on LiDAR or camera inputs for 3D occupancy prediction. These methods are susceptible to adverse weather conditions, limiting the all-weather deployment of self-driving cars. To improve perception robustness, we leverage the recent advances in automotive radars and introduce a novel approach that utilizes 4D imaging radar sensors for 3D occupancy prediction. Our method, RadarOcc, circumvents the limitations of sparse radar point clouds by directly processing the 4D radar tensor, thus preserving essential scene details. RadarOcc innovatively addresses the challenges associated with the voluminous and noisy 4D radar data by employing Doppler bins descriptors, sidelobe-aware spatial sparsification, and range-wise self-attention mechanisms. To minimize the interpolation errors associated with direct coordinate transformations, we also devise a spherical-based feature encoding followed by spherical-to-Cartesian feature aggregation. We benchmark various baseline methods based on distinct modalities on the public K-Radar dataset. The results demonstrate RadarOcc's state-of-the-art performance in radar-based 3D occupancy prediction and promising results even when compared with LiDAR- or camera-based methods. Additionally, we present qualitative evidence of the superior performance of 4D radar in adverse weather conditions and explore the impact of key pipeline components through ablation studies.
- Tesla. Tesla AI Day 2022. https://www.youtube.com/watch?v=ODSJsviD_SU, 2022. Accessed: 2024-04-08.
- Implicit occupancy flow fields for perception and prediction in self-driving. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 1379–1388, 2023.
- Openoccupancy: A large scale benchmark for surrounding semantic occupancy perception. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 17850–17859, 2023.
- Occ3d: A large-scale 3d occupancy prediction benchmark for autonomous driving. Advances in Neural Information Processing Systems, 36, 2024.
- Voxformer: Sparse voxel transformer for camera-based 3d semantic scene completion. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 9087–9098, 2023.
- Bevfusion: A simple and robust lidar-camera fusion framework. Advances in Neural Information Processing Systems, 35:10421–10434, 2022.
- Bevfusion: Multi-task multi-sensor fusion with unified bird’s-eye view representation. In Proceedings of the IEEE International Conference on Robotics and Automation (ICRA), pages 2774–2781. IEEE, 2023.
- Futr3d: A unified sensor fusion framework for 3d detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 172–181, 2023.
- Rethinking range view representation for lidar segmentation. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 228–240, 2023.
- Polarstream: Streaming object detection and segmentation with polar pillars. Advances in Neural Information Processing Systems, 34:26871–26883, 2021.
- Stratified transformer for 3d point cloud segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 8500–8509, 2022.
- Pnpnet: End-to-end perception and prediction with tracking in the loop. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 11553–11562, 2020.
- Vip3d: End-to-end visual trajectory prediction via 3d agent queries. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 5496–5506, 2023.
- Planning-oriented autonomous driving. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 17853–17862, 2023.
- S3cnet: A sparse semantic scene completion network for lidar point clouds. In Proceedings of the Conference on Robot Learning (CoRL), pages 2148–2161. PMLR, 2021.
- Lmscnet: Lightweight multiscale 3d semantic completion. In Proceedings of the International Conference on 3D Vision (3DV), pages 111–119, 2020.
- Sparse single sweep lidar point cloud segmentation via learning contextual shape priors from scene completion. In Proceedings of the AAAI Conference on Artificial Intelligence (AAAI), volume 35, pages 3101–3109, 2021.
- Semi-supervised implicit scene completion from sparse lidar. arXiv preprint arXiv:2111.14798, 2021.
- Semantic scene completion using local deep implicit functions on lidar data. IEEE Transactions on Pattern Analysis and Machine Intelligence, 44(10):7205–7218, 2021.
- Lidar-based 4d occupancy completion and forecasting. arXiv preprint arXiv:2310.11239, 2023.
- Scpnet: Semantic scene completion on point cloud. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 17642–17651, 2023.
- Point cloud forecasting as a proxy for 4d occupancy forecasting. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 1116–1124, 2023.
- Monoscene: Monocular 3d semantic scene completion. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 3991–4001, 2022.
- Tri-perspective view for vision-based 3d semantic occupancy prediction. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 9223–9232, 2023.
- Surroundocc: Multi-camera 3d occupancy prediction for autonomous driving. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 21729–21740, 2023.
- Scene as occupancy. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 8406–8415, 2023.
- Occformer: Dual-path transformer for vision-based 3d semantic occupancy prediction. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 9433–9443, 2023.
- Ovo: Open-vocabulary occupancy. arXiv preprint arXiv:2305.16133, 2023.
- Selfocc: Self-supervised vision-based 3d occupancy prediction. arXiv preprint arXiv:2311.12754, 2023.
- Occnerf: Self-supervised multi-camera occupancy prediction with neural radiance fields. arXiv preprint arXiv:2312.09243, 2023.
- Pop-3d: Open-vocabulary 3d occupancy prediction from images. Advances in Neural Information Processing Systems, 36, 2024.
- Cotr: Compact occupancy transformer for vision-based 3d occupancy prediction. arXiv preprint arXiv:2312.01919, 2023.
- Cam4docc: Benchmark for camera-only 4d occupancy forecasting in autonomous driving applications. arXiv preprint arXiv:2311.17663, 2023.
- Automotive radar dataset for deep learning based 3d object detection. In Proceedings of the 16th European radar conference (EuRAD), pages 129–132. IEEE, 2019.
- Echoes beyond points: Unleashing the power of raw radar data in multi-modality fusion. Advances in Neural Information Processing Systems, 36, 2024.
- Raw high-definition radar for multi-task learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 17021–17030, June 2022.
- Rpfa-net: A 4d radar pillar feature attention network for 3d object detection. In Proceedings of the IEEE International Intelligent Transportation Systems Conference (ITSC), pages 3061–3066. IEEE, 2021.
- Multi-class road user detection with 3+ 1d radar in the view-of-delft dataset. IEEE Robotics and Automation Letters, 7(2):4961–4968, 2022.
- 3d object detection for multi-frame 4d automotive millimeter-wave radar point cloud. IEEE Sensors Journal, 2022.
- K-radar: 4d radar object detection for autonomous driving in various weather conditions. Advances in Neural Information Processing Systems, 35:3819–3829, 2022.
- Deep learning based 3d object detection for automotive radar and camera. In Proceedings of the 16th European Radar Conference (EuRAD), pages 133–136. IEEE, 2019.
- Interfusion: Interaction-based 4d radar and lidar fusion for 3d object detection. In Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages 12247–12253. IEEE, 2022.
- Multi-modal and multi-scale fusion 3d object detection of 4d radar and lidar for autonomous driving. IEEE Transactions on Vehicular Technology, 2022.
- Enhanced k-radar: Optimal density reduction to improve detection performance and accessibility of 4d radar tensor-based object detection. In Proceedings of the IEEE Intelligent Vehicles Symposium (IV), pages 1–6. IEEE, 2023.
- Tj4dradset: A 4d radar dataset for autonomous driving. In Proceedings of the IEEE 25th International Conference on Intelligent Transportation Systems (ITSC), pages 493–498. IEEE, 2022.
- Smurf: Spatial multi-representation fusion for 3d object detection with 4d imaging radar. IEEE Transactions on Intelligent Vehicles, 2023.
- Rcfusion: Fusing 4d radar and camera with bird’s-eye view features for 3d object detection. IEEE Transactions on Instrumentation and Measurement, 2023.
- Lxl: Lidar excluded lean 3d object detection with 4d imaging radar and camera fusion. IEEE Transactions on Intelligent Vehicles, 2023.
- Tl-4drcf: A two-level 4d radar-camera fusion method for object detection in adverse weather. IEEE Sensors Journal, 2024.
- Rtnh+: Enhanced 4d radar object detection network using combined cfar-based two-level preprocessing and vertical encoding. arXiv preprint arXiv:2310.17659, 2023.
- Mvfan: Multi-view feature assisted network for 4d radar object detection. In Proceedings of the International Conference on Neural Information Processing, pages 493–511. Springer, 2023.
- Dual radar: A multi-modal dataset with dual 4d radar for autononous driving. arXiv preprint arXiv:2310.07602, 2023.
- See beyond seeing: Robust 3d object detection from point clouds via cross-modal hallucination. arXiv preprint arXiv:2309.17336, 2023.
- 3d detection and tracking for on-road vehicles with a monovision camera and dual low-cost 4d mmwave radars. In Proceedings of the IEEE International Intelligent Transportation Systems Conference (ITSC), pages 2931–2937. IEEE, 2021.
- Moving object detection and tracking with 4d radar point cloud. In Proceedings of the IEEE International Conference on Robotics and Automation (ICRA), 2024.
- Tracking of multiple static and dynamic targets for 4d automotive millimeter-wave radar point cloud in urban environments. Remote Sensing, 15(11):2923, 2023.
- Statistical signal processing: detection, estimation, and time series analysis. Prentice Hall, 1991.
- Analysis of cfar processors in nonhomogeneous background. IEEE Transactions on Aerospace and Electronic systems, 24(4):427–445, 1988.
- Deformable detr: Deformable transformers for end-to-end object detection. In Proceedings of the International Conference on Learning Representations (ICLR), 2020.
- 3d semantic scene completion: A survey. International Journal of Computer Vision, 130(8):1978–2005, 2022.
- Semantic scene completion from a single depth image. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 1746–1754, 2017.
- See and think: Disentangling semantic scene completion. Advances in Neural Information Processing Systems, 31, 2018.
- Efficient semantic scene completion network with spatial group convolution. In Proceedings of the European Conference on Computer Vision (ECCV), pages 733–749, 2018.
- Depth based semantic scene completion with position importance aware loss. IEEE Robotics and Automation Letters, 5(1):219–226, 2019.
- Rgbd based dimensional decomposition residual network for 3d semantic scene completion. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 7693–7702, 2019.
- Cascaded context pyramid for full-resolution 3d semantic scene completion. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 7801–7810, 2019.
- Anisotropic convolutional networks for 3d semantic scene completion. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 3351–3359, 2020.
- 3d sketch-aware semantic scene completion via semi-supervised structure prior. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 4193–4202, 2020.
- Semantic scene completion via integrating instances and scene in-the-loop. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 324–333, 2021.
- Semantickitti: A dataset for semantic scene understanding of lidar sequences. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 9297–9307, 2019.
- Hidden gems: 4d radar scene flow learning using cross-modal supervision. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 9340–9349, 2023.
- Self-supervised scene flow estimation with 4-d automotive radar. IEEE Robotics and Automation Letters, 7(3):8233–8240, 2022.
- Ntu4dradlm: 4d radar-centric multi-modal dataset for localization and mapping. In Proceedings of the IEEE 26th International Conference on Intelligent Transportation Systems (ITSC), pages 4291–4296. IEEE, 2023.
- Msc-rad4r: Ros-based automotive dataset with 4d radar. IEEE Robotics and Automation Letters, 2023.
- Efficient deep-learning 4d automotive radar odometry method. IEEE Transactions on Intelligent Vehicles, 2023.
- 4drvo-net: Deep 4d radar–visual odometry using multi-modal and multi-scale adaptive fusion. IEEE Transactions on Intelligent Vehicles, 2023.
- 4d iriom: 4d imaging radar inertial odometry and mapping. IEEE Robotics and Automation Letters, 2023.
- 4d radar-based pose graph slam with ego-velocity pre-integration factor. IEEE Robotics and Automation Letters, 2023.
- 4dradarslam: A 4d imaging radar slam system for large-scale environments based on pose graph optimization. In Proceedings of the IEEE International Conference on Robotics and Automation (ICRA), pages 8333–8340. IEEE, 2023.
- Raw high-definition radar for multi-task learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 17021–17030, 2022.
- Object detection and 3d estimation via an fmcw radar using a fully convolutional network. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 4487–4491. IEEE, 2020.
- Rodnet: A real-time radar object detection network cross-supervised by camera-radar fused object 3d localization. IEEE Journal of Selected Topics in Signal Processing, 15(4):954–967, 2021.
- Probabilistic oriented object detection in automotive radar. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pages 102–103, 2020.
- Raddet: Range-azimuth-doppler based radar object detection for dynamic road users. In Proceedings of the 18th Conference on Robots and Vision (CRV), pages 95–102. IEEE, 2021.
- Cnn based road user detection using the 3d radar cube. IEEE Robotics and Automation Letters, 5(2):1263–1270, 2020.
- Vehicle detection with automotive radar using deep learning on range-azimuth-doppler tensors. In Proceedings of the IEEE/CVF International Conference on Computer Vision Workshops (ICCVW), 2019.
- Texas Instruments. mmWave Radar Sensors - Overview. https://www.ti.com/sensors/mmwave-radar/overview.html, 2024. Accessed: 2024-02-22.
- Coloradar: The direct 3d millimeter wave radar dataset. The International Journal of Robotics Research, 41(4):351–360, 2022.
- Stephen Blake. Os-cfar theory for multiple targets and nonuniform clutter. IEEE Transactions on Aerospace and Electronic Systems, 24(6):785–790, 1988.
- A novel radar point cloud generation method for robot environment perception. IEEE Transactions on Robotics, 38(6):3754–3773, 2022.
- Learning spatiotemporal features with 3d convolutional networks. In Proceedings of the IEEE International Conference on Computer Vision (ICCV), pages 4489–4497, 2015.
- Voxelnet: End-to-end learning for point cloud based 3d object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 4490–4499, 2018.
- R Kwok and C Haas. Effects of radar side-lobes on snow depth retrievals from operation icebridge. Journal of Glaciology, 61(227):576–584, 2015.
- Peter Tait. Introduction to radar target recognition, volume 18. IET, 2005.
- Partner: Level up the polar representation for lidar 3d object detection. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 3801–3813, 2023.
- Cylindrical and asymmetrical 3d convolution networks for lidar segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 9939–9948, 2021.
- Attention is all you need. Advances in neural information processing systems, 30, 2017.
- Second: Sparsely embedded convolutional detection. Sensors, 18(10):3337, 2018.
- Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 770–778, 2016.
- Feature pyramid networks for object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 2117–2125, 2017.
- The lovász-softmax loss: A tractable surrogate for the optimization of the intersection-over-union measure in neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 4413–4421, 2018.
- Sscbench: A large-scale 3d semantic scene completion benchmark for autonomous driving. arXiv preprint arXiv:2306.09001, 2023.
- Fangqiang Ding (16 papers)
- Xiangyu Wen (8 papers)
- Yunzhou Zhu (2 papers)
- Yiming Li (200 papers)
- Chris Xiaoxuan Lu (50 papers)