Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
110 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

RCBEVDet: Radar-camera Fusion in Bird's Eye View for 3D Object Detection (2403.16440v1)

Published 25 Mar 2024 in cs.CV

Abstract: Three-dimensional object detection is one of the key tasks in autonomous driving. To reduce costs in practice, low-cost multi-view cameras for 3D object detection are proposed to replace the expansive LiDAR sensors. However, relying solely on cameras is difficult to achieve highly accurate and robust 3D object detection. An effective solution to this issue is combining multi-view cameras with the economical millimeter-wave radar sensor to achieve more reliable multi-modal 3D object detection. In this paper, we introduce RCBEVDet, a radar-camera fusion 3D object detection method in the bird's eye view (BEV). Specifically, we first design RadarBEVNet for radar BEV feature extraction. RadarBEVNet consists of a dual-stream radar backbone and a Radar Cross-Section (RCS) aware BEV encoder. In the dual-stream radar backbone, a point-based encoder and a transformer-based encoder are proposed to extract radar features, with an injection and extraction module to facilitate communication between the two encoders. The RCS-aware BEV encoder takes RCS as the object size prior to scattering the point feature in BEV. Besides, we present the Cross-Attention Multi-layer Fusion module to automatically align the multi-modal BEV feature from radar and camera with the deformable attention mechanism, and then fuse the feature with channel and spatial fusion layers. Experimental results show that RCBEVDet achieves new state-of-the-art radar-camera fusion results on nuScenes and view-of-delft (VoD) 3D object detection benchmarks. Furthermore, RCBEVDet achieves better 3D detection results than all real-time camera-only and radar-camera 3D object detectors with a faster inference speed at 21~28 FPS. The source code will be released at https://github.com/VDIGPKU/RCBEVDet.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (54)
  1. Seeing through fog without seeing fog: Deep multimodal sensor fusion in unseen adverse weather. In CVPR, 2020.
  2. nuscenes: A multimodal dataset for autonomous driving. In CVPR, 2020.
  3. Gregory L Charvat. Small and short-range radar systems. CRC Press, 2014.
  4. Monopair: Monocular 3d object detection using pairwise spatial relationships. In CVPR, 2020.
  5. Graph-detr3d: rethinking overlapping regions for multi-view 3d object detection. In ACM MM, 2022.
  6. An image is worth 16x16 words: Transformers for image recognition at scale. In ICLR, 2020.
  7. Squeeze-and-excitation networks. In CVPR, 2018.
  8. Bevdet4d: Exploit temporal cues in multi-camera 3d object detection. arXiv preprint arXiv:2203.17054, 2022a.
  9. Bevpoolv2: A cutting-edge implementation of bevdet toward deployment. arXiv preprint arXiv:2211.17111, 2022b.
  10. Bevdet: High-performance multi-camera 3d object detection in bird-eye-view. arXiv preprint arXiv:2112.11790, 2021.
  11. Grif net: Gated region of interest fusion network for robust 3d object detection from radar point cloud and monocular image. In IROS, 2020.
  12. Craft: Camera-radar 3d object detection with spatio-contextual fusion transformer. In AAAI, 2023a.
  13. Crn: Camera radar net for accurate, robust, efficient 3d perception. In ICCV, 2023b.
  14. Adam: A method for stochastic optimization. In ICLR, 2015.
  15. Pointpillars: Fast encoders for object detection from point clouds. In CVPR, 2019.
  16. Stereo r-cnn based 3d object detection for autonomous driving. In CVPR, 2019.
  17. Exploiting temporal relations on radar perception for autonomous driving. In CVPR, 2022a.
  18. Bevdepth: Acquisition of reliable depth for multi-view 3d object detection. In AAAI, 2023.
  19. Bevformer: Learning bird’s-eye-view representation from multi-camera images via spatiotemporal transformers. In ECCV, 2022b.
  20. Bevfusion: A simple and robust lidar-camera fusion framework. In NeurIPS, 2022.
  21. Voxel-based 3d detection and reconstruction of multiple objects from a single image. NeurIPS, 2021.
  22. Sparsebev: High-performance sparse 3d object detection from multi-camera videos. In ICCV, 2023a.
  23. Petr: Position embedding transformation for multi-view 3d object detection. In ECCV, 2022.
  24. Petrv2: A unified framework for 3d perception from multi-camera images. In ICCV, 2023b.
  25. Bevfusion: Multi-task multi-sensor fusion with unified bird’s-eye view representation. In ICRA, 2023c.
  26. RADIANT: radar-image association network for 3d object detection. In AAAI, 2023.
  27. Geometry uncertainty projection network for monocular 3d object detection. In ICCV, 2021.
  28. Delving into localization errors for monocular 3d object detection. In CVPR, 2021.
  29. Pvgnet: A bottom-up one-stage 3d object detector with integrated multi-level features. In CVPR, 2021.
  30. Centerfusion: Center-based radar and camera fusion for 3d object detection. In WACV, 2021.
  31. Multi-class road user detection with 3+ 1d radar in the view-of-delft dataset. IEEE Robotics and Automation Letters, 2022.
  32. Time will tell: New outlooks and A baseline for temporal multi-view 3d object detection. In ICLR, 2023.
  33. Lift, splat, shoot: Encoding images from arbitrary camera rigs by implicitly unprojecting to 3d. In ECCV, 2020.
  34. Pointnet: Deep learning on point sets for 3d classification and segmentation. In CVPR, 2017.
  35. From points to parts: 3d object detection from point cloud with part-aware and part-aggregation network. IEEE TPAMI, 2020.
  36. Towards generalization across depth for monocular 3d object detection. In ECCV, 2020.
  37. Scalability in perception for autonomous driving: Waymo open dataset. In CVPR, 2020.
  38. Improved orientation estimation and detection with hybrid object detection networks for automotive radar. In IEEE ITSC, 2022.
  39. Attention is all you need. In NeurIPS, 2017.
  40. Exploring object-centric temporal modeling for efficient multi-view 3d object detection. In ICCV, 2023.
  41. Fcos3d: Fully convolutional one-stage monocular 3d object detection. In ICCV, 2021.
  42. Monocular 3d object detection with pseudo-lidar point cloud. In ICCVW, 2019.
  43. Mvfusion: Multi-view 3d object detection with semantic-aligned radar and camera fusion. In ICRA, 2023.
  44. Lxl: Lidar exclusive lean 3d object detection with 4d imaging radar and camera fusion. arXiv preprint arXiv:2307.00724, 2023.
  45. Monocular depth estimation with multi-scale feature fusion. IEEE Signal Processing Letters, 2021.
  46. Second: Sparsely embedded convolutional detection. Sensors, 2018.
  47. Radarnet: Exploiting radar for robust perception of dynamic objects. In ECCV, 2020.
  48. Center-based 3d object detection and tracking. In CVPR, 2021.
  49. Rcfusion: Fusing 4d radar and camera with bird’s-eye view features for 3d object detection. IEEE Transactions on Instrumentation and Measurement, 2023.
  50. Bridging the view disparity between radar and camera features for multi-modal fusion 3d object detection. IEEE Transactions on Intelligent Vehicles, 2023.
  51. Towards deep radar perception for autonomous driving: Datasets, methods, and challenges. Sensors, 2022.
  52. Class-balanced grouping and sampling for point cloud 3d object detection. arXiv preprint arXiv:1908.09492, 2019.
  53. Deformable DETR: deformable transformers for end-to-end object detection. In ICLR, 2021.
  54. Temporal enhanced training of multi-view 3d object detector via historical object prediction. In ICCV, 2023.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (10)
  1. Zhiwei Lin (41 papers)
  2. Zhe Liu (234 papers)
  3. Zhongyu Xia (5 papers)
  4. Xinhao Wang (12 papers)
  5. Yongtao Wang (43 papers)
  6. Shengxiang Qi (3 papers)
  7. Yang Dong (28 papers)
  8. Nan Dong (6 papers)
  9. Le Zhang (180 papers)
  10. Ce Zhu (85 papers)
Citations (12)

Summary

We haven't generated a summary for this paper yet.