Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
156 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

CRKD: Enhanced Camera-Radar Object Detection with Cross-modality Knowledge Distillation (2403.19104v1)

Published 28 Mar 2024 in cs.CV and cs.RO

Abstract: In the field of 3D object detection for autonomous driving, LiDAR-Camera (LC) fusion is the top-performing sensor configuration. Still, LiDAR is relatively high cost, which hinders adoption of this technology for consumer automobiles. Alternatively, camera and radar are commonly deployed on vehicles already on the road today, but performance of Camera-Radar (CR) fusion falls behind LC fusion. In this work, we propose Camera-Radar Knowledge Distillation (CRKD) to bridge the performance gap between LC and CR detectors with a novel cross-modality KD framework. We use the Bird's-Eye-View (BEV) representation as the shared feature space to enable effective knowledge distillation. To accommodate the unique cross-modality KD path, we propose four distillation losses to help the student learn crucial features from the teacher model. We present extensive evaluations on the nuScenes dataset to demonstrate the effectiveness of the proposed CRKD framework. The project page for CRKD is https://song-jingyu.github.io/CRKD.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (75)
  1. Transfusion: Robust lidar-camera fusion for 3d object detection with transformers. In CVPR, 2022.
  2. nuscenes: A multimodal dataset for autonomous driving. In CVPR, 2020.
  3. Dataset and benchmark: Novel sensors for autonomous vehicle perception. arXiv preprint arXiv:2401.13853, 2024.
  4. Futr3d: A unified sensor fusion framework for 3d detection. In CVPR Workshop, 2023a.
  5. Bevdistill: Cross-modal bev distillation for multi-view 3d object detection. In ICLR, 2023b.
  6. Monodistill: Learning spatial features for monocular 3d object detection. In ICLR, 2022.
  7. MMDetection3D Contributors. MMDetection3D: OpenMMLab next-generation platform for general 3D object detection. https://github.com/open-mmlab/mmdetection3d, 2020.
  8. Deepfusion: A robust and modular 3d object detector for lidars, cameras and radars. In IROS, 2022.
  9. Are we ready for autonomous driving? the kitti vision benchmark suite. In CVPR, 2012.
  10. Liga-stereo: Learning lidar geometry aware representations for stereo-based 3d detector. In ICCV, 2021.
  11. Deep residual learning for image recognition. In CVPR, 2016.
  12. Distilling the knowledge in a neural network. arXiv preprint arXiv:1503.02531, 2015.
  13. Cross-modality knowledge distillation network for monocular 3d object detection. In ECCV, 2022.
  14. Ea-lss: Edge-aware lift-splat-shot framework for 3d bev object detection. arXiv preprint arXiv:2303.17895, 2023a.
  15. Planning-oriented autonomous driving. In CVPR, 2023b.
  16. Bevdet4d: Exploit temporal cues in multi-camera 3d object detection. arXiv preprint arXiv:2203.17054, 2022.
  17. Bevdet: High-performance multi-camera 3d object detection in bird-eye-view. arXiv preprint arXiv:2112.11790, 2021.
  18. Tig-bev: Multi-view bev 3d object detection via target inner-geometry learning. arXiv preprint arXiv:2212.13979, 2022.
  19. Epnet: Enhancing point features with image semantics for 3d object detection. In ECCV, 2020.
  20. Adaptive mixtures of local experts. In Neural computation, 1991.
  21. Paint and distill: Boosting 3d object detection with semantic passing network. In ACM MM, 2022.
  22. Rcm-fusion: Radar-camera multi-level fusion for 3d object detection. arXiv preprint arXiv:2307.10249, 2023a.
  23. Grif net: Gated region of interest fusion network for robust 3d object detection from radar point cloud and monocular image. In IROS, 2020.
  24. Craft: Camera-radar 3d object detection with spatio-contextual fusion transformer. In AAAI, 2023b.
  25. Crn: Camera radar net for accurate, robust, efficient 3d perception. In ICCV, 2023c.
  26. X3kd: Knowledge distillation across modalities, tasks and stages for multi-camera 3d object detection. In CVPR, 2023.
  27. Pointpillars: Fast encoders for object detection from point clouds. In CVPR, 2019.
  28. Hvdetfusion: A simple and robust camera-radar fusion framework. arXiv preprint arXiv:2307.11323, 2023.
  29. Bev-lgkd: A unified lidar-guided knowledge distillation framework for multi-view bev 3d object detection. In IEEE IV, 2023a.
  30. Generalized focal loss: Learning qualified and distributed bounding boxes for dense object detection. In NeurIPS, 2020.
  31. Unifying voxel-based representation with transformer for 3d object detection. In NeurIPS, 2022a.
  32. π–Ύπ—“π–Ώπ—Žπ—Œπ—‚π—ˆπ—‡π–Ύπ—“π–Ώπ—Žπ—Œπ—‚π—ˆπ—‡{\mathsf{ezfusion}}sansserif_ezfusion: A close look at the integration of lidar, millimeter-wave radar, and camera for accurate 3d object detection and tracking. In IEEE RAL, 2022b.
  33. Voxel field fusion for 3d object detection. In CVPR, 2022c.
  34. Bevdepth: Acquisition of reliable depth for multi-view 3d object detection. In AAAI, 2023b.
  35. Bevformer: Learning bird’s-eye-view representation from multi-camera images via spatiotemporal transformers. In ECCV, 2022d.
  36. When object detection meets knowledge distillation: A survey. In IEEE TPAMI, 2023c.
  37. Bevfusion: A simple and robust lidar-camera fusion framework. In NeurIPS, 2022.
  38. Swin transformer: Hierarchical vision transformer using shifted windows. In ICCV, 2021.
  39. Bevfusion: Multi-task multi-sensor fusion with unified bird’s-eye view representation. In ICRA, 2023a.
  40. Stereodistill: Pick the cream from lidar for distilling stereo-based 3d object detection. In AAAI, 2023b.
  41. Sgdr: Stochastic gradient descent with warm restarts. arXiv preprint arXiv:1608.03983, 2016.
  42. Decoupled weight decay regularization. arXiv preprint arXiv:1711.05101, 2017.
  43. Waymo open dataset: Panoramic video panoptic segmentation. In ECCV, 2022.
  44. Centerfusion: Center-based radar and camera fusion for 3d object detection. In WACV, 2021.
  45. Clocs: Camera-lidar object candidates fusion for 3d object detection. In IROS, 2020.
  46. Fast-clocs: Fast camera-lidar object candidates fusion for 3d object detection. In WACV, 2022.
  47. Standing between past and future: Spatio-temporal modeling for multi-camera 3d multi-object tracking. In CVPR, 2023.
  48. Image-to-lidar self-supervised distillation for autonomous driving data. In CVPR, 2022.
  49. Mvx-net: Multimodal voxelnet for 3d object detection. In ICRA, 2019.
  50. Lirafusion: Deep adaptive lidar-radar fusion for 3d object detection. arXiv preprint arXiv:2402.11735, 2024.
  51. Pointpainting: Sequential fusion for 3d object detection. In CVPR, 2020.
  52. Pointaugmenting: Cross-modal augmentation for 3d object detection. In CVPR, 2021.
  53. Multi-modal 3d object detection in autonomous driving: A survey and taxonomy. In IEEE IV, 2023a.
  54. Object dgcnn: 3d object detection using dynamic graphs. In NeurIPS, 2021.
  55. Distillbev: Boosting multi-camera 3d object detection with cross-modal knowledge distillation. In ICCV, 2023b.
  56. Lidar distillation: Bridging the beam-induced domain gap for 3d object detection. In ECCV, 2022.
  57. Motionsc: Data set and network for real-time semantic mapping in dynamic environments. In IEEE RAL, 2022.
  58. Convolutional bayesian kernel inference for 3d semantic mapping. In ICRA, 2023.
  59. Mvfusion: Multi-view 3d object detection with semantic-aligned radar and camera fusion. In ICRA, 2023.
  60. Fusionpainting: Multimodal fusion with adaptive attention for 3d object detection. In IEEE ITSC, 2021.
  61. Cross modal transformer: Towards fast and robust 3d object detection. In ICCV, 2023.
  62. Second: Sparsely embedded convolutional detection. In MDPI Sensors, 2018.
  63. Radarnet: Exploiting radar for robust perception of dynamic objects. In ECCV, 2020.
  64. Towards efficient 3d object detection with knowledge distillation. In NeurIPS, 2022a.
  65. Deepinteraction: 3d object detection via modality interaction. In NeurIPS, 2022b.
  66. Center-based 3d object detection and tracking. In CVPR, 2021.
  67. 3d-cvf: Generating joint camera and lidar features using cross-view spatial feature fusion for 3d object detection. In ECCV, 2020.
  68. Distilling focal knowledge from imperfect expert for 3d object detection. In CVPR, 2023.
  69. Structured knowledge distillation towards efficient and compact multi-view 3d detection. arXiv preprint arXiv:2211.08398, 2022.
  70. Pointdistiller: Structured knowledge distillation towards efficient and compact 3d detection. In CVPR, 2023.
  71. Bevsimdet: Simulated multi-modal distillation in bird’s-eye view for multi-view 3d object detection. arXiv preprint arXiv:2303.16818, 2023.
  72. Unidistill: A universal cross-modality knowledge distillation framework for 3d object detection in bird’s-eye view. In CVPR, 2023a.
  73. Bridging the view disparity between radar and camera features for multi-modal fusion 3d object detection. In IEEE IV, 2023b.
  74. Towards deep radar perception for autonomous driving: Datasets, methods, and challenges. In MDPI Sensors, 2022.
  75. Class-balanced grouping and sampling for point cloud 3d object detection. arXiv preprint arXiv:1908.09492, 2019.
Citations (7)

Summary

We haven't generated a summary for this paper yet.

Github Logo Streamline Icon: https://streamlinehq.com
X Twitter Logo Streamline Icon: https://streamlinehq.com
Youtube Logo Streamline Icon: https://streamlinehq.com