LXL: LiDAR Excluded Lean 3D Object Detection with 4D Imaging Radar and Camera Fusion (2307.00724v4)
Abstract: As an emerging technology and a relatively affordable device, the 4D imaging radar has already been confirmed effective in performing 3D object detection in autonomous driving. Nevertheless, the sparsity and noisiness of 4D radar point clouds hinder further performance improvement, and in-depth studies about its fusion with other modalities are lacking. On the other hand, as a new image view transformation strategy, "sampling" has been applied in a few image-based detectors and shown to outperform the widely applied "depth-based splatting" proposed in Lift-Splat-Shoot (LSS), even without image depth prediction. However, the potential of "sampling" is not fully unleashed. This paper investigates the "sampling" view transformation strategy on the camera and 4D imaging radar fusion-based 3D object detection. LiDAR Excluded Lean (LXL) model, predicted image depth distribution maps and radar 3D occupancy grids are generated from image perspective view (PV) features and radar bird's eye view (BEV) features, respectively. They are sent to the core of LXL, called "radar occupancy-assisted depth-based sampling", to aid image view transformation. We demonstrated that more accurate view transformation can be performed by introducing image depths and radar information to enhance the "sampling" strategy. Experiments on VoD and TJ4DRadSet datasets show that the proposed method outperforms the state-of-the-art 3D object detection methods by a significant margin without bells and whistles. Ablation studies demonstrate that our method performs the best among different enhancement settings.
- J. Liu, W. Xiong, L. Bai, Y. Xia, T. Huang, W. Ouyang, and B. Zhu, “Deep instance segmentation with automotive radar detection points,” IEEE Transactions on Intelligent Vehicles, vol. 8, no. 1, pp. 84-94, 2023.
- W. Xiong, J. Liu, Y. Xia, T. Huang, B. Zhu, and W. Xiang, “Contrastive learning for automotive mmWave radar detection points based instance segmentation,” in Proceedings of the IEEE International Conference on Intelligent Transportation Systems (ITSC), 2022, pp. 1255-1261.
- Y. Yang, J. Liu, T. Huang, Q.-L. Han, G. Ma, and B. Zhu, “RaLiBEV: Radar and LiDAR BEV Fusion Learning for Anchor Box Free Object Detection Systems,” 2022, arXiv:2211.06108.
- J. Liu, L. Bai, Y. Xia, T. Huang, B. Zhu, and Q.-L. Han, “GNN-PMB: A simple but effective online 3D multi-object tracker without bells and whistles,” IEEE Transactions on Intelligent Vehicles, vol. 8, no. 2, pp. 1176-1189, 2023.
- P. Sun, S. Li, B. Zhu, Z. Zuo, X. Xia, “Vision-Based Fixed-Time Uncooperative Aerial Target Tracking for UAV,” IEEE/CAA Journal of Automatica Sinica, vol. 10, no. 5, pp. 1322-1324, 2023.
- J. Mao, S. Shi, X. Wang, and H. Li, “3D object detection for autonomous driving: A review and new outlooks,” International Journal of Computer Vision (IJCV), vol. 131, pp. 1909–1963, 2023.
- S. Sun, A. P. Petropulu, and H. V. Poor, “MIMO radar for advanced driver-assistance systems and autonomous driving: Advantages and challenges,” IEEE Signal Processing Magazine, vol. 37, no. 4, pp. 98–117, 2020.
- A. Caillot, S. Ouerghi, P. Vasseur, R. Boutteau, and Y. Dupuis, “Survey on cooperative perception in an automotive context,” IEEE Transactions on Intelligent Transportation Systems, vol. 23, no. 9, pp. 14204-14223, 2022.
- Z. Han, J. Wang, Z. Xu, S. Yang, L. He, S. Xu, and J. Wang, “4D millimeter-wave radar in autonomous driving: A survey,” 2023, arXiv:2306.04242.
- A. Palffy, E. Pool, S. Baratam, J. F. Kooij, and D. M. Gavrila, “Multi-class road user detection with 3+1D radar in the View-of-Delft dataset,” IEEE Robotics and Automation Letters, vol. 7, no. 2, pp. 4961–4968, 2022.
- B. Xu, X. Zhang, L. Wang, X. Hu, Z. Li, S. Pan, J. Li, and Y. Deng, “RPFA-Net: A 4D radar pillar feature attention network for 3D object detection,” in Proceedings of the IEEE International Intelligent Transportation Systems Conference (ITSC), 2021, pp. 3061–3066.
- B. Tan, Z. Ma, X. Zhu, S. Li, L. Zheng, S. Chen, L. Huang, and J. Bai, “3D object detection for multi-frame 4D automotive millimeter-wave radar point cloud,” IEEE Sensors Journal, 2022, doi: 10.1109/JSEN.2022.3219643.
- M. Drobnitzky, J. Friederich, B. Egger, and P. Zschech, “Survey and systematization of 3D object detection models and methods,” The Visual Computer, pp. 1-47, 2023.
- Y. Ma, T. Wang, X. Bai, H. Yang, Y. Hou, Y. Wang, Y. Qiao, R. Yang, D. Manocha, and X. Zhu, “Vision-centric BEV perception: A survey,” 2022, arXiv:2208.02797.
- C. Reading, A. Harakeh, J. Chae, and S. L. Waslander, “Categorical depth distribution network for monocular 3D object detection,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2021, pp. 8555–8564.
- J. Huang, G. Huang, Z. Zhu, and D. Du, “BEVDet: High-performance multi-camera 3D object detection in bird-eye-view,” 2023, arXiv:2112.11790.
- E. Xie, Z. Yu, D. Zhou, J. Philion, A. Anandkumar, S. Fidler, P. Luo, and J. M. Alvarez, “M2BEV: Multi-camera joint 3D detection and segmentation with unified birds-eye view representation,” 2022, arXiv:2204.05088.
- Z. Li, W. Wang, H. Li, E. Xie, C. Sima, T. Lu, Y. Qiao, and J. Dai, “BEVFormer: Learning bird’s-eye-view representation from multi-camera images via spatio-temporal transformers,” in Proceedings of the 17th European Conference on Computer Vision (ECCV). Springer, 2022, pp. 1–18.
- Y. Jiang, L. Zhang, Z. Miao, X. Zhu, J. Gao, W. Hu, and Y.-G. Jiang, “PolarFormer: Multi-camera 3D object detection with polar transformers,” 2022, arXiv:2206.15398.
- J. Philion and S. Fidler, “Lift, Splat, Shoot: Encoding images from arbitrary camera rigs by implicitly unprojecting to 3D,” in Proceedings of the 16th European Conference on Computer Vision (ECCV). Springer, 2020, pp. 194–210.
- Y. Li, Z. Ge, G. Yu, J. Yang, Z. Wang, Y. Shi, J. Sun, and Z. Li, “BEVDepth: Acquisition of reliable depth for multi-view 3D object detection,” 2022, arXiv:2206.10092.
- Y. Kim, S. Kim, J. Shin, J. W. Choi, and D. Kum, “CRN: Camera radar net for accurate, robust, efficient 3D perception,” in Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2023.
- A. W. Harley, Z. Fang, J. Li, R. Ambrus, and K. Fragkiadaki, “Simple-BEV: What really matters for multi-sensor BEV perception?” in Proceedings of the IEEE International Conference on Robotics and Automation (ICRA), 2023.
- G. Brazil and X. Liu, “M3D-RPN: Monocular 3D region proposal network for object detection,” in Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2019, pp. 9287–9296.
- H. A. Mallot, H. H. Bülthoff, J. Little, and S. Bohrer, “Inverse perspective mapping simplifies optical flow computation and obstacle detection,” Biological Cybernetics, vol. 64, no. 3, pp. 177–185, 1991.
- Y. Wang, W.-L. Chao, D. Garg, B. Hariharan, M. Campbell, and K. Q. Weinberger, “Pseudo-LiDAR from visual depth estimation: Bridging the gap in 3D object detection for autonomous driving,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2019, pp. 8445–8453.
- T.-Y. Lim, A. Ansari, B. Major, D. Fontijne, M. Hamilton, R. Gowaikar, and S. Subramanian, “Radar and camera early fusion for vehicle detection in advanced driver assistance systems,” in Proceedings of the 33rd Conference on Neural Information Processing Systems (NeurIPS), Machine Learning for Autonomous Driving Workshop, vol. 2, 2019, p. 7.
- R. Nabati and H. Qi, “Radar-camera sensor fusion for joint object detection and distance estimation in autonomous vehicles,” 2020, arXiv:2009.08428.
- R. Nabati and H. Qi, “CenterFusion: Center-based radar and camera fusion for 3D object detection,” in Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), 2021, pp. 1527–1536.
- S. Yao, R. Guan, X. Huang, Z. Li, X. Sha, Y. Yue, E. G. Lim, H. Seo, K. L. Man, X. Zhu, and Y. Yue, “Radar-camera fusion for object detection and semantic segmentation in autonomous driving: A comprehensive review,” IEEE Transactions on Intelligent Vehicles, 2023. doi: 10.1109/TIV.2023.3307157.
- J.-J. Hwang, H. Kretzschmar, J. Manela, S. Rafferty, N. Armstrong-Crews, T. Chen, and D. Anguelov, “CramNet: Camera-radar fusion with ray-constrained cross-attention for robust 3D object detection,” in Proceedings of the 17th European Conference on Computer Vision (ECCV). Springer, 2022, pp. 388–405.
- T. Zhou, J. Chen, Y. Shi, K. Jiang, M. Yang, and D. Yang, “Bridging the view disparity between radar and camera features for multi-modal fusion 3D object detection,” IEEE Transactions on Intelligent Vehicles, vol. 8, no. 2, pp. 1523–1535, 2023.
- Y. Long, A. Kumar, D. Morris, X. Liu, M. Castro, and P. Chakravarty, “RADIANT: Radar-image association network for 3D object detection,” in Proceedings of the 37th AAAI Conference on Artificial Intelligence, 2023.
- Z. Wu, G. Chen, Y. Gan, L. Wang, and J. Pu, “MVFusion: Multi-view 3D object detection with semantic-aligned radar and camera fusion,” in Proceedings of the IEEE International Conference on Robotics and Automation (ICRA), 2023, pp. 2766-2773.
- Y. Kim, S. Kim, J. W. Choi, and D. Kum, “CRAFT: Camera-radar 3D object detection with spatio-contextual fusion transformer,” 2022, arXiv:2209.06535. Accepted by the 37th AAAI Conference on Artificial Intelligence.
- P. Su, M. Daniel, and R. Hayder, “TransCAR: Transformer-based camera-and-radar fusion for 3D object detection,” 2023, arXiv:2305.00397. Accepted by the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).
- M. Meyer and G. Kuschk, “Automotive radar dataset for deep learning based 3D object detection,” in Proceedings of the IEEE 16th European Radar Conference (EuRAD), 2019, pp. 129–132.
- J. Rebut, A. Ouaknine, W. Malik, and P. Pérez, “Raw high-definition radar for multi-task learning,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2022, pp. 17021–17030.
- D.-H. Paek, S.-H. Kong, and K. T. Wijaya, “K-radar: 4D radar object detection for autonomous driving in various weather conditions,” in Proceedings of the 36th Conference on Neural Information Processing Systems (NeuIPS), Datasets and Benchmarks Track, 2022.
- Y. Yan, Y. Mao, and B. Li, “SECOND: Sparsely embedded convolutional detection,” Sensors, vol. 18, no. 10, p. 3337, 2018.
- T. Yin, X. Zhou, and P. Krahenbuhl, “Center-based 3D object detection and tracking,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2021, pp. 11784–11793.
- A. H. Lang, S. Vora, H. Caesar, L. Zhou, J. Yang, and O. Beijbom, “PointPillars: Fast encoders for object detection from point clouds,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2019, pp. 12697–12705.
- C. R. Qi, H. Su, K. Mo, and L. J. Guibas, “PointNet: Deep learning on point sets for 3D classification and segmentation,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017, pp. 652–660.
- J. Liu, Q. Zhao, W. Xiong, T. Huang, Q.-L. Han, and B. Zhu, “SMURF: Spatial multi-representation fusion for 3D object detection with 4D imaging radar,” 2023, arXiv:2307.10784.
- L. Wang, X. Zhang, B. Xv, J. Zhang, R. Fu, X. Wang, L. Zhu, H. Ren, P. Lu, J. Li, and H. Liu, “InterFusion: Interaction-based 4D radar and LiDAR fusion for 3D object detection,” in Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2022, pp. 12247–12253.
- L. Wang, X. Zhang, J. Li, B. Xv, R. Fu, H. Chen, L. Yang, D. Jin, and L. Zhao, “Multi-modal and multi-scale fusion 3D object detection of 4D radar and LiDAR for autonomous driving,” IEEE Transactions on Vehicular Technology, pp. 1–15, 2022.
- A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, and I. Polosukhin, “Attention is all you need,” Advances in Neural Information Processing Systems, 2017, pp. 30.
- H. Cui, J. Wu, J. Zhang, G. Chowdhary, and W. R. Norris, “3D detection and tracking for on-road vehicles with a monovision camera and dual low-cost 4D mmwave radars,” in Proceedings of the IEEE International Intelligent Transportation Systems Conference (ITSC). IEEE, 2021, pp. 2931–2937.
- A. Valada, R. Mohan, and W. Burgard, “Self-supervised model adaptation for multi-modal semantic segmentation,” International Journal of Computer Vision, vol. 128, no. 5, pp. 1239–1285, 2020.
- L. Zheng, S. Li, B. Tan, L. Yang, S. Chen, L. Huang, J. Bai, X. Zhu, and Z. Ma, “RCFusion: Fusing 4D radar and camera with bird’s-eye view features for 3D object detection,” IEEE Transactions on Instrumentation and Measurement, 2023, doi: 10.1109/TIM.2023.3280525.
- Z. Ge, S. Liu, F. Wang, Z. Li, and J. Sun, “YOLOX: Exceeding YOLO series in 2021,” 2021, arXiv:2107.08430.
- C.-Y. Wang, H.-Y. M. Liao, Y.-H. Wu, P.-Y. Chen, J.-W. Hsieh, and I.-H. Yeh, “CSPNet: A new backbone that can enhance learning capability of CNN,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) workshops, 2020, pp. 390–391.
- S. Liu, L. Qi, H. Qin, J. Shi, and J. Jia, “Path aggregation network for instance segmentation,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018, pp. 8759–8768.
- T. Roddick, A. Kendall, and R. Cipolla, “Orthographic feature transform for monocular 3D object detection,” in Proceedings of the British Machine Vision Conference (BMVC). BMVA Press, September 2019, pp. 59.1–59.13.
- MMDetection3D Contributors, “MMDetection3D: OpenMMLab next generation platform for general 3D object detection,” https://github.com/open-mmlab/mmdetection3d, 2020.
- X. Chen, T. Zhang, Y. Wang, Y. Wang, and H. Zhao, “FUTR3D: A unified sensor fusion framework for 3D detection,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2023, pp. 172–181.
- Z. Liu, H. Tang, A. Amini, X. Yang, H. Mao, D. Rus, and S. Han, “BEVFusion: Multi-task multi-sensor fusion with unified bird’s-eye view representation,” in Proceedings of the IEEE International Conference on Robotics and Automation (ICRA), 2023, pp. 2774-2781.
- K. Chen, J. Wang, J. Pang, Y. Cao, Y. Xiong, X. Li, S. Sun, W. Feng, Z. Liu, J, Xu, Z. Zhang, D. Cheng, C. Zhu, T. Cheng, Q. Zhao, B. Li, X. Lu, R. Zhu, Y. Wu, J. Dai, J. Wang, J. Shi, W. Ouyang, C. Loy, and D. Lin “MMDetection: Open MMLab detection toolbox and benchmark,” 2019, arXiv:1906.07155.
- T.-Y. Lin, M, Maire, S. Belongie, J. Hays, P. Perona, D. Ramanan, P. Dollár, and C. L. Zitnick, “Microsoft COCO: Common objects in context,” in Proceedings of the 13th European Conference on Computer Vision (ECCV). Springer, 2014, pp. 740-755.
- P. Palmer, M. Krueger, R. Altendorfer, G. Adam, and T. Bertram, “Reviewing 3D object detectors in the context of high-resolution 3+1D radar,” in Proceedings of the Conference on Computer Vision and Pattern Recognition (CVPR), Workshop on 3D Vision and Robotics, 2023.
- J. Wang, B. Zhu, and Z. Zheng, “Robust Adaptive Control for a Quadrotor UAV With Uncertain Aerodynamic Parameters,” IEEE Transactions on Aerospace and Electronic Systems (early access), 2023, doi: 10.1109/TAES.2023.3303133.
- W. Zheng, and B. Zhu, “Control Lyapunov–Barrier function based model predictive control for stochastic nonlinear affine systems,” IEEE International Journal of Robust and Nonlinear Control (early access), 2023, doi: doi.org/10.1002/rnc.6962.