FADet: A Multi-sensor 3D Object Detection Network based on Local Featured Attention (2405.11682v1)
Abstract: Camera, LiDAR and radar are common perception sensors for autonomous driving tasks. Robust prediction of 3D object detection is optimally based on the fusion of these sensors. To exploit their abilities wisely remains a challenge because each of these sensors has its own characteristics. In this paper, we propose FADet, a multi-sensor 3D detection network, which specifically studies the characteristics of different sensors based on our local featured attention modules. For camera images, we propose dual-attention-based sub-module. For LiDAR point clouds, triple-attention-based sub-module is utilized while mixed-attention-based sub-module is applied for features of radar points. With local featured attention sub-modules, our FADet has effective detection results in long-tail and complex scenes from camera, LiDAR and radar input. On NuScenes validation dataset, FADet achieves state-of-the-art performance on LiDAR-camera object detection tasks with 71.8% NDS and 69.0% mAP, at the same time, on radar-camera object detection tasks with 51.7% NDS and 40.3% mAP. Code will be released at https://github.com/ZionGo6/FADet.
- J. Yan, Y. Liu, J. Sun, F. Jia, S. Li, T. Wang, and X. Zhang, “Cross modal transformer via coordinates encoding for 3d object dectection,” arXiv preprint arXiv:2301.01283, 2023.
- C. Wu, F. Wu, T. Qi, and Y. Huang, “Improving attention mechanism with query-value interaction,” arXiv preprint arXiv:2010.03766, 2020.
- Y. Zhang, K. Liu, H. Bao, X. Qian, Z. Wang, S. Ye, and W. Wang, “Aftr: A robustness multi-sensor fusion model for 3d object detection based on adaptive fusion transformer,” Sensors (Basel, Switzerland), vol. 23, 2023. [Online]. Available: https://api.semanticscholar.org/CorpusID:264064325
- X. Chen, T. Zhang, Y. Wang, Y. Wang, and H. Zhao, “Futr3d: A unified sensor fusion framework for 3d detection,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 172–181.
- Y. Liu, Z. Xu, and M. Liu, “Star-convolution for image-based 3d object detection,” in 2022 International Conference on Robotics and Automation (ICRA). IEEE, 2022, pp. 5018–5024.
- H. Liu, Y. Teng, T. Lu, H. Wang, and L. Wang, “Sparsebev: High-performance sparse 3d object detection from multi-camera videos,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023, pp. 18 580–18 590.
- J. Huang and G. Huang, “Bevdet4d: Exploit temporal cues in multi-camera 3d object detection,” arXiv preprint arXiv:2203.17054, 2022.
- J. S. Hu, T. Kuai, and S. L. Waslander, “Point density-aware voxels for lidar 3d object detection,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2022, pp. 8469–8478.
- Y. You, Y. Wang, W.-L. Chao, D. Garg, G. Pleiss, B. Hariharan, M. Campbell, and K. Q. Weinberger, “Pseudo-lidar++: Accurate depth for 3d object detection in autonomous driving,” arXiv preprint arXiv:1906.06310, 2019.
- H. Shi, D. Hou, and X. Li, “Center-aware 3d object detection with attention mechanism based on roadside lidar,” Sustainability, vol. 15, no. 3, p. 2628, 2023.
- J. Kim, M. Seong, G. Bang, D. Kum, and J. W. Choi, “Rcm-fusion: Radar-camera multi-level fusion for 3d object detection,” arXiv preprint arXiv:2307.10249, 2023.
- Y. Zhao, L. Zhang, J. Deng, and Y. Zhang, “Bev-radar: bidirectional radar-camera fusion for 3d object detection,” JUSTC, vol. 54, no. 1, pp. 0101–1, 2024.
- L. Wang, X. Zhang, J. Li, B. Xv, R. Fu, H. Chen, L. Yang, D. Jin, and L. Zhao, “Multi-modal and multi-scale fusion 3d object detection of 4d radar and lidar for autonomous driving,” IEEE Transactions on Vehicular Technology, 2022.
- H. Caesar, V. Bankiti, A. H. Lang, S. Vora, V. E. Liong, Q. Xu, A. Krishnan, Y. Pan, G. Baldan, and O. Beijbom, “nuscenes: A multimodal dataset for autonomous driving,” in CVPR, 2020.
- D. Park, R. Ambrus, V. Guizilini, J. Li, and A. Gaidon, “Is pseudo-lidar needed for monocular 3d object detection?” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2021, pp. 3142–3152.
- S. Qiao, L.-C. Chen, and A. Yuille, “Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2021, pp. 10 213–10 224.
- Y. Zhou and O. Tuzel, “Voxelnet: End-to-end learning for point cloud based 3d object detection,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2018, pp. 4490–4499.
- T.-Y. Lin, P. Dollár, R. Girshick, K. He, B. Hariharan, and S. Belongie, “Feature pyramid networks for object detection,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2017, pp. 2117–2125.
- A. H. Lang, S. Vora, H. Caesar, L. Zhou, J. Yang, and O. Beijbom, “Pointpillars: Fast encoders for object detection from point clouds,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2019, pp. 12 697–12 705.
- M. S. Hossain, G. M. Shahriar, M. M. Syeed, M. F. Uddin, M. Hasan, S. Shivam, and S. Advani, “Region of interest (roi) selection using vision transformer for automatic analysis using whole slide images,” Scientific Reports, vol. 13, no. 1, p. 11314, 2023.
- X. Zhu, W. Su, L. Lu, B. Li, X. Wang, and J. Dai, “Deformable detr: Deformable transformers for end-to-end object detection,” arXiv preprint arXiv:2010.04159, 2020.
- J. Fu, J. Liu, H. Tian, Y. Li, Y. Bao, Z. Fang, and H. Lu, “Dual attention network for scene segmentation,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2019, pp. 3146–3154.
- D. Misra, T. Nalamada, A. U. Arasanipalai, and Q. Hou, “Rotate to attend: Convolutional triplet attention module,” in Proceedings of the IEEE/CVF winter conference on applications of computer vision, 2021, pp. 3139–3148.
- N. Carion, F. Massa, G. Synnaeve, N. Usunier, A. Kirillov, and S. Zagoruyko, “End-to-end object detection with transformers,” in European conference on computer vision. Springer, 2020, pp. 213–229.
- Y. Wang, V. C. Guizilini, T. Zhang, Y. Wang, H. Zhao, and J. Solomon, “Detr3d: 3d object detection from multi-view images via 3d-to-2d queries,” in Conference on Robot Learning. PMLR, 2022, pp. 180–191.
- T.-Y. Lin, P. Goyal, R. B. Girshick, K. He, and P. Dollár, “Focal loss for dense object detection,” 2017 IEEE International Conference on Computer Vision (ICCV), pp. 2999–3007, 2017. [Online]. Available: https://api.semanticscholar.org/CorpusID:47252984
- X. Liu, Y. Liu, J. Chen, and X. Liu, “Pscc-net: Progressive spatio-channel correlation network for image manipulation detection and localization,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 32, no. 11, pp. 7505–7517, 2022.
- O. Elharrouss, K. Hassine, A. Zayyan, Z. Chatri, S. Al-Maadeed, K. Abualsaud et al., “3d objects and scenes classification, recognition, segmentation, and reconstruction using 3d point cloud data: A review,” arXiv preprint arXiv:2306.05978, 2023.
- I. Loshchilov and F. Hutter, “Decoupled weight decay regularization,” arXiv preprint arXiv:1711.05101, 2017.
- X. Bai, Z. Hu, X. Zhu, Q. Huang, Y. Chen, H. Fu, and C.-L. Tai, “Transfusion: Robust lidar-camera fusion for 3d object detection with transformers,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2022, pp. 1090–1099.
- Z. Liu, H. Tang, A. Amini, X. Yang, H. Mao, D. Rus, and S. Han, “Bevfusion: Multi-task multi-sensor fusion with unified bird’s-eye view representation,” in IEEE International Conference on Robotics and Automation (ICRA), 2023.
- Y. Li, Y. Chen, X. Qi, Z. Li, J. Sun, and J. Jia, “Unifying voxel-based representation with transformer for 3d object detection,” Advances in Neural Information Processing Systems, vol. 35, pp. 18 442–18 455, 2022.
- R. Nabati and H. Qi, “Centerfusion: Center-based radar and camera fusion for 3d object detection,” in Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 2021, pp. 1527–1536.
- J. Ahmad and A. Del Bue, “mmfusion: Multimodal fusion for 3d objects detection,” arXiv preprint arXiv:2311.04058, 2023.
- Ziang Guo (7 papers)
- Zakhar Yagudin (4 papers)
- Selamawit Asfaw (4 papers)
- Artem Lykov (22 papers)
- Dzmitry Tsetserukou (144 papers)