Real-Time and Robust 3D Object Detection Within Road-Side LiDARs Using Domain Adaptation (2204.00132v2)
Abstract: This work aims to address the challenges in domain adaptation of 3D object detection using infrastructure LiDARs. We design a model DASE-ProPillars that can detect vehicles in infrastructure-based LiDARs in real-time. Our model uses PointPillars as the baseline model with additional modules to improve the 3D detection performance. To prove the effectiveness of our proposed modules in DASE-ProPillars, we train and evaluate the model on two datasets, the open source A9-Dataset and a semi-synthetic infrastructure dataset created within the Regensburg Next project. We do several sets of experiments for each module in the DASE-ProPillars detector that show that our model outperforms the SE-ProPillars baseline on the real A9 test set and a semi-synthetic A9 test set, while maintaining an inference speed of 45 Hz (22 ms). We apply domain adaptation from the semi-synthetic A9-Dataset to the semi-synthetic dataset from the Regensburg Next project by applying transfer learning and achieve a 3D [email protected] of 93.49% on the Car class of the target test set using 40 recall positions.
- C. Creß, W. Zimmer, L. Strand, M. Fortkord, S. Dai, V. Lakshminarasimhan, and A. Knoll, “A9-dataset: Multi-sensor infrastructure-based dataset for mobility research,” arXiv preprint arXiv, 2022.
- A. H. Lang, S. Vora, H. Caesar, L. Zhou, J. Yang, and O. Beijbom, “Pointpillars: Fast encoders for object detection from point clouds,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019, pp. 12 697–12 705.
- W. Zheng, W. Tang, L. Jiang, and C.-W. Fu, “Se-ssd: Self-ensembling single-stage object detector from point cloud,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021, pp. 14 494–14 503.
- Z. Liu, X. Zhao, T. Huang, R. Hu, Y. Zhou, and X. Bai, “Tanet: Robust 3d object detection from point clouds with triple attention,” in Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, no. 07, 2020, pp. 11 677–11 684.
- A. Krämmer, C. Schöller, D. Gulati, V. Lakshminarasimhan, F. Kurz, D. Rosenbaum, C. Lenz, and A. Knoll, “Providentia-a large-scale sensor system for the assistance of autonomous vehicles and its evaluation,” Journal of Field Robotics, 2022.
- V. Lakshminarasimhan and A. Knoll, “C-v2x resource deployment architecture based on moving network convoys,” in 2020 IEEE 91st vehicular technology conference (VTC2020-Spring). IEEE, 2020, pp. 1–6.
- C. Creß and A. C. Knoll, “Intelligent transportation systems with the use of external infrastructure: A literature survey,” arXiv preprint arXiv:2112.05615, 2021.
- V. R. Torunsky, “Pilotprojekt mit Vorbildcharakter,” p. 1.
- A. Dosovitskiy, G. Ros, F. Codevilla, A. Lopez, and V. Koltun, “Carla: An open urban driving simulator,” in Conference on robot learning. PMLR, 2017, pp. 1–16.
- “Asam e.v. openlabel v1.0.0 standardization project.” https://www.asam.net/project-detail/asam-openlabel-v100/.
- J. Wu, W. Zimmer, and A. Knoll, “Real-time lidar-based 3d object detection on the providentia++ test stretch using a single-stage architecture,” Master’s thesis, Technische Universität München, 2021, unpublished thesis.
- W. Zimmer, E. Ercelik, X. Zhou, X. Jair Diaz Ortiz, and A. Knoll, “A survey of robust 3d object detection methods in point clouds,” arXiv preprint arXiv:submit/4161670, 2022.
- S. Shi, X. Wang, and H. Li, “Pointrcnn: 3d object proposal generation and detection from point cloud,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2019, pp. 770–779.
- C. R. Qi, L. Yi, H. Su, and L. J. Guibas, “Pointnet++ deep hierarchical feature learning on point sets in a metric space,” in Proceedings of the 31st International Conference on Neural Information Processing Systems, 2017, pp. 5105–5114.
- Z. Yang, Y. Sun, S. Liu, and J. Jia, “3dssd: Point-based 3d single stage object detector,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2020, pp. 11 040–11 048.
- Y. Yan, Y. Mao, and B. Li, “Second: Sparsely embedded convolutional detection,” Sensors, vol. 18, no. 10, p. 3337, 2018.
- B. Graham, M. Engelcke, and L. Van Der Maaten, “3d semantic segmentation with submanifold sparse convolutional networks,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2018, pp. 9224–9232.
- C. He, H. Zeng, J. Huang, X.-S. Hua, and L. Zhang, “Structure aware single-stage 3d object detection from point cloud,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020, pp. 11 873–11 882.
- W. Zheng, W. Tang, S. Chen, L. Jiang, and C.-W. Fu, “Cia-ssd: Confident iou-aware single-stage object detector from point cloud,” arXiv preprint arXiv:2012.03015, 2020.
- J. Mao, M. Niu, H. Bai, X. Liang, H. Xu, and C. Xu, “Pyramid r-cnn: Towards better performance and adaptability for 3d object detection,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2021, pp. 2723–2732.
- P. Sun, H. Kretzschmar, X. Dotiwalla, A. Chouard, V. Patnaik, P. Tsui, J. Guo, Y. Zhou, Y. Chai, B. Caine et al., “Scalability in perception for autonomous driving: Waymo open dataset,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2020, pp. 2446–2454.
- J. Mao, Y. Xue, M. Niu, H. Bai, J. Feng, X. Liang, H. Xu, and C. Xu, “Voxel transformer for 3d object detection,” 2021.
- Q. Xu, Y. Zhong, and U. Neumann, “Behind the curtain: Learning occluded shapes for 3d object detection,” CoRR, vol. abs/2112.02205, 2021. [Online]. Available: https://arxiv.org/abs/2112.02205
- A. Xiao, J. Huang, D. Guan, F. Zhan, and S. Lu, “Synlidar: Learning from synthetic lidar sequential point cloud for semantic segmentation,” CoRR, vol. abs/2107.05399, 2021. [Online]. Available: https://arxiv.org/abs/2107.05399
- L. T. Triess, M. Dreissig, C. B. Rist, and J. M. Zöllner, “A survey on deep domain adaptation for lidar perception,” CoRR, vol. abs/2106.02377, 2021. [Online]. Available: https://arxiv.org/abs/2106.02377
- D. Jia, A. Hermans, and B. Leibe, “Domain and modality gaps for lidar-based person detection on mobile robots,” CoRR, vol. abs/2106.11239, 2021. [Online]. Available: https://arxiv.org/abs/2106.11239
- Y. Wang, X. Chen, Y. You, L. E. Li, B. Hariharan, M. E. Campbell, K. Q. Weinberger, and W. Chao, “Train in germany, test in the USA: making 3d object detectors generalize,” CoRR, vol. abs/2005.08139, 2020. [Online]. Available: https://arxiv.org/abs/2005.08139
- J. Yang, S. Shi, Z. Wang, H. Li, and X. Qi, “ST3D: self-training for unsupervised domain adaptation on 3d object detection,” CoRR, vol. abs/2103.05346, 2021. [Online]. Available: https://arxiv.org/abs/2103.05346
- K. G. Derpanis, “Overview of the ransac algorithm,” Image Rochester NY, vol. 4, no. 1, pp. 2–3, 2010.
- Y. Eldar, M. Lindenbaum, M. Porat, and Y. Y. Zeevi, “The farthest point strategy for progressive image sampling,” IEEE Transactions on Image Processing, vol. 6, no. 9, pp. 1305–1315, 1997.
- J. Hu, L. Shen, and G. Sun, “Squeeze-and-excitation networks,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2018, pp. 7132–7141.
- K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 770–778.
- C. R. Qi, H. Su, K. Mo, and L. J. Guibas, “Pointnet: Deep learning on point sets for 3d classification and segmentation,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2017, pp. 652–660.
- S. Ioffe and C. Szegedy, “Batch normalization: Accelerating deep network training by reducing internal covariate shift,” in International conference on machine learning. PMLR, 2015, pp. 448–456.
- V. Nair and G. E. Hinton, “Rectified linear units improve restricted boltzmann machines,” in Icml, 2010.
- L. Yi, B. Gong, and T. A. Funkhouser, “Complete & label: A domain adaptation approach to semantic segmentation of lidar point clouds,” CoRR, vol. abs/2007.08488, 2020. [Online]. Available: https://arxiv.org/abs/2007.08488
- M. Jaritz, T. Vu, R. de Charette, É. Wirbel, and P. Pérez, “xmuda: Cross-modal unsupervised domain adaptation for 3d semantic segmentation,” CoRR, vol. abs/1911.12676, 2019. [Online]. Available: http://arxiv.org/abs/1911.12676
- T.-Y. Lin, P. Goyal, R. Girshick, K. He, and P. Dollár, “Focal loss for dense object detection,” in Proceedings of the IEEE international conference on computer vision, 2017, pp. 2980–2988.
- H. Caesar, V. Bankiti, A. H. Lang, S. Vora, V. E. Liong, Q. Xu, A. Krishnan, Y. Pan, G. Baldan, and O. Beijbom, “nuscenes: A multimodal dataset for autonomous driving,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2020, pp. 11 621–11 631.