Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
126 tokens/sec
GPT-4o
28 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

CalibFormer: A Transformer-based Automatic LiDAR-Camera Calibration Network (2311.15241v2)

Published 26 Nov 2023 in cs.CV and cs.RO

Abstract: The fusion of LiDARs and cameras has been increasingly adopted in autonomous driving for perception tasks. The performance of such fusion-based algorithms largely depends on the accuracy of sensor calibration, which is challenging due to the difficulty of identifying common features across different data modalities. Previously, many calibration methods involved specific targets and/or manual intervention, which has proven to be cumbersome and costly. Learning-based online calibration methods have been proposed, but their performance is barely satisfactory in most cases. These methods usually suffer from issues such as sparse feature maps, unreliable cross-modality association, inaccurate calibration parameter regression, etc. In this paper, to address these issues, we propose CalibFormer, an end-to-end network for automatic LiDAR-camera calibration. We aggregate multiple layers of camera and LiDAR image features to achieve high-resolution representations. A multi-head correlation module is utilized to identify correlations between features more accurately. Lastly, we employ transformer architectures to estimate accurate calibration parameters from the correlation information. Our method achieved a mean translation error of $0.8751 \mathrm{cm}$ and a mean rotation error of $0.0562 {\circ}$ on the KITTI dataset, surpassing existing state-of-the-art methods and demonstrating strong robustness, accuracy, and generalization capabilities.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (34)
  1. X. Bai, Z. Hu, X. Zhu, Q. Huang, Y. Chen, H. Fu, and C.-L. Tai, “Transfusion: Robust lidar-camera fusion for 3d object detection with transformers,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2022, pp. 1090–1099.
  2. Z. Chen, Z. Li, S. Zhang, L. Fang, Q. Jiang, and F. Zhao, “Autoalignv2: Deformable feature aggregation for dynamic multi-modal 3d object detection,” arXiv preprint arXiv:2207.10316, 2022.
  3. Y. Li, J. Deng, Y. Zhang, J. Ji, H. Li, and Y. Zhang, “Ezfusion: A close look at the integration of lidar, millimeter-wave radar, and camera for accurate 3d object detection and tracking,” IEEE Robotics and Automation Letters, vol. 7, no. 4, pp. 11 182–11 189, 2022.
  4. T. Shan, B. Englot, C. Ratti, and D. Rus, “Lvi-sam: Tightly-coupled lidar-visual-inertial odometry via smoothing and mapping,” in 2021 IEEE International Conference on Robotics and Automation (ICRA).   IEEE, 2021, pp. 5692–5698.
  5. J. Lin and F. Zhang, “R33{}^{\mbox{3}}start_FLOATSUPERSCRIPT 3 end_FLOATSUPERSCRIPTlive: A robust, real-time, rgb-colored, lidar-inertial-visual tightly-coupled state estimation and mapping package,” in 2022 International Conference on Robotics and Automation (ICRA).   IEEE, 2022, pp. 10 672–10 678.
  6. A. Geiger, F. Moosmann, Ö. Car, and B. Schuster, “Automatic camera and range sensor calibration using a single shot,” in 2012 IEEE International Conference on Robotics and Automation (ICRA).   IEEE, 2012, pp. 3936–3943.
  7. X. Gong, Y. Lin, and J. Liu, “3d lidar-camera extrinsic calibration using an arbitrary trihedron,” Sensors, vol. 13, no. 2, pp. 1902–1918, 2013.
  8. J. Kümmerle, T. Kühner, and M. Lauer, “Automatic calibration of multiple cameras and depth sensors with a spherical target,” in 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).   IEEE, 2018, pp. 1–8.
  9. G. Pandey, J. McBride, S. Savarese, and R. Eustice, “Automatic targetless extrinsic calibration of a 3d lidar and camera by maximizing mutual information,” in Proceedings of the AAAI Conference on Artificial Intelligence, vol. 26, no. 1, 2012, pp. 2053–2059.
  10. Z. Taylor and J. Nieto, “Motion-based calibration of multimodal sensor arrays,” in 2015 IEEE International Conference on Robotics and Automation (ICRA).   IEEE, 2015, pp. 4843–4850.
  11. J. Levinson and S. Thrun, “Automatic online calibration of cameras and lasers.” in Robotics: science and systems, vol. 2, no. 7.   Citeseer, 2013.
  12. N. Schneider, F. Piewak, C. Stiller, and U. Franke, “Regnet: Multimodal sensor registration using deep neural networks,” in 2017 IEEE Intelligent Vehicles Symposium (IV).   IEEE, 2017, pp. 1803–1810.
  13. J. Shi, Z. Zhu, J. Zhang, R. Liu, Z. Wang, S. Chen, and H. Liu, “Calibrcnn: Calibrating camera and lidar by recurrent convolutional neural network and geometric constraints,” in 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).   IEEE, 2020, pp. 10 197–10 202.
  14. X. Lv, B. Wang, Z. Dou, D. Ye, and S. Wang, “Lccnet: Lidar and camera self-calibration using cost volume network,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021, pp. 2894–2901.
  15. X. Zhou, V. Koltun, and P. Krähenbühl, “Tracking objects as points,” in European Conference on Computer Vision.   Springer, 2020, pp. 474–490.
  16. Z. Liu, H. Hu, Y. Lin, Z. Yao, Z. Xie, Y. Wei, J. Ning, Y. Cao, Z. Zhang, L. Dong et al., “Swin transformer v2: Scaling up capacity and resolution,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2022, pp. 12 009–12 019.
  17. A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, and I. Polosukhin, “Attention is all you need,” Advances in neural information processing systems, vol. 30, 2017.
  18. Q. Zhang and R. Pless, “Extrinsic calibration of a camera and laser range finder (improves camera calibration),” in 2004 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), vol. 3.   IEEE, 2004, pp. 2301–2306.
  19. T. Tóth, Z. Pusztai, and L. Hajder, “Automatic lidar-camera calibration of extrinsic parameters using a spherical target,” in 2020 IEEE International Conference on Robotics and Automation (ICRA).   IEEE, 2020, pp. 8580–8586.
  20. J. Beltrán, C. Guindel, A. de la Escalera, and F. García, “Automatic extrinsic calibration method for lidar and camera sensor setups,” IEEE Transactions on Intelligent Transportation Systems, vol. 23, no. 10, pp. 17 677–17 689, 2022.
  21. S. Garrido-Jurado, R. Muñoz-Salinas, F. J. Madrid-Cuevas, and M. J. Marín-Jiménez, “Automatic generation and detection of highly reliable fiducial markers under occlusion,” Pattern Recognition, vol. 47, no. 6, pp. 2280–2292, 2014.
  22. X. Li, Y. Xiao, B. Wang, H. Ren, Y. Zhang, and J. Ji, “Automatic targetless lidar–camera calibration: a survey,” Artificial Intelligence Review, vol. 56, no. 9, pp. 9949–9987, 2023.
  23. C. Yuan, X. Liu, X. Hong, and F. Zhang, “Pixel-level extrinsic self calibration of high resolution lidar and camera in targetless environments,” IEEE Robotics and Automation Letters, vol. 6, no. 4, pp. 7517–7524, 2021.
  24. G. Iyer, R. K. Ram, J. K. Murthy, and K. M. Krishna, “Calibnet: Geometrically supervised extrinsic calibration using 3d spatial transformer networks,” in 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).   IEEE, 2018, pp. 1110–1117.
  25. Y. Li, A. W. Yu, T. Meng, B. Caine, J. Ngiam, D. Peng, J. Shen, Y. Lu, D. Zhou, Q. V. Le et al., “Deepfusion: Lidar-camera deep fusion for multi-modal 3d object detection,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 17 182–17 191.
  26. R. Wan, S. Xu, W. Wu, X. Zou, and T. Cao, “From one to many: Dynamic cross attention networks for lidar and camera fusion,” arXiv preprint arXiv:2209.12254, 2022.
  27. K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 770–778.
  28. F. Yu, D. Wang, E. Shelhamer, and T. Darrell, “Deep layer aggregation,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2018, pp. 2403–2412.
  29. X. Zhu, H. Hu, S. Lin, and J. Dai, “Deformable convnets v2: More deformable, better results,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2019, pp. 9308–9316.
  30. G. Huang, Z. Liu, L. Van Der Maaten, and K. Q. Weinberger, “Densely connected convolutional networks,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2017, pp. 4700–4708.
  31. A. Geiger, P. Lenz, and R. Urtasun, “Are we ready for autonomous driving? the kitti vision benchmark suite,” in 2012 IEEE conference on computer vision and pattern recognition.   IEEE, 2012, pp. 3354–3361.
  32. H. Shang and B.-J. Hu, “Calnet: Lidar-camera online calibration with channel attention and liquid time-constant network,” in 2022 26th International Conference on Pattern Recognition (ICPR).   IEEE, 2022, pp. 5147–5154.
  33. Y. Wu, M. Zhu, and J. Liang, “Psnet: Lidar and camera registration using parallel subnetworks,” IEEE Access, vol. 10, pp. 70 553–70 561, 2022.
  34. J. Zhu, J. Xue, and P. Zhang, “Calibdepth: Unifying depth map representation for iterative lidar-camera online calibration,” in 2023 IEEE International Conference on Robotics and Automation (ICRA).   IEEE, 2023, pp. 726–733.
Citations (3)

Summary

  • The paper introduces CalibFormer, an end-to-end Transformer network that automates LiDAR-camera calibration with high precision.
  • It integrates multi-layer feature aggregation and a multi-head correlation module to effectively align LiDAR and camera data.
  • Experiments on the KITTI dataset confirm its robustness, achieving a mean translation error of 0.8751 cm and a mean rotation error of 0.0562°.

An Analysis of "CalibFormer: A Transformer-based Automatic LiDAR-Camera Calibration Network"

The proliferation of autonomous driving technology has engendered significant interest in the precise calibration of disparate sensor modalities. The paper "CalibFormer: A Transformer-based Automatic LiDAR-Camera Calibration Network" by Yuxuan Xiao et al. addresses the crucial problem of calibrating LiDAR and camera sensors. The novelty of this work lies in its fully automated, end-to-end calibration network leveraging Transformer architectures.

Core Contributions

The researchers introduce CalibFormer, an automated calibration network designed to address the limitations of existing methods, which often suffer from sparse feature maps, unreliable cross-modality association, and inaccurate parameter regression. Specifically, the innovative aspects of CalibFormer include:

  1. High-Resolution Feature Aggregation: Leveraging multi-layer features from both LiDAR and camera data ensures fine-grained representation, crucial for precise calibrations.
  2. Multi-Head Correlation Module: This module accurately identifies correspondences between features across modalities, crucial given the inherently different data characteristics of LiDAR and camera sensors.
  3. Transformer-Based Parameter Estimation: The use of Swin Transformer encoders and traditional Transformer decoders facilitates effective extraction and utilization of correlation features, thereby enhancing the accuracy of the estimated calibration parameters.

Quantitative Performance

The empirical results presented highlight the robustness and efficacy of CalibFormer. On the KITTI dataset, the network achieves a mean translation error of $0.8751 \si{\cm}$ and a mean rotation error of $0.0562 \si{\degree}$, establishing its superiority over state-of-the-art methods. These results demonstrate the network's ability to maintain high accuracy amidst significant initial miscalibrations, underscoring its practical robustness. Ablation studies further validate the contributions of each module, where using the multi-head correlation and Transformer architecture significantly improved performance.

Methodological Details

CalibFormer is built with a well-defined structure:

  • Input Data Preprocessing: The LiDAR point cloud is projected onto the image plane to generate a two-channel LiDAR image.
  • Feature Extraction: Using an enhanced version of the Deep Layer Aggregation (DLA) network, the system extracts fine-grained features from both LiDAR and camera inputs.
  • Feature Matching: The multi-head correlation module computes correlations between misaligned features with a window-based approach to enhance precision.
  • Parameter Regression: Leveraging Swin Transformer encoders and traditional Transformer decoders, the network regresses the translation and rotation parameters required for calibration.

Theoretical and Practical Implications

From a theoretical perspective, the integration of Transformer networks for sensor calibration presents a robust mechanism for correlating features across vastly different data modalities. The ability to compute multi-dimensional correlations and leverage high-resolution feature representations underscores the potential for further advancements in this domain.

Practically, CalibFormer’s performance on the KITTI dataset suggests its applicability in real-world autonomous driving applications. The end-to-end nature of the network eliminates manual intervention, thereby reducing operational costs and time delays associated with traditional calibration methods. The network’s design also hints at scalability, potentially extending to other sensor modalities or environments with minimal adjustments.

Future Directions

The promising results from CalibFormer open several avenues for future research. Harnessing additional data modalities, such as radar, or integrating temporal data through recurrent networks could further boost calibration accuracy and robustness. Moreover, investigating the training of such networks on diverse datasets could enhance their generalization capabilities, making them more suitable for varied and unpredictable urban environments.

In conclusion, the paper by Xiao et al. presents significant advancements in the domain of LiDAR-camera calibration through the introduction of CalibFormer. The network's architecture, combining fine-grained feature extraction with advanced Transformer-based correlation and parameter estimation, sets a new benchmark in autonomous sensor calibration. The strong empirical results, along with the modularity of the proposed method, establish a firm foundation for both theoretical exploration and practical application in autonomous driving systems.