Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

CMRNext: Camera to LiDAR Matching in the Wild for Localization and Extrinsic Calibration (2402.00129v4)

Published 31 Jan 2024 in cs.CV and cs.RO

Abstract: LiDARs are widely used for mapping and localization in dynamic environments. However, their high cost limits their widespread adoption. On the other hand, monocular localization in LiDAR maps using inexpensive cameras is a cost-effective alternative for large-scale deployment. Nevertheless, most existing approaches struggle to generalize to new sensor setups and environments, requiring retraining or fine-tuning. In this paper, we present CMRNext, a novel approach for camera-LIDAR matching that is independent of sensor-specific parameters, generalizable, and can be used in the wild for monocular localization in LiDAR maps and camera-LiDAR extrinsic calibration. CMRNext exploits recent advances in deep neural networks for matching cross-modal data and standard geometric techniques for robust pose estimation. We reformulate the point-pixel matching problem as an optical flow estimation problem and solve the Perspective-n-Point problem based on the resulting correspondences to find the relative pose between the camera and the LiDAR point cloud. We extensively evaluate CMRNext on six different robotic platforms, including three publicly available datasets and three in-house robots. Our experimental evaluations demonstrate that CMRNext outperforms existing approaches on both tasks and effectively generalizes to previously unseen environments and sensor setups in a zero-shot manner. We make the code and pre-trained models publicly available at http://cmrnext.cs.uni-freiburg.de .

Definition Search Book Streamline Icon: https://streamlinehq.com
References (69)
  1. D. Cattaneo, M. Vaghi, and A. Valada, “Lcdnet: Deep loop closure detection and point cloud registration for lidar slam,” IEEE Transactions on Robotics, vol. 38, no. 4, pp. 2074–2093, 2022.
  2. W. Xu, Y. Cai, D. He, J. Lin, and F. Zhang, “FAST-LIO2: Fast direct LiDAR-inertial odometry,” IEEE Transactions on Robotics, vol. 38, no. 4, pp. 2053–2073, 2022.
  3. C. Campos, R. Elvira, J. J. G. Rodríguez, J. M. Montiel, and J. D. Tardós, “Orb-slam3: An accurate open-source library for visual, visual–inertial, and multimap slam,” IEEE Transactions on Robotics, vol. 37, no. 6, pp. 1874–1890, 2021.
  4. N. Vödisch, D. Cattaneo, W. Burgard, and A. Valada, “Covio: Online continual learning for visual-inertial odometry,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, June 2023, pp. 2464–2473.
  5. N. Vödisch, D. Cattaneo, W. Burgard, and A. Valada, “Continual slam: Beyond lifelong simultaneous localization and mapping through continual learning,” in The International Symposium of Robotics Research, 2022, pp. 19–35.
  6. J. Arce, N. Vödisch, D. Cattaneo, W. Burgard, and A. Valada, “Padloc: Lidar-based deep loop closure detection and registration using panoptic attention,” IEEE Robotics and Automation Letters, vol. 8, no. 3, pp. 1319–1326, 2023.
  7. E. Greve, M. Büchner, N. Vödisch, W. Burgard, and A. Valada, “Collaborative dynamic 3d scene graphs for automated driving,” arXiv preprint arXiv:2309.06635, 2023.
  8. T. Caselitz, B. Steder, M. Ruhnke, and W. Burgard, “Monocular camera localization in 3d lidar maps,” in Int. Conf. on Intelligent Robots and Systems, 2016.
  9. W. Guo, J. Wang, and S. Wang, “Deep multimodal representation learning: A survey,” Ieee Access, vol. 7, pp. 63 373–63 394, 2019.
  10. A. Nayak, D. Cattaneo, and A. Valada, “Ralf: Flow-based global and metric radar localization in lidar maps,” Int. Conf. on Robotics & Automation, 2024.
  11. A. Geiger, P. Lenz, and R. Urtasun, “Are we ready for autonomous driving? the kitti vision benchmark suite,” in IEEE Conf. on Computer Vision and Pattern Recognition, 2012.
  12. M. Sun, S. Yang, and H. Liu, “Scale-aware camera localization in 3d lidar maps with a monocular visual odometry,” Computer Animation and Virtual Worlds, vol. 30, no. 3-4, p. e1879, 2019.
  13. K. Yabuuchi, D. R. Wong, T. Ishita, Y. Kitsukawa, and S. Kato, “Visual localization for autonomous driving using pre-built point cloud maps,” in 2021 IEEE Intelligent Vehicles Symposium (IV), 2021, pp. 913–919.
  14. R. W. Wolcott and R. M. Eustice, “Visual localization within LIDAR maps for automated urban driving,” in Int. Conf. on Intelligent Robots and Systems, 2014.
  15. G. Pascoe, W. Maddern, and P. Newman, “Direct visual localisation and calibration for road vehicles in changing city environments,” in Proceedings of the IEEE International Conference on Computer Vision Workshops, 2015.
  16. J. Lambert, A. Carballo, A. M. Cano, P. Narksri, D. Wong, E. Takeuchi, and K. Takeda, “Performance analysis of 10 models of 3d lidars for automated driving,” IEEE Access, vol. 8, pp. 131 699–131 722, 2020.
  17. N. Radwan, A. Valada, and W. Burgard, “VLocNet++: deep multitask learning for semantic visual localization and odometry,” IEEE Robotics and Automation Letters, vol. 3, no. 4, pp. 4407–4414, 2018.
  18. D. Cattaneo, M. Vaghi, A. L. Ballardini, S. Fontana, D. G. Sorrenti, and W. Burgard, “CMRNet: Camera to LiDAR-Map registration,” in IEEE Intelligent Transportation Systems Conf., 2019, pp. 1283–1289.
  19. M.-F. Chang, J. Mangelson, M. Kaess, and S. Lucey, “Hypermap: Compressed 3d map for monocular camera registration,” in Int. Conf. on Robotics & Automation, 2021, pp. 11 739–11 745.
  20. J. Miao, K. Jiang, Y. Wang, T. Wen, Z. Xiao, Z. Fu, M. Yang, M. Liu, J. Huang, Z. Zhong, et al., “Poses as queries: End-to-end image-to-lidar map localization with transformers,” IEEE Robotics and Automation Letters, vol. 9, no. 1, pp. 803–810, 2023.
  21. M. Feng, S. Hu, M. H. Ang, and G. H. Lee, “2D3D-Matchnet: Learning to match keypoints across 2d image and 3d point cloud,” in Int. Conf. on Robotics & Automation, 2019, pp. 4790–4796.
  22. B. Wang, C. Chen, Z. Cui, J. Qin, C. X. Lu, Z. Yu, P. Zhao, Z. Dong, F. Zhu, N. Trigoni, and A. Markham, “P2-net: Joint description and detection of local features for pixel and point matching,” in Int. Conf. on Computer Vision, 2021, pp. 16 004–16 013.
  23. M. Li, Z. Qin, Z. Gao, R. Yi, C. Zhu, Y. Guo, and K. Xu, “2d3d-matr: 2d-3d matching transformer for detection-free registration between images and point clouds,” in Int. Conf. on Computer Vision, 2023, pp. 14 128–14 138.
  24. D. Cattaneo, D. G. Sorrenti, and A. Valada, “CMRNet++: Map and camera agnostic monocular visual localization in LiDAR maps,” Int. Conf. on Robotics and Automation Workshop on Emerging Learning and Algorithmic Methods for Data Association in Robotics, 2020.
  25. B. Chen, A. Parra, J. Cao, N. Li, and T.-J. Chin, “End-to-end learnable geometric vision by backpropagating pnp optimization,” in IEEE Conf. on Computer Vision and Pattern Recognition, 2020.
  26. A. Geiger, F. Moosmann, Ö. Car, and B. Schuster, “Automatic camera and range sensor calibration using a single shot,” in Int. Conf. on Robotics & Automation, 2012, pp. 3936–3943.
  27. L. Zhou, Z. Li, and M. Kaess, “Automatic extrinsic calibration of a camera and a 3d lidar using line and plane correspondences,” in Int. Conf. on Intelligent Robots and Systems, 2018, pp. 5562–5569.
  28. Z. Pusztai and L. Hajder, “Accurate calibration of lidar-camera systems using ordinary boxes,” in Proceedings of the IEEE International Conference on Computer Vision Workshops, 2017.
  29. M. Vel’as, M. Španěl, Z. Materna, and A. Herout, “Calibration of RGB camera with Velodyne LiDAR,” in Int. Conf. in Central Europe on Computer Graphics, Visualization and Computer Vision.   Václav Skala-UNION Agency, 2014, pp. 135–144.
  30. Z. Taylor and J. Nieto, “Automatic calibration of lidar and camera images using normalized mutual information,” in Int. Conf. on Robotics & Automation, 2013.
  31. S. Bileschi, “Fully automatic calibration of lidar and video streams from a vehicle,” in IEEE International Conference on Computer Vision Workshops Workshops, 2009, pp. 1457–1464.
  32. J. Levinson and S. Thrun, “Automatic online calibration of cameras and lasers,” 2013.
  33. N. Schneider, F. Piewak, C. Stiller, and U. Franke, “RegNet: Multimodal sensor registration using deep neural networks,” in IEEE Intelligent Vehicles, June 2017.
  34. G. Iyer, R. K. Ram, J. K. Murthy, and K. M. Krishna, “Calibnet: Geometrically supervised extrinsic calibration using 3d spatial transformer networks,” in Int. Conf. on Intelligent Robots and Systems, 2018, pp. 1110–1117.
  35. K. Yuan, Z. Guo, and Z. J. Wang, “RGGNet: Tolerance aware lidar-camera online calibration with geometric deep learning and generative model,” IEEE Robotics and Automation Letters, vol. 5, no. 4, pp. 6956–6963, 2020.
  36. X. Lv, B. Wang, Z. Dou, D. Ye, and S. Wang, “LCCNet: Lidar and camera self-calibration using cost volume network,” in 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2021, pp. 2888–2895.
  37. D. Sun, X. Yang, M.-Y. Liu, and J. Kautz, “PWC-Net: CNNs for optical flow using pyramid, warping, and cost volume,” in IEEE Conf. on Computer Vision and Pattern Recognition, 2018.
  38. X. Lv, S. Wang, and D. Ye, “CFNet: LiDAR-camera registration using calibration flow network,” Sensors, vol. 21, no. 23, p. 8112, 2021.
  39. X. Jing, X. Ding, R. Xiong, H. Deng, and Y. Wang, “DXQ-Net: differentiable lidar-camera extrinsic calibration using quality-aware flow,” in Int. Conf. on Intelligent Robots and Systems, 2022, pp. 6235–6241.
  40. D. Cattaneo, M. Vaghi, S. Fontana, A. L. Ballardini, and D. G. Sorrenti, “Global visual localization in LiDAR-maps through shared 2D-3D embedding space,” in Int. Conf. on Robotics & Automation, 2020, pp. 4365–4371.
  41. R. Pintus, E. Gobbetti, and M. Agus, “Real-time rendering of massive unstructured raw point clouds using screen-space operators,” in Proceedings of the 12th International conference on Virtual Reality, Archaeology and Cultural Heritage, 2011, pp. 105–112.
  42. J. A. Placed, J. Strader, H. Carrillo, N. Atanasov, V. Indelman, L. Carlone, and J. A. Castellanos, “A survey on active simultaneous localization and mapping: State of the art and new frontiers,” IEEE Transactions on Robotics, 2023.
  43. P. Zhang, M. Zhang, and J. Liu, “Real-time hd map change detection for crowdsourcing update based on mid-to-high-end sensors,” Sensors, vol. 21, no. 7, p. 2477, 2021.
  44. M. A. Fischler and R. C. Bolles, “Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography,” Communications of the ACM, vol. 24, no. 6, pp. 381–395, 1981.
  45. V. Lepetit, F. Moreno-Noguer, and P. Fua, “EPnP: An accurate o(n) solution to the PnP problem,” Int. Journal of Computer Vision, vol. 81, no. 2, p. 155, 2009.
  46. Z. Teed and J. Deng, “Raft: Recurrent all-pairs field transforms for optical flow,” in Europ. Conf. on Computer Vision, 2020, pp. 402–419.
  47. H. Xu, J. Zhang, J. Cai, H. Rezatofighi, F. Yu, D. Tao, and A. Geiger, “Unifying flow, stereo and depth estimation,” IEEE Trans. on Pattern Analysis and Machine Intelligence, 2023.
  48. L. Mehl, A. Jahedi, J. Schmalfuss, and A. Bruhn, “M-fuse: Multi-frame fusion for scene flow estimation,” in Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 2023, pp. 2020–2029.
  49. M. Tancik, P. Srinivasan, B. Mildenhall, S. Fridovich-Keil, N. Raghavan, U. Singhal, R. Ramamoorthi, J. Barron, and R. Ng, “Fourier features let networks learn high frequency functions in low dimensional domains,” in Advances in Neural Information Processing Systems, vol. 33, 2020, pp. 7537–7547.
  50. A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, and I. Polosukhin, “Attention is all you need,” Advances in neural information processing systems, vol. 30, 2017.
  51. B. Mildenhall, P. P. Srinivasan, M. Tancik, J. T. Barron, R. Ramamoorthi, and R. Ng, “Nerf: Representing scenes as neural radiance fields for view synthesis,” Communications of the ACM, vol. 65, no. 1, pp. 99–106, 2021.
  52. J. Behley, M. Garbade, A. Milioto, J. Quenzel, S. Behnke, C. Stachniss, and J. Gall, “SemanticKITTI: A Dataset for Semantic Scene Understanding of LiDAR Sequences,” in Int. Conf. on Computer Vision, 2019.
  53. M.-F. Chang, J. W. Lambert, P. Sangkloy, J. Singh, S. Bak, A. Hartnett, D. Wang, P. Carr, S. Lucey, D. Ramanan, and J. Hays, “Argoverse: 3d tracking and forecasting with rich maps,” in IEEE Conf. on Computer Vision and Pattern Recognition, 2019.
  54. P. Xiao, Z. Shao, S. Hao, Z. Zhang, X. Chai, J. Jiao, Z. Li, J. Wu, K. Sun, K. Jiang, et al., “Pandaset: Advanced sensor suite dataset for autonomous driving,” in IEEE Intelligent Transportation Systems Conf., 2021, pp. 3095–3101.
  55. Q.-Y. Zhou, J. Park, and V. Koltun, “Open3D: A modern library for 3D data processing,” arXiv:1801.09847, 2018.
  56. B. Bešić and A. Valada, “Dynamic object removal and spatio-temporal rgb-d inpainting via geometry-aware adversarial learning,” IEEE Transactions on Intelligent Vehicles, vol. 7, no. 2, pp. 170–185, 2022.
  57. Y. Wu, M. Zhu, and J. Liang, “Psnet: Lidar and camera registration using parallel subnetworks,” IEEE Access, vol. 10, pp. 70 553–70 561, 2022.
  58. N. Chen, J. Wang, H. Chen, Y. Shen, S. Wang, and X. Tan, “BEVLoc: End-to-end 6-dof localization via cross-modality correlation under bird’s eye view,” in IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2024.
  59. G. Wang, Y. Zheng, Y. Guo, Z. Liu, Y. Zhu, W. Burgard, and H. Wang, “End-to-end 2d-3d registration between image and lidar point cloud for vehicle localization,” arXiv preprint arXiv:2306.11346, 2023.
  60. K. Chen, H. Yu, W. Yang, L. Yu, S. Scherer, and G.-S. Xia, “I2d-loc: Camera localization via image to lidar depth flow,” ISPRS Journal of Photogrammetry and Remote Sensing, vol. 194, pp. 209–221, 2022.
  61. S. Zhao, Y. Sheng, Y. Dong, E. I. Chang, Y. Xu, et al., “Maskflownet: Asymmetric feature matching with learnable occlusion mask,” in IEEE Conf. on Computer Vision and Pattern Recognition, 2020, pp. 6278–6287.
  62. Z. Huang, X. Shi, C. Zhang, Q. Wang, K. C. Cheung, H. Qin, J. Dai, and H. Li, “Flowformer: A transformer architecture for optical flow,” in Europ. Conf. on Computer Vision.   Springer, 2022, pp. 668–685.
  63. Y. Xu, Q. ZHANG, J. Zhang, and D. Tao, “ViTAE: Vision transformer advanced by exploring intrinsic inductive bias,” in Advances in Neural Information Processing Systems, A. Beygelzimer, Y. Dauphin, P. Liang, and J. W. Vaughan, Eds., 2021.
  64. F. L. Markley, Y. Cheng, J. L. Crassidis, and Y. Oshman, “Averaging quaternions,” Journal of Guidance, Control, and Dynamics, vol. 30, no. 4, pp. 1193–1197, 2007.
  65. K. Aftab, R. Hartley, and J. Trumpf, “Generalized weiszfeld algorithms for lq optimization,” IEEE transactions on pattern analysis and machine intelligence, vol. 37, no. 4, pp. 728–745, 2014.
  66. S. Falkner, A. Klein, and F. Hutter, “BOHB: Robust and efficient hyperparameter optimization at scale,” in International conference on machine learning, 2018, pp. 1437–1446.
  67. L. Biewald, “Experiment tracking with weights and biases,” 2020, software available from wandb.com. [Online]. Available: https://www.wandb.com/
  68. G. Bradski, “The opencv library.” Dr. Dobb’s Journal: Software Tools for the Professional Programmer, vol. 25, no. 11, pp. 120–123, 2000.
  69. G. Terzakis and M. Lourakis, “A consistently fast and globally optimal solution to the perspective-n-point problem,” in Europ. Conf. on Computer Vision, 2020, pp. 478–494.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (2)
  1. Daniele Cattaneo (21 papers)
  2. Abhinav Valada (117 papers)
Citations (5)

Summary

We haven't generated a summary for this paper yet.