VOOM: Robust Visual Object Odometry and Mapping using Hierarchical Landmarks (2402.13609v2)
Abstract: In recent years, object-oriented simultaneous localization and mapping (SLAM) has attracted increasing attention due to its ability to provide high-level semantic information while maintaining computational efficiency. Some researchers have attempted to enhance localization accuracy by integrating the modeled object residuals into bundle adjustment. However, few have demonstrated better results than feature-based visual SLAM systems, as the generic coarse object models, such as cuboids or ellipsoids, are less accurate than feature points. In this paper, we propose a Visual Object Odometry and Mapping framework VOOM using high-level objects and low-level points as the hierarchical landmarks in a coarse-to-fine manner instead of directly using object residuals in bundle adjustment. Firstly, we introduce an improved observation model and a novel data association method for dual quadrics, employed to represent physical objects. It facilitates the creation of a 3D map that closely reflects reality. Next, we use object information to enhance the data association of feature points and consequently update the map. In the visual object odometry backend, the updated map is employed to further optimize the camera pose and the objects. Meanwhile, local bundle adjustment is performed utilizing the objects and points-based covisibility graphs in our visual object mapping process. Experiments show that VOOM outperforms both object-oriented SLAM and feature points SLAM systems such as ORB-SLAM2 in terms of localization. The implementation of our method is available at https://github.com/yutongwangBIT/VOOM.git.
- J. Engel, T. Schöps, and D. Cremers, “LSD-SLAM: Large-scale direct monocular SLAM,” in Proc. of the Europ. Conf. on Computer Vision (ECCV). Springer, 2014, pp. 834–849.
- R. Mur-Artal and J. D. Tardós, “Orb-slam2: An open-source slam system for monocular, stereo, and rgb-d cameras,” IEEE Trans. on Robotics (TRO), vol. 33, no. 5, pp. 1255–1262, 2017.
- T. Qin, P. Li, and S. Shen, “Vins-mono: A robust and versatile monocular visual-inertial state estimator,” IEEE Trans. on Robotics (TRO), vol. 34, no. 4, pp. 1004–1020, 2018.
- M. Runz, M. Buffier, and L. Agapito, “MaskFusion: Real-Time Recognition, Tracking and Reconstruction of Multiple Moving Objects,” in Proc. of the Intl. Symposium on Mixed and Augmented Reality (ISMAR), 2018, pp. 10–20.
- J. Mccormac, R. Clark, M. Bloesch, A. Davison, and S. Leutenegger, “Fusion++: Volumetric Object-Level SLAM,” in Proc. of the Intl. Conf. on 3D Vision (3DV), 2018, pp. 32–41.
- A. Rosinol, M. Abate, Y. Chang, and L. Carlone, “Kimera: an open-source library for real-time metric-semantic localization and mapping,” in Proc. of the IEEE Intl. Conf. on Robotics & Automation (ICRA). IEEE, 2020, pp. 1689–1696.
- S. Yang and S. Scherer, “Cubeslam: Monocular 3-d object slam,” IEEE Trans. on Robotics (TRO), vol. 35, no. 4, pp. 925–938, 2019.
- C. Rubino, M. Crocco, and A. Del Bue, “3D Object Localisation from Multi-View Image Detections,” IEEE Trans. on Pattern Analysis and Machine Intelligence (TPAMI), vol. 40, no. 6, pp. 1281–1294, 2018.
- Z. Liao, Y. Hu, J. Zhang, X. Qi, X. Zhang, and W. Wang, “So-slam: Semantic object slam with scale proportional and symmetrical texture constraints,” IEEE Robotics and Automation Letters (RA-L), vol. 7, no. 2, pp. 4008–4015, 2022.
- X. Lin, Y. Yang, L. He, W. Chen, Y. Guan, and H. Zhang, “Robust improvement in 3d object landmark inference for semantic mapping,” in Proc. of the IEEE Intl. Conf. on Robotics & Automation (ICRA). IEEE, 2021, pp. 13 011–13 017.
- L. Nicholson, M. Milford, and N. Sünderhauf, “Quadricslam: Dual quadrics from object detections as landmarks in object-oriented slam,” IEEE Robotics and Automation Letters (RA-L), vol. 4, no. 1, pp. 1–8, 2018.
- Z. Liao, W. Wang, X. Qi, and X. Zhang, “Rgb-d object slam using quadrics for indoor environments,” Sensors, vol. 20, no. 18, p. 5150, 2020.
- Y. Wang, B. Xu, W. Fan, and C. Xiang, “Qiso-slam: Object-oriented slam using dual quadrics as landmarks based on instance segmentation,” IEEE Robotics and Automation Letters (RA-L), vol. 8, no. 4, pp. 2253–2260, 2023.
- J. Wang, C. Xu, W. Yang, and L. Yu, “A normalized gaussian wasserstein distance for tiny object detection,” arXiv preprint, 2021.
- J. Engel, V. Koltun, and D. Cremers, “Direct sparse odometry,” IEEE Trans. on Pattern Analysis and Machine Intelligence (TPAMI), vol. 40, no. 3, pp. 611–625, 2017.
- J. Zubizarreta, I. Aguinaga, and J. M. M. Montiel, “Direct sparse mapping,” IEEE Trans. on Robotics (TRO), vol. 36, no. 4, pp. 1363–1370, 2020.
- A. J. Davison, I. D. Reid, N. D. Molton, and O. Stasse, “Monoslam: Real-time single camera slam,” IEEE Trans. on Pattern Analysis and Machine Intelligence (TPAMI), vol. 29, no. 6, pp. 1052–1067, 2007.
- J. Civera, O. G. Grasa, A. J. Davison, and J. M. Montiel, “1-point ransac for extended kalman filtering: Application to real-time structure from motion and visual odometry,” Journal of Field Robotics (JFR), vol. 27, no. 5, pp. 609–631, 2010.
- G. Klein and D. Murray, “Parallel tracking and mapping for small ar workspaces,” in Proc. of the Intl. Symposium on Mixed and Augmented Reality (ISMAR). IEEE, 2007, pp. 225–234.
- R. Mur-Artal, J. M. M. Montiel, and J. D. Tardos, “Orb-slam: a versatile and accurate monocular slam system,” IEEE Trans. on Robotics (TRO), vol. 31, no. 5, pp. 1147–1163, 2015.
- R. F. Salas-Moreno, R. A. Newcombe, H. Strasdat, P. H. J. Kelly, and A. J. Davison, “SLAM++: Simultaneous Localisation and Mapping at the Level of Objects,” in Proc. of the IEEE/CVF Conf. on Computer Vision and Pattern Recognition (CVPR), 2013, pp. 1352–1359.
- N. Sünderhauf, T. T. Pham, Y. Latif, M. Milford, and I. Reid, “Meaningful maps with object-oriented semantic mapping,” in Proc. of the IEEE/RSJ Intl. Conf. on Intelligent Robots and Systems (IROS), 2017, pp. 5079–5085.
- Y. Wu, Y. Zhang, D. Zhu, Y. Feng, S. Coleman, and D. Kerr, “Eao-slam: Monocular semi-dense object slam based on ensemble data association,” in Proc. of the IEEE/RSJ Intl. Conf. on Intelligent Robots and Systems (IROS), 2020, pp. 4966–4973.
- M. Zins, G. Simon, and M.-O. Berger, “Oa-slam: Leveraging objects for camera relocalization in visual slam,” in Proc. of the Intl. Symposium on Mixed and Augmented Reality (ISMAR). IEEE, 2022, pp. 720–728.
- G. Jocher, A. Chaurasia, and J. Qiu, “YOLO by Ultralytics,” Jan. 2023. [Online]. Available: https://github.com/ultralytics/ultralytics
- A. Fitzgibbon, M. Pilu, and R. B. Fisher, “Direct least square fitting of ellipses,” IEEE Trans. on Pattern Analysis and Machine Intelligence (TPAMI), vol. 21, no. 5, pp. 476–480, 1999.
- J. Sturm, N. Engelhard, F. Endres, W. Burgard, and D. Cremers, “A benchmark for the evaluation of rgb-d slam systems,” in Proc. of the International Conference on Intelligent Robot Systems (IROS), Oct. 2012.
- S. Saeedi, E. D. Carvalho, W. Li, D. Tzoumanikas, S. Leutenegger, P. H. Kelly, and A. J. Davison, “Characterizing visual localization and mapping datasets,” in Proc. of the IEEE Intl. Conf. on Robotics & Automation (ICRA). IEEE, 2019, pp. 6699–6705.
- Yutong Wang (50 papers)
- Chaoyang Jiang (10 papers)
- Xieyuanli Chen (77 papers)