Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
97 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

VOOM: Robust Visual Object Odometry and Mapping using Hierarchical Landmarks (2402.13609v2)

Published 21 Feb 2024 in cs.RO and cs.CV

Abstract: In recent years, object-oriented simultaneous localization and mapping (SLAM) has attracted increasing attention due to its ability to provide high-level semantic information while maintaining computational efficiency. Some researchers have attempted to enhance localization accuracy by integrating the modeled object residuals into bundle adjustment. However, few have demonstrated better results than feature-based visual SLAM systems, as the generic coarse object models, such as cuboids or ellipsoids, are less accurate than feature points. In this paper, we propose a Visual Object Odometry and Mapping framework VOOM using high-level objects and low-level points as the hierarchical landmarks in a coarse-to-fine manner instead of directly using object residuals in bundle adjustment. Firstly, we introduce an improved observation model and a novel data association method for dual quadrics, employed to represent physical objects. It facilitates the creation of a 3D map that closely reflects reality. Next, we use object information to enhance the data association of feature points and consequently update the map. In the visual object odometry backend, the updated map is employed to further optimize the camera pose and the objects. Meanwhile, local bundle adjustment is performed utilizing the objects and points-based covisibility graphs in our visual object mapping process. Experiments show that VOOM outperforms both object-oriented SLAM and feature points SLAM systems such as ORB-SLAM2 in terms of localization. The implementation of our method is available at https://github.com/yutongwangBIT/VOOM.git.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (28)
  1. J. Engel, T. Schöps, and D. Cremers, “LSD-SLAM: Large-scale direct monocular SLAM,” in Proc. of the Europ. Conf. on Computer Vision (ECCV).   Springer, 2014, pp. 834–849.
  2. R. Mur-Artal and J. D. Tardós, “Orb-slam2: An open-source slam system for monocular, stereo, and rgb-d cameras,” IEEE Trans. on Robotics (TRO), vol. 33, no. 5, pp. 1255–1262, 2017.
  3. T. Qin, P. Li, and S. Shen, “Vins-mono: A robust and versatile monocular visual-inertial state estimator,” IEEE Trans. on Robotics (TRO), vol. 34, no. 4, pp. 1004–1020, 2018.
  4. M. Runz, M. Buffier, and L. Agapito, “MaskFusion: Real-Time Recognition, Tracking and Reconstruction of Multiple Moving Objects,” in Proc. of the Intl. Symposium on Mixed and Augmented Reality (ISMAR), 2018, pp. 10–20.
  5. J. Mccormac, R. Clark, M. Bloesch, A. Davison, and S. Leutenegger, “Fusion++: Volumetric Object-Level SLAM,” in Proc. of the Intl. Conf. on 3D Vision (3DV), 2018, pp. 32–41.
  6. A. Rosinol, M. Abate, Y. Chang, and L. Carlone, “Kimera: an open-source library for real-time metric-semantic localization and mapping,” in Proc. of the IEEE Intl. Conf. on Robotics & Automation (ICRA).   IEEE, 2020, pp. 1689–1696.
  7. S. Yang and S. Scherer, “Cubeslam: Monocular 3-d object slam,” IEEE Trans. on Robotics (TRO), vol. 35, no. 4, pp. 925–938, 2019.
  8. C. Rubino, M. Crocco, and A. Del Bue, “3D Object Localisation from Multi-View Image Detections,” IEEE Trans. on Pattern Analysis and Machine Intelligence (TPAMI), vol. 40, no. 6, pp. 1281–1294, 2018.
  9. Z. Liao, Y. Hu, J. Zhang, X. Qi, X. Zhang, and W. Wang, “So-slam: Semantic object slam with scale proportional and symmetrical texture constraints,” IEEE Robotics and Automation Letters (RA-L), vol. 7, no. 2, pp. 4008–4015, 2022.
  10. X. Lin, Y. Yang, L. He, W. Chen, Y. Guan, and H. Zhang, “Robust improvement in 3d object landmark inference for semantic mapping,” in Proc. of the IEEE Intl. Conf. on Robotics & Automation (ICRA).   IEEE, 2021, pp. 13 011–13 017.
  11. L. Nicholson, M. Milford, and N. Sünderhauf, “Quadricslam: Dual quadrics from object detections as landmarks in object-oriented slam,” IEEE Robotics and Automation Letters (RA-L), vol. 4, no. 1, pp. 1–8, 2018.
  12. Z. Liao, W. Wang, X. Qi, and X. Zhang, “Rgb-d object slam using quadrics for indoor environments,” Sensors, vol. 20, no. 18, p. 5150, 2020.
  13. Y. Wang, B. Xu, W. Fan, and C. Xiang, “Qiso-slam: Object-oriented slam using dual quadrics as landmarks based on instance segmentation,” IEEE Robotics and Automation Letters (RA-L), vol. 8, no. 4, pp. 2253–2260, 2023.
  14. J. Wang, C. Xu, W. Yang, and L. Yu, “A normalized gaussian wasserstein distance for tiny object detection,” arXiv preprint, 2021.
  15. J. Engel, V. Koltun, and D. Cremers, “Direct sparse odometry,” IEEE Trans. on Pattern Analysis and Machine Intelligence (TPAMI), vol. 40, no. 3, pp. 611–625, 2017.
  16. J. Zubizarreta, I. Aguinaga, and J. M. M. Montiel, “Direct sparse mapping,” IEEE Trans. on Robotics (TRO), vol. 36, no. 4, pp. 1363–1370, 2020.
  17. A. J. Davison, I. D. Reid, N. D. Molton, and O. Stasse, “Monoslam: Real-time single camera slam,” IEEE Trans. on Pattern Analysis and Machine Intelligence (TPAMI), vol. 29, no. 6, pp. 1052–1067, 2007.
  18. J. Civera, O. G. Grasa, A. J. Davison, and J. M. Montiel, “1-point ransac for extended kalman filtering: Application to real-time structure from motion and visual odometry,” Journal of Field Robotics (JFR), vol. 27, no. 5, pp. 609–631, 2010.
  19. G. Klein and D. Murray, “Parallel tracking and mapping for small ar workspaces,” in Proc. of the Intl. Symposium on Mixed and Augmented Reality (ISMAR).   IEEE, 2007, pp. 225–234.
  20. R. Mur-Artal, J. M. M. Montiel, and J. D. Tardos, “Orb-slam: a versatile and accurate monocular slam system,” IEEE Trans. on Robotics (TRO), vol. 31, no. 5, pp. 1147–1163, 2015.
  21. R. F. Salas-Moreno, R. A. Newcombe, H. Strasdat, P. H. J. Kelly, and A. J. Davison, “SLAM++: Simultaneous Localisation and Mapping at the Level of Objects,” in Proc. of the IEEE/CVF Conf. on Computer Vision and Pattern Recognition (CVPR), 2013, pp. 1352–1359.
  22. N. Sünderhauf, T. T. Pham, Y. Latif, M. Milford, and I. Reid, “Meaningful maps with object-oriented semantic mapping,” in Proc. of the IEEE/RSJ Intl. Conf. on Intelligent Robots and Systems (IROS), 2017, pp. 5079–5085.
  23. Y. Wu, Y. Zhang, D. Zhu, Y. Feng, S. Coleman, and D. Kerr, “Eao-slam: Monocular semi-dense object slam based on ensemble data association,” in Proc. of the IEEE/RSJ Intl. Conf. on Intelligent Robots and Systems (IROS), 2020, pp. 4966–4973.
  24. M. Zins, G. Simon, and M.-O. Berger, “Oa-slam: Leveraging objects for camera relocalization in visual slam,” in Proc. of the Intl. Symposium on Mixed and Augmented Reality (ISMAR).   IEEE, 2022, pp. 720–728.
  25. G. Jocher, A. Chaurasia, and J. Qiu, “YOLO by Ultralytics,” Jan. 2023. [Online]. Available: https://github.com/ultralytics/ultralytics
  26. A. Fitzgibbon, M. Pilu, and R. B. Fisher, “Direct least square fitting of ellipses,” IEEE Trans. on Pattern Analysis and Machine Intelligence (TPAMI), vol. 21, no. 5, pp. 476–480, 1999.
  27. J. Sturm, N. Engelhard, F. Endres, W. Burgard, and D. Cremers, “A benchmark for the evaluation of rgb-d slam systems,” in Proc. of the International Conference on Intelligent Robot Systems (IROS), Oct. 2012.
  28. S. Saeedi, E. D. Carvalho, W. Li, D. Tzoumanikas, S. Leutenegger, P. H. Kelly, and A. J. Davison, “Characterizing visual localization and mapping datasets,” in Proc. of the IEEE Intl. Conf. on Robotics & Automation (ICRA).   IEEE, 2019, pp. 6699–6705.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (3)
  1. Yutong Wang (50 papers)
  2. Chaoyang Jiang (10 papers)
  3. Xieyuanli Chen (77 papers)
Citations (3)

Summary

  • The paper introduces a novel SLAM approach that unites dual quadrics and ORB feature points as hierarchical landmarks.
  • It employs a unique normalized Wasserstein distance-based observation model to enhance object and feature association.
  • Experimental results demonstrate that VOOM outperforms state-of-the-art systems like ORB-SLAM2 in demanding, dynamic environments.

Enhancing Localization Accuracy in SLAM with VOOM: A Hierarchical Approach Using Objects and Points

Introduction to VOOM

In the pursuit of more precise and robust SLAM systems, the integration of high-level semantic information through objects alongside low-level point features has shown promise. This paper introduces the Visual Object Odometry and Mapping (VOOM) framework, which leverages dual quadrics to represent high-level physical objects and ORB feature points for low-level landmarks in a cohesive SLAM process. By employing a hierarchical landmark system and presenting an improved observation model alongside a novel data association method, VOOM notably surpasses the localization accuracy of both object-oriented and feature point-based SLAM systems, including the widely recognized ORB-SLAM2.

Key Contributions

The VOOM framework's main contributions can be distilled into three significant advancements:

  • The innovative union of dual quadrics and feature points as hierarchical landmarks enhances the SLAM process, offering a more detailed and accurate mapping and localization system.
  • The introduction of effective algorithms for object optimization and association, alongside an innovative method for associating these objects with map points. This approach facilitates the construction of a more accurate and reality-reflective 3D map.
  • A comprehensive suite of experimental validations that demonstrate VOOM's superior performance in localization accuracy across various sequences when compared with state-of-the-art SLAM methodologies.

Technical Breakdown

Visual Object Odometry and Mapping

VOOM's process starts by taking RGB-D images as input, segmenting instances to detect objects, and processing ORB feature points for pose prediction. A novel aspect of VOOM is its use of the normalized Wasserstein distance for a dual quadric-based observation model, enhancing the accuracy of object-level mapping and localization. This method allows for a more precise association of feature points with their corresponding map points, facilitating an accurate and efficient update of the map and optimization of the camera pose and objects.

Enhanced Data Association

Contrary to traditional methods that rely on IoU metrics for object data association, VOOM employs the Wasserstein distance, benefiting from its sensitivity to the shape, orientation, and scale of objects. This shift enables a more nuanced and robust object association process, particularly vital for small or dynamically positioned objects. Additionally, VOOM's object association methodology integrates seamlessly with its object optimization process, further refining the overall SLAM performance.

Experimental Validation

The framework was rigorously tested against well-known datasets and compared with leading SLAM systems, notably exhibiting superior performance in terms of localization accuracy. The experiments highlight VOOM's robustness, especially in scenarios with dynamic objects or in sequences where traditional feature-based methods struggle due to lack of texture or structural complexity.

Future Directions

Looking forward, the integration of a loop closure and relocalization module that effectively utilizes both objects and point features represents a logical extension of the VOOM framework. Such advancements could address long-term stability and accuracy issues in SLAM, paving the way for more reliable autonomous navigation and mapping systems in complex and dynamically changing environments.

Conclusion

The VOOM framework presents a significant step forward in the integration of semantic object information into the SLAM process. By leveraging dual quadrics and feature points in a hierarchical manner, VOOM achieves a new level of localization accuracy, outperforming existing object-oriented and feature-based SLAM systems. Its novel data association method, object-oriented optimization process, and comprehensive experimental validation underscore its potential to enhance both the theoretical understanding and practical applications of SLAM technology. As the research community continues to explore the integration of high-level semantic information into SLAM, VOOM stands out as a promising direction for future developments.

X Twitter Logo Streamline Icon: https://streamlinehq.com