VOOM: Robust Visual Object Odometry and Mapping using Hierarchical Landmarks (2402.13609v2)

Published 21 Feb 2024 in cs.RO and cs.CV

Abstract: In recent years, object-oriented simultaneous localization and mapping (SLAM) has attracted increasing attention due to its ability to provide high-level semantic information while maintaining computational efficiency. Some researchers have attempted to enhance localization accuracy by integrating the modeled object residuals into bundle adjustment. However, few have demonstrated better results than feature-based visual SLAM systems, as the generic coarse object models, such as cuboids or ellipsoids, are less accurate than feature points. In this paper, we propose a Visual Object Odometry and Mapping framework VOOM using high-level objects and low-level points as the hierarchical landmarks in a coarse-to-fine manner instead of directly using object residuals in bundle adjustment. Firstly, we introduce an improved observation model and a novel data association method for dual quadrics, employed to represent physical objects. It facilitates the creation of a 3D map that closely reflects reality. Next, we use object information to enhance the data association of feature points and consequently update the map. In the visual object odometry backend, the updated map is employed to further optimize the camera pose and the objects. Meanwhile, local bundle adjustment is performed utilizing the objects and points-based covisibility graphs in our visual object mapping process. Experiments show that VOOM outperforms both object-oriented SLAM and feature points SLAM systems such as ORB-SLAM2 in terms of localization. The implementation of our method is available at https://github.com/yutongwangBIT/VOOM.git.

References (28)

Authors (3)

Yutong Wang (50 papers)
Chaoyang Jiang (10 papers)
Xieyuanli Chen (77 papers)

Citations (3)

View on Semantic Scholar

Summary

The paper introduces a novel SLAM approach that unites dual quadrics and ORB feature points as hierarchical landmarks.
It employs a unique normalized Wasserstein distance-based observation model to enhance object and feature association.
Experimental results demonstrate that VOOM outperforms state-of-the-art systems like ORB-SLAM2 in demanding, dynamic environments.

Enhancing Localization Accuracy in SLAM with VOOM: A Hierarchical Approach Using Objects and Points

Introduction to VOOM

In the pursuit of more precise and robust SLAM systems, the integration of high-level semantic information through objects alongside low-level point features has shown promise. This paper introduces the Visual Object Odometry and Mapping (VOOM) framework, which leverages dual quadrics to represent high-level physical objects and ORB feature points for low-level landmarks in a cohesive SLAM process. By employing a hierarchical landmark system and presenting an improved observation model alongside a novel data association method, VOOM notably surpasses the localization accuracy of both object-oriented and feature point-based SLAM systems, including the widely recognized ORB-SLAM2.

Key Contributions

The VOOM framework's main contributions can be distilled into three significant advancements:

The innovative union of dual quadrics and feature points as hierarchical landmarks enhances the SLAM process, offering a more detailed and accurate mapping and localization system.
The introduction of effective algorithms for object optimization and association, alongside an innovative method for associating these objects with map points. This approach facilitates the construction of a more accurate and reality-reflective 3D map.
A comprehensive suite of experimental validations that demonstrate VOOM's superior performance in localization accuracy across various sequences when compared with state-of-the-art SLAM methodologies.

Technical Breakdown

Visual Object Odometry and Mapping

VOOM's process starts by taking RGB-D images as input, segmenting instances to detect objects, and processing ORB feature points for pose prediction. A novel aspect of VOOM is its use of the normalized Wasserstein distance for a dual quadric-based observation model, enhancing the accuracy of object-level mapping and localization. This method allows for a more precise association of feature points with their corresponding map points, facilitating an accurate and efficient update of the map and optimization of the camera pose and objects.

Enhanced Data Association

Contrary to traditional methods that rely on IoU metrics for object data association, VOOM employs the Wasserstein distance, benefiting from its sensitivity to the shape, orientation, and scale of objects. This shift enables a more nuanced and robust object association process, particularly vital for small or dynamically positioned objects. Additionally, VOOM's object association methodology integrates seamlessly with its object optimization process, further refining the overall SLAM performance.

Experimental Validation

The framework was rigorously tested against well-known datasets and compared with leading SLAM systems, notably exhibiting superior performance in terms of localization accuracy. The experiments highlight VOOM's robustness, especially in scenarios with dynamic objects or in sequences where traditional feature-based methods struggle due to lack of texture or structural complexity.

Future Directions

Looking forward, the integration of a loop closure and relocalization module that effectively utilizes both objects and point features represents a logical extension of the VOOM framework. Such advancements could address long-term stability and accuracy issues in SLAM, paving the way for more reliable autonomous navigation and mapping systems in complex and dynamically changing environments.

Conclusion

The VOOM framework presents a significant step forward in the integration of semantic object information into the SLAM process. By leveraging dual quadrics and feature points in a hierarchical manner, VOOM achieves a new level of localization accuracy, outperforming existing object-oriented and feature-based SLAM systems. Its novel data association method, object-oriented optimization process, and comprehensive experimental validation underscore its potential to enhance both the theoretical understanding and practical applications of SLAM technology. As the research community continues to explore the integration of high-level semantic information into SLAM, VOOM stands out as a promising direction for future developments.

PDF Markdown

Related Papers

Tweets

https://twitter.com/zhenjun_zhao/status/1760503546963796473