Orbeez-SLAM: A Real-time Monocular Visual SLAM with ORB Features and NeRF-realized Mapping (2209.13274v2)

Published 27 Sep 2022 in cs.RO and cs.CV

Abstract: A spatial AI that can perform complex tasks through visual signals and cooperate with humans is highly anticipated. To achieve this, we need a visual SLAM that easily adapts to new scenes without pre-training and generates dense maps for downstream tasks in real-time. None of the previous learning-based and non-learning-based visual SLAMs satisfy all needs due to the intrinsic limitations of their components. In this work, we develop a visual SLAM named Orbeez-SLAM, which successfully collaborates with implicit neural representation and visual odometry to achieve our goals. Moreover, Orbeez-SLAM can work with the monocular camera since it only needs RGB inputs, making it widely applicable to the real world. Results show that our SLAM is up to 800x faster than the strong baseline with superior rendering outcomes. Code link: https://github.com/MarvinChung/Orbeez-SLAM.

Citations (92)

View on Semantic Scholar

Summary

The paper presents a novel SLAM system that combines ORB visual odometry with NeRF-based mapping for real-time, pre-training-free operation using only RGB data.
The system employs ray-casting triangulation to efficiently generate dense, detailed maps and achieves competitive Absolute Trajectory Error metrics.
Experimental results demonstrate that Orbeez-SLAM outperforms baselines in speed and rendering quality, making it ideal for robotics and AR applications.

Overview of Orbeez-SLAM: A Monocular Visual SLAM System

The paper presents Orbeez-SLAM, an advanced monocular visual SLAM system that integrates ORB features with NeRF-based mapping, allowing for real-time operation and pre-training-free scene adaptation. The proposed system addresses the limitations of traditional SLAM systems, which primarily focus on localization accuracy with sparse mapping, and learning-based SLAMs, which often require pre-training or depth input.

Orbeez-SLAM leverages ORB-SLAM2 as its core visual odometry mechanism and enhances it with a Neural Radiance Field (NeRF) to create dense, detailed maps of the environment in real-time. The system operates with only RGB data, making it adaptable to various real-world applications without relying on additional depth sensors.

Technical Contributions

Integration of VO and NeRF: By integrating visual odometry (VO) from ORB-SLAM2 with the NeRF-based mapping framework, Orbeez-SLAM ensures accurate pose estimation and efficient map construction. This dual approach enables the system to operate effectively with a monocular camera setup, avoiding the dependency on depth data that other similar systems require.
Real-time and Pre-training-free Operation: The method is designed to function in real-time without needing pre-training, utilizing fast NeRF optimization implemented on the instant-ngp platform. This allows for immediate deployment in novel environments, a significant advancement over previous NeRF-SLAM integrations that necessitate depth input or suffer from slow convergence times.
Ray-casting Triangulation: The paper introduces a novel ray-casting triangulation method within the NeRF framework that efficiently generates dense map points in real-time. This technique obviates the extensive pre-processing required in other methods, enhancing computational efficiency and map detail.

Experimental Results

The authors evaluate Orbeez-SLAM on several benchmarks, including TUM RGB-D, ScanNet, and Replica, demonstrating its superior performance in various scenarios. Notably, the system achieves competitive Absolute Trajectory Error (ATE) metrics against both deep learning-based and traditional visual SLAM baselines. While ORB-SLAM2 provides an upper bound on pose estimation performance, Orbeez-SLAM achieves comparable ATE while delivering full scene reconstructions.

Orbeez-SLAM excels in producing high-quality renderings with better depth and image metrics than its competitors, particularly under the RGB-only settings. The system also outperforms NICE-SLAM in terms of speed, operating 360 to 800 times faster on tested benchmarks, demonstrating both its effectiveness and practical application potential.

Implications and Future Directions

The integration of NeRF-based mapping with VO in Orbeez-SLAM marks significant progress in the development of spatial AI systems capable of real-time processing. The advancements presented in this work suggest several practical applications, from autonomous robotics in domestic environments to augmented reality systems that require fine-grained scene mapping and understanding.

Despite these advancements, the paper identifies future research opportunities such as optimizing performance on large-scale environments with compartmentalizations, as evidenced by challenges faced on the ScanNet dataset. Enhancing large scene adaptability without compromising real-time capabilities remains a priority for future iterations of the system.

Overall, Orbeez-SLAM provides an important contribution to the field by addressing the scalability and adaptability challenges in real-time visual SLAM systems, offering a robust framework that balances accuracy, speed, and scene completeness.

PDF Markdown

Related Papers

GitHub

GitHub - MarvinChung/Orbeez-SLAM (247 stars)