Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
97 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
5 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

ORB-SLAM3: An Accurate Open-Source Library for Visual, Visual-Inertial and Multi-Map SLAM (2007.11898v2)

Published 23 Jul 2020 in cs.RO

Abstract: This paper presents ORB-SLAM3, the first system able to perform visual, visual-inertial and multi-map SLAM with monocular, stereo and RGB-D cameras, using pin-hole and fisheye lens models. The first main novelty is a feature-based tightly-integrated visual-inertial SLAM system that fully relies on Maximum-a-Posteriori (MAP) estimation, even during the IMU initialization phase. The result is a system that operates robustly in real-time, in small and large, indoor and outdoor environments, and is 2 to 5 times more accurate than previous approaches. The second main novelty is a multiple map system that relies on a new place recognition method with improved recall. Thanks to it, ORB-SLAM3 is able to survive to long periods of poor visual information: when it gets lost, it starts a new map that will be seamlessly merged with previous maps when revisiting mapped areas. Compared with visual odometry systems that only use information from the last few seconds, ORB-SLAM3 is the first system able to reuse in all the algorithm stages all previous information. This allows to include in bundle adjustment co-visible keyframes, that provide high parallax observations boosting accuracy, even if they are widely separated in time or if they come from a previous mapping session. Our experiments show that, in all sensor configurations, ORB-SLAM3 is as robust as the best systems available in the literature, and significantly more accurate. Notably, our stereo-inertial SLAM achieves an average accuracy of 3.6 cm on the EuRoC drone and 9 mm under quick hand-held motions in the room of TUM-VI dataset, a setting representative of AR/VR scenarios. For the benefit of the community we make public the source code.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (5)
  1. Carlos Campos (8 papers)
  2. Richard Elvira (4 papers)
  3. Juan J. Gómez Rodríguez (6 papers)
  4. José M. M. Montiel (9 papers)
  5. Juan D. Tardós (23 papers)
Citations (2,363)

Summary

  • The paper presents a novel MAP-based visual-inertial SLAM system that achieves 2-10x accuracy improvements over previous methods.
  • The system integrates multi-map management with enhanced place recognition to effectively recover from long periods of visual degradation.
  • ORB-SLAM3 supports diverse sensor configurations including monocular, stereo, and RGB-D, enabling versatile applications in autonomous navigation and AR/VR.

Overview of ORB-SLAM3: An Accurate Open-Source Library for Visual, Visual-Inertial and Multi-Map SLAM

Introduction

ORB-SLAM3 aims to advance the capabilities of Simultaneous Localization and Mapping (SLAM) by integrating visual, visual-inertial, and multi-map functionalities into a robust open-source library. This paper, authored by Carlos Campos, Richard Elvira, Juan J. Gómez Rodríguez, José M. M. Montiel, and Juan D. Tardós, presents a comprehensive system capable of monocular, stereo, and RGB-D configurations, using both pin-hole and fisheye lens models. The primary contributions lie in the development of a tightly-integrated visual-inertial SLAM system and a multiple map management system, underpinned by robust place recognition methods.

Key Contributions

  1. Visual-Inertial SLAM: ORB-SLAM3 incorporates a novel feature-based visual-inertial SLAM system that relies entirely on Maximum-a-Posteriori (MAP) estimation, effective from the IMU initialization phase. This approach ensures robust real-time operation across various environments, achieving accuracies two to ten times better than prior methods.
  2. Multiple Map Management: The system introduces an enhanced place recognition method that significantly improves recall, enabling ORB-SLAM3 to handle long periods of visual degradation by starting new maps that seamlessly integrate with prior maps upon revisitation.
  3. Comprehensive Configurations: ORB-SLAM3 supports a range of sensor configurations—monocular, stereo, RGB-D—and various lens models, providing versatility across different application scenarios.

Experimental Results

The system's robustness and accuracy were empirically validated using the EuRoC dataset. The experiments demonstrated that ORB-SLAM3 consistently outperforms leading contemporary SLAM systems in all sensor configurations.

  • Monocular SLAM: Compared to ORB-SLAM2 and DSO, ORB-SLAM3 showed significantly enhanced robustness and precision, effectively managing challenging sequences and tracking losses.
  • Stereo SLAM: The stereo configurations of ORB-SLAM3 yielded accuracies up to four times better than VINS-Fusion and SVO.
  • Monocular-Inertial SLAM: The monocular-inertial version surpassed MSCKF, OKVIS, and ROVIO, achieving superior robustness and precision, especially in complex sequences.
  • Stereo-Inertial SLAM: This configuration achieved top-tier accuracy surpassing BASALT, effectively managing even the sequences with missing frames from the EuRoC dataset.

Theoretical and Practical Implications

ORB-SLAM3's contributions extend both theoretical and practical frontiers in SLAM research. The integration of comprehensive data associations—short-term, mid-term, long-term, and multi-map—addresses fundamental challenges in SLAM. The MAP estimation for visual-inertial initialization offers robust and rapid sensor calibration, vastly improving practical deployment scenarios.

From a practical perspective, ORB-SLAM3's public release as an open-source library facilitates its adoption and further improvement by the research community. The system's performance under various configurations—monocular, stereo, mono-inertial, and stereo-inertial—provides flexibility for diverse applications ranging from autonomous navigation to augmented and virtual reality (AR/VR).

Future Directions

Future research might explore photometric methods to enhance the system's performance in low-texture environments, addressing one of ORB-SLAM3's key limitations. Additionally, investigating hybrid techniques that combine the advantages of both feature-based and direct methods could further improve robustness and accuracy in diverse scenarios.

Conclusion

ORB-SLAM3 sets a new benchmark in SLAM systems by providing a robust, accurate, and versatile solution that integrates visual and inertial data across multiple maps. Its exceptional performance in empirical evaluations underscores its potential for broad application in both research and industry.

By combining state-of-the-art MAP estimation, robust place recognition, and comprehensive support for multiple sensor configurations, ORB-SLAM3 stands as a significant advancement in the field of visual and visual-inertial SLAM. This work not only enhances the understanding of SLAM system design but also catalyzes further research and development in creating more efficient and reliable autonomous navigation systems.