DXSLAM: A Robust and Efficient Visual SLAM System with Deep Features (2008.05416v1)

Published 12 Aug 2020 in cs.CV and cs.RO

Abstract: A robust and efficient Simultaneous Localization and Mapping (SLAM) system is essential for robot autonomy. For visual SLAM algorithms, though the theoretical framework has been well established for most aspects, feature extraction and association is still empirically designed in most cases, and can be vulnerable in complex environments. This paper shows that feature extraction with deep convolutional neural networks (CNNs) can be seamlessly incorporated into a modern SLAM framework. The proposed SLAM system utilizes a state-of-the-art CNN to detect keypoints in each image frame, and to give not only keypoint descriptors, but also a global descriptor of the whole image. These local and global features are then used by different SLAM modules, resulting in much more robustness against environmental changes and viewpoint changes compared with using hand-crafted features. We also train a visual vocabulary of local features with a Bag of Words (BoW) method. Based on the local features, global features, and the vocabulary, a highly reliable loop closure detection method is built. Experimental results show that all the proposed modules significantly outperforms the baseline, and the full system achieves much lower trajectory errors and much higher correct rates on all evaluated data. Furthermore, by optimizing the CNN with Intel OpenVINO toolkit and utilizing the Fast BoW library, the system benefits greatly from the SIMD (single-instruction-multiple-data) techniques in modern CPUs. The full system can run in real-time without any GPU or other accelerators. The code is public at https://github.com/ivipsourcecode/dxslam.

Authors (8)

Dongjiang Li (9 papers)
Xuesong Shi (11 papers)
Qiwei Long (3 papers)
Shenghui Liu (11 papers)
Wei Yang (350 papers)
Fangshi Wang (2 papers)
Qi Wei (53 papers)
Fei Qiao (18 papers)

Citations (100)

View on Semantic Scholar

Summary

The paper introduces a novel SLAM system that integrates HF-Net to extract both local and global features, enhancing performance in complex environments.
It employs CPU-optimized techniques using SIMD and OpenVINO to achieve real-time processing without the need for GPU acceleration.
Enhanced loop closure detection and re-localization strategies lead to improvements in accuracy, reducing average translation error and improving trajectory precision on benchmark datasets.

DXSLAM: A Robust Visual SLAM System with Deep Features

The paper presents DXSLAM, a visual Simultaneous Localization and Mapping (SLAM) system that leverages deep convolutional neural networks (CNNs) for feature extraction in real-time applications. Traditional SLAM systems often rely on hand-crafted features, such as SIFT, ORB, or Shi-Tomasi, which can be ineffective in complex or changing environments. DXSLAM addresses this limitation by integrating CNN-generated features into the SLAM framework, enhancing both robustness and efficiency.

System Design and Methodology

DXSLAM utilizes deep CNNs, particularly the HF-Net architecture, to automatically extract both local and global features from images. This integration allows for more robust feature detection and description, adaptable to changes in illumination, background, and viewpoint. The HF-Net model provides keypoint detection and descriptors, along with a global descriptor for the entire image, which are critical for various SLAM components like pose tracking and loop closure detection (LCD).

To ensure efficient operation on resource-constrained devices, the system exploits SIMD techniques in modern CPUs, optimized through Intel's OpenVINO toolkit. This optimization allows DXSLAM to run real-time without the need for GPU accelerators, a significant advancement in enabling SLAM use in mobile robotics.

Key Contributions and Results

DXSLAM's architecture includes several novel contributions:

Feature Extraction: By employing HF-Net, DXSLAM produces high-quality keypoint and global features. The design achieves enhanced robustness against environmental shifts compared to traditional hand-crafted features.
Loop Closure Detection: The LCD system combines global and local features to reliably identify previously visited locations. This dual-level feature approach outperforms conventional methods such as ORB-SLAM2 by reducing false positives and improving precision and recall.
Re-localization Method: The system implements a re-localization strategy using global feature-based image retrieval coupled with group matching, achieving higher success rates and lower computational costs than methods based on conventional Bag of Words (BoW).

The experimental validation using datasets such as OpenLORIS-Scene and TUM RGB-D highlights the strength of DXSLAM, showcasing it as surpassing traditional systems in environments with challenging conditions like low light and dynamic scenes. The demonstrated improvement in average translation error (ATE) and trajectory accuracy further indicates its robustness and precision.

Implications and Future Directions

The adaptation of deep learning into SLAM systems, as demonstrated by DXSLAM, indicates a promising shift towards more adaptive and intelligent localization solutions. Particularly, the efficient implementation on CPUs makes it viable for real-world robotic systems that require on-device processing.

Future research may focus on optimizing CNN architectures specifically for SLAM applications, potentially improving feature extraction and system resilience further. Integrating DXSLAM capabilities into more advanced SLAM systems, like ORB-SLAM3, may also yield additional performance gains by combining state-of-the-art algorithms across different SLAM functionalities.

In conclusion, DXSLAM represents a valuable step forward in utilizing deep learning for enhancing SLAM system performance in dynamically challenging environments, laying groundwork for future advancements in robotic autonomy and navigation capabilities.

PDF Markdown

Related Papers

GitHub

GitHub - ivipsourcecode/dxslam (408 stars)