DynaSLAM: Tracking, Mapping and Inpainting in Dynamic Scenes (1806.05620v2)

Published 14 Jun 2018 in cs.CV

Abstract: The assumption of scene rigidity is typical in SLAM algorithms. Such a strong assumption limits the use of most visual SLAM systems in populated real-world environments, which are the target of several relevant applications like service robotics or autonomous vehicles. In this paper we present DynaSLAM, a visual SLAM system that, building over ORB-SLAM2 [1], adds the capabilities of dynamic object detection and background inpainting. DynaSLAM is robust in dynamic scenarios for monocular, stereo and RGB-D configurations. We are capable of detecting the moving objects either by multi-view geometry, deep learning or both. Having a static map of the scene allows inpainting the frame background that has been occluded by such dynamic objects. We evaluate our system in public monocular, stereo and RGB-D datasets. We study the impact of several accuracy/speed trade-offs to assess the limits of the proposed methodology. DynaSLAM outperforms the accuracy of standard visual SLAM baselines in highly dynamic scenarios. And it also estimates a map of the static parts of the scene, which is a must for long-term applications in real-world environments.

Citations (740)

View on Semantic Scholar

Summary

The paper pioneers an enhanced SLAM algorithm that uses CNN-based segmentation and multi-view geometry to accurately detect dynamic objects.
It employs a lightweight tracking module that minimizes reprojection error, ensuring precise camera localization amid moving elements.
The system inpaints occluded backgrounds from previous views, facilitating reliable mapping for augmented reality and long-term environmental monitoring.

DynaSLAM: Tracking, Mapping and Inpainting in Dynamic Scenes

The paper "DynaSLAM: Tracking, Mapping and Inpainting in Dynamic Scenes" represents a notable advancement in the field of Simultaneous Localization and Mapping (SLAM), particularly addressing the challenges posed by dynamic environments. Traditionally, SLAM algorithms operate under the assumption of a static scene, which significantly hampers their efficacy in scenarios with moving objects. Such limitations restrict the application of visual SLAM systems in real-world environments where dynamic elements like pedestrians and vehicles are prevalent.

Contributions and Methodology

DynaSLAM builds on the established ORB-SLAM2 framework, enhancing it with capabilities for dynamic object detection and background inpainting across monocular, stereo, and RGB-D camera configurations. The core contributions and methodologies of DynaSLAM can be summarized as follows:

Dynamic Object Detection: DynaSLAM employs a dual approach combining multi-view geometry and deep learning techniques to robustly detect moving objects. For monocular and stereo configurations, a Convolutional Neural Network (CNN) based on Mask R-CNN is used for pixel-wise semantic segmentation of a priori dynamic objects. In the RGB-D case, the system leverages both multi-view geometry models and deep learning algorithms to refine motion segmentation and detect new dynamic instances not classified by the CNN.
Low-Cost Tracking Module: The incorporation of a lightweight tracking algorithm is pivotal for efficiently localizing the camera within the constructed map, especially in the presence of dynamic content. This module projects map features into the current frame and minimizes the reprojection error to optimize camera pose.
Background Inpainting: DynaSLAM reconstructs the occluded background in frames where dynamic objects have been detected and removed. This background reconstruction is facilitated by reusing information from prior views, making the system suitable for applications such as virtual reality and place recognition in long-term mapping scenarios.

Experimental Validation

The efficacy of DynaSLAM is validated through extensive experiments on public datasets, including the TUM RGB-D dataset and the KITTI dataset. The experiments benchmark DynaSLAM against several state-of-the-art dynamic SLAM solutions, with results highlighting substantial improvements in both tracking accuracy and robustness.

TUM RGB-D Dataset

The TUM RGB-D dataset, featuring indoor sequences with varying degrees of dynamic content, serves as a challenging benchmark for DynaSLAM. The system's performance metrics, particularly the Absolute Trajectory RMSE (Root Mean Square Error), illustrate its superior accuracy in highly dynamic scenarios compared to standard RGB-D SLAM systems. For instance, in sequences with significant motion (e.g., 'walking_xyz'), DynaSLAM achieved an RMSE of 0.015m, a notable improvement over the 0.459m RMSE of RGB-D ORB-SLAM2 without motion detection.

KITTI Dataset

In the KITTI dataset, which comprises urban and highway driving scenarios, DynaSLAM’s performance was evaluated for both stereo and monocular configurations. While its accuracy is comparable to ORB-SLAM2 in static and low-dynamic scenes, DynaSLAM demonstrates enhanced robustness in environments with considerable dynamic objects. For example, in the 'KITTI 01' sequence, involving moving vehicles, DynaSLAM markedly reduced trajectory errors.

Implications and Future Directions

The implications of DynaSLAM’s advancements are multifaceted:

Enhanced Robustness in Real-World Applications: By effectively managing dynamic content, DynaSLAM extends the applicability of visual SLAM systems to real-world scenarios, including service robotics and autonomous driving in populated environments.
Reusability of Static Maps: The ability to construct and maintain static maps without dynamic elements supports the development of SLAM systems for long-term applications, facilitating persistent environment monitoring and iterative map updates.
Potential for Augmented Reality: The background inpainting capabilities of DynaSLAM, which generate occlusion-free synthetic frames, hold promise for augmented and virtual reality applications where consistency and scene integrity are critical.

Future work on DynaSLAM could focus on real-time optimization, incorporating RGB-based motion detection to distinguish between movable and moving objects dynamically, and enhancing the realism of synthetic frames using advanced inpainting techniques, such as those involving Generative Adversarial Networks (GANs).

In conclusion, DynaSLAM represents a significant step forward in visual SLAM systems, particularly in managing dynamic scenes, thus broadening the horizon for practical applications in complex, real-world environments.

PDF Markdown