NID-SLAM: Neural Implicit Representation-based RGB-D SLAM in dynamic environments (2401.01189v2)

Published 2 Jan 2024 in cs.RO and cs.AI

Abstract: Neural implicit representations have been explored to enhance visual SLAM algorithms, especially in providing high-fidelity dense map. Existing methods operate robustly in static scenes but struggle with the disruption caused by moving objects. In this paper we present NID-SLAM, which significantly improves the performance of neural SLAM in dynamic environments. We propose a new approach to enhance inaccurate regions in semantic masks, particularly in marginal areas. Utilizing the geometric information present in depth images, this method enables accurate removal of dynamic objects, thereby reducing the probability of camera drift. Additionally, we introduce a keyframe selection strategy for dynamic scenes, which enhances camera tracking robustness against large-scale objects and improves the efficiency of mapping. Experiments on publicly available RGB-D datasets demonstrate that our method outperforms competitive neural SLAM approaches in tracking accuracy and mapping quality in dynamic environments.

References (18)

Citations (4)

View on Semantic Scholar

Summary

The paper presents a novel SLAM approach that refines semantic masks and introduces a keyframe selection strategy to filter out dynamic objects.
It integrates depth-guided segmentation and ray sampling techniques to achieve detailed, efficient, and stable 3D reconstructions.
Benchmark evaluations demonstrate enhanced tracking accuracy and mapping quality, despite limitations due to segmentation speed.

Introduction to NID-SLAM

The advent of SLAM (Simultaneous Localization and Mapping) using RGB-D cameras has been pivotal for 3D environmental mapping. The integration of neural implicit representations, particularly neural radiance fields (NeRF), has enhanced the details and coherence of these maps. Yet, a significant challenge arises when dynamic objects enter the scene, causing tracking inaccuracies and map inconsistencies. NID-SLAM steps in as a solution for robust mapping and tracking in dynamic environments.

Advancing SLAM in Dynamic Environments

NID-SLAM is built to address the deficiencies of current neural SLAM systems that fall short in dynamic settings. By refining semantic masks and leveraging depth information, NID-SLAM adeptly eliminates dynamic elements from scenes, which significantly improves tracking and mapping. This research introduces an innovative keyframe selection approach tailored for dynamic scenarios. These advancements are shown to be superior to existing neural SLAM methodologies, particularly when faced with large-scale object movement.

Technical Innovations in NID-SLAM

Several key technical contributions have been made in NID-SLAM that together enhance its performance:

Depth-guided semantic segmentation improves the accuracy of dynamic object detection, with special attention paid to refining edge areas.
Background inpainting repairs occluded backgrounds using static information from the environment when dynamic objects are removed.
A novel keyframe selection strategy optimizes the inclusion of frames that contain less dynamic content and have a low overlap with prior keyframes, enhancing stability and mapping detail.
The scene representation harnesses multiresolution geometric and color feature grids, facilitating highly detailed reconstructions.
Ray sampling during rendering focuses on surfaces and eliminates non-contributing points, ensuring efficiency and accuracy.

Performance Evaluation and Limitations

Benchmarking on standard RGB-D datasets has demonstrated NID-SLAM's proficiency in improving mapping quality and tracking accuracy in dynamic environments. The ablation paper further validates the individual effectiveness of proposed components such as depth revision, sampling strategy, and keyframe selection. Despite its advancements, the system does have limitations, particularly the dependency on the speed of the segmentation network affecting real-time performance. Future directions could focus on optimizing the balance between segmentation speed and quality, and exploiting neural network predictions to attain even better background inpainting results.

Related Papers

Tweets

https://twitter.com/1565330182176911367/status/1742395570353054081