Robust Gaussian Splatting SLAM by Leveraging Loop Closure (2409.20111v1)

Published 30 Sep 2024 in cs.RO

Abstract: 3D Gaussian Splatting algorithms excel in novel view rendering applications and have been adapted to extend the capabilities of traditional SLAM systems. However, current Gaussian Splatting SLAM methods, designed mainly for hand-held RGB or RGB-D sensors, struggle with tracking drifts when used with rotating RGB-D camera setups. In this paper, we propose a robust Gaussian Splatting SLAM architecture that utilizes inputs from rotating multiple RGB-D cameras to achieve accurate localization and photorealistic rendering performance. The carefully designed Gaussian Splatting Loop Closure module effectively addresses the issue of accumulated tracking and mapping errors found in conventional Gaussian Splatting SLAM systems. First, each Gaussian is associated with an anchor frame and categorized as historical or novel based on its timestamp. By rendering different types of Gaussians at the same viewpoint, the proposed loop detection strategy considers both co-visibility relationships and distinct rendering outcomes. Furthermore, a loop closure optimization approach is proposed to remove camera pose drift and maintain the high quality of 3D Gaussian models. The approach uses a lightweight pose graph optimization algorithm to correct pose drift and updates Gaussians based on the optimized poses. Additionally, a bundle adjustment scheme further refines camera poses using photometric and geometric constraints, ultimately enhancing the global consistency of scenarios. Quantitative and qualitative evaluations on both synthetic and real-world datasets demonstrate that our method outperforms state-of-the-art methods in camera pose estimation and novel view rendering tasks. The code will be open-sourced for the community.

Summary

The paper presents a novel loop closure module that significantly reduces drift and improves camera pose estimation in Gaussian Splatting SLAM.
It integrates a multi-RGB-D camera setup with advanced keyframe selection and Gaussian densification to boost rendering quality.
Experimental results demonstrate notable PSNR improvements and superior depth estimation accuracy on both synthetic and real-world datasets.

Robust Gaussian Splatting SLAM by Leveraging Loop Closure

This paper proposes a robust Gaussian Splatting SLAM (GSS) system tailored for rotating devices equipped with multiple RGB-D cameras. The principal focus is on enabling accurate localization and photorealistic rendering performance through a novel loop closure module designed specifically for Gaussian Splatting techniques. The proposed methodologies are evaluated extensively on both synthetic and real-world datasets, highlighting their efficacy in enhancing camera pose estimation and rendering quality.

Background and Motivation

Simultaneous Localization and Mapping (SLAM) systems are cornerstones in robotics and computer vision, providing foundational capabilities for navigation and environmental understanding. Recent advancements have integrated neural radiance fields, such as NeRF, into SLAM methods to enhance novel view rendering. Despite these advancements, conventional SLAM systems remain challenged with issues like tracking drifts, particularly when employed with rotating RGB-D camera setups.

Gaussians splatting, with its explicit point-based representations, offers an effective alternative. However, state-of-the-art GSS methods still grapple with drift and mapping errors from handheld sensors when adapted to robust applications. This paper addresses these challenges by introducing a Gaussian Splatting SLAM architecture that incorporates a loop closure module, enhancing accuracy and photorealism in mapping.

Methodology

The proposed system architecture is designed to handle inputs from rotating multiple RGB-D cameras. It includes three main components: camera pose tracking, keyframe selection and Gaussian densification, and a loop closure module.

3D Gaussian Splatting

The 3D Gaussian representation ( $\mathcal{G} = [\bm{\mu}, \mathbf{S}, \mathbf{U}, \mathbf{c}, o]$ ) plays a critical role. The mean vector, scaling matrix, rotational matrix, color, and opacity define a Gaussian in the 3D space. Gaussian parameters are optimized by rendering their projections onto 2D image planes and employing differentiable operations for updating these parameters.

Camera Pose Tracking

The tracking stage leverages RGB-D inputs from three cameras, where the motion model provides initial poses for each frame. Photometric and geometric residuals between observed and rendered images inform the loss function, which is minimized to refine current camera poses. A joint loss function incorporates constraints from overlapping camera views to achieve refined poses, even in complex, dynamic environments.

Keyframe Selection and Gaussian Densification

The system selects keyframes to optimize Gaussian parameters, similar to traditional SLAM keyframes, but adds a technique for Gaussian densification. By detecting areas of low representation in the map, new Gaussians are generated, enhancing map density and stability.

Loop Closure

The loop closure module is pivotal in addressing accumulated drift:

Loop Detection: Gaussians are categorized based on timestamps into historical and novel groups. A novel loop detection strategy considers both co-visibility and SSIM distances between rendered images from these Gaussian sets.
Pose Graph Optimization: Lightweight pose graph optimization corrects camera pose drift using relative transformations between keyframes.
Gaussian Updating and Bundle Adjustment: Anisotropic Gaussians associated with respective anchor frames are updated based on optimized poses. Finally, a two-stage bundle adjustment scheme refines poses using photometric and geometric constraints for global consistency.

Experimental Results

Quantitative and qualitative evaluations demonstrate that the proposed method significantly surpasses existing state-of-the-art GSS methods. Notable metrics include PSNR, SSIM, LPIPS for rendering quality, and the L1 distance in depth estimation. Specifically, experiments on virtual datasets indicate PSNR improvements up to 38.678 dB in noisy environments. Real-world dataset evaluations further confirm the method's robustness, displaying superior rendering and depth estimation accuracy.

Tables \ref{virtual w/o jitters}, \ref{virtual}, and \ref{real data} offer comprehensive comparisons across various metrics, while Figures \ref{visual render} and \ref{real render} provide visual evidence of the rendering enhancements achieved by incorporating the loop closure module.

Implications and Future Work

The proposed GSS SLAM system not only addresses critical drift issues but also enables more accurate and photorealistic scene reconstructions in dynamic and complex environments. The integration of efficient Gaussian map updating and robust loop closure techniques presents a strong framework for future developments in SLAM systems.

Future work can explore extending this approach to dynamic scenes using 4D Gaussian methods, incorporating motion constraints for dynamic objects, and achieving robust tracking and rendering in broader environmental contexts.

In summary, this paper provides a comprehensive solution to typical SLAM system challenges, leveraging Gaussian splatting and loop closure to significantly enhance both practical and theoretical aspects of camera pose estimation and novel view rendering.

PDF Markdown

Related Papers

Tweets

https://twitter.com/zhenjun_zhao/status/1841337170927452641