- The paper presents a novel loop closure module that significantly reduces drift and improves camera pose estimation in Gaussian Splatting SLAM.
- It integrates a multi-RGB-D camera setup with advanced keyframe selection and Gaussian densification to boost rendering quality.
- Experimental results demonstrate notable PSNR improvements and superior depth estimation accuracy on both synthetic and real-world datasets.
Robust Gaussian Splatting SLAM by Leveraging Loop Closure
This paper proposes a robust Gaussian Splatting SLAM (GSS) system tailored for rotating devices equipped with multiple RGB-D cameras. The principal focus is on enabling accurate localization and photorealistic rendering performance through a novel loop closure module designed specifically for Gaussian Splatting techniques. The proposed methodologies are evaluated extensively on both synthetic and real-world datasets, highlighting their efficacy in enhancing camera pose estimation and rendering quality.
Background and Motivation
Simultaneous Localization and Mapping (SLAM) systems are cornerstones in robotics and computer vision, providing foundational capabilities for navigation and environmental understanding. Recent advancements have integrated neural radiance fields, such as NeRF, into SLAM methods to enhance novel view rendering. Despite these advancements, conventional SLAM systems remain challenged with issues like tracking drifts, particularly when employed with rotating RGB-D camera setups.
Gaussians splatting, with its explicit point-based representations, offers an effective alternative. However, state-of-the-art GSS methods still grapple with drift and mapping errors from handheld sensors when adapted to robust applications. This paper addresses these challenges by introducing a Gaussian Splatting SLAM architecture that incorporates a loop closure module, enhancing accuracy and photorealism in mapping.
Methodology
The proposed system architecture is designed to handle inputs from rotating multiple RGB-D cameras. It includes three main components: camera pose tracking, keyframe selection and Gaussian densification, and a loop closure module.
3D Gaussian Splatting
The 3D Gaussian representation (G=[μ,S,U,c,o]) plays a critical role. The mean vector, scaling matrix, rotational matrix, color, and opacity define a Gaussian in the 3D space. Gaussian parameters are optimized by rendering their projections onto 2D image planes and employing differentiable operations for updating these parameters.
Camera Pose Tracking
The tracking stage leverages RGB-D inputs from three cameras, where the motion model provides initial poses for each frame. Photometric and geometric residuals between observed and rendered images inform the loss function, which is minimized to refine current camera poses. A joint loss function incorporates constraints from overlapping camera views to achieve refined poses, even in complex, dynamic environments.
Keyframe Selection and Gaussian Densification
The system selects keyframes to optimize Gaussian parameters, similar to traditional SLAM keyframes, but adds a technique for Gaussian densification. By detecting areas of low representation in the map, new Gaussians are generated, enhancing map density and stability.
Loop Closure
The loop closure module is pivotal in addressing accumulated drift:
- Loop Detection: Gaussians are categorized based on timestamps into historical and novel groups. A novel loop detection strategy considers both co-visibility and SSIM distances between rendered images from these Gaussian sets.
- Pose Graph Optimization: Lightweight pose graph optimization corrects camera pose drift using relative transformations between keyframes.
- Gaussian Updating and Bundle Adjustment: Anisotropic Gaussians associated with respective anchor frames are updated based on optimized poses. Finally, a two-stage bundle adjustment scheme refines poses using photometric and geometric constraints for global consistency.
Experimental Results
Quantitative and qualitative evaluations demonstrate that the proposed method significantly surpasses existing state-of-the-art GSS methods. Notable metrics include PSNR, SSIM, LPIPS for rendering quality, and the L1 distance in depth estimation. Specifically, experiments on virtual datasets indicate PSNR improvements up to 38.678 dB in noisy environments. Real-world dataset evaluations further confirm the method's robustness, displaying superior rendering and depth estimation accuracy.
Tables \ref{virtual w/o jitters}, \ref{virtual}, and \ref{real data} offer comprehensive comparisons across various metrics, while Figures \ref{visual render} and \ref{real render} provide visual evidence of the rendering enhancements achieved by incorporating the loop closure module.
Implications and Future Work
The proposed GSS SLAM system not only addresses critical drift issues but also enables more accurate and photorealistic scene reconstructions in dynamic and complex environments. The integration of efficient Gaussian map updating and robust loop closure techniques presents a strong framework for future developments in SLAM systems.
Future work can explore extending this approach to dynamic scenes using 4D Gaussian methods, incorporating motion constraints for dynamic objects, and achieving robust tracking and rendering in broader environmental contexts.
In summary, this paper provides a comprehensive solution to typical SLAM system challenges, leveraging Gaussian splatting and loop closure to significantly enhance both practical and theoretical aspects of camera pose estimation and novel view rendering.