Gaussian Splatting SLAM (2312.06741v2)

Published 11 Dec 2023 in cs.CV and cs.RO

Abstract: We present the first application of 3D Gaussian Splatting in monocular SLAM, the most fundamental but the hardest setup for Visual SLAM. Our method, which runs live at 3fps, utilises Gaussians as the only 3D representation, unifying the required representation for accurate, efficient tracking, mapping, and high-quality rendering. Designed for challenging monocular settings, our approach is seamlessly extendable to RGB-D SLAM when an external depth sensor is available. Several innovations are required to continuously reconstruct 3D scenes with high fidelity from a live camera. First, to move beyond the original 3DGS algorithm, which requires accurate poses from an offline Structure from Motion (SfM) system, we formulate camera tracking for 3DGS using direct optimisation against the 3D Gaussians, and show that this enables fast and robust tracking with a wide basin of convergence. Second, by utilising the explicit nature of the Gaussians, we introduce geometric verification and regularisation to handle the ambiguities occurring in incremental 3D dense reconstruction. Finally, we introduce a full SLAM system which not only achieves state-of-the-art results in novel view synthesis and trajectory estimation but also reconstruction of tiny and even transparent objects.

References (48)

Authors (4)

Hidenobu Matsuki (5 papers)
Riku Murai (13 papers)
Paul H. J. Kelly (34 papers)
Andrew J. Davison (64 papers)

Citations (150)

View on Semantic Scholar

Summary

The paper introduces the first real-time SLAM system using 3D Gaussian splatting, overcoming limitations of traditional 3D representations.
It derives analytical camera pose Jacobians and employs geometric regularization to ensure robust tracking and accurate scene reconstruction.
Experimental results on TUM RGB-D and Replica datasets demonstrate state-of-the-art performance and high-fidelity view synthesis.

An Expert Review of "Gaussian Splatting SLAM"

The publication "Gaussian Splatting SLAM" introduces a novel application of 3D Gaussian Splatting (3DGS) in real-time Simultaneous Localization and Mapping (SLAM) for monocular and RGB-D cameras. This method harnesses the strength of Gaussian functions as the solitary 3D representation, integrating essential processes of SLAM such as tracking, mapping, and rendering into a unified system. This work expands the application spectrum of 3DGS from offline use-cases to real-time camera operations, showcasing several computational innovations to optimize this transition.

Main Contributions

The paper presents several key contributions:

First Real-Time SLAM System Using 3DGS: This work pioneers the use of 3D Gaussian splats as the exclusive 3D representation in a real-time visual SLAM system. Traditionally, scene representations have varied significantly, including mesh, voxel grids, or neural fields, often facing challenges like high memory demands or lack of flexibility in the scene representation. By employing a smooth, continuously differentiable Gaussian representation, the system inherits the flexibility and efficiency of point clouds while enabling direct alignment for camera tracking and mapping.
Analytical Derivation of Camera Pose Jacobians: To adapt the real-time SLAM functionality, the paper introduces an analytic derivation of the Jacobian with respect to camera poses, allowing efficient integration with existing differentiable rendering techniques. This innovation allows the algorithm to rapidly converge on camera tracking by direct optimization against the Gaussian map, rather than relying on pre-estimated camera poses from an offline Structure from Motion (SfM) system.
Geometric Regularization Through Gaussian Shape Constraints: The authors propose a novel regularization term to maintain geometric consistency during the incremental reconstruction process. This isotropic regularization reduces artifacts by discouraging excessive elongation of Gaussians along the viewing direction, particularly beneficial in ensuring correct geometry capture in textureless regions or when multiple views are sparse.
Incremental 3D Scene Reconstruction and Keyframe Management: The paper details a keyframe selection strategy and Gaussian resource management to optimize the computational resources without sacrificing the accuracy of global map updates. Gaussian pruning mechanisms and scene geometry constraints support efficient memory usage and maintain architectural simplicity.

Experimental Results

The paper presents extensive evaluations on multiple datasets, notably the TUM RGB-D and Replica datasets. The proposed system shows competitive performance, achieving state-of-the-art results in absolute trajectory error (ATE) for both monocular and RGB-D SLAM scenarios. Notably, in the RGB-D case, it surpasses existing neural implicit SLAM methods, commonly criticized for their computational inefficiency due to neural rendering.

The novel view synthesis achieved through 3DGS in this method delivers high-fidelity renderings at unprecedented speeds, as demonstrated by rendering performance metrics and frame rates, significantly outpacing other rendering-based SLAM methods. This rapid rendering capability is crucial for applications demanding low-latency responses.

Implications and Future Directions

The introduction of Gaussian splatting as a core for SLAM systems potentially offers new paradigms in both academic and application contexts within robotics and spatial AI. The smooth, flexible nature of Gaussian representations supports robust, scalable real-time mapping, paving pathways towards future systems capable of even larger-scale outdoor environments.

Future research directions may include integrating loop closure capabilities within this framework, enabling effective performance on long trajectory sequences with reduced drift. Furthermore, optimizing computational efficiency for full real-time operations and exploring deformations of Gaussian representations to encapsulate dynamic environments remain open challenges.

In summary, "Gaussian Splatting SLAM" enriches the SLAM landscape by transcending conventional limitations with innovative 3D representation techniques. This work is a testament to the potential of Gaussian approaches in evolving intelligent sensing systems and represents a meaningful stride towards comprehensive real-time spatial understanding.

PDF Markdown

Related Papers

Tweets

https://twitter.com/taziku_co/status/1791439678601904377

https://twitter.com/varunsiddaraju/status/1764842733275906223

YouTube

Show All Videos