Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
97 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Gaussian Splatting SLAM (2312.06741v2)

Published 11 Dec 2023 in cs.CV and cs.RO

Abstract: We present the first application of 3D Gaussian Splatting in monocular SLAM, the most fundamental but the hardest setup for Visual SLAM. Our method, which runs live at 3fps, utilises Gaussians as the only 3D representation, unifying the required representation for accurate, efficient tracking, mapping, and high-quality rendering. Designed for challenging monocular settings, our approach is seamlessly extendable to RGB-D SLAM when an external depth sensor is available. Several innovations are required to continuously reconstruct 3D scenes with high fidelity from a live camera. First, to move beyond the original 3DGS algorithm, which requires accurate poses from an offline Structure from Motion (SfM) system, we formulate camera tracking for 3DGS using direct optimisation against the 3D Gaussians, and show that this enables fast and robust tracking with a wide basin of convergence. Second, by utilising the explicit nature of the Gaussians, we introduce geometric verification and regularisation to handle the ambiguities occurring in incremental 3D dense reconstruction. Finally, we introduce a full SLAM system which not only achieves state-of-the-art results in novel view synthesis and trajectory estimation but also reconstruction of tiny and even transparent objects.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (48)
  1. Deepfactors: Real-time probabilistic dense monocular SLAM. IEEE Robotics and Automation Letters (RAL), 5(2):721–728, 2020.
  2. BundleFusion: Real-time Globally Consistent 3D Reconstruction using On-the-fly Surface Re-integration. ACM Transactions on Graphics (TOG), 36(3):24:1–24:18, 2017.
  3. Learning a Depth Covariance Function. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2023.
  4. Direct sparse odometry. IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI), 2017.
  5. SVO: Fast Semi-Direct Monocular Visual Odometry. In Proceedings of the IEEE International Conference on Robotics and Automation (ICRA), 2014.
  6. Plenoxels: Radiance fields without neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2022.
  7. Di-fusion: Online implicit 3d reconstruction with deep priors. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2021.
  8. ESLAM: Efficient dense slam system based on hybrid representation of signed distance fields. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2023.
  9. Real-time 3D Reconstruction in Dynamic Scenes using Point-based Fusion. In Proc. of Joint 3DIM/3DPVT Conference (3DV), 2013.
  10. 3D gaussian splatting for real-time radiance field rendering. ACM Transactions on Graphics (TOG), 2023.
  11. Approximate differentiable rendering with algebraic surfaces. In Proceedings of the European Conference on Computer Vision (ECCV), 2022.
  12. Adam: A method for stochastic optimization. In Proceedings of the International Conference on Learning Representations (ICLR), 2015.
  13. Dense rgb slam with neural implicit maps. In Proceedings of the International Conference on Learning Representations (ICLR), 2023.
  14. Neural sparse voxel fields. NeurIPS, 2020.
  15. Dynamic 3d gaussians: Tracking by persistent dynamic view synthesis. 3DV, 2024.
  16. SemanticFusion: Dense 3D semantic mapping with convolutional neural networks. In Proceedings of the IEEE International Conference on Robotics and Automation (ICRA), 2017.
  17. Nerf: Representing scenes as neural radiance fields for view synthesis. In Proceedings of the European Conference on Computer Vision (ECCV), 2020.
  18. Registration of Point Cloud Data from a Geometric Optimization Perspective. In Proceedings of the Symposium on Geometry Processing, 2004.
  19. Instant neural graphics primitives with a multiresolution hash encoding. ACM Transactions on Graphics (TOG), 2022.
  20. ORB-SLAM2: An Open-Source SLAM System for Monocular, Stereo, and RGB-D Cameras. IEEE Transactions on Robotics (T-RO), 33(5):1255–1262, 2017.
  21. ORB-SLAM: a Versatile and Accurate Monocular SLAM System. IEEE Transactions on Robotics (T-RO), 31(5):1147–1163, 2015.
  22. R. A. Newcombe. Dense Visual SLAM. PhD thesis, Imperial College London, 2012.
  23. KinectFusion: Real-Time Dense Surface Mapping and Tracking. In Proceedings of the International Symposium on Mixed and Augmented Reality (ISMAR), 2011.
  24. Differentiable volumetric rendering: Learning implicit 3d representations without 3d supervision. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2020.
  25. Real-time 3D Reconstruction at Scale using Voxel Hashing. In Proceedings of SIGGRAPH, 2013.
  26. A framework for the volumetric integration of depth images. CoRR, abs/1410.0925, 2014.
  27. Point-slam: Dense neural point cloud-based slam. In Proceedings of the International Conference on Computer Vision (ICCV), 2023.
  28. Surfelmeshing: Online surfel-based mesh reconstruction. IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI), 2020.
  29. Bad slam: Bundle adjusted direct rgb-d slam. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2019.
  30. A micro Lie theory for state estimation in robotics. arXiv:1812.01537, 2018.
  31. The Replica dataset: A digital replica of indoor spaces. arXiv preprint arXiv:1906.05797, 2019.
  32. A Benchmark for the Evaluation of RGB-D SLAM Systems. In Proceedings of the IEEE/RSJ Conference on Intelligent Robots and Systems (IROS), 2012.
  33. iMAP: Implicit mapping and positioning in real-time. In Proceedings of the International Conference on Computer Vision (ICCV), 2021.
  34. Direct voxel grid optimization: Super-fast convergence for radiance fields reconstruction. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2022.
  35. Dreamgaussian: Generative gaussian splatting for efficient 3d content creation. arXiv preprint arXiv:2309.16653, 2023.
  36. DROID-SLAM: Deep Visual SLAM for Monocular, Stereo, and RGB-D Cameras. In Neural Information Processing Systems (NIPS), 2021.
  37. Efficient octree-based volumetric SLAM supporting signed-distance and occupancy mapping. IEEE Robotics and Automation Letters (RAL), 2018.
  38. Voge: a differentiable volume renderer using gaussian ellipsoids for analysis-by-synthesis. 2022.
  39. Co-slam: Joint coordinate and sparse parametric encodings for neural real-time slam. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2023.
  40. Real-time large scale dense RGB-D SLAM with volumetric fusion. International Journal of Robotics Research (IJRR), 34(4-5):598–626, 2015a.
  41. ElasticFusion: Dense SLAM without a pose graph. In Proceedings of Robotics: Science and Systems (RSS), 2015b.
  42. 4d gaussian splatting for real-time dynamic scene rendering. arXiv preprint arXiv:2310.08528, 2023.
  43. Vox-fusion: Dense tracking and mapping with voxel-based neural implicit representation. In Proceedings of the International Symposium on Mixed and Augmented Reality (ISMAR), 2022.
  44. Real-time photorealistic dynamic scene representation and rendering with 4d gaussian splatting. arXiv preprint arXiv 2310.10642, 2023.
  45. Gaussiandreamer: Fast generation from text to 3d gaussian splatting with point cloud priors. arxiv:2310.08529, 2023.
  46. Nice-slam: Neural implicit scalable encoding for slam. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2022.
  47. Nicer-slam: Neural implicit scene encoding for rgb slam. arXiv preprint arXiv:2302.03594, 2023.
  48. Ewa splatting. IEEE Transactions on Visualization and Computer Graphics, 8(3):223–238, 2002.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (4)
  1. Hidenobu Matsuki (5 papers)
  2. Riku Murai (13 papers)
  3. Paul H. J. Kelly (34 papers)
  4. Andrew J. Davison (64 papers)
Citations (150)

Summary

  • The paper introduces the first real-time SLAM system using 3D Gaussian splatting, overcoming limitations of traditional 3D representations.
  • It derives analytical camera pose Jacobians and employs geometric regularization to ensure robust tracking and accurate scene reconstruction.
  • Experimental results on TUM RGB-D and Replica datasets demonstrate state-of-the-art performance and high-fidelity view synthesis.

An Expert Review of "Gaussian Splatting SLAM"

The publication "Gaussian Splatting SLAM" introduces a novel application of 3D Gaussian Splatting (3DGS) in real-time Simultaneous Localization and Mapping (SLAM) for monocular and RGB-D cameras. This method harnesses the strength of Gaussian functions as the solitary 3D representation, integrating essential processes of SLAM such as tracking, mapping, and rendering into a unified system. This work expands the application spectrum of 3DGS from offline use-cases to real-time camera operations, showcasing several computational innovations to optimize this transition.

Main Contributions

The paper presents several key contributions:

  1. First Real-Time SLAM System Using 3DGS: This work pioneers the use of 3D Gaussian splats as the exclusive 3D representation in a real-time visual SLAM system. Traditionally, scene representations have varied significantly, including mesh, voxel grids, or neural fields, often facing challenges like high memory demands or lack of flexibility in the scene representation. By employing a smooth, continuously differentiable Gaussian representation, the system inherits the flexibility and efficiency of point clouds while enabling direct alignment for camera tracking and mapping.
  2. Analytical Derivation of Camera Pose Jacobians: To adapt the real-time SLAM functionality, the paper introduces an analytic derivation of the Jacobian with respect to camera poses, allowing efficient integration with existing differentiable rendering techniques. This innovation allows the algorithm to rapidly converge on camera tracking by direct optimization against the Gaussian map, rather than relying on pre-estimated camera poses from an offline Structure from Motion (SfM) system.
  3. Geometric Regularization Through Gaussian Shape Constraints: The authors propose a novel regularization term to maintain geometric consistency during the incremental reconstruction process. This isotropic regularization reduces artifacts by discouraging excessive elongation of Gaussians along the viewing direction, particularly beneficial in ensuring correct geometry capture in textureless regions or when multiple views are sparse.
  4. Incremental 3D Scene Reconstruction and Keyframe Management: The paper details a keyframe selection strategy and Gaussian resource management to optimize the computational resources without sacrificing the accuracy of global map updates. Gaussian pruning mechanisms and scene geometry constraints support efficient memory usage and maintain architectural simplicity.

Experimental Results

The paper presents extensive evaluations on multiple datasets, notably the TUM RGB-D and Replica datasets. The proposed system shows competitive performance, achieving state-of-the-art results in absolute trajectory error (ATE) for both monocular and RGB-D SLAM scenarios. Notably, in the RGB-D case, it surpasses existing neural implicit SLAM methods, commonly criticized for their computational inefficiency due to neural rendering.

The novel view synthesis achieved through 3DGS in this method delivers high-fidelity renderings at unprecedented speeds, as demonstrated by rendering performance metrics and frame rates, significantly outpacing other rendering-based SLAM methods. This rapid rendering capability is crucial for applications demanding low-latency responses.

Implications and Future Directions

The introduction of Gaussian splatting as a core for SLAM systems potentially offers new paradigms in both academic and application contexts within robotics and spatial AI. The smooth, flexible nature of Gaussian representations supports robust, scalable real-time mapping, paving pathways towards future systems capable of even larger-scale outdoor environments.

Future research directions may include integrating loop closure capabilities within this framework, enabling effective performance on long trajectory sequences with reduced drift. Furthermore, optimizing computational efficiency for full real-time operations and exploring deformations of Gaussian representations to encapsulate dynamic environments remain open challenges.

In summary, "Gaussian Splatting SLAM" enriches the SLAM landscape by transcending conventional limitations with innovative 3D representation techniques. This work is a testament to the potential of Gaussian approaches in evolving intelligent sensing systems and represents a meaningful stride towards comprehensive real-time spatial understanding.

Youtube Logo Streamline Icon: https://streamlinehq.com