GS-SLAM: Dense Visual SLAM with 3D Gaussian Splatting (2311.11700v4)
Abstract: In this paper, we introduce \textbf{GS-SLAM} that first utilizes 3D Gaussian representation in the Simultaneous Localization and Mapping (SLAM) system. It facilitates a better balance between efficiency and accuracy. Compared to recent SLAM methods employing neural implicit representations, our method utilizes a real-time differentiable splatting rendering pipeline that offers significant speedup to map optimization and RGB-D rendering. Specifically, we propose an adaptive expansion strategy that adds new or deletes noisy 3D Gaussians in order to efficiently reconstruct new observed scene geometry and improve the mapping of previously observed areas. This strategy is essential to extend 3D Gaussian representation to reconstruct the whole scene rather than synthesize a static object in existing methods. Moreover, in the pose tracking process, an effective coarse-to-fine technique is designed to select reliable 3D Gaussian representations to optimize camera pose, resulting in runtime reduction and robust estimation. Our method achieves competitive performance compared with existing state-of-the-art real-time methods on the Replica, TUM-RGBD datasets. Project page: https://gs-slam.github.io/.
- Codeslam - learning a compact, optimisable representation for dense visual slam. 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 2560–2568, 2018.
- Simultaneous localization and mapping: A survey of current trends in autonomous driving. IEEE Transactions on Intelligent Vehicles, 2:194–220, 2017.
- Text-to-3d using gaussian splatting. ArXiv, abs/2309.16585, 2023.
- A review paper on oculus rift-a virtual reality headset. ArXiv, abs/1408.1173, 2014.
- Simultaneous localization and mapping: part i. IEEE Robotics & Automation Magazine, 13:99–110, 2006.
- Stereo depth map fusion for robot navigation. 2011 IEEE/RSJ International Conference on Intelligent Robots and Systems, pages 1618–1625, 2011.
- Di-fusion: Online implicit 3d reconstruction with deep priors. In IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 8932–8941, 2021.
- A slam-based 6dof controller with smooth auto-calibration for virtual reality. The Visual Computer, 39:3873 – 3886, 2022.
- Eslam: Efficient dense slam system based on hybrid representation of signed distance fields. CVPR, 2023.
- Hierarchical voxel block hashing for efficient integration of depth images. IEEE Robotics and Automation Letters, 1:192–197, 2016.
- 3d gaussian splatting for real-time radiance field rendering. ACM Transactions on Graphics, 42(4), 2023.
- Approximate differentiable rendering with algebraic surfaces. In European Conference on Computer Vision (ECCV), 2022.
- Flexible techniques for differentiable rendering with 3d gaussians. arXiv preprint arXiv:2308.14737, 2023.
- Georg S. W. Klein and David William Murray. Parallel tracking and mapping on a camera phone. 2009 8th IEEE International Symposium on Mixed and Augmented Reality, pages 83–86, 2009.
- Dynamic 3d gaussians: Tracking by persistent dynamic view synthesis. ArXiv, abs/2308.09713, 2023.
- Efficient online surface correction for real-time large-scale 3d reconstruction. ArXiv, abs/1709.03763, 2017.
- Nerf: Representing scenes as neural radiance fields for view synthesis. In ECCV, 2020.
- Orb-slam2: An open-source slam system for monocular, stereo, and rgb-d cameras. IEEE Transactions on Robotics, 33:1255–1262, 2016.
- Kinectfusion: Real-time dense surface mapping and tracking. 2011 10th IEEE International Symposium on Mixed and Augmented Reality, pages 127–136, 2011a.
- Dtam: Dense tracking and mapping in real-time. ICCV, pages 2320–2327, 2011b.
- Real-time 3d reconstruction at scale using voxel hashing. ACM Transactions on Graphics (TOG), 32:1 – 11, 2013.
- Simultaneous localization and mapping for augmented reality. 2010 International Symposium on Ubiquitous Virtual Reality, pages 5–8, 2010.
- Ovpc mesh: 3d free-space representation for local ground vehicle navigation. 2019 International Conference on Robotics and Automation (ICRA), pages 8648–8654, 2018.
- Point-slam: Dense neural point cloud-based slam. In ICCV, 2023.
- Structure-from-motion revisited. In Conference on Computer Vision and Pattern Recognition (CVPR), 2016.
- Bad slam: Bundle adjusted direct rgb-d slam. 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 134–144, 2019.
- BAD SLAM: Bundle adjusted direct RGB-D SLAM. In CVF/IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2019.
- The replica dataset: A digital replica of indoor spaces. ArXiv, abs/1906.05797, 2019.
- Multi-resolution surfel maps for efficient dense 3d modeling and tracking. J. Vis. Commun. Image Represent., 25:137–147, 2014.
- A benchmark for the evaluation of RGB-D SLAM systems. In International Conference on Intelligent Robots and Systems (IROS). IEEE/RSJ, 2012.
- Nodeslam: Neural object descriptors for multi-view shape reconstruction. 2020 International Conference on 3D Vision (3DV), pages 949–958, 2020.
- imap: Implicit mapping and positioning in real-time. ICCV, 2021.
- Dreamgaussian: Generative gaussian splatting for efficient 3d content creation. ArXiv, abs/2309.16653, 2023.
- Droid-slam: Deep visual slam for monocular, stereo, and rgb-d cameras. In Neural Information Processing Systems, 2021.
- Rgb-d mapping and tracking in a plenoxel radiance field. ArXiv, abs/2307.03404, 2023.
- Visual slam algorithms and their application for ar, mapping, localization and wayfinding. Array, 15:100222, 2022.
- Voge: a differentiable volume renderer using gaussian ellipsoids for analysis-by-synthesis. arXiv preprint arXiv:2205.15401, 2022.
- Co-slam: Joint coordinate and sparse parametric encodings for neural real-time slam. CVPR, 2023.
- Real-time scalable dense surfel mapping. 2019 International Conference on Robotics and Automation (ICRA), pages 6919–6925, 2019.
- Image quality assessment: from error visibility to structural similarity. IEEE transactions on image processing, 13(4):600–612, 2004.
- Kintinuous: Spatially extended kinectfusion. In AAAI, 2012a.
- Kintinuous: Spatially extended kinectfusion. In Proceedings of RSS ’12 Workshop on RGB-D: Advanced Reasoning with Depth Cameras, 2012b.
- Elasticfusion: Dense slam without a pose graph. In Robotics: Science and Systems (RSS), 2015.
- 4d gaussian splatting for real-time dynamic scene rendering. ArXiv, abs/2310.08528, 2023.
- Vox-fusion: Dense tracking and mapping with voxel-based neural implicit representation. 2022 IEEE International Symposium on Mixed and Augmented Reality (ISMAR), pages 499–507, 2022.
- Deformable 3d gaussians for high-fidelity monocular dynamic scene reconstruction. ArXiv, abs/2309.13101, 2023a.
- Real-time photorealistic dynamic scene representation and rendering with 4d gaussian splatting. ArXiv, abs/2310.10642, 2023b.
- Gaussiandreamer: Fast generation from text to 3d gaussian splatting with point cloud priors. ArXiv, abs/2310.08529, 2023.
- The unreasonable effectiveness of deep features as a perceptual metric. In IEEE conference on computer vision and pattern recognition, pages 586–595, 2018.
- Scenecode: Monocular dense semantic reconstruction using learned encoded scene representations. 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 11768–11777, 2019.
- Nice-slam: Neural implicit scalable encoding for slam. CVPR, 2021.
- Drivable 3d gaussian avatars. 2023.
- Ewa volume splatting. Proceedings Visualization, 2001. VIS ’01., pages 29–538, 2001.