Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
167 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

MotionGS : Compact Gaussian Splatting SLAM by Motion Filter (2405.11129v2)

Published 18 May 2024 in cs.CV

Abstract: With their high-fidelity scene representation capability, the attention of SLAM field is deeply attracted by the Neural Radiation Field (NeRF) and 3D Gaussian Splatting (3DGS). Recently, there has been a surge in NeRF-based SLAM, while 3DGS-based SLAM is sparse. A novel 3DGS-based SLAM approach with a fusion of deep visual feature, dual keyframe selection and 3DGS is presented in this paper. Compared with the existing methods, the proposed tracking is achieved by feature extraction and motion filter on each frame. The joint optimization of poses and 3D Gaussians runs through the entire mapping process. Additionally, the coarse-to-fine pose estimation and compact Gaussian scene representation are implemented by dual keyframe selection and novel loss functions. Experimental results demonstrate that the proposed algorithm not only outperforms the existing methods in tracking and mapping, but also has less memory usage.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (40)
  1. Simultaneous localization and mapping: A survey of current trends in autonomous driving. IEEE Transactions on Intelligent Vehicles, 2:194–220, 2017.
  2. Mam3slam: Towards underwater-robust multi-agent visual slam. Ocean Engineering, 302, 2024.
  3. A slam-based 6dof controller with smooth auto-calibration for virtual reality. The Visual Computer, 39:1–14, 06 2022.
  4. Visual slam algorithms and their application for ar, mapping, localization and wayfinding. Array, 15:100–222, 2022.
  5. Orb-slam3: An accurate open-source library for visual, visual–inertial, and multimap slam. IEEE Transactions on Robotics, 37(6):1874–1890, 2021.
  6. Orb-slam2: An open-source slam system for monocular, stereo, and rgb-d cameras. IEEE Transactions on Robotics, 33(5):1255–1262, 2017.
  7. Real-time 3d reconstruction in dynamic scenes using point-based fusion. In 2013 International Conference on 3D Vision - 3DV 2013, pages 1–8, 2013.
  8. Real-time scalable dense surfel mapping. In 2019 International Conference on Robotics and Automation (ICRA), pages 6919–6925, 2019.
  9. Elasticfusion: Real-time dense slam and light source estimation. The International Journal of Robotics Research, 35(14):1697–1716, 2016.
  10. Ovpc mesh: 3d free-space representation for local ground vehicle navigation. In 2019 International Conference on Robotics and Automation (ICRA), pages 8648–8654, 2019.
  11. Surfelmeshing: Online surfel-based mesh reconstruction. IEEE Transactions on Pattern Analysis and Machine Intelligence, 42(10):2494–2507, 2020.
  12. Real-time 3d reconstruction at scale using voxel hashing. ACM Trans. Graph., 32(6), nov 2013.
  13. Direct voxel grid optimization: Super-fast convergence for radiance fields reconstruction. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 5459–5469, June 2022.
  14. Plenoxels: Radiance fields without neural networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 5501–5510, June 2022.
  15. Kinectfusion: Real-time dense surface mapping and tracking. In 2011 10th IEEE International Symposium on Mixed and Augmented Reality, pages 127–136, 2011.
  16. Nerf: Representing scenes as neural radiance fields for view synthesis, 2020.
  17. imap: Implicit mapping and positioning in real-time. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 6229–6238, October 2021.
  18. Vox-fusion: Dense tracking and mapping with voxel-based neural implicit representation. In 2022 IEEE International Symposium on Mixed and Augmented Reality (ISMAR), pages 499–507, 2022.
  19. Nice-slam: Neural implicit scalable encoding for slam. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 12786–12796, June 2022.
  20. Eslam: Efficient dense slam system based on hybrid representation of signed distance fields. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 17408–17419, June 2023.
  21. Go-slam: Global optimization for consistent 3d instant reconstruction. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 3727–3737, October 2023.
  22. Co-slam: Joint coordinate and sparse parametric encodings for neural real-time slam. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 13293–13302, June 2023.
  23. Point-slam: Dense neural point cloud-based slam. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 18433–18444, October 2023.
  24. Nerf-slam: Real-time dense monocular slam with neural radiance fields. In 2023 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages 3437–3444, 2023.
  25. Gs-slam: Dense visual slam with 3d gaussian splatting, 2024.
  26. 3d gaussian splatting for real-time radiance field rendering, 2023.
  27. Gaussian splatting slam, 2024.
  28. Splatam: Splat, track & map 3d gaussians for dense rgb-d slam, 2024.
  29. Dtam: Dense tracking and mapping in real-time. In 2011 International Conference on Computer Vision, pages 2320–2327, 2011.
  30. Kintinuous : Spatially extended kinectfusion. 01 2012.
  31. Droid-slam: Deep visual slam for monocular, stereo, and rgb-d cameras. In M. Ranzato, A. Beygelzimer, Y. Dauphin, P.S. Liang, and J. Wortman Vaughan, editors, Advances in Neural Information Processing Systems, volume 34, pages 16558–16569. Curran Associates, Inc., 2021.
  32. Hi-slam: Monocular real-time dense mapping with hybrid implicit fields. IEEE Robotics and Automation Letters, 9(2):1548–1555, 2024.
  33. Photo-slam: Real-time simultaneous localization and photorealistic mapping for monocular, stereo, and rgb-d cameras, 2024.
  34. Gaussian-slam: Photo-realistic dense slam with gaussian splatting, 2024.
  35. Rgbd gs-icp slam, 2024.
  36. Compact 3d gaussian representation for radiance field, 2024.
  37. Estimating or propagating gradients through stochastic neurons for conditional computation, 2013.
  38. A micro lie theory for state estimation in robotics, 2021.
  39. A benchmark for the evaluation of rgb-d slam systems. In 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pages 573–580, 2012.
  40. The replica dataset: A digital replica of indoor spaces, 2019.
Citations (2)

Summary

  • The paper introduces MotionGS which integrates motion and information filters for selective keyframe processing.
  • It employs compact 3D Gaussian splatting with a penalty term to achieve efficient real-time scene rendering and optimization.
  • Direct pose optimization based on photometric errors enables state-of-the-art tracking accuracy and memory efficiency.

MotionGS: Compact Gaussian Splatting SLAM by Motion Filter

Introduction

SLAM, which stands for Simultaneous Localization and Mapping, is a key technology for enabling machines to understand and navigate unknown environments in real time. It's crucial for applications such as autonomous driving, virtual reality, and augmented reality. The task involves building a map of the environment while simultaneously keeping track of the agent's location within that map.

Traditional SLAM approaches often use point clouds, mesh, or voxels for scene representation. However, these methods can struggle with achieving high-fidelity scene reconstruction. Enter MotionGS, a novel approach using 3D Gaussian Splatting (3DGS) that promises better real-time tracking and high-quality scene reconstruction, all while minimizing memory usage.

Key Innovations in MotionGS

MotionGS combines several advanced techniques to push the boundaries of SLAM performance. Here are the main contributions of the approach:

  1. Dual Keyframe Strategy: Implementing motion and information filters to selectively track and map keyframes, which enhances tracking accuracy and reduces the number of frames that need to be processed.
  2. Compact 3DGS Scene Representation: Using a set of anisotropic Gaussians to represent scenes compactly, thereby achieving efficient rendering and optimization.
  3. Direct Pose Optimization: A method for fine-tuning poses based on photometric errors, leveraging the differentiable rendering framework of 3DGS.

Dual Keyframe Strategy

The dual keyframe strategy is one of the standout innovations in MotionGS. This involves:

  • Motion Filter: This extracts features from each frame and selects keyframes based on motion vectors. Frames that exceed a motion threshold or frame interval are chosen as motion keyframes.
  • Information Filter: This filter operates a sliding window of information keyframes for mapping purposes. It selects keyframes based on their geometric content and overlap with previously selected frames.

Both filters work together to ensure that only the most critical frames are used for tracking and mapping, enhancing performance and reducing computational load.

Compact 3DGS Scene Representation

3D Gaussian Splatting represents the scene using a set of Gaussian functions, encapsulating properties like color, opacity, and geometry. This allows for fast and efficient rendering and optimization. To make this process even more efficient, a penalty term is introduced to mask and prune the Gaussians that have minimal impact on rendering quality.

The compact representation balances high-fidelity rendering with memory efficiency by using fewer but more effective scene geometries.

Direct Pose Optimization with 3DGS

The MotionGS system employs a direct pose optimization method based on photometric errors between real images and rendered images. All pixels are used to form a pose optimization framework, leveraging the differentiable nature of 3D Gaussian rasterization.

Performance Evaluation

MotionGS has been tested on widely used datasets such as TUM RGB-D and Replica, showing strong results across both tracking and mapping tasks.

Key Results:

  1. Tracking Accuracy: MotionGS achieved state-of-the-art performance in tracking accuracy on various sequences from the TUM dataset and was highly competitive on the Replica dataset.
  2. Rendering Quality: Metrics like PSNR, SSIM, and LPIPS demonstrated that MotionGS performs exceptionally well in rendering high-fidelity images.
  3. Memory Efficiency: The approach significantly reduces memory usage compared to traditional and NeRF-based SLAM methods.

Notably, in the TUM dataset, MotionGS achieved an average absolute trajectory error (ATE) of 1.46 cm, outperforming other 3DGS and NeRF-based methods.

Implications and Future Directions

The high efficiency and accuracy of MotionGS open up exciting possibilities for real-time applications in robotics and augmented/virtual reality. By demonstrating the practicality of 3D Gaussian Splatting for SLAM, this paper sets the stage for the development of more robust, compact, and accurate mapping systems.

Future research may explore extending this work to multi-sensor setups for larger-scale outdoor environments, further enhancing its applicability and robustness.

Overall, MotionGS showcases a promising direction for SLAM research, squeezing out higher performance from existing hardware and pushing the boundary of what's possible in real-time scene understanding.