Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
166 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

MGSO: Monocular Real-time Photometric SLAM with Efficient 3D Gaussian Splatting (2409.13055v1)

Published 19 Sep 2024 in cs.RO and cs.CV

Abstract: Real-time SLAM with dense 3D mapping is computationally challenging, especially on resource-limited devices. The recent development of 3D Gaussian Splatting (3DGS) offers a promising approach for real-time dense 3D reconstruction. However, existing 3DGS-based SLAM systems struggle to balance hardware simplicity, speed, and map quality. Most systems excel in one or two of the aforementioned aspects but rarely achieve all. A key issue is the difficulty of initializing 3D Gaussians while concurrently conducting SLAM. To address these challenges, we present Monocular GSO (MGSO), a novel real-time SLAM system that integrates photometric SLAM with 3DGS. Photometric SLAM provides dense structured point clouds for 3DGS initialization, accelerating optimization and producing more efficient maps with fewer Gaussians. As a result, experiments show that our system generates reconstructions with a balance of quality, memory efficiency, and speed that outperforms the state-of-the-art. Furthermore, our system achieves all results using RGB inputs. We evaluate the Replica, TUM-RGBD, and EuRoC datasets against current live dense reconstruction systems. Not only do we surpass contemporary systems, but experiments also show that we maintain our performance on laptop hardware, making it a practical solution for robotics, A/R, and other real-time applications.

Summary

  • The paper presents a novel MGSO system that integrates photometric SLAM and 3D Gaussian Splatting to balance memory efficiency and real-time performance.
  • It leverages Direct Sparse Odometry to initialize dense reconstructions, achieving competitive PSNR and SSIM metrics on benchmarks like Replica and EuRoC.
  • The approach significantly improves resource-limited dense mapping, offering practical applications in AR, robotics, and real-time scene reconstruction.

Monocular Real-Time Photometric SLAM with Efficient 3D Gaussian Splatting

The paper "MGSO: Monocular Real-time Photometric SLAM with Efficient 3D Gaussian Splatting" addresses a significant challenge within the field of real-time Simultaneous Localization and Mapping (SLAM) and dense 3D reconstruction—specifically on resource-limited devices. Existing systems that rely on monocular setups for dense 3D mapping typically struggle to achieve an optimal balance between hardware requirements, processing speed, and the quality of generated maps.

Problem Statement and Methodology

Historically, SLAM systems have bifurcated their approaches toward dense mapping into decoupled and coupled methods. Decoupled methods, while efficient, often operate independently of the reconstruction process, leading to suboptimal results in dense environments. Coupled methods synchronize tracking and mapping but usually suffer from speed inefficiencies, as both pristine localization and high-quality mapping necessitate time-intensive computations.

The MGSO (Monocular Gaussian Splatting for SLAM) system introduced in this paper leverages photometric SLAM for initializing 3D Gaussian Splatting (3DGS), achieving an enhanced balance of map quality, memory efficiency, and real-time performance.

Core Components

The SLAM module of MGSO is constructed on the principles of Direct Sparse Odometry (DSO), a technique that selects a sparse set of high-gradient pixels to optimize camera pose through photometric tracking. This approach is well-aligned with the requirements of 3D Gaussian Splatting, as it outputs densely structured point clouds essential for initializing 3DGS effectively. The system further incorporates an additional set of non-tracked high-gradient points to bolster point cloud density, thereby accelerating the initialization and convergence of 3DGS.

The dense reconstruction module employs 3DGS, which models the environment as a collection of 3D Gaussians, rendering images through a projection process that optimizes for photometric accuracy. To enhance real-time performance, MGSO leverages a Gaussian pyramid structure for training, optimizing the 3D Gaussians initially at a coarser level and progressively refining them.

Experimental Results and Analysis

MGSO's performance was benchmarked against other state-of-the-art 3DGS-based SLAM systems using datasets like Replica, EuRoC MAV, and TUM-RGBD. The results, detailed in various figures and tables, consistently showed that MGSO generated reconstructions with PSNR and SSIM values superior or comparable to competitors like Photo-SLAM, while maintaining significantly smaller map sizes and real-time frame rates.

  • On the Replica dataset, MGSO achieved an average PSNR of 31.41 dB and a SSIM of 0.89 using a desktop setup, with an even higher PSNR of 31.90 dB when run on a laptop, all while keeping the map size to approximately 4.6 MB.
  • On the EuRoC dataset, MGSO demonstrated improved PSNR and SSIM values over Photo-SLAM, with a PSNR of 20.31 dB and an SSIM of 0.76, and managed to maintain low memory usage around 8.3 MB.
  • The TUM-RGBD dataset results further underscored MGSO's reconstruction quality, posting average PSNR and SSIM improvements over comparable systems.

Implications and Future Directions

The research illustrates the feasibility and advantages of combining photometric SLAM with 3DGS for monocular SLAM systems, offering substantial improvements in memory efficiency and real-time performance. The use of monocular cameras widens the practical applicability of this approach across various domains such as augmented reality (AR), autonomous robotics, and other real-time applications where depth sensors may not be viable.

Future developments can explore incorporating loop closure mechanisms to enhance global map consistency and implementing adaptive re-rendering strategies for dynamically changing scenes. Such advancements could further elevate the precision and adaptability of MGSO, making it increasingly suitable for complex, large-scale environments typical in various robotics and AR applications.

X Twitter Logo Streamline Icon: https://streamlinehq.com