MotionGS : Compact Gaussian Splatting SLAM by Motion Filter (2405.11129v2)

Published 18 May 2024 in cs.CV

Abstract: With their high-fidelity scene representation capability, the attention of SLAM field is deeply attracted by the Neural Radiation Field (NeRF) and 3D Gaussian Splatting (3DGS). Recently, there has been a surge in NeRF-based SLAM, while 3DGS-based SLAM is sparse. A novel 3DGS-based SLAM approach with a fusion of deep visual feature, dual keyframe selection and 3DGS is presented in this paper. Compared with the existing methods, the proposed tracking is achieved by feature extraction and motion filter on each frame. The joint optimization of poses and 3D Gaussians runs through the entire mapping process. Additionally, the coarse-to-fine pose estimation and compact Gaussian scene representation are implemented by dual keyframe selection and novel loss functions. Experimental results demonstrate that the proposed algorithm not only outperforms the existing methods in tracking and mapping, but also has less memory usage.

References (40)

Citations (2)

View on Semantic Scholar

Summary

The paper introduces MotionGS which integrates motion and information filters for selective keyframe processing.
It employs compact 3D Gaussian splatting with a penalty term to achieve efficient real-time scene rendering and optimization.
Direct pose optimization based on photometric errors enables state-of-the-art tracking accuracy and memory efficiency.

MotionGS: Compact Gaussian Splatting SLAM by Motion Filter

Introduction

SLAM, which stands for Simultaneous Localization and Mapping, is a key technology for enabling machines to understand and navigate unknown environments in real time. It's crucial for applications such as autonomous driving, virtual reality, and augmented reality. The task involves building a map of the environment while simultaneously keeping track of the agent's location within that map.

Traditional SLAM approaches often use point clouds, mesh, or voxels for scene representation. However, these methods can struggle with achieving high-fidelity scene reconstruction. Enter MotionGS, a novel approach using 3D Gaussian Splatting (3DGS) that promises better real-time tracking and high-quality scene reconstruction, all while minimizing memory usage.

Key Innovations in MotionGS

MotionGS combines several advanced techniques to push the boundaries of SLAM performance. Here are the main contributions of the approach:

Dual Keyframe Strategy: Implementing motion and information filters to selectively track and map keyframes, which enhances tracking accuracy and reduces the number of frames that need to be processed.
Compact 3DGS Scene Representation: Using a set of anisotropic Gaussians to represent scenes compactly, thereby achieving efficient rendering and optimization.
Direct Pose Optimization: A method for fine-tuning poses based on photometric errors, leveraging the differentiable rendering framework of 3DGS.

Dual Keyframe Strategy

The dual keyframe strategy is one of the standout innovations in MotionGS. This involves:

Motion Filter: This extracts features from each frame and selects keyframes based on motion vectors. Frames that exceed a motion threshold or frame interval are chosen as motion keyframes.
Information Filter: This filter operates a sliding window of information keyframes for mapping purposes. It selects keyframes based on their geometric content and overlap with previously selected frames.

Both filters work together to ensure that only the most critical frames are used for tracking and mapping, enhancing performance and reducing computational load.

Compact 3DGS Scene Representation

3D Gaussian Splatting represents the scene using a set of Gaussian functions, encapsulating properties like color, opacity, and geometry. This allows for fast and efficient rendering and optimization. To make this process even more efficient, a penalty term is introduced to mask and prune the Gaussians that have minimal impact on rendering quality.

The compact representation balances high-fidelity rendering with memory efficiency by using fewer but more effective scene geometries.

Direct Pose Optimization with 3DGS

The MotionGS system employs a direct pose optimization method based on photometric errors between real images and rendered images. All pixels are used to form a pose optimization framework, leveraging the differentiable nature of 3D Gaussian rasterization.

Performance Evaluation

MotionGS has been tested on widely used datasets such as TUM RGB-D and Replica, showing strong results across both tracking and mapping tasks.

Key Results:

Tracking Accuracy: MotionGS achieved state-of-the-art performance in tracking accuracy on various sequences from the TUM dataset and was highly competitive on the Replica dataset.
Rendering Quality: Metrics like PSNR, SSIM, and LPIPS demonstrated that MotionGS performs exceptionally well in rendering high-fidelity images.
Memory Efficiency: The approach significantly reduces memory usage compared to traditional and NeRF-based SLAM methods.

Notably, in the TUM dataset, MotionGS achieved an average absolute trajectory error (ATE) of 1.46 cm, outperforming other 3DGS and NeRF-based methods.

Implications and Future Directions

The high efficiency and accuracy of MotionGS open up exciting possibilities for real-time applications in robotics and augmented/virtual reality. By demonstrating the practicality of 3D Gaussian Splatting for SLAM, this paper sets the stage for the development of more robust, compact, and accurate mapping systems.

Future research may explore extending this work to multi-sensor setups for larger-scale outdoor environments, further enhancing its applicability and robustness.

Overall, MotionGS showcases a promising direction for SLAM research, squeezing out higher performance from existing hardware and pushing the boundary of what's possible in real-time scene understanding.

PDF Markdown

Related Papers

Tweets

https://twitter.com/zhenjun_zhao/status/1792796839722582142

https://twitter.com/realmofresearch/status/1799077067709153538

https://twitter.com/CSVisionPapers/status/1792858470326186391