Papers

Topics

Authors

Recent

View all

Detailed Answer

Quick Answer

Concise responses based on abstracts only

Detailed Answer

Well-researched responses based on abstracts and relevant paper content.

Custom Instructions Pro

Preferences or requirements that you'd like Emergent Mind to consider when generating responses

Gemini 2.5 Flash

Gemini 2.5 Flash 48 tok/s

Gemini 2.5 Pro 48 tok/s Pro

GPT-5 Medium 26 tok/s Pro

GPT-5 High 19 tok/s Pro

GPT-4o 107 tok/s Pro

Kimi K2 205 tok/s Pro

GPT OSS 120B 473 tok/s Pro

Claude Sonnet 4 37 tok/s Pro

2000 character limit reached

Gaussian-Plus-SDF SLAM: High-fidelity 3D Reconstruction at 150+ fps (2509.11574v1)

Published 15 Sep 2025 in cs.CV

Abstract: While recent Gaussian-based SLAM methods achieve photorealistic reconstruction from RGB-D data, their computational performance remains a critical bottleneck. State-of-the-art techniques operate at less than 20 fps, significantly lagging behind geometry-centric approaches like KinectFusion (hundreds of fps). This limitation stems from the heavy computational burden: modeling scenes requires numerous Gaussians and complex iterative optimization to fit RGB-D data, where insufficient Gaussian counts or optimization iterations cause severe quality degradation. To address this, we propose a Gaussian-SDF hybrid representation, combining a colorized Signed Distance Field (SDF) for smooth geometry and appearance with 3D Gaussians to capture underrepresented details. The SDF is efficiently constructed via RGB-D fusion (as in geometry-centric methods), while Gaussians undergo iterative optimization. Our representation enables drastic Gaussian reduction (50% fewer) by avoiding full-scene Gaussian modeling, and efficient Gaussian optimization (75% fewer iterations) through targeted appearance refinement. Building upon this representation, we develop GPS-SLAM (Gaussian-Plus-SDF SLAM), a real-time 3D reconstruction system achieving over 150 fps on real-world Azure Kinect sequences -- delivering an order-of-magnitude speedup over state-of-the-art techniques while maintaining comparable reconstruction quality. We will release the source code and data to facilitate future research.

Summary

The paper introduces a hybrid Gaussian-Plus-SDF representation that combines dense SDF geometry with sparse 3D Gaussians for high-fidelity 3D reconstruction.
The methodology leverages efficient SDF raycasting and targeted Gaussian refinement, reducing computational costs by up to 75% while preserving quality.
Experimental results show real-time performance up to 252 fps and competitive reconstruction metrics, setting a new benchmark in RGB-D SLAM.

Gaussian-Plus-SDF SLAM: High-Fidelity 3D Reconstruction at 150+ fps

Introduction and Motivation

The paper introduces GPS-SLAM, a real-time RGB-D SLAM system that achieves high-fidelity 3D reconstruction at over 150 fps by combining a colorized Signed Distance Field (SDF) with a sparse set of 3D Gaussians. This hybrid representation addresses the computational bottlenecks of prior Gaussian-based SLAM systems, which require dense Gaussian modeling and extensive optimization, resulting in sub-real-time performance. In contrast, geometry-centric methods like KinectFusion offer high speed but suffer from poor photometric quality due to color fusion artifacts. GPS-SLAM leverages the strengths of both paradigms, using the SDF for efficient geometry and base color, and Gaussians for targeted appearance refinement.

Figure 1: The Gaussian-Plus-SDF representation uses the SDF for 3D structure and initial color, while 3D Gaussians correct residual color errors, enabling high-fidelity reconstruction with a small number of Gaussians.

Hybrid Scene Representation: Gaussian-Plus-SDF

The core innovation is the Gaussian-Plus-SDF representation, which consists of:

SDF Volume: Stores truncated signed distances and RGB colors on surface voxels, providing a smooth geometric and photometric base.
3D Gaussians: A sparse set of radiance field primitives, each parameterized by position, scale, rotation, opacity, and spherical harmonics (SH) coefficients, optimized to correct color errors and capture high-frequency details not represented by the SDF.

The rendering pipeline is a two-pass process:

SDF Raycasting: Standard per-pixel raycasting through the SDF volume yields a surface depth map and an initial color image via trilinear interpolation.
Gaussian Splatting: Gaussians are rendered with depth culling using the SDF depth map, and their colors are blended order-independently into the image. The final color is a weighted combination of SDF and Gaussian contributions.
Figure 3: The rendering process first raycasts the SDF for geometry and base color, then splats Gaussians for appearance refinement, with order-independent blending and depth culling.

This design enables a drastic reduction in the number of Gaussians (up to 50% fewer) and optimization iterations (up to 75% fewer), as the SDF provides a strong geometric prior and initial color, relegating Gaussians to localized correction.

GPS-SLAM System Architecture

The GPS-SLAM system is built on the InfiniTAM framework and processes each RGB-D frame in three stages:

Camera Tracking: Standard ICP-based alignment using the SDF for pose estimation.
SDF Fusion: Depth and color integration into the SDF volume via hash table-based fusion.
Gaussian Optimization: Targeted addition, optimization, and removal of Gaussians based on photometric error.
Figure 2: GPS-SLAM pipeline: SDF fusion for geometry and color, followed by targeted Gaussian addition and optimization in regions with significant color errors.

Gaussian Management

Addition: Gaussians are added only in regions with significant color error and insufficient Gaussian coverage, determined by thresholding the difference between rendered and observed color and the accumulated Gaussian weight.
Optimization: Gaussians are optimized using $L_1$ color loss over a set of keyframes, balancing recent and global views to avoid overfitting and catastrophic forgetting.
Removal: Redundant Gaussians (low opacity, excessively large/small scale) are pruned after each optimization cycle.
Figure 4: Optimization view selection samples both global keyframes and recent local frames to ensure robust Gaussian optimization.

Implementation Details

Parallelization: Sort-free Gaussian rendering enables per-Gaussian parallelization, avoiding atomic operations and improving both forward and backward pass efficiency.
Hardware: The system is implemented in C++ with Libtorch and custom CUDA kernels, running on an AMD 9950X3D CPU and Nvidia RTX 4090 GPU.
Optimization: Gaussian optimization is performed every 10 frames with 20 iterations per cycle; SDF voxel size is set to 0.5 cm for high geometric fidelity.

Experimental Results

Runtime and Memory

GPS-SLAM achieves:

252 fps on Replica (synthetic), 151 fps on Indoor (real-world), and 79 fps on high-resolution ScanNet++.
Memory usage is significantly lower than other Gaussian-based methods, with only 4 GB required on Replica and Indoor datasets.

Reconstruction Quality

Rendering: Comparable PSNR, SSIM, and LPIPS to state-of-the-art methods, with substantial improvement over GS-ICP SLAM at similar speeds.
Geometry: Superior accuracy and completion metrics on ScanNet++ due to high-resolution SDF fusion.
Tracking: ICP-based tracking yields competitive ATE RMSE on synthetic data; performance on real-world data is limited by depth sensor noise, similar to other ICP-based systems.
Figure 5: Qualitative comparison on the Indoor Dataset shows GPS-SLAM achieves high-fidelity reconstruction, while baselines exhibit blurring or artifacts.

Figure 6: Ablation comparing SDF-only and Gaussian-only rendering demonstrates the necessity of both components for robust appearance modeling.

Ablation Studies

SDF Voxel Size: Smaller voxels improve both geometry and appearance, with minimal impact on speed due to reduced Gaussian count.
Sort-Free Rendering: Yields a 17% system speedup by accelerating both forward and backward passes.
Failure Cases: SDF geometric errors (e.g., thin structures) can cause color leakage, as Gaussians rely on SDF depth for culling.
Figure 7: SDF voxel size ablation: smaller voxels yield higher fidelity and tracking accuracy.

Figure 8: Failure case: SDF holes in thin panels cause color to be incorrectly rendered on the opposite side.

Implications and Future Directions

The Gaussian-Plus-SDF paradigm demonstrates that hybrid representations can achieve both high-fidelity appearance and real-time performance, overcoming the trade-offs inherent in pure SDF or Gaussian approaches. The system's modularity allows for parallel SDF and Gaussian processing, and the targeted Gaussian management strategy minimizes computational overhead.

Practical implications include immediate feedback for interactive scanning, scalability to high-resolution and large-scale scenes, and efficient memory usage. The approach is particularly well-suited for robotics, AR/VR, and real-time 3D content creation.

Limitations include reliance on SDF geometry for depth culling, which can propagate geometric errors to the appearance model, and the absence of global pose optimization, leading to drift in large-scale environments.

Future work should address:

Integration of global pose optimization (e.g., loop closure) for large-scale consistency.
Extension to outdoor and LiDAR-based scenarios, leveraging the generality of the hybrid representation.
Adaptive Gaussian management strategies for further efficiency and robustness.

Conclusion

GPS-SLAM establishes a new state-of-the-art in real-time, high-fidelity RGB-D SLAM by introducing a hybrid Gaussian-Plus-SDF representation. The system achieves an order-of-magnitude speedup over previous Gaussian-based methods while maintaining comparable or superior reconstruction quality. The work demonstrates the efficacy of combining efficient geometric priors with targeted radiance field refinement, and sets a foundation for future research in scalable, photorealistic, and real-time 3D scene reconstruction.