ToF-Splatting: Dense SLAM using Sparse Time-of-Flight Depth and Multi-Frame Integration (2504.16545v1)

Published 23 Apr 2025 in cs.CV

Abstract: Time-of-Flight (ToF) sensors provide efficient active depth sensing at relatively low power budgets; among such designs, only very sparse measurements from low-resolution sensors are considered to meet the increasingly limited power constraints of mobile and AR/VR devices. However, such extreme sparsity levels limit the seamless usage of ToF depth in SLAM. In this work, we propose ToF-Splatting, the first 3D Gaussian Splatting-based SLAM pipeline tailored for using effectively very sparse ToF input data. Our approach improves upon the state of the art by introducing a multi-frame integration module, which produces dense depth maps by merging cues from extremely sparse ToF depth, monocular color, and multi-view geometry. Extensive experiments on both synthetic and real sparse ToF datasets demonstrate the viability of our approach, as it achieves state-of-the-art tracking and mapping performances on reference datasets.

Summary

Overview of ToF-Splatting: Dense SLAM using Sparse Time-of-Flight Depth and Multi-Frame Integration

The paper presents an advanced framework for simultaneous localization and mapping (SLAM) utilizing sparse Time-of-Flight (ToF) depth measurements combined with multi-frame integration to achieve dense SLAM. This approach, termed ToF-Splatting, integrates cutting-edge techniques such as 3D Gaussian Splatting to provide a robust solution tailored for scenarios with limited computational and power resources, typical of mobile and augmented reality (AR)/virtual reality (VR) systems.

Technical Highlights

ToF-Splatting leverages sparse depth data from low-resolution ToF sensors, typically featuring only a few measurements, often as low as 64 points, which poses significant challenges for traditional SLAM systems. The innovation of this framework lies in the effective fusion of sparse depth data with monocular cues and multi-view geometry within a unified end-to-end system.

The architecture consists of three core modules:

Tracking Frontend: It performs ego-motion estimation using both photometric and geometric information extracted from color images and integrated sparse depth measurements. The frontend intelligently selects keyframes based on a novelty factor derived from rendering opacity, ensuring efficient processing and mapping.
Mapping Backend: Responsible for embedding keyframes into a coherent 3D Gaussian Splatting model. New Gaussian points are selectively seeded based on opacity analysis, minimizing typical initialization errors and reducing outliers.
Multi-Frame Integration Module: Extends the Depth on Demand framework to leverage multi-view information combined with monocular cues, thus providing a robust depth prediction mechanism resilient to noisy inputs and depth sparsity. An outlier filtering strategy ensures the consistency of sparse depth integration.

Experimental Evaluation

The paper rigorously evaluates ToF-Splatting using several datasets, including the real-world ZJUL5 dataset, known for its challenging conditions with noisy ToF data, Replica dataset for synthetic indoor scenes, and TUM RGB-D for diverse indoor sequences. Compared to existing SLAM methods, whether RGB-D based or monocular, ToF-Splatting consistently demonstrates superior tracking accuracy, mapping quality, and resilience to sparse and noisy depth, significantly outperforming baselines in terms of Absolute Trajectory Error (ATE) and F-score metrics.

Implications and Future Directions

The implications of ToF-Splatting are profound, especially in extending the capabilities of SLAM systems to more constrained hardware environments prevalent in consumer electronics and mobile AR/VR devices. As computational resources and energy efficiency become more critical, frameworks like ToF-Splatting offer a pragmatic path towards achieving high-quality dense visually-enabled navigation and mapping.

In terms of future development, enhancing real-time performance remains a key challenge. Although current implementations are not yet real-time, advancements in model optimization and parallel computing could bridge this gap. Additionally, the integration of dynamic elements into the mapping process, leveraging the robustness of ToF-Splatting under varying scene conditions, could extend its utility to even more complex environments, such as dynamic urban landscapes.

This research opens new avenues for exploiting sparse depth sensors, which are affordable and energy-efficient, thus broadening the scope of SLAM applications across various domains, including robotics, autonomous navigation, and interactive virtual environments.

Tweets

https://twitter.com/zhenjun_zhao/status/1915374683278463326