Papers
Topics
Authors
Recent
Search
2000 character limit reached

Triangle Splatting SLAM

Updated 7 June 2026
  • The paper introduces a dynamic 'triangle soup' representation that uses differentiable rendering for efficient and photorealistic RGB-D SLAM.
  • It employs restricted Delaunay triangulation to extract an editable, connected mesh on-the-fly, supporting live deformation and collision checking.
  • Experimental results demonstrate competitive camera tracking and superior geometry accuracy on standard benchmarks, validating its practical performance.

Triangle Splatting SLAM is a dense RGB-D simultaneous localization and mapping (SLAM) system that leverages differentiable triangles as explicit 3D map primitives. Its core innovation is the use of a dynamic “triangle soup” representation optimized online, providing both photorealistic rendering and explicit geometry amenable to downstream tasks such as simulation and mesh editing. Triangle Splatting SLAM employs online differentiable rendering of this triangle soup for both camera tracking and map optimization, and can extract a connected mesh on-the-fly via restricted Delaunay triangulation, supporting live mesh deformation and collision checking. Experimental results demonstrate state-of-the-art geometry accuracy and competitive camera-tracking performance on standard benchmarks (Fry et al., 29 May 2026).

1. System Pipeline and Map Representation

The system maintains a live triangle soup map M=(V,F)M=(V, F), where VV is the set of 3D vertices and FF the connectivity of the triangles. The pipeline operates in a single process, executing three tightly interleaved stages per RGB-D frame:

  1. Tracking: Estimation of the 6-DOF camera pose TCWT_{CW} via minimization of a tracking energy.
  2. Keyframing: Keyframe selection based on pose change and triangle visibility; addition of keyframes with new triangles back-projected from depth.
  3. Mapping: Joint optimization of past keyframe poses and triangle parameters; densification and pruning of triangles; optional mesh extraction.

The pseudocode for the end-to-end, single-threaded pipeline is as follows:

TCWT_{CW}6

The mapping stage periodically extracts a mesh using restricted Delaunay triangulation, converting the “soup” into a connected surface suitable for simulation or editing.

2. Differentiable Triangle Splatting

Each triangle Fm=(i,j,k)F_m = (i,j,k) is stored with three world-space vertices vi=(xi,yi,zi,ci,oi)v_i = (x_i, y_i, z_i, c_i, o_i), where cic_i is color and oio_i opacity. Differentiable triangle splatting comprises:

  • Projection into image space: vI=π(TCWvW)v_I = \pi(T_{CW}v_W).
  • Signed-Distance Field: The image-space signed distance function ϕ(p)\phi(p) is computed relative to projected triangle edges, with a smooth per-pixel coverage function VV0, where VV1 is the triangle incentre.
  • Alpha-composite Rendering: Pixel color VV2 is rendered using an alpha compositing stream over triangles,

VV3

  • Photometric Loss: The photometric error over all pixels,

VV4

  • Backpropagation: Gradients are computed for vertex positions and appearance, leveraging analytic derivatives (Eq. 4 and window function Eq. 3), the pinhole model, and pose Jacobians in VV5 (Eq. 7).

This differentiable pipeline enables gradient-based optimization of geometry and color parameters directly from image and depth supervision.

3. Camera Tracking with Photometric and Depth Alignment

Camera tracking solves for VV6 per frame by minimizing a joint energy:

VV7

where

  • VV8 (Eq. 11) is the combined photometric/structural loss,
  • VV9 (Eq. 12) aligns rendered and observed depths,
  • FF0 and FF1 are tunable hyperparameters.

Approximately 100 gradient-descent steps are performed per frame, using analytic pose Jacobians for efficiency.

4. Online Mapping, Densification, and Optimization

Mapping proceeds whenever a new keyframe is added. New triangles are back-projected from depth and assigned spatial support and normals via sensor data (Eq. 14). Optimization is performed by minimizing the mapping energy over vertices, colors, opacities, and past keyframe poses:

FF2

where:

  • FF3 (Eq. 15) penalizes normal misalignments,
  • FF4 (Eq. 16) encourages triangle equilateralness.

Optimization uses Adam per-parameter learning rates: positions FF5, colors FF6, and poses FF7. Densification (blur-split, Loop subdivision) and pruning (opacity and area-based) maintain map quality and efficiency.

5. On-the-Fly Mesh Extraction with Restricted Delaunay

To convert the triangle soup into a manifold mesh, restricted Delaunay triangulation is applied:

  • Vertices with mean opacity FF8 are selected.
  • Delaunay tetrahedralisation is constructed in 3D. Only surface faces separating inside/outside are retained.
  • Triangles exceeding the projected area threshold or fully occluded from all keyframes are pruned.

Incremental mesh updating is supported:

TCWT_{CW}7

This allows efficient online mesh extraction and supports real-time mesh-based editing, deformation, and collision checking.

6. Implementation, Hyperparameters, and System Characteristics

The implementation utilizes a custom CUDA/C++ differentiable rasterizer for triangle splatting, with the SLAM loop managed in PyTorch. Hardware used includes an NVIDIA RTX 4090 GPU and AMD Ryzen 9 9950X CPU. Operational metrics are:

  • Frame time: 430–1225 ms (0.8–2.3 FPS on TUM-RGBD).
  • Map sizes: 24k–152k triangles; 4.4–16.4 MB checkpoint size; 0.5–1.25 GB GPU memory.
  • Hyperparameters (Replica benchmarks): 100 tracking iterations per frame; learning rates for rotation FF9, translation TCWT_{CW}0, mapping features and vertices TCWT_{CW}1; loss weights TCWT_{CW}2, TCWT_{CW}3, TCWT_{CW}4, TCWT_{CW}5; keyframing at 5-frame intervals, translation threshold 0.08 m, overlap 0.95; densification and pruning thresholds as specified.

7. Evaluation and Comparative Results

Triangle Splatting SLAM achieves:

  • Camera tracking (TUM-RGBD dataset; absolute trajectory error, cm):
Method fr1/desk fr2/xyz fr3/office Avg
MonoGS-2D 1.58 1.20 1.83 1.54
Ours 1.77 1.12 1.83 1.57
  • 3D geometry (Replica; Chamfer distance in cm, L1 depth in cm):
Method Chamfer Avg ↓ Depth L1 Avg ↓
MonoGS-2D* + TSDF 1.36 0.74
Ours + TSDF 0.95 0.68
Ours + Delaunay (pruned) 1.14
  • Mesh extraction time (Replica; seconds, avg):
Method Time [s] Avg ↓
Ours + TSDF 33.44
Ours + Delaunay 11.18
Ours + Delaunay (pruned) 15.66

The method provides live mapping with explicit, editable mesh geometry, supporting photorealistic novel-view rendering and mesh-based downstream tasks, while achieving state-of-the-art 3D geometric accuracy and camera-tracking comparable to established SLAM systems (Fry et al., 29 May 2026).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Triangle Splatting SLAM.