Triangle Splatting SLAM
- The paper introduces a dynamic 'triangle soup' representation that uses differentiable rendering for efficient and photorealistic RGB-D SLAM.
- It employs restricted Delaunay triangulation to extract an editable, connected mesh on-the-fly, supporting live deformation and collision checking.
- Experimental results demonstrate competitive camera tracking and superior geometry accuracy on standard benchmarks, validating its practical performance.
Triangle Splatting SLAM is a dense RGB-D simultaneous localization and mapping (SLAM) system that leverages differentiable triangles as explicit 3D map primitives. Its core innovation is the use of a dynamic “triangle soup” representation optimized online, providing both photorealistic rendering and explicit geometry amenable to downstream tasks such as simulation and mesh editing. Triangle Splatting SLAM employs online differentiable rendering of this triangle soup for both camera tracking and map optimization, and can extract a connected mesh on-the-fly via restricted Delaunay triangulation, supporting live mesh deformation and collision checking. Experimental results demonstrate state-of-the-art geometry accuracy and competitive camera-tracking performance on standard benchmarks (Fry et al., 29 May 2026).
1. System Pipeline and Map Representation
The system maintains a live triangle soup map , where is the set of 3D vertices and the connectivity of the triangles. The pipeline operates in a single process, executing three tightly interleaved stages per RGB-D frame:
- Tracking: Estimation of the 6-DOF camera pose via minimization of a tracking energy.
- Keyframing: Keyframe selection based on pose change and triangle visibility; addition of keyframes with new triangles back-projected from depth.
- Mapping: Joint optimization of past keyframe poses and triangle parameters; densification and pruning of triangles; optional mesh extraction.
The pseudocode for the end-to-end, single-threaded pipeline is as follows:
6
The mapping stage periodically extracts a mesh using restricted Delaunay triangulation, converting the “soup” into a connected surface suitable for simulation or editing.
2. Differentiable Triangle Splatting
Each triangle is stored with three world-space vertices , where is color and opacity. Differentiable triangle splatting comprises:
- Projection into image space: .
- Signed-Distance Field: The image-space signed distance function is computed relative to projected triangle edges, with a smooth per-pixel coverage function 0, where 1 is the triangle incentre.
- Alpha-composite Rendering: Pixel color 2 is rendered using an alpha compositing stream over triangles,
3
- Photometric Loss: The photometric error over all pixels,
4
- Backpropagation: Gradients are computed for vertex positions and appearance, leveraging analytic derivatives (Eq. 4 and window function Eq. 3), the pinhole model, and pose Jacobians in 5 (Eq. 7).
This differentiable pipeline enables gradient-based optimization of geometry and color parameters directly from image and depth supervision.
3. Camera Tracking with Photometric and Depth Alignment
Camera tracking solves for 6 per frame by minimizing a joint energy:
7
where
- 8 (Eq. 11) is the combined photometric/structural loss,
- 9 (Eq. 12) aligns rendered and observed depths,
- 0 and 1 are tunable hyperparameters.
Approximately 100 gradient-descent steps are performed per frame, using analytic pose Jacobians for efficiency.
4. Online Mapping, Densification, and Optimization
Mapping proceeds whenever a new keyframe is added. New triangles are back-projected from depth and assigned spatial support and normals via sensor data (Eq. 14). Optimization is performed by minimizing the mapping energy over vertices, colors, opacities, and past keyframe poses:
2
where:
- 3 (Eq. 15) penalizes normal misalignments,
- 4 (Eq. 16) encourages triangle equilateralness.
Optimization uses Adam per-parameter learning rates: positions 5, colors 6, and poses 7. Densification (blur-split, Loop subdivision) and pruning (opacity and area-based) maintain map quality and efficiency.
5. On-the-Fly Mesh Extraction with Restricted Delaunay
To convert the triangle soup into a manifold mesh, restricted Delaunay triangulation is applied:
- Vertices with mean opacity 8 are selected.
- Delaunay tetrahedralisation is constructed in 3D. Only surface faces separating inside/outside are retained.
- Triangles exceeding the projected area threshold or fully occluded from all keyframes are pruned.
Incremental mesh updating is supported:
7
This allows efficient online mesh extraction and supports real-time mesh-based editing, deformation, and collision checking.
6. Implementation, Hyperparameters, and System Characteristics
The implementation utilizes a custom CUDA/C++ differentiable rasterizer for triangle splatting, with the SLAM loop managed in PyTorch. Hardware used includes an NVIDIA RTX 4090 GPU and AMD Ryzen 9 9950X CPU. Operational metrics are:
- Frame time: 430–1225 ms (0.8–2.3 FPS on TUM-RGBD).
- Map sizes: 24k–152k triangles; 4.4–16.4 MB checkpoint size; 0.5–1.25 GB GPU memory.
- Hyperparameters (Replica benchmarks): 100 tracking iterations per frame; learning rates for rotation 9, translation 0, mapping features and vertices 1; loss weights 2, 3, 4, 5; keyframing at 5-frame intervals, translation threshold 0.08 m, overlap 0.95; densification and pruning thresholds as specified.
7. Evaluation and Comparative Results
Triangle Splatting SLAM achieves:
- Camera tracking (TUM-RGBD dataset; absolute trajectory error, cm):
| Method | fr1/desk | fr2/xyz | fr3/office | Avg |
|---|---|---|---|---|
| MonoGS-2D | 1.58 | 1.20 | 1.83 | 1.54 |
| Ours | 1.77 | 1.12 | 1.83 | 1.57 |
- 3D geometry (Replica; Chamfer distance in cm, L1 depth in cm):
| Method | Chamfer Avg ↓ | Depth L1 Avg ↓ |
|---|---|---|
| MonoGS-2D* + TSDF | 1.36 | 0.74 |
| Ours + TSDF | 0.95 | 0.68 |
| Ours + Delaunay (pruned) | 1.14 | — |
- Mesh extraction time (Replica; seconds, avg):
| Method | Time [s] Avg ↓ |
|---|---|
| Ours + TSDF | 33.44 |
| Ours + Delaunay | 11.18 |
| Ours + Delaunay (pruned) | 15.66 |
The method provides live mapping with explicit, editable mesh geometry, supporting photorealistic novel-view rendering and mesh-based downstream tasks, while achieving state-of-the-art 3D geometric accuracy and camera-tracking comparable to established SLAM systems (Fry et al., 29 May 2026).