Geometry-Aware Gaussian Surfel Fusion

Updated 4 December 2025

The paper introduces geometry-aware Gaussian surfel fusion, merging anisotropic 2D Gaussian surfels with probabilistic multi-view updates for photorealistic mapping and robust pose tracking.
It employs a detailed 2D surfel representation with explicit geometric regularization and uncertainty modeling, enabling adaptive rendering and precise multiview fusion.
The approach outperforms traditional point-based and voxel-based methods, achieving sub-millimeter accuracy and high FPS performance in rigorous experimental benchmarks.

Geometry-aware Gaussian surfel fusion is a class of methodologies that combine anisotropic Gaussian primitives flattened into surfels—planar disk-like patches aligned with local surface geometry—with multi-view, probabilistic, and differentiable fusion rules. This paradigm achieves photorealistic real-time mapping, robust pose tracking, and highly precise surface reconstruction. The current state-of-the-art approaches leverage both RGB-D and LiDAR sensors and employ learnable “2D Gaussian surfels,” adaptive rendering schemes, uncertainty-aware fusion, and explicit geometric regularization to address fundamental limitations of earlier point-based, voxel-based, and 3D Gaussian splatting schemes.

1. Mathematical Representation and Surfel Parameterization

Geometry-aware Gaussian surfel fusion adopts a surfel representation that is embedded within a 2D tangent plane but retains 3D geometric and appearance information. A surfel is characterized by:

Center position $p_k \in \mathbb{R}^3$
Two principal tangent directions $t_{u_k}, t_{v_k} \in \mathbb{R}^3$
Scales $s_{u_k}, s_{v_k} > 0$ along these axes
Normal $t_{w_k} = t_{u_k} \times t_{v_k}$ ; the rotation matrix $R_k = [t_{u_k}, t_{v_k}, t_{w_k}]$
Opacity (alpha) $\alpha_k$
Appearance coefficients $c_k$

The spatial probability distribution is modeled by a 2D Gaussian within the surfel plane, yielding a covariance $\Sigma_k = \mathrm{diag}(s_{u_k}^2, s_{v_k}^2)$ , where a point on the surfel is $P_k(u,v) = p_k + s_{u_k} t_{u_k} u + s_{v_k} t_{v_k} v$ , with $(u,v) \sim \mathcal{N}(0, I_2)$ . The rendering proceeds via front-to-back alpha compositing: $G_k(u,v) = \exp[-\frac{1}{2}(u^2 + v^2)]$

$\omega_k(x) = \alpha_k G_k(u,v) \prod_{j<k}\left[1-\alpha_j G_j(u,v)\right]$

where the color, depth, and normal at pixel $x$ are

$C(x) = \sum_k \omega_k(x) c_k \quad D(x) = \sum_k \omega_k(x) z_k / \sum_k \omega_k(x) \quad N(x) = \sum_k \omega_k(x) t_{w_k}$

as compiled in $S^3$ LAM (Fan et al., 28 Jul 2025), GauS-SLAM (Su et al., 3 May 2025), and EGG-Fusion (Pan et al., 1 Dec 2025). Flattening the third axis yields a pure disk surfel (cf. (Dai et al., 2024)), effectively aligning the representation with the local surface.

2. Surfel Fusion, Uncertainty Modeling, and Optimization

Fusion of geometry and appearance evidence from multiple views is accomplished via adaptive, probabilistic update rules operating on all surfel parameters. The core optimization objective typically aggregates photometric ( $L_{\mathrm{rgb}}$ ), depth, and normal consistency losses, sometimes with explicit geometric or statistical regularization: $L_\text{map} = \|C_t - \bar{C}_t\|_1 + \gamma_D \|D_t - \bar{D}_t\|_1 + \gamma_N \sum_x [1-N_t(x)\cdot\bar{N}_t(x)]$ Parameters $(p_k, t_{u_k}, t_{v_k}, s_{u_k}, s_{v_k}, c_k, \alpha_k)$ are updated by gradient descent, fusing new RGB-D or LiDAR evidence. Redundant or unobserved surfels are pruned according to alpha coverage and error thresholds.

Uncertainty is handled by per-surfel covariance tracking, e.g., using information filters (Pan et al., 1 Dec 2025) that update the mean and covariance of each surfel's state $x_i=[p_i; n_i]$ in information form $(\Lambda, \eta)$ via: $\Lambda^t = \Lambda^{t-1} + H^T \Lambda_z H \quad \eta^t = \eta^{t-1} + H^T \Lambda_z (z^t - \bar{t})$ Surfel fusion also extends to pose-graph approaches, where surfel-to-surfel Mahalanobis constraints align patches across keyframes, driving global consistency to sub-pixel levels (Park et al., 31 Jul 2025).

3. Adaptive Surface Rendering and Multi-View Consistency

Adaptive rendering strategies address ambiguous or noisy regions, sharpening edges and increasing multi-view consistency. For example, S³LAM computes a depth-distortion measure: $\mathcal{D}_d(x) = \sum_{i,j} \omega_i(x) \omega_j(x) |z_i - z_j|$ Exceeding a threshold triggers selection of a dominant surfel $k^* = \arg\max_k \omega_k(x)$ for color and geometry (Fan et al., 28 Jul 2025, Su et al., 3 May 2025). Edge-aware depth blending, such as surface-aware depth adjustment in GauS-SLAM,

$d_i' = \beta_i d_i + (1-\beta_i) d_m \quad \beta_i = \exp\left[-\frac{(d_i-d_m)^2}{B \sigma_i^2}\right]$

suppresses occluded surfel bias, significantly improving geometry quality under novel viewpoints.

Multi-view fusion is reinforced by geometric regularization, monocular normal priors (from foundation models), and normal-depth consistency losses. Incorporation of strong monocular normal priors corrects ambiguous regions and stabilizes surfel alignment (Dai et al., 2024, Shen et al., 2024, Yang et al., 20 Aug 2025).

4. Advanced Fusion on Lie Groups and Covariance Control

When fusing pose and orientation uncertainties, Gaussian distributions on Lie groups (SE(3), SO(3)) are mapped into a common tangent space, using parallel transport and curvature corrections for optimal covariance adjustment (Ge et al., 2024). For surfel fusion in pose-graph SLAM, covariance transfer between reference frames leverages the Jacobian of the exponential map: $\Sigma_2 = J_{(\mu_2)}^{-1} J_{(\mu_1)} \Sigma_1 J_{(\mu_1)}^T J_{(\mu_2)}^{-T}$ Efficient approximations (parallel transport, curvature corrections) realize near-optimal accuracy with low computational overhead—enabling real-time fusion of position and orientation uncertainties for large surfel sets.

Covariance control is further enforced through scale-bounding using sigmoid constraints: $\sigma_\text{bounded} = \sigma_\min + (\sigma_\max-\sigma_\min) \mathrm{sigmoid}(s)$ preventing unconstrained Gaussian growth and yielding compact, crisp representations (Park et al., 31 Jul 2025).

5. Geometry-Aware Fusion in SLAM and Surface Reconstruction

State-of-the-art SLAM systems (S³LAM (Fan et al., 28 Jul 2025), GauS-SLAM (Su et al., 3 May 2025), EGG-Fusion (Pan et al., 1 Dec 2025), GSFusion (Park et al., 31 Jul 2025)) instantiate this fusion pipeline at scale for camera and LiDAR/IMU inputs. Incremental attachment, periodic surfel initialization, local–global map architectures, and fusion-aware bundle adjustment integrate RGB-D and LiDAR evidence into surfel maps. The surfel-centric approach supports sparse-to-dense real-time mapping (24 FPS in EGG-Fusion), robust tracking under severe occlusion, and millimeter-level geometric and pose accuracy.

Comparative results demonstrate that geometry-aware surfel fusion outperforms prior 3D Gaussian Splatting and neural volumetric schemes in surface completeness, normal alignment, tracking robustness, and memory efficiency. Quantitative benchmarks include Replica, ScanNet++, DTU, and Tanks-and-Temples with metrics such as Chamfer distance, normal consistency, PSNR, SSIM, and LPIPS.

6. Extensions: Radiance Field Rendering and Hybrid Architectures

Hybrid bi-scale architectures, such as Gaussian-enhanced Surfels (GES) (Ye et al., 24 Apr 2025), combine opaque 2D surfel layers for coarse geometry with sparse 3D Gaussians for high-frequency appearance. This approach enables sorting-free, ultra-fast rendering (675–1135 FPS) and modular extensions such as anti-aliasing (Mip-GES), storage compaction (Compact-GES), and improved geometry via 2D-GES. Sorting-free blending yields view-consistent images and suppresses “popping” artifacts, while surfel/Gaussian aggregation enables flexible surface smoothing.

Advanced inverse rendering methods further exploit surfel-based representations for material decomposition and photorealistic relighting, using physics-based shading (split-sum approximation), Monte Carlo sampling, and high-frequency specular compensation (Yang et al., 20 Aug 2025).

7. Practical Impact and Experimental Results

Recent systems demonstrate sub-millimeter surface reconstruction accuracy, robust geometric tracking, and real-time end-to-end operation. Tables below summarize key quantitative results from Replica and ScanNet++ (Pan et al., 1 Dec 2025, Fan et al., 28 Jul 2025, Su et al., 3 May 2025):

Method	Acc (cm) Replica	Comp (cm) ScanNet++	FPS	PSNR (dB)	Storage (MB)
EGG-Fusion	0.60	0.91	24	25.70	–
RTG-SLAM	0.80	1.22	15	24.77	–
S³LAM	0.47	–	8	–	–
3DGS	1.97 (DTU mm)	–	675	27.38	734
2D-GES	0.79 (DTU mm)	–	–	–	–
GES	–	–	1135	27.42	185

Qualitative results show sharp edge recovery, minimal color/depth artifacts, smooth surface meshes, and persistent tracking under severe occlusions, with surfel-based SLAM and rendering retaining geometric fidelity and visual consistency across difficult scenarios.

References

$S^3$ LAM: Surfel Splatting SLAM for Geometrically Accurate Tracking and Mapping (Fan et al., 28 Jul 2025)
GauS-SLAM: Dense RGB-D SLAM with Gaussian Surfels (Su et al., 3 May 2025)
EGG-Fusion: Efficient 3D Reconstruction with Geometry-aware Gaussian Surfel on the Fly (Pan et al., 1 Dec 2025)
SolidGS: Consolidating Gaussian Surfel Splatting for Sparse-View Surface Reconstruction (Shen et al., 2024)
GSurf: 3D Reconstruction via Signed Distance Fields with Direct Gaussian Supervision (Xu et al., 2024)
High-quality Surface Reconstruction using Gaussian Surfels (Dai et al., 2024)
Gaussian Set Surface Reconstruction through Per-Gaussian Optimization (Huang et al., 25 Jul 2025)
GSFusion: Globally Optimized LiDAR-Inertial-Visual Mapping for Gaussian Splatting (Park et al., 31 Jul 2025)
When Gaussian Meets Surfel: Ultra-fast High-fidelity Radiance Field Rendering (Ye et al., 24 Apr 2025)
A Geometric Perspective on Fusing Gaussian Distributions on Lie Groups (Ge et al., 2024)
GOGS: High-Fidelity Geometry and Relighting for Glossy Objects via Gaussian Surfels (Yang et al., 20 Aug 2025)