Papers
Topics
Authors
Recent
Search
2000 character limit reached

QuickSplat: Fast 3D Surface Reconstruction

Updated 6 February 2026
  • QuickSplat is a data-driven 3D surface reconstruction method that uses learned Gaussian initialization and densification to achieve rapid and accurate indoor scene modeling.
  • It replaces slow, heuristic-based per-scene optimization with an end-to-end pipeline that jointly refines photometric, depth, occupancy, and distortion losses.
  • Experimental results show QuickSplat reduces depth errors by up to 48% and is over 300× faster than competing methods while preserving fine scene details.

QuickSplat is a data-driven 3D surface reconstruction method that leverages learned Gaussian initialization and densification to accelerate and enhance large-scale indoor scene modeling from posed multi-view RGB images. By replacing hand-crafted heuristics and slow per-scene optimization, QuickSplat enables high-accuracy surface reconstruction within seconds, addressing the long-standing challenges of under-observed or textureless regions in traditional pipelines (Liu et al., 8 May 2025).

1. Problem Context and Limitations of Prior Approaches

Surface reconstruction from RGB images is foundational to applications in computer vision, graphics, mixed reality, and robotics. Existing methods can be divided into volumetric/implicit-function-based approaches (such as NeRF, UNISURF, and MonoSDF) and surface-oriented Gaussian Splatting methods (e.g., 3DGS, 2DGS, SuGaR, GS2Mesh). Volumetric and implicit-function approaches optimize multilayer perceptrons (MLPs) over hundreds of thousands of scene-specific gradient-descent steps (\sim30 min to >>10 h for room-scale), yet remain susceptible to errors and artifacts—particularly floating geometry and holes—in vast, untextured, or occluded regions.

Gaussian Splatting methods speed novel-view synthesis by representing the scene as a sparse set of 3D Gaussian primitives, reducing render time but retaining the bottleneck of per-scene, heuristic-guided densification, which often leaves holes and curved artifacts on flat structures such as walls and ceilings. Heuristic densification (“grow near high-error rays”) is inadequately robust in under-observed or textureless regions and can result in incomplete or inaccurate reconstructions (Liu et al., 8 May 2025).

2. Data-Driven Gaussian Initialization

QuickSplat supplants the standard pipeline’s slow, rule-based initialization with a learned, end-to-end dense Gaussian initialization. The initializer network θI\theta_I predicts a well-distributed set of 2D Gaussians directly from a sparse Structure-from-Motion (SfM) point cloud via a single feed-forward pass.

2.1. Surface Representation

  • Each scene is represented as a set of NN Gaussian “splats” G={gi}i=1N\mathcal{G}=\{\mathbf{g}_i\}_{i=1}^{N}, where each giR14\mathbf{g}_i \in \mathbb{R}^{14} encodes center (gcR3\mathbf{g}_c\in\mathbb{R}^3), scale (defining the 3×33 \times 3 covariance Σi\Sigma_i), rotation (quaternion), opacity (oi[0,1]o_i \in [0,1]), and diffuse color (ci[0,1]3c_i \in [0,1]^3).
  • Rendering is performed by projecting each Gaussian along a ray xx to an elliptical disk via gi2D(u(x))=exp(12(u(x)μi)Σi1(u(x)μi))g_i^{2D}(\mathbf{u}(x)) = \exp\left(-\frac{1}{2}(\mathbf{u}(x)-\boldsymbol\mu_i)^\top\Sigma_i^{-1}(\mathbf{u}(x)-\boldsymbol\mu_i)\right) and alpha-blending the contributions along the ray.

2.2. Decoder-Style Initialization

  • The sparse SfM point cloud is voxelized onto a 3D grid (voxel side length vd=4v_d = 4 cm); occupied voxels are assigned learnable 64-dimensional features.
  • A sparse 3D U-Net encoder–decoder with 4 down/upsampling layers predicts densified features, guided by an occupancy head at each upsampling stage.
  • A small MLP decodes each feature into Gaussians (two splats per voxel); positions are decoded with gc=vc+R(2σ(z)1)\mathbf{g}_c=\mathbf{v}_c + R(2\sigma(\mathbf{z})-\mathbf{1}), R=4vdR=4v_d.

2.3. Training Losses

The initializer loss combines photometric consistency (Lc\mathcal{L}_c), depth accuracy (Ld\mathcal{L}_d), occupancy (Locc\mathcal{L}_\mathrm{occ}), normal alignment (Ln\mathcal{L}_n), and a distortion regularizer (Ldist\mathcal{L}_\mathrm{dist}), with a training objective:

L(θI)=Lc+Ld+Locc+0.01Ln+10Ldist\mathcal{L}(\theta_I) = \mathcal{L}_c + \mathcal{L}_d + \mathcal{L}_\mathrm{occ} + 0.01\,\mathcal{L}_n + 10\,\mathcal{L}_\mathrm{dist}

Training is conducted on ScanNet++ (902 training scenes), directly optimizing the quality of scene initialization for robust downstream refinement.

3. Iterative Densification and Joint Optimization

After initialization, QuickSplat employs T=5T=5 learned “optimization” steps, each iteration refining scene geometry and introducing new splats in under-modeled areas.

3.1. Pipeline Overview

  1. Rendering Gradients: For all training images, compute and aggregate the gradient of the total loss (photometric, depth, distortion) with respect to each voxel feature.
  2. Learned Densifier (fDf_D): A sparse 3D CNN ingests current features, rendering gradients, and the step index tt, predicting candidate new voxel features Gt^\widehat{\mathcal{G}_t}. Up to n(t)=20000/2tn(t)=20000/2^t new voxels are importance-sampled at each step.
  3. Learned Optimizer (fOf_O): Combine Gt\mathcal{G}_t and Gt^\widehat{\mathcal{G}_t}, concatenate zero gradients for new features, and refine all features via a sparse 3D U-Net. The update ΔGt[1,1]\Delta\overline{\mathcal{G}_t} \in [-1,1] is bounded, with Gt+1=Gt+ΔGt\mathcal{G}_{t+1} = \overline{\mathcal{G}_t} + \Delta\overline{\mathcal{G}_t}.
  4. Loss and Gradient Detachment: Densifier and optimizer are jointly supervised at each step; gradients are detached between steps, following meta-learning protocols.

The learned densifier eliminates heuristic densification, enabling robust, data-driven expansion in regions where photometric information is limited or ambiguous.

4. Experimental Protocol and Performance

4.1. Training and Inference Details

  • Dataset: ScanNet++ (902 training scenes, 20 test scenes).
  • Image Resolution: 360×540360 \times 540 for training; 720×1080720 \times 1080 for testing.
  • Optimizer: Adam (1×1041 \times 10^{-4}), with batch accumulation from 100 random views per loop.
  • Network Scale: 68\sim68M parameters; 2 Gaussians per $4$ cm voxel.
  • Training Regime: Train θI\theta_I for 3 days (RTX A6000), then jointly train (θD,θO)(\theta_D,\theta_O) for 5 steps per loop.
  • Optional Fine-Tuning: 2,000 steps of vanilla SGD on all splats (\sim26 s)—not strictly necessary for high performance.

4.2. Quantitative Results

QuickSplat’s performance is benchmarked against SuGaR, 2DGS, GS2Mesh, and MonoSDF. Metrics on ScanNet++ (20 test scenes) are summarized:

Method AbsErr (m) Acc@2cm Acc@5cm Acc@10cm Chamfer Time
SuGaR 0.2061 0.1157 0.2774 0.4794 0.2078 3,130s
2DGS 0.1127 0.4021 0.6027 0.7422 0.2420 1,796s
GS2Mesh 0.1212 0.4028 0.6039 0.7406 0.2012 973s
MonoSDF 0.0569 0.5774 0.8006 0.8850 0.1450 >10h
QuickSplat w/o opt 0.0732 0.5263 0.7674 0.8583 0.1461 26s
QuickSplat w/ opt 0.0578 0.5783 0.8035 0.8887 0.1347 124s

Key outcomes:

  • Depth error reductions up to 48% versus SuGaR.
  • 8×\times speedup over 2DGS; QuickSplat w/ opt is >>300×\times faster than MonoSDF with matching accuracy.
  • Even without opt, QuickSplat is 70×\times faster than MonoSDF and more accurate than all Gaussian-splatting baselines.

5. Qualitative Analysis and Observations

QuickSplat consistently reconstructs flat wall regions with high geometric fidelity, eliminating “floating” or curved patches common in baselines. The learned initializer and densifier allow rapid closure of holes in under-observed or textureless areas, such as white corners, and prevent the generation of flying fragments. Fine scene details (e.g., ladders, chair legs), often missed by monocular-depth–guided volumetric methods, are preserved due to the flexible, spatially adaptive Gaussian representation (Liu et al., 8 May 2025).

A plausible implication is that learned Gaussian representations generalize better across scene structures, particularly in challenging illumination and sparsity regimes.

6. Limitations and Future Directions

Mirrors and specular surfaces continue to challenge QuickSplat, resulting in occasional ghost geometry due to the reliance on photometric losses, which favor reconstructing reflections rather than true geometry. The method assumes static scenes and does not model dynamic or semi-dynamic objects. While QuickSplat greatly accelerates the reconstruction pipeline (to seconds or minutes), it is not yet real-time.

Prospective research directions include:

  • Extension to dynamic/semi-dynamic settings via per-frame Gaussian updates.
  • Integration into live SLAM pipelines, such as SplaTAM, for on-the-fly 3D reconstruction.
  • Incorporation of more powerful, expressive data-driven priors such as learned shape grammars, potentially permitting effective initialization from even sparser observations.

7. Summary and Impact

QuickSplat demonstrates that learning both initialization and densification of Gaussian splats, as opposed to relying on per-scene gradient descent and manual heuristics, provides state-of-the-art performance for 3D surface reconstruction of large indoor environments. The method achieves orders-of-magnitude acceleration, significantly reduces depth errors in challenging regions, and produces high-fidelity geometry. Its modular design supports both rapid batch and incremental scene updates, providing a viable platform for future research in scene reconstruction and real-time spatial understanding (Liu et al., 8 May 2025).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Topic to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to QuickSplat.