Two-Stage Gaussian Splatting Framework
- Two-Stage Gaussian Splatting is an explicit volumetric method that segments scenes into foreground and background to optimize outdoor reconstructions.
- It employs specialized spatial segmentation along with photometric, shell, and planarity losses to accurately model both near-field textures and distant geometries.
- The framework improves rendering fidelity, suppresses background artifacts, and supports practical applications like high-quality environment map extraction.
The two-stage Gaussian Splatting framework is an explicit volumetric representation and optimization technique that addresses key challenges in outdoor scene reconstruction, particularly the divergence between well-textured foreground content and the low-detail, unevenly illuminated distant background. By decomposing the scene spatially and optimizing each component with specialized constraints and loss terms, this approach produces higher-fidelity novel view synthesis, effectively suppresses background artifacts, and enables downstream tasks such as environment map extraction for photorealistic rendering.
1. Architectural Overview
The framework divides the scene reconstruction workflow into two distinct stages—a “background” stage and a “foreground” stage—operating on a dual-shell structure:
- Spatial Segmentation: Scenes are segmented into foreground and background using metric depth thresholds (inner radius and outer radius ) obtained from dense, per-pixel depth maps.
- Background Representation: Distant scenery is modeled as a set of Gaussians constrained to reside on a geodesic spherical shell between and . This is particularly suited for representing elements such as skies or distant topographies, whose texture/geometry is sparse or ambiguous from Structure-from-Motion (SfM) pipelines.
- Foreground Representation: The foreground, containing richly textured, proximal regions, is reconstructed using a dense point cloud (filtered for ) and optimized with full photometric and geometric losses.
This staged optimization addresses typical artifacts, such as floaters in the sky or inconsistent background geometry, that arise from using uniform optimization over scenes with drastically different spatial statistics.
2. Background Gaussian Splatting: Initialization and Optimization
The background stage begins by initializing Gaussian splatting primitives sampled over a geodesic sphere:
- Sampling and Initialization:
- Points are initialized to uniformly cover the shell .
- Radial distances for Gaussians are set to (if available depth exceeds ) or randomly within otherwise.
- Each Gaussian is assigned a view-independent color, derived from the mean RGB of rays intersecting it from all relevant images.
- Optimization Losses:
- Photometric Loss (): Measures the difference between rendered images (from only the background set) and masked background regions from the original images.
- Shell Loss (): Forces Gaussians to remain inside the shell,
with denoting the shell center. - Planarity Loss (): Aligns the shortest axis of anisotropic Gaussians tangential to the shell, discouraging “spiking” towards the scene center:
where and are the largest and smallest scale factors of Gaussian , is the shortest axis in local coordinates, and is a small value for numerical stability.
Pruning: Only Gaussians never observed by any camera are pruned to preserve shell coverage, suppressing the formation of spurious holes in the synthesized background.
3. Foreground Gaussian Splatting: Initialization and Refinement
In the foreground stage:
Point Cloud Source: The framework constructs a point cloud via COLMAP and selects points within the inner radius .
Initialization: Gaussians are placed at filtered point positions, using appearance and geometric information derived from the local image set.
Optimization:
- The background set is fixed and participates in rendering but not in optimization.
- The foreground set undergoes photometric optimization (with the usual GS rendering loss) to accurately capture local detail and texture.
- A spatial pruning constraint ensures that foreground Gaussians moving outside are removed to maintain a strict spatial partition.
This explicit fixing of the background during foreground optimization decouples the foreground refinement from background errors and produces sharp, artifact-free geometry and appearance in the navigation region.
4. Loss Formulations and Optimization Details
The key losses for the framework are summarized as follows:
| Term | Stage | Mathematical Formulation | Purpose |
|---|---|---|---|
| Both | Image-space rendering loss | Drives visual fidelity | |
| Background | As above (shell constraint) | Constrains background Gaussians to shell | |
| Background | As above (tangential orientation) | Discourages radial “spikes” in representation |
Further, the method adopts a custom pruning strategy for the background (removing only never-observed Gaussians) and enforces spatial filtering for the foreground (removing primitives leaving the navigation region). The optimization sequence for the background is completed prior to any foreground updates, with the background Gaussians then frozen.
5. Empirical Evaluation and Observed Improvements
Experimental results on NerfStudio, Tanks and Temples, Fields, and Tobacco datasets confirm:
- Superior perceptual quality: Higher SSIM and lower LPIPS relative to official GS and hierarchical GS baselines.
- Artifact suppression: Significantly reduced background floaters and visible seams; more accurate sky and horizon modeling.
- Extreme viewpoint robustness: Consistency and scene completeness maintained even under viewpoint extrapolation, owing to the decoupled spatial optimization.
- Efficiency: The approach allows rendering systems (including integration with real-time engines such as Unreal and Unity) to avoid background overfitting and batch artifacts, supporting real-time immersive environments.
6. Environment Map Extraction and Additional Applications
A notable consequence of the shell-based background representation is that the optimized background Gaussians can be directly rasterized to generate panoramic, cube, or spherical maps, constituting a high-dynamic-range background that is devoid of proximate objects. This enables:
- Automatic, object-free environment maps for photorealistic lighting and mixed-reality composition.
- Enhanced scene relighting and background replacement with a guarantee that the environment map contains only distant or “infinite” elements.
A plausible implication is that the decoupled approach could streamline interactive relighting workflows and facilitate environment-aware VR/AR applications.
7. Impact and Broader Implications
This two-stage optimization approach introduces explicit inductive bias regarding scene structure that standard GS workflows lack. By segmenting the reconstruction task and applying distinct geometric priors and losses to the background and foreground, the method avoids pathologies encountered when attempting a unified optimization (e.g., background-fogging, disconnected floaters). This design offers a robust pathway for high-fidelity outdoor reconstruction and presents new opportunities for content creation, lighting design, and mixed-reality research.
In summary, the two-stage Gaussian splatting optimization for outdoor scene reconstruction yields a dual-shell representation in which the background and foreground are independently modeled, optimized, and pruned using tailored spatial and photometric losses. This explicit separation circumvents common issues in outdoor scene reconstruction and facilitates perceptually high-quality, artifact-free synthesis as well as easy extraction of environment maps for subsequent rendering or editing applications (Pintani et al., 10 Oct 2025).