- The paper introduces self-constrained priors using TSDF extraction from rendered depth maps to refine 3D Gaussian Splatting for precise surface reconstruction.
- The methodology employs iterative band narrowing, Gaussian removal, and opacity control to achieve lower Chamfer distances and higher PSNR and SSIM across datasets.
- Empirical validations on NeRF-Synthetic, DTU, TNT, and Mip-NeRF 360 demonstrate robust performance and improved visual fidelity for complex scenes.
3D Gaussian Splatting with Self-Constrained Priors: Surface Reconstruction Paradigm
Problem Definition and Motivation
This work addresses the persistent challenge of high fidelity 3D surface reconstruction from multi-view images using 3D Gaussian Splatting (3DGS), which, despite recent advances, remains limited in geometric accuracy. Prior approaches utilizing neural radiance fields (NeRF) and its variants, as well as explicit and implicit Gaussian representations, are effective for novel view synthesis but often fail to reconstruct precise surface details, particularly in complex or unstructured scenes. Existing supervision strategies either rely on data-driven priors, explicit geometric assumptions, or indirect constraints, leading to artifacts and loss of detail due to lack of targeted, geometry-aware supervision.
Methodology
The central contribution is a self-constrained prior that directly leverages rendered depth maps, derived from the current 3D Gaussian configuration, to impose iterative, geometry-aware constraints. The pipeline involves:
- TSDF Prior Extraction: Rendered depth maps from multi-view images are fused into a Truncated Signed Distance Field (TSDF) grid, which estimates a coarse surface and provides a distance field for evaluating Gaussian proximity to the surface.
- Band Definition and Refinement: A narrow band centered around the estimated surface is defined in the distance field, with its width adaptively reduced during optimization to impose increasingly stringent geometric constraints.
- Constraint Operations:
- Gaussian Removal: Outlier Gaussians beyond the band are eliminated to compact the distribution.
- Opacity Control: Within the band, opacities are maximized for on-surface Gaussians and minimized at the boundaries, using a geometry-aware loss weighted by distance to the surface.
- Gaussian Movement: Centers of Gaussians are projected toward the surface, using interpolated signed distances and gradients from the TSDF, independent of iterative optimization for stability.
- Periodic Prior Update: The TSDF prior is updated every 5000 iterations in the 30000-iteration optimization, with narrowed bandwidths (Ï„=1,0.5,0.25), ensuring constraints are synchronized to the most accurate surface estimates.
The overall loss integrates RGB reconstruction, depth consistency, normal regularization, cross-view homography alignment, and self-constrained opacity penalties.
Empirical Validation
Extensive benchmarks on NeRF-Synthetic, DTU, Tanks and Temples (TNT), and Mip-NeRF 360 datasets demonstrate performance leadership:
- NeRF-Synthetic: Yields lowest Chamfer Distance (CDL1 = 1.87×100) and highest PSNR (34.21) among both explicit and implicit approaches, outperforming reference models like PGSR [1], GS-Pull [4], GS-UDF [3], and QGS [5].
- DTU: Achieves superior mean CD values (0.50) and competitive rendering times, without reliance on learned SDFs or external priors.
- TNT: Outperforms contemporaries in F1-score (mean = 0.51), confirming robustness in large-scale, diverse scenes.
- Mip-NeRF 360: Delivers highest SSIM (0.754 outdoors, 0.933 indoors) and lowest LPIPS (0.200 outdoors, 0.155 indoors), with strong PSNR retention, validating both rendering fidelity and geometric precision.
Ablation studies further confirm the impact of each module, with the absence of the prior, constraint updates, Gaussian removal/projection, and opacity losses all demonstrably degrading reconstruction quality.
Implications and Discussion
This methodology exhibits several significant implications:
- Geometrically Targeted Constraints: The self-constrained prior eliminates the dependency on external priors or geometric assumptions, enabling adaptive, scene-specific supervision derived purely from radiance field optimization.
- Iterative Band Narrowing: The coarse-to-fine band refinement implicitly increases spatial coherence and surface convergence, supporting dynamic constraint scheduling for stable optimization.
- Synergy with 3DGS: Integration with 3DGS preserves rendering speed and visual quality, resolving the major trade-off with previous approaches between fidelity and efficiency.
Practically, this advances mesh and radiance field reconstruction in applications requiring precise geometry, such as virtual reality, robotics, and scientific visualization. Theoretically, it sets a precedent for constraint-driven learning from internal model outputs themselves, sidestepping the weaknesses of external supervision.
Future Outlook
The adaptability and generalization afforded by geometry-aware, self-constrained priors could be extended to:
- Foundation Model Integration: Combining TSDF-driven constraints with foundation or transformer models for high-level semantic supervision.
- Unstructured Scene Completion: Further work on band definition in heavily unstructured or occluded environments for surface completion.
- Dynamic Scene Reconstruction: Applying band-driven constraints in dynamic or time-varying radiance fields.
Regular updates to the prior and finer grid resolutions remain open avenues for improving fidelity, especially for highly complex topologies.
Conclusion
This paper introduces an iterative, self-constrained prior mechanism for 3D Gaussian Splatting, leveraging TSDF grids extracted from depth maps to impose direct, geometry-aware constraints. The approach delivers strong numerical improvements in both geometric error and rendering fidelity, advancing the trade-off between accuracy and efficiency in surface reconstruction tasks. The paradigm enables targeted supervision without external data, and empirical results establish superiority over state-of-the-art methods across diverse datasets (2603.19682).