- The paper presents a novel hybrid framework combining implicit-structured Gaussian representations with Atlanta-world guided planar regularization for globally consistent 3D reconstruction.
- It leverages multi-modal supervision from semantic, photometric, and geometric priors to achieve higher accuracy and smoother surfaces in both indoor and urban scenes.
- Experimental results demonstrate improved performance over explicit and implicit baselines on benchmarks like ScanNet and MatrixCity, ensuring detailed and noise-free reconstructions.
AtlasGS: Atlanta-world Guided Surface Reconstruction with Implicit Structured Gaussians
Introduction and Motivation
AtlasGS introduces a hybrid surface reconstruction framework that leverages the Atlanta-world assumption to regularize geometry in both indoor and urban scenes. The method addresses two persistent challenges in multi-view 3D reconstruction: (1) the lack of globally consistent geometric priors for low-texture regions, and (2) the trade-off between the efficiency and detail preservation of explicit Gaussian Splatting (GS) and the smoothness of implicit neural representations. Existing approaches either suffer from discontinuities in reconstructed surfaces (explicit GS) or are computationally inefficient and lack high-frequency detail (implicit SDF-based methods). AtlasGS proposes an implicit-structured GS representation, integrating the strengths of both paradigms, and introduces semantic and structural regularization based on the Atlanta-world model to enforce global consistency.
Methodology
Implicit-Structured Gaussian Representation
AtlasGS constructs a sparse feature grid from SfM points, encoding both geometric and semantic features. Each voxel in the grid predicts attributes for a set of local Gaussians via geometry and semantic MLPs. This implicit-structured approach ensures that the optimization of each Gaussian influences its neighbors, promoting locally coherent geometry while preserving high-frequency details. The rendering pipeline employs surfel rasterization, and semantic attributes are lifted from 2D pseudo-labels to 3D Gaussians, supervised via cross-entropy loss.
Figure 1: AtlasGS pipeline overview, showing the integration of posed images, SfM points, feature grid construction, attribute decoding, rasterization, and structural regularization via explicit plane indicators.
Atlanta-world Guided Planar Regularization
The Atlanta-world assumption models scenes with a dominant vertical direction (gravity) and multiple horizontal directions, capturing the structural regularity of man-made environments. AtlasGS introduces learnable explicit plane indicators for floor and ceiling, initialized via RANSAC on semantic-lifted points and optimized jointly with Gaussian parameters. Two regularization terms are defined:
- 3D Global Planar Regularization: Enforces normal alignment and planar constraints for Gaussians based on their semantic probabilities, ensuring that wall, floor, and ceiling regions are geometrically consistent with the Atlanta-world planes.
- 2D Local Surface Regularization: Aligns rendered surface normals (derived from depth) with plane indicators, directly regularizing Gaussian positions in poorly defined wall regions.
The total loss combines photometric, depth, normal, semantic, distortion, normal consistency, and structural regularization terms, with monocular geometry priors incorporated for indoor scenes.
Experimental Results
Surface Reconstruction Quality
AtlasGS is evaluated on ScanNet, ScanNet++, Replica (indoor), and MatrixCity (urban) datasets. Quantitative metrics include accuracy, completeness, precision, recall, F-score, and Chamfer Distance. AtlasGS consistently outperforms both implicit and explicit baselines, achieving the best scores across all metrics. Notably, it delivers smoother surfaces and captures finer geometric details, especially in low-texture regions where other methods exhibit discontinuities or noise.
Figure 2: Qualitative comparison of indoor and outdoor reconstruction, highlighting AtlasGS's ability to generate smoother surfaces and finer details compared to baselines.
Figure 3: Indoor scene reconstruction comparison on ScanNet, demonstrating local smoothness and high-frequency detail preservation.
Figure 4: Outdoor scene reconstruction comparison, showing detailed and noise-free surfaces in textureless regions.
Novel View Synthesis
AtlasGS achieves competitive results in novel view synthesis, rendering photorealistic views with accurate geometry and minimal artifacts. While 2DGS attains higher PSNR on synthetic datasets, AtlasGS produces less noisy images and better handles lighting variations in real scenes.
Figure 5: Novel view synthesis results on ScanNet++ and Replica, illustrating high-fidelity rendering and reduced noise.
Ablation Studies
Ablation experiments on ScanNet validate the contribution of each component. Removing either the 3D global planar or 2D local surface regularization degrades geometric quality, confirming their necessity for globally consistent supervision. Excluding depth or normal priors also results in lower F-score and higher Chamfer Distance, underscoring the importance of multi-modal geometric supervision.
Figure 6: Ablation paper on ScanNet, showing the impact of structural regularization on wall straightness and overall geometry.
Implementation Considerations
AtlasGS is implemented in PyTorch with custom surfel rasterization. Training is performed on a single NVIDIA 4090D GPU, with typical training times under 30 minutes for indoor scenes. The method is slower than prior Gaussian-based approaches due to the implicit decoding of all Gaussians during rendering, but remains significantly faster than implicit SDF-based methods. Surfaces are extracted using TSDF Fusion post-training. The approach relies on a pretrained semantic segmentation model (Mask2Former) for semantic lifting, which may limit generalization to elements outside the model's label set.
Limitations and Future Directions
AtlasGS's reliance on the Atlanta-world assumption and semantic segmentation restricts its applicability to scenes with strong structural regularity and well-defined semantic classes. Training and rendering speed, while improved over implicit methods, is still slower than pure explicit GS approaches. Future work may focus on accelerating the pipeline and extending semantic priors using more general segmentation models (e.g., SAM) and geometry priors for broader applicability.
Conclusion
AtlasGS presents a principled hybrid framework for 3D surface reconstruction, integrating implicit-structured Gaussians with Atlanta-world guided regularization. The method achieves state-of-the-art reconstruction quality in both indoor and urban scenes, balancing geometric smoothness, detail preservation, and rendering efficiency. Its structural regularization strategies effectively address the challenges of low-texture regions, and its modular design facilitates further extension to more diverse environments and priors.