Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash 91 tok/s
Gemini 2.5 Pro 53 tok/s Pro
GPT-5 Medium 29 tok/s
GPT-5 High 26 tok/s Pro
GPT-4o 98 tok/s
GPT OSS 120B 470 tok/s Pro
Kimi K2 216 tok/s Pro
2000 character limit reached

MeshSplat: Generalizable Sparse-View Surface Reconstruction via Gaussian Splatting (2508.17811v1)

Published 25 Aug 2025 in cs.GR, cs.AI, cs.CV, and cs.LG

Abstract: Surface reconstruction has been widely studied in computer vision and graphics. However, existing surface reconstruction works struggle to recover accurate scene geometry when the input views are extremely sparse. To address this issue, we propose MeshSplat, a generalizable sparse-view surface reconstruction framework via Gaussian Splatting. Our key idea is to leverage 2DGS as a bridge, which connects novel view synthesis to learned geometric priors and then transfers these priors to achieve surface reconstruction. Specifically, we incorporate a feed-forward network to predict per-view pixel-aligned 2DGS, which enables the network to synthesize novel view images and thus eliminates the need for direct 3D ground-truth supervision. To improve the accuracy of 2DGS position and orientation prediction, we propose a Weighted Chamfer Distance Loss to regularize the depth maps, especially in overlapping areas of input views, and also a normal prediction network to align the orientation of 2DGS with normal vectors predicted by a monocular normal estimator. Extensive experiments validate the effectiveness of our proposed improvement, demonstrating that our method achieves state-of-the-art performance in generalizable sparse-view mesh reconstruction tasks. Project Page: https://hanzhichang.github.io/meshsplat_web

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.

Summary

  • The paper introduces MeshSplat, which employs 2D Gaussian Splatting to bridge novel view synthesis and mesh extraction from sparse views.
  • It leverages a feed-forward MVS architecture with multi-view transformers, U-Net refinement, and specialized losses such as the WCD loss and uncertainty-guided normal supervision to improve geometric accuracy.
  • MeshSplat achieves state-of-the-art results on Re10K and Scannet by producing smoother, more complete meshes with efficient inference and robust cross-dataset generalization.

MeshSplat: Generalizable Sparse-View Surface Reconstruction via Gaussian Splatting

Introduction and Motivation

MeshSplat addresses the challenge of generalizable surface reconstruction from sparse multi-view images, a scenario where traditional per-scene optimization methods (e.g., NeRF, 3DGS) and even recent generalizable approaches often fail to recover accurate and complete scene geometry. The core innovation is the use of 2D Gaussian Splatting (2DGS) as an intermediate representation, bridging the gap between novel view synthesis (NVS) and mesh extraction. Unlike 3DGS, which suffers from inconsistent surface intersections across viewpoints, 2DGS provides consistent intersection planes, making it more suitable for surface reconstruction, especially under sparse-view constraints. Figure 1

Figure 1: 2DGS provides consistent intersection planes across viewpoints, while 3DGS leads to inconsistent surfaces; unregularized 2DGS can still misalign with true surface contours.

Methodology

Architecture Overview

MeshSplat employs a feed-forward, multi-view stereo (MVS)-based architecture. Given a pair of images and their camera parameters, the pipeline consists of:

  1. Feature Extraction: A CNN and multi-view transformer extract per-view feature maps.
  2. Cost Volume Construction: Plane-sweeping generates per-view cost volumes, encoding multi-view correspondences.
  3. Depth and Normal Prediction: U-Net-based refinement yields depth maps; a dedicated normal prediction network estimates per-pixel normals and their uncertainties.
  4. 2DGS Parameterization: Pixel-aligned 2DGS are instantiated using predicted depths (for positions) and normals (for orientations), with other attributes decoded via a convolutional head.
  5. Supervision and Mesh Extraction: 2DGS are used for differentiable rendering (NVS supervision) and mesh extraction. Figure 2

    Figure 2: MeshSplat architecture: multi-view feature extraction, cost volume construction, depth/normal prediction, 2DGS parameterization, and mesh extraction.

Loss Functions

  • Weighted Chamfer Distance (WCD) Loss: Regularizes 2DGS positions by aligning point clouds from different views, weighted by confidence maps derived from cost volumes. This focuses the loss on regions with reliable correspondences, mitigating the impact of occlusions and non-overlapping areas.
  • Uncertainty-Guided Normal NLL Loss: Supervises normal predictions using pseudo ground-truth normals from a monocular estimator (Omnidata), with per-pixel uncertainty (kappa) modulating the loss. This targets reliable regions and improves orientation accuracy.
  • Photometric Loss: Combines MSE and LPIPS between rendered and ground-truth images for NVS supervision.

Training and Inference

Training is performed end-to-end with the above losses. During inference, the network predicts 2DGS from sparse input views, which are then used for mesh extraction following the 2DGS pipeline.

Experimental Results

Quantitative and Qualitative Performance

MeshSplat demonstrates superior performance on Re10K and Scannet datasets, outperforming MVSplat, PixelSplat, and NeRF-based baselines in Chamfer Distance, precision, recall, and F1 metrics. Notably, MeshSplat achieves a CD of 0.3566 and F1 of 0.3758 on Re10K, and CD of 0.2606 and F1 of 0.3824 on Scannet, with significant improvements in mesh completeness and surface smoothness. Figure 3

Figure 3: MeshSplat reconstructs smoother and more complete meshes from sparse views compared to MVSplat and other baselines.

Figure 4

Figure 4

Figure 4: Qualitative comparison on Re10K: MeshSplat yields meshes with fewer holes and more regular surfaces.

Figure 5

Figure 5: Zero-shot transfer: MeshSplat generalizes to Scannet and Replica, maintaining mesh quality across datasets.

Depth and Normal Prediction

MeshSplat achieves lower depth error (AbsRel 0.0910 vs. 0.1692 for MVSplat) and improved normal estimation (mean error 33.84° vs. 57.16°), indicating more accurate geometry recovery. Figure 6

Figure 6: Visualization of predicted depth, normal, confidence, and kappa maps; confidence maps highlight unreliable regions, kappa maps reflect normal uncertainty.

Ablation Studies

Ablations confirm the necessity of each component:

  • 2DGS outperforms 3DGS for mesh extraction.
  • WCD loss and the normal prediction network each provide additive improvements in geometric fidelity.
  • Removing either leads to degraded mesh quality and increased artifacts. Figure 7

    Figure 7: Ablation: each module (2DGS, WCD, normal prediction) incrementally improves mesh quality and normal accuracy.

Implementation Considerations

  • Resource Requirements: MeshSplat is efficient, requiring only a single NVIDIA A800 GPU for training, and achieves fast inference (0.102s per forward pass, 13.3M parameters).
  • Scalability: The feed-forward design enables application to large-scale datasets and real-world scenes without per-scene optimization.
  • Limitations: MeshSplat struggles in textureless or occluded regions, sometimes producing discontinuous depth maps. It cannot reconstruct surfaces not visible in the input views, a limitation inherent to all non-generative, view-based methods.

Practical and Theoretical Implications

MeshSplat demonstrates that 2DGS, when properly regularized and supervised, can serve as an effective bridge between NVS and surface reconstruction, enabling generalizable, feed-forward mesh extraction from sparse views. The WCD loss and uncertainty-guided normal supervision are critical for robust geometry prediction under limited multi-view constraints. The approach is extensible to other tasks requiring generalizable geometric priors, such as SLAM, AR/VR content creation, and robotics.

Future Directions

Potential avenues for further research include:

  • Integrating generative priors to hallucinate geometry in unobserved regions.
  • Extending the framework to handle dynamic scenes or outdoor environments.
  • Improving robustness in textureless or ambiguous regions via multi-modal or semantic cues.
  • Exploring more efficient architectures for real-time applications.

Conclusion

MeshSplat establishes a new state-of-the-art for generalizable sparse-view surface reconstruction by leveraging 2DGS, WCD loss, and uncertainty-guided normal supervision. The method achieves strong quantitative and qualitative results, robust cross-dataset generalization, and efficient inference, making it a practical solution for real-world 3D reconstruction from limited imagery.

Ai Generate Text Spark Streamline Icon: https://streamlinehq.com

Paper Prompts

Sign up for free to create and run prompts on this paper using GPT-5.

Github Logo Streamline Icon: https://streamlinehq.com