GSD: View-Guided Gaussian Splatting Diffusion for 3D Reconstruction (2407.04237v4)

Published 5 Jul 2024 in cs.CV and cs.GR

Abstract: We present GSD, a diffusion model approach based on Gaussian Splatting (GS) representation for 3D object reconstruction from a single view. Prior works suffer from inconsistent 3D geometry or mediocre rendering quality due to improper representations. We take a step towards resolving these shortcomings by utilizing the recent state-of-the-art 3D explicit representation, Gaussian Splatting, and an unconditional diffusion model. This model learns to generate 3D objects represented by sets of GS ellipsoids. With these strong generative 3D priors, though learning unconditionally, the diffusion model is ready for view-guided reconstruction without further model fine-tuning. This is achieved by propagating fine-grained 2D features through the efficient yet flexible splatting function and the guided denoising sampling process. In addition, a 2D diffusion model is further employed to enhance rendering fidelity, and improve reconstructed GS quality by polishing and re-using the rendered images. The final reconstructed objects explicitly come with high-quality 3D structure and texture, and can be efficiently rendered in arbitrary views. Experiments on the challenging real-world CO3D dataset demonstrate the superiority of our approach. Project page: https://yxmu.foo/GSD/

Citations (2)

View on Semantic Scholar

Summary

The paper introduces a novel method that fuses Gaussian Splatting with a Diffusion Transformer to enable view-guided, single-view 3D reconstruction.
It employs a guided sampling process and iterative image polishing to enhance rendering fidelity and ensure geometric consistency.
Empirical results show significant improvements in PSNR, SSIM, LPIPS, and Chamfer Distance compared to state-of-the-art approaches.

View-Guided Gaussian Splatting Diffusion for 3D Reconstruction: An Overview

The paper "GSD: View-Guided Gaussian Splatting Diffusion for 3D Reconstruction" introduces an advanced diffusion model for high-fidelity 3D object reconstruction derived from a single view. This work builds upon the Gaussian Splatting (GS) representation, leveraging its strengths to address existing shortcomings in 3D geometry consistency and rendering quality encountered by prior methods. The combination of GS with a diffusion model enables the generation of 3D objects represented by GS ellipsoids, facilitating efficient, view-guided reconstruction.

Key Contributions

Utilization of Gaussian Splatting (GS) Representation: The paper employs GS as the core representation technique, encoding scenes with GS ellipsoids that include parameters for position, covariance, color, and opacity. Unlike traditional 3D representations, GS allows for high-resolution and dense encoding of both geometry and texture.
Diffusion Transformer (DiT) Integration: A Diffusion Transformer model is built to capture the generative priors of 3D objects in GS space. The use of a transformer is argued to be more effective for modeling the rich features in GS, in comparison to other architectures like PVCNNs.
View-Guided Sampling: The proposed method incorporates a guided sampling process during the denoising steps of the diffusion model, which utilizes gradients from the splatting function. This approach allows for fine-grained feature propagation from the input view to the reconstructed 3D model.
Iterative Polishing and Re-Using: To enhance rendering fidelity and improve the reconstructed GS quality, a 2D diffusion model is employed. The iterative process of refining and re-using the rendered images ensures high-quality 3D reconstructions that are consistent with the input view.

Empirical Results

The proposed methodology demonstrates significant improvements in both view synthesis and 3D geometry across various objects and datasets. Specifically, the experiments conducted on the CO3Dv2 dataset show that GSD outperforms existing methods such as NerFormer, ViewFormer, and SparseFusion in terms of PSNR, SSIM, and LPIPS metrics for novel view synthesis. Additionally, the GS representation utilized in GSD provides superior geometric consistency, as reflected in higher F-score and lower Chamfer Distance when compared to approaches like PC $^2$ .

For reconstruction from a single view, GSD achieves notable PSNR and LPIPS improvements, even surpassing methods specifically designed for multi-view inputs. The qualitative results exhibit realistic image synthesis and geometrically consistent novel views, further validating the efficacy of GSD.

Implications and Future Directions

The integration of GS and diffusion models signifies a promising direction for 3D object reconstruction, combining the explicit encoding capabilities of GS with the generative strength of DiT. This fusion allows for real-time rendering from arbitrary views, facilitating applications in augmented reality, virtual reality, and other domains requiring high-quality 3D representations.

Future research may explore the scalability of this approach to more diverse datasets and generic object categories. Additionally, enhancing the robustness of the iterative denoising process and expanding the model's adaptability to varying input conditions could further improve performance and applicability. The potential extension of this work to leverage large-scale, multi-view datasets might offer new insights into efficient 3D asset creation for practical use cases.

Conclusion

The paper presents a comprehensive framework for 3D reconstruction, merging the strengths of Gaussian Splatting representation and diffusion models. The empirical results underscore the effectiveness of view-guided sampling and the iterative polishing strategy, establishing new benchmarks in the field. The GSD approach is poised to significantly impact practical applications requiring detailed and consistent 3D object reconstructions from limited input views.

PDF Markdown

Related Papers

Tweets

https://twitter.com/YuxuanMu16173/status/1810208982767665535