GRM: Large Gaussian Reconstruction Model for Efficient 3D Reconstruction and Generation (2403.14621v1)

Published 21 Mar 2024 in cs.CV

Abstract: We introduce GRM, a large-scale reconstructor capable of recovering a 3D asset from sparse-view images in around 0.1s. GRM is a feed-forward transformer-based model that efficiently incorporates multi-view information to translate the input pixels into pixel-aligned Gaussians, which are unprojected to create a set of densely distributed 3D Gaussians representing a scene. Together, our transformer architecture and the use of 3D Gaussians unlock a scalable and efficient reconstruction framework. Extensive experimental results demonstrate the superiority of our method over alternatives regarding both reconstruction quality and efficiency. We also showcase the potential of GRM in generative tasks, i.e., text-to-3D and image-to-3D, by integrating it with existing multi-view diffusion models. Our project website is at: https://justimyhxu.github.io/projects/grm/.

Citations (85)

View on Semantic Scholar

Summary

The paper introduces a novel transformer-based GRM that converts sparse-view images into dense 3D Gaussian representations for rapid reconstruction.
It replaces traditional triplane methods with pixel-aligned 3D Gaussians and a feed-forward generative model to enhance scalability and efficiency.
Empirical results demonstrate significant gains in PSNR, SSIM, and LPIPS metrics, with promising integration into text-to-3D and image-to-3D generative tasks.

Exploring Efficient 3D Reconstruction and Generation with GRM: A Large Gaussian Reconstruction Model

Introduction to GRM

The recently introduced Gaussian Reconstruction Model (GRM) presents an innovative approach to reconstructing 3D assets from sparse-view images, significantly reducing the time required for this process to approximately 0.1s. This model leverages a transformer-based architecture to efficiently handle multi-view information, translating input pixels to pixel-aligned Gaussians. These Gaussians are subsequently unprojected to create a densely distributed set of 3D Gaussians that represent the scene. Notable for its scalability and efficiency, GRM demonstrates superior reconstruction quality and efficiency over other methods. Furthermore, its potential extends to generative tasks, including text-to-3D and image-to-3D, by integrating with existing multi-view diffusion models.

Technical Overview

GRM replaces the conventional triplane scene representation with 3D Gaussians, navigating away from inefficient volume rendering. Its architecture encompasses two novel components: a representation through pixel-aligned 3D Gaussians and a purely transformer-based architecture for pixel-to-3D Gaussian conversion. This model captures highly detailed spatial features and encourages consistency across different views, a crucial factor for high-quality reconstruction.

Core Contributions

Efficient Framework: GRM introduces a feed-forward 3D generative model focused on 3D Gaussian splatting, enabling rapid and high-quality 3D reconstruction.
Transformer-based Sparse-View Reconstructor: A transformer architecture, including an encoder and an innovative upsampler, is employed for efficient pixel-to-3D Gaussian translation.
State-of-the-Art Quality and Speed: For object-level 3D reconstruction and when combined with multi-view diffusion models for generative tasks, GRM sets new benchmarks in quality and inference speed.

Empirical Results

Extensive experimental analyses underscore GRM's ability to outperform existing methods significantly. For instance, in the context of sparse-view 3D reconstruction from four images, GRM achieves remarkable improvements in PSNR, SSIM, and LPIPS metrics while maintaining impressive inference speed. Similarly, in the domains of text-to-3D and image-to-3D generation, GRM, coupled with appropriate diffusion models, continues to exhibit superior performance across various quality metrics and user studies.

Future Directions

Despite its achievements, GRM's reliance on input images restricts its capacity for hallucinating unseen regions, leading to potential areas for improvement in future work. The exploration of probabilistic frameworks or the incorporation of hallucinative capabilities could enhance GRM's versatility and reconstruction quality.

Concluding Remarks

The Gaussian Reconstruction Model (GRM) represents a significant step forward in the field of 3D reconstruction and generation. By efficiently transforming sparse-view images into high-fidelity 3D assets and seamlessly integrating with diffusion models for generative tasks, it opens new avenues in digital content creation. Its exemplary performance, underscored by rigorous experimental validation, showcases the transformative potential of combining advanced neural architectures with 3D Gaussian representations.

PDF Markdown

Related Papers

Tweets

https://twitter.com/taziku_co/status/1774068527735226465

https://twitter.com/janusch_patas/status/1771052859259277587

https://twitter.com/camenduru/status/1774120641429463325

https://twitter.com/zhenjun_zhao/status/1771049619704922189

https://twitter.com/fly51fly/status/1771297414663839949

https://twitter.com/arxivsanitybot/status/1771356215349575875

YouTube

Show All Videos