RelitLRM: Generative Relightable Radiance for Large Reconstruction Models (2410.06231v2)

Published 8 Oct 2024 in cs.CV, cs.GR, and cs.LG

Abstract: We propose RelitLRM, a Large Reconstruction Model (LRM) for generating high-quality Gaussian splatting representations of 3D objects under novel illuminations from sparse (4-8) posed images captured under unknown static lighting. Unlike prior inverse rendering methods requiring dense captures and slow optimization, often causing artifacts like incorrect highlights or shadow baking, RelitLRM adopts a feed-forward transformer-based model with a novel combination of a geometry reconstructor and a relightable appearance generator based on diffusion. The model is trained end-to-end on synthetic multi-view renderings of objects under varying known illuminations. This architecture design enables to effectively decompose geometry and appearance, resolve the ambiguity between material and lighting, and capture the multi-modal distribution of shadows and specularity in the relit appearance. We show our sparse-view feed-forward RelitLRM offers competitive relighting results to state-of-the-art dense-view optimization-based baselines while being significantly faster. Our project page is available at: https://relit-lrm.github.io/.

Summary

The paper introduces a transformer-based model that disentangles geometry and appearance to reconstruct and relight 3D objects from as few as 4–8 images.
It employs a dual approach with a deterministic geometry reconstructor and a probabilistic diffusion-based relightable appearance generator for enhanced visual fidelity.
Empirical evaluations demonstrate that RelitLRM achieves photorealistic rendering with reduced computational demands, promising applications in gaming and augmented reality.

Overview of RelitLRM: Generative Relightable Radiance for Large Reconstruction Models

The paper introduces RelitLRM, a sophisticated generative Large Reconstruction Model aimed at efficiently reconstructing and relighting 3D objects from only a sparse set of images. This model tackles challenges inherent in inverse rendering, particularly under unknown static lighting conditions, by deploying a transformer-based architecture. Noteworthy is its ability to achieve photorealistic rendering with just four to eight images, driven by an end-to-end trainable framework on synthetic multi-view datasets.

Technical Contributions

RelitLRM is engineered to overcome limitations of prior inverse rendering techniques that often demanded extensive image captures and lengthy optimization times, leading to artifacts such as incorrect highlights. Its architecture adopts a dual approach: a deterministic geometry reconstructor and a probabilistic relightable appearance generator utilizing diffusion processes. This design effectively handles the decomposition between geometry and appearance, addressing the complex interaction of material properties and lighting which has historically challenged more traditional models.

Key contributions of RelitLRM are:

Transformative Architecture: Incorporation of a novel transformer-based generative relighting component which disentangles geometric and photometric elements to enhance the understanding of lighting-shape relations.
Performance and Efficiency: Comparison with state-of-the-art methods illustrates matching or even superior performance with significantly reduced computational demands, presenting real-world applicability due to its rapid processing capabilities.

Empirical Evaluations

The model’s efficacy is validated across several datasets, including Stanford-ORB and TensoIR-Synthetic, showcasing its capability in reconstructing detailed 3D appearances and accurately mapping complex lighting environments in fractions of the time taken by competing methods. These benchmarks particularly highlight RelitLRM’s advantage in terms of input efficiency—utilizing drastically fewer views without compromising output quality.

In ablation studies focused on deterministic versus probabilistic methods, RelitLRM demonstrates enhanced visual fidelity, particularly in capturing specular highlights, corroborating the effectiveness of its diffusion model.

Implications and Future Directions

RelitLRM opens promising avenues in the domains of gaming and augmented reality where swift and precise object rendering is critical. The implementation can significantly reduce computational loads in real-time applications, allowing for broader integration of complex AI-driven imaging techniques in consumer-facing products. The removal of reliance on static and controlled lighting conditions can further simplify capture setups, expanding the utility of 3D imaging technologies.

Looking ahead, expanding the model to support even higher resolutions and more extensive view counts without sacrificing performance remains a challenge. Future work could explore adaptive approaches to the representation of near-field lighting conditions, potentially enriching the model's usability across diverse environments.

In conclusion, RelitLRM stands as a significant stride in the efficient reconstruction and relighting of 3D objects, aligning well with contemporary demands in digital visualization technologies. Its innovative architecture and trained adaptability position it as a pivotal tool in advancing visual AI capabilities.

PDF Markdown

Related Papers

GitHub

RelitLRM: Generative Relightable Radiance for Large Reconstruction Models

Tweets

https://twitter.com/janusch_patas/status/1844344753401495734