Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
97 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

GeoNeRF: Generalizing NeRF with Geometry Priors (2111.13539v2)

Published 26 Nov 2021 in cs.CV

Abstract: We present GeoNeRF, a generalizable photorealistic novel view synthesis method based on neural radiance fields. Our approach consists of two main stages: a geometry reasoner and a renderer. To render a novel view, the geometry reasoner first constructs cascaded cost volumes for each nearby source view. Then, using a Transformer-based attention mechanism and the cascaded cost volumes, the renderer infers geometry and appearance, and renders detailed images via classical volume rendering techniques. This architecture, in particular, allows sophisticated occlusion reasoning, gathering information from consistent source views. Moreover, our method can easily be fine-tuned on a single scene, and renders competitive results with per-scene optimized neural rendering methods with a fraction of computational cost. Experiments show that GeoNeRF outperforms state-of-the-art generalizable neural rendering models on various synthetic and real datasets. Lastly, with a slight modification to the geometry reasoner, we also propose an alternative model that adapts to RGBD images. This model directly exploits the depth information often available thanks to depth sensors. The implementation code is available at https://www.idiap.ch/paper/geonerf.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (3)
  1. Mohammad Mahdi Johari (2 papers)
  2. Yann Lepoittevin (1 paper)
  3. François Fleuret (78 papers)
Citations (180)

Summary

  • The paper introduces a novel framework that leverages geometry priors to generalize NeRF without intensive per-scene optimization.
  • It employs a two-stage architecture with a geometry reasoner constructing cost volumes and a Transformer-based renderer integrating multi-view data.
  • Experimental results show higher PSNR and SSIM with reduced LPIPS scores, outperforming models like IBRNet and MVSNeRF.

GeoNeRF: Generalizing NeRF with Geometry Priors

GeoNeRF presents a novel approach in the domain of view synthesis through a framework designed to generalize Neural Radiance Fields (NeRFs). This method tackles the intrinsic limitations of NeRFs related to the need for per-scene optimization, which is computationally expensive and requires dense imagery. GeoNeRF introduces a more efficient solution by leveraging geometry priors, allowing for novel view synthesis without the necessity for prolonged optimizations on individual scenes.

The GeoNeRF architecture incorporates two primary stages: a geometry reasoner and a renderer. The geometry reasoner constructs cascaded cost volumes from nearby views and employs a semi-supervised learning approach to guide the extraction of geometry features. This design allows GeoNeRF to manage sophisticated occlusions, enhancing its ability to gather information from a comprehensive array of source views.

In the renderer phase, GeoNeRF employs a Transformer-based attention mechanism that is permutation invariant, enabling the integration of data from multiple viewpoints. This results in a higher fidelity in inferred geometry and appearance, subsequently enabling the synthesis of detailed and accurate image renditions using classical volume rendering techniques.

GeoNeRF exhibits significant improvements over existing frameworks, outperforming models like IBRNet and MVSNeRF in terms of image quality across various datasets. Numerical evaluations demonstrate its superiority, with GeoNeRF achieving higher PSNR and SSIM values alongside reduced LPIPS scores, indicative of enhanced perceptual quality. Its efficiency is further underscored by the ability to be fine-tuned on individual scenes quickly, producing competitive results comparable to the extensively optimized vanilla NeRF models.

Furthermore, a derivative of GeoNeRF, termed $\text{GeoNeRF}_{\text{+D}$, introduces compatibility with RGBD inputs, leveraging additional depth information to further augment the geometric reasoning capabilities. This adaptation demonstrates robustness to the quality and sparseness of depth inputs, allowing for reliable synthesis even with incomplete or low-resolution depth data.

Implications of this research extend to practical applications requiring fast and flexible rendering capabilities, reducing the traditional computational constraints associated with NeRF deployment. Theoretically, the GeoNeRF framework establishes a more scalable approach to synthesizing novel views, paving the way for future methodologies that blend geometry reasoning with neural rendering. Future developments may see the refinement of source view selection and adaptive rendering adjustments, offering even more efficiency and adaptability for diverse and complex scene contents.