IBRNet: Learning Multi-View Image-Based Rendering (2102.13090v2)

Published 25 Feb 2021 in cs.CV

Abstract: We present a method that synthesizes novel views of complex scenes by interpolating a sparse set of nearby views. The core of our method is a network architecture that includes a multilayer perceptron and a ray transformer that estimates radiance and volume density at continuous 5D locations (3D spatial locations and 2D viewing directions), drawing appearance information on the fly from multiple source views. By drawing on source views at render time, our method hearkens back to classic work on image-based rendering (IBR), and allows us to render high-resolution imagery. Unlike neural scene representation work that optimizes per-scene functions for rendering, we learn a generic view interpolation function that generalizes to novel scenes. We render images using classic volume rendering, which is fully differentiable and allows us to train using only multi-view posed images as supervision. Experiments show that our method outperforms recent novel view synthesis methods that also seek to generalize to novel scenes. Further, if fine-tuned on each scene, our method is competitive with state-of-the-art single-scene neural rendering methods. Project page: https://ibrnet.github.io/

Authors (9)

Qianqian Wang (63 papers)
Zhicheng Wang (81 papers)
Kyle Genova (21 papers)
Pratul Srinivasan (8 papers)
Howard Zhou (12 papers)
Jonathan T. Barron (89 papers)
Ricardo Martin-Brualla (28 papers)
Noah Snavely (86 papers)
Thomas Funkhouser (66 papers)

Citations (721)

View on Semantic Scholar

Summary

The paper introduces a novel rendering method that synthesizes photorealistic images from sparse multiview data without scene-specific optimization.
It integrates multiview feature aggregation and a ray transformer to estimate radiance and volume density along continuous rays.
Empirical results show IBRNet surpasses traditional methods in perceptual quality and fidelity, even with sparse input views.

Overview of IBRNet: Learning Multi-View Image-Based Rendering

The paper "IBRNet: Learning Multi-View Image-Based Rendering" presents a novel methodology for synthesizing photo-realistic images from sparse multiview data. Unlike traditional neural scene representation methods that require optimization for each scene, IBRNet offers a more generalized approach capable of producing high-fidelity renderings for novel scenes without scene-specific training.

Methodology

IBRNet integrates elements from classic image-based rendering (IBR) and advanced neural radiance fields (NeRF). The method employs a network architecture consisting of a multilayer perceptron (MLP) and a ray transformer to estimate radiance and volume density at continuous 5D locations—specifically targeting 3D spatial coordinates with 2D directionality. IBRNet stands out as it draws these parameters dynamically from multiple source views at render time.

Key components include:

Multiview Feature Aggregation: At each query point, features from neighboring source views are aggregated. A PointNet-like architecture calculates variance to decide feature consistency, aiding in occlusion and visibility reasoning.
Ray Transformer: This module enables samples along a ray to contextually inform each other, enhancing density prediction without relying on precomputed geometry.
Volume Rendering: Leveraging fully differentiable classic volume rendering, the method synthesizes the target view by accumulating colors and densities along the ray.

The network supports end-to-end training on multiview posed images, offering competitive results even on complex real-world scenes.

Results and Comparative Analysis

Empirical evaluations illustrate that when trained across diverse datasets, IBRNet surpasses state-of-the-art systems in rendering high-resolution images for unseen scenes. On the Real Forward-Facing dataset, it exhibits superior perceptual quality and fidelity metrics compared to LLFF while displaying close competitiveness to NeRF when fine-tuned per scene. The experiments demonstrate IBRNet's ability to maintain performance even as source view density varies significantly.

Implications and Future Work

By combining image-based interpolation with scene representation, IBRNet addresses limitations faced by existing methods that either demand dense input views or suffer from prolonged optimization times. This capability presents substantial implications for potential applications, including interactive environments and real-time rendering systems.

Future research should delve into further optimizing network efficiency and exploring additional mechanisms to handle extremely sparse datasets, enhancing scalability across wider domains.

Conclusion

IBRNet offers a generalized, efficient, and high-quality approach for multi-view image-based rendering, marking a significant contribution to the field. Its blend of IBR principles with neural modeling opens avenues for future exploration, particularly in broadening the scopes of application for real-time and large-scale scene rendering.

PDF Markdown

Related Papers

GitHub

IBRNet: Learning Multi-View Image-Based Rendering
GitHub - googleinterns/IBRNet (489 stars)

YouTube

Show All Videos