Baking Neural Radiance Fields for Real-Time View Synthesis (2103.14645v1)

Published 26 Mar 2021 in cs.CV and cs.GR

Abstract: Neural volumetric representations such as Neural Radiance Fields (NeRF) have emerged as a compelling technique for learning to represent 3D scenes from images with the goal of rendering photorealistic images of the scene from unobserved viewpoints. However, NeRF's computational requirements are prohibitive for real-time applications: rendering views from a trained NeRF requires querying a multilayer perceptron (MLP) hundreds of times per ray. We present a method to train a NeRF, then precompute and store (i.e. "bake") it as a novel representation called a Sparse Neural Radiance Grid (SNeRG) that enables real-time rendering on commodity hardware. To achieve this, we introduce 1) a reformulation of NeRF's architecture, and 2) a sparse voxel grid representation with learned feature vectors. The resulting scene representation retains NeRF's ability to render fine geometric details and view-dependent appearance, is compact (averaging less than 90 MB per scene), and can be rendered in real-time (higher than 30 frames per second on a laptop GPU). Actual screen captures are shown in our video.

Authors (5)

Peter Hedman (21 papers)
Pratul P. Srinivasan (38 papers)
Ben Mildenhall (41 papers)
Jonathan T. Barron (89 papers)
Paul Debevec (20 papers)

Citations (510)

View on Semantic Scholar

Summary

Overview of "Baking Neural Radiance Fields for Real-Time View Synthesis"

This paper presents novel advancements in the domain of neural volumetric representations, specifically targeting the computational challenges posed by Neural Radiance Fields (NeRF) in real-time applications. NeRF has been a prominent method for rendering photorealistic images of 3D scenes but is constrained by its intensive computational requirements, rendering it impractical for applications requiring real-time interaction.

Methodology

The paper introduces the Sparse Neural Radiance Grid (SNeRG), which facilitates real-time rendering on commodity hardware. This representation is achieved through two primary innovations:

Reformulated NeRF Architecture: The authors propose a deferred NeRF architecture that modifies the traditional NeRF by limiting the computationally expensive operations to a single network evaluation per ray, rather than multiple evaluations throughout the 3D space. This is accomplished by outputting a diffuse color and a feature vector that encodes view-dependent effects.
Sparse Voxel Grid Representation: The new representation leverages a sparse 3D voxel grid, effectively precomputing and storing the radiance fields. The addition of a sparsity regularization term encourages concentration of opacity around the actual scene surfaces, thereby reducing the memory requirements both for storage and rendering.

Implementation and Results

The implementation of SNeRG involves "baking" the trained NeRF into the new grid structure, enabling accelerated rendering processes. The paper showcases significant improvements:

Rendering Speed: SNeRG allows for rendering times reduced to 12 ms per frame, achieving over 30 frames per second on standard hardware, a notable enhancement over the existing solutions.
Memory Efficiency: The representation remains compact, with average storage requirements under 90 MB per scene.
Quality: The quality of the rendered images maintains competitive fidelity, as indicated by metrics such as PSNR, SSIM, and LPIPS.

Furthermore, the authors detail extensive evaluations against various baselines, confirming the robust performance of their method in both synthetic and real-world scenarios.

Implications and Future Directions

The implications of this work are profound, offering immediate utility in fields such as virtual and augmented reality, where real-time rendering is critical. SNeRG's efficient encoding and rendering suggest potential extensions beyond static scenes, potentially influencing the rendering of dynamic environments or interactive media.

Looking forward, the integration of such techniques could inspire further exploration into hybrid models that might combine different neural and traditional rendering techniques, optimizing for both quality and efficiency in increasingly complex and interactive applications.

Conclusion

"Baking Neural Radiance Fields for Real-Time View Synthesis" is a substantial contribution to the paper of neural scene representations, demonstrating significant strides toward making photorealistic rendering accessible for real-time applications. The methodologies proposed not only enhance computational efficiency but also set the stage for future research in more dynamic and interactive rendering systems.

PDF Markdown

Related Papers

Find Related Papers