EVER: Exact Volumetric Ellipsoid Rendering for Real-time View Synthesis

Published 2 Oct 2024 in cs.CV | (2410.01804v5)

Abstract: We present Exact Volumetric Ellipsoid Rendering (EVER), a method for real-time differentiable emission-only volume rendering. Unlike recent rasterization based approach by 3D Gaussian Splatting (3DGS), our primitive based representation allows for exact volume rendering, rather than alpha compositing 3D Gaussian billboards. As such, unlike 3DGS our formulation does not suffer from popping artifacts and view dependent density, but still achieves frame rates of $\sim!30$ FPS at 720p on an NVIDIA RTX4090. Since our approach is built upon ray tracing it enables effects such as defocus blur and camera distortion (e.g. such as from fisheye cameras), which are difficult to achieve by rasterization. We show that our method is more accurate with fewer blending issues than 3DGS and follow-up work on view-consistent rendering, especially on the challenging large-scale scenes from the Zip-NeRF dataset where it achieves sharpest results among real-time techniques.

Abstract PDF HTML Upgrade to Chat

Citations (4)

View on Semantic Scholar

Summary

The paper presents EVER, a novel method that replaces 3D Gaussian splatting with constant-density ellipsoids for exact volumetric rendering.
It leverages analytical integration to eliminate view-dependent artifacts, achieving real-time performance at 30 FPS on high-resolution scenes.
Experimental results on Mip-NeRF 360 and Zip-NeRF datasets demonstrate improved image quality with superior LPIPS and SSIM metrics.

Exact Volumetric Ellipsoid Rendering for Real-time View Synthesis

The paper "Exact Volumetric Ellipsoid Rendering (EVER)" introduces a novel approach to real-time differentiable emission-only volume rendering. In contrast to the popular 3D Gaussian Splatting (3DGS) technique, the authors propose using a primitive-based representation that allows for exact volume rendering, thereby addressing several limitations of 3DGS.

Motivation and Background

The field of novel view synthesis has seen significant advances with the advent of 3D reconstruction techniques like Neural Radiance Fields (NeRF). These methods leverage differentiable rendering of volumetric scene representations to produce photorealistic 3D reconstructions. Recent developments, such as 3D Gaussian Splatting, combine the efficiency of point-based models with the differentiability of volume-based representations. However, 3DGS lacks a true volumetric density field and suffers from view-dependent artifacts, primarily noticeable as "popping".

Methodology

EVER employs constant density ellipsoids as primitives, diverging from the Gaussian primitives used in 3DGS. This choice ensures a physically accurate representation of the scene's volumetric properties, facilitating the exact computation of the volume rendering integral. The model guarantees that the rendering is free from inconsistencies such as popping artifacts and view-dependent densities, which are prevalent in 3DGS.

Exact Primitive-based Rendering

Each primitive in EVER has a constant density and constant view-dependent color, represented as ellipsoids. The method executes an exact volume rendering by performing ray tracing, leveraging the computational efficiency of constant density primitives. This approach allows for the analytical integration of the volume rendering integral: $C = \int_0^\infty c_\mathbf{r}(t)\sigma_\mathbf{r}(t)\exp(-\int_0^t \sigma_\mathbf{r}(s)\,ds)\,dt$ The primitives' density and color are expressed as step functions, enabling exact computation without numerical quadrature. The volume rendering equation is then efficiently resolved using: $C = \sum_{i=1}^N c_i (1-\exp(-\sigma_i \Delta t_i)) \prod_{j=1}^{i-1} \exp(-\sigma_j \Delta t_j)$

Density Parameterization and Optimization

To avoid optimization challenges associated with direct density parameterization, the paper introduces an alternative parameter $\alpha$ , linking it to density via a softmax function. This allows for a stable gradient descent even for high-density primitives. Further, the paper adapts the Adaptive Density Control (ADC) heuristics from 3DGS to accommodate the constant density primitives, including adjusted thresholds for splitting and cloning based on gradient magnitudes.

Implementation

The authors implemented EVER utilizing PyTorch, CUDA, OptiX, and Slang for efficient ray tracing and backpropagation through the rendering process. The method achieves real-time performance, rendering at approximately 30 FPS at a 720p resolution on an NVIDIA RTX4090.

Experimental Results

The effectiveness of EVER is demonstrated on the Mip-NeRF 360 and Zip-NeRF datasets. EVER outperforms 3DGS and its variants in terms of image quality metrics like LPIPS and SSIM, especially in scenarios involving large-scale scenes. The paper provides quantitative as well as qualitative evidence of EVER's superior performance, illustrating enhanced sharpness and absence of popping artifacts (Figure~\ref{fig:image_comparison} & Figure~\ref{fig:popping_train}).

Comparison and Analysis

Ablation studies confirm the importance of each component of the model. Notably, the exact blending of primitive colors significantly improves visual quality, allowing EVER to handle lighting and textural details more accurately (Figure~\ref{fig:ablation}). The authors also show that while splatting-based techniques see a degradation in performance due to improper blending, EVER maintains high fidelity even under complex lighting interactions.

Implications and Future Directions

From a practical standpoint, EVER presents a robust methodology for high-quality real-time 3D reconstructions, making it suitable for applications in virtual reality, augmented reality, and interactive media. Theoretically, it bridges the gap between fast but less accurate techniques and slower, high-fidelity methods, opening avenues for further research in hybrid models combining the strengths of both paradigms.

Future work may focus on extending this approach to handle more complex interactions, such as dynamic scenes or integration with global illumination models. Additionally, exploring adaptive primitives that can dynamically alter their shapes and densities based on scene requirements could further enhance both performance and quality.

In summary, EVER advances the state of real-time view synthesis, offering a method that combines the efficiency of primitive-based models with the accuracy of volumetric rendering, thus marking a significant contribution to the field of 3D computational graphics.

Markdown