Direct Voxel Grid Optimization: Super-fast Convergence for Radiance Fields Reconstruction (2111.11215v2)

Published 22 Nov 2021 in cs.CV

Abstract: We present a super-fast convergence approach to reconstructing the per-scene radiance field from a set of images that capture the scene with known poses. This task, which is often applied to novel view synthesis, is recently revolutionized by Neural Radiance Field (NeRF) for its state-of-the-art quality and flexibility. However, NeRF and its variants require a lengthy training time ranging from hours to days for a single scene. In contrast, our approach achieves NeRF-comparable quality and converges rapidly from scratch in less than 15 minutes with a single GPU. We adopt a representation consisting of a density voxel grid for scene geometry and a feature voxel grid with a shallow network for complex view-dependent appearance. Modeling with explicit and discretized volume representations is not new, but we propose two simple yet non-trivial techniques that contribute to fast convergence speed and high-quality output. First, we introduce the post-activation interpolation on voxel density, which is capable of producing sharp surfaces in lower grid resolution. Second, direct voxel density optimization is prone to suboptimal geometry solutions, so we robustify the optimization process by imposing several priors. Finally, evaluation on five inward-facing benchmarks shows that our method matches, if not surpasses, NeRF's quality, yet it only takes about 15 minutes to train from scratch for a new scene.

Authors (3)

Cheng Sun (40 papers)
Min Sun (108 papers)
Hwann-Tzong Chen (38 papers)

Citations (942)

View on Semantic Scholar

Summary

Evaluation of Direct Voxel Grid Optimization for Fast Radiance Fields Reconstruction

The paper "Direct Voxel Grid Optimization: Super-fast Convergence for Radiance Fields Reconstruction" presents a method for significantly accelerating the training process needed to reconstruct radiance fields from image data, specifically tailored for novel view synthesis tasks. The proposed method achieves comparable quality to Neural Radiance Fields (NeRF), reducing the training time from hours or days to just about 15 minutes using a single GPU.

Radiance field reconstruction aims to enable free-viewpoint navigation of 3D scenes from calibrated images, involving capturing photo-realistic novel views. NeRF stands out for its state-of-the-art quality and flexibility; however, it is constrained by high computational demands and prolonged training durations. This paper addresses these limitations through efficient direct voxel grid optimization.

Key Contributions

Voxel Grid Representation: The authors implement a dense voxel grid to explicitly represent the scene geometry and appearance. The voxel grid stores density and feature information, which is interpolated for rendering.
Post-Activated Interpolation: The paper introduces a post-activated interpolation technique after trilinear voxel interpolation but prior to alpha compositing, enabling sharper boundary modeling at lower grid resolutions. This approach contrasts with pre-activated and in-activated interpolation methods, producing more accurate geometry with fewer voxels.
Optimization Process: Two techniques are applied to stabilize and enhance the optimization:
- Low-Density Initialization: Initial densities are set to near-zero to avoid suboptimal geometries biased towards camera near planes.
- View-Count-Based Learning Rate: The learning rate for each voxel is adjusted based on its visibility across multiple views, reducing redundancy in less observed regions.

Numerical Results

Empirical evaluations on multiple datasets indicate the efficacy of the proposed method:

Synthetic-NeRF: Achieves a PSNR of 31.95 with a training time of less than 15 minutes, outperforming NeRF's 31.01 PSNR which requires over a day of training.
Synthetic-NSVF: Reaches performance levels comparable to state-of-the-art methods, demonstrating the method's robust application across different datasets.
BlendedMVS and Tanks and Temples: Consistently high visual quality and rapid convergence, confirming the method’s generalizability to real-world data.

Practical and Theoretical Implications

Practically, reducing training time to minutes facilitates real-time applications, such as interactive 3D visualizations and augmented reality, extending NeRF's usability for online product showcases and immersive user experiences. The significant reduction in the computational burden lowers barriers for deploying models in resource-constrained environments. From a theoretical standpoint, post-activation interpolation poses intriguing questions for future exploration of volumetric representations and their efficient optimization.

Future Developments

Potential advancements include:

Generalization to Diverse Scenes: Adapting the method to handle dynamic or unbounded scenes, building upon the achieved speedup in bounded static environments.
Integration with Hybrid Approaches: Combining the proposed method with other fast radiance field reconstruction techniques, such as those involving multi-plane images or layered depth images, to further enhance performance.
Improved Data Structures: Exploring more advanced octree-based structures or implicit neural representations to refine sparse voxel grids.

The efficient direct voxel grid optimization introduced in this paper opens new avenues for accelerated neural scene reconstruction, offering a compelling alternative to traditional slow optimization processes. By achieving substantial speed improvements while maintaining high-quality outputs, this work marks a significant step forward in the practical application and theoretical understanding of volumetric scene representations.

PDF Markdown

Related Papers

GitHub

GitHub - sunset1995/DirectVoxGO: Direct voxel grid optimization for fast radiance field reconstruction. (1,068 stars)