NeRF--: Neural Radiance Fields Without Known Camera Parameters (2102.07064v4)

Published 14 Feb 2021 in cs.CV

Abstract: Considering the problem of novel view synthesis (NVS) from only a set of 2D images, we simplify the training process of Neural Radiance Field (NeRF) on forward-facing scenes by removing the requirement of known or pre-computed camera parameters, including both intrinsics and 6DoF poses. To this end, we propose NeRF$--$, with three contributions: First, we show that the camera parameters can be jointly optimised as learnable parameters with NeRF training, through a photometric reconstruction; Second, to benchmark the camera parameter estimation and the quality of novel view renderings, we introduce a new dataset of path-traced synthetic scenes, termed as Blender Forward-Facing Dataset (BLEFF); Third, we conduct extensive analyses to understand the training behaviours under various camera motions, and show that in most scenarios, the joint optimisation pipeline can recover accurate camera parameters and achieve comparable novel view synthesis quality as those trained with COLMAP pre-computed camera parameters. Our code and data are available at https://nerfmm.active.vision.

Citations (102)

View on Semantic Scholar

Summary

The paper introduces a joint optimization technique that simultaneously refines camera parameters and the NeRF model using a photometric reconstruction loss.
The paper presents the BLEFF dataset for benchmarking novel view synthesis without relying on pre-computed SfM camera estimates.
The paper demonstrates that NeRF-- achieves comparable results to traditional methods, excelling in handling translational perturbations over rotational ones.

Overview of NeRF--: Neural Radiance Fields Without Known Camera Parameters

The paper presents an innovative approach to Neural Radiance Fields (NeRF) by eliminating the need for pre-computed camera parameters during the training phase for novel view synthesis. This development is articulated through NeRF--, a method designed to operate in forward-facing scenes using only a collection of 2D images without the requirement for known camera intrinsics and poses. The motivation behind this approach centers around simplifying the NeRF training process which traditionally relies on Structure from Motion (SfM) techniques like COLMAP to estimate camera parameters, a method that often introduces unnecessary complexity and potential inaccuracies.

Key Contributions

Joint Optimization of Camera Parameters and NeRF Model: The authors propose a novel training pipeline where camera parameters are estimated and refined alongside the NeRF model through joint optimization. The process employs a photometric reconstruction loss, thus allowing both camera intrinsics and poses to become learnable entities within the model. This reduces dependency on external estimation methods, such as COLMAP, freeing the pipeline from pre-processing steps that traditionally introduce additional error, especially in scenes with homogeneous textures or rapidly changing view-dependent appearances.
Blender Forward-Facing Dataset (BLEFF): To benchmark camera parameter estimation accuracy and the quality of novel view synthesis, the authors introduced the BLEFF dataset. This dataset contains path-traced synthetic scenes specifically designed for evaluating novel view synthesis without predefined camera parameters. BLEFF provides a robust framework to assess this methodology under controlled conditions where the ground truth of camera parameters is available.
Analysis of Training Behavior Under Varying Conditions: Extensive evaluations were conducted to understand the behavior of the NeRF-- framework under varying camera motions. These experiments demonstrate that, in many cases, the NeRF-- method rivals the novel view synthesis quality achieved with COLMAP pre-computed camera parameters, particularly excelling in translational perturbations over rotational ones.

Results and Implications

The results indicate that NeRF--, with its integrated pipeline for optimizing unknown camera parameters, can achieve performance comparable to traditional two-stage systems that rely on COLMAP. This approach not only simplifies the workflow but also enhances system robustness against translational perturbations and eliminates potential errors from SfM-based initialization where feature correspondences may fail.

The implications are significant for practical deployment, offering potential advancements in scenarios where users have limited control over capture setups or where pre-processing capabilities are constrained. This could lead to improved accessibility and efficiency in creating photo-realistic renderings for applications in virtual reality, augmented reality, and beyond.

Future Directions

The simplification achieved by NeRF-- opens numerous avenues for further research. Potential explorations could include enhancing optimization techniques for handling larger rotational perturbations or expanding the approach to fully 360-degree scenes. Moreover, integrating temporal information from sequences might address current limitations in dynamic scenes or instances of tracking an object where compositional consistency limits camera parameter determination.

Overall, this paper signifies a meaningful step in reducing the complexity of novel view synthesis systems, ensuring that high-quality image representations are attainable with minimal reliance on external camera parameter estimation processes.

PDF Markdown

Related Papers

Tweets

https://twitter.com/UPF_BME/status/1752301435146883461

YouTube

Show All Videos