PERF: Panoramic Neural Radiance Field from a Single Panorama (2310.16831v2)

Published 25 Oct 2023 in cs.CV, cs.GR, and cs.LG

Abstract: Neural Radiance Field (NeRF) has achieved substantial progress in novel view synthesis given multi-view images. Recently, some works have attempted to train a NeRF from a single image with 3D priors. They mainly focus on a limited field of view with a few occlusions, which greatly limits their scalability to real-world 360-degree panoramic scenarios with large-size occlusions. In this paper, we present PERF, a 360-degree novel view synthesis framework that trains a panoramic neural radiance field from a single panorama. Notably, PERF allows 3D roaming in a complex scene without expensive and tedious image collection. To achieve this goal, we propose a novel collaborative RGBD inpainting method and a progressive inpainting-and-erasing method to lift up a 360-degree 2D scene to a 3D scene. Specifically, we first predict a panoramic depth map as initialization given a single panorama and reconstruct visible 3D regions with volume rendering. Then we introduce a collaborative RGBD inpainting approach into a NeRF for completing RGB images and depth maps from random views, which is derived from an RGB Stable Diffusion model and a monocular depth estimator. Finally, we introduce an inpainting-and-erasing strategy to avoid inconsistent geometry between a newly-sampled view and reference views. The two components are integrated into the learning of NeRFs in a unified optimization framework and achieve promising results. Extensive experiments on Replica and a new dataset PERF-in-the-wild demonstrate the superiority of our PERF over state-of-the-art methods. Our PERF can be widely used for real-world applications, such as panorama-to-3D, text-to-3D, and 3D scene stylization applications. Project page and code are available at https://perf-project.github.io/ and https://github.com/perf-project/PeRF.

Citations (21)

View on Semantic Scholar

Summary

The paper presents a framework that trains panoramic neural radiance fields from a single 360° image, overcoming the need for multi-view data.
It employs collaborative RGBD inpainting and a progressive inpainting-and-erasing strategy to synthesize occluded regions and maintain geometric consistency.
Experimental results demonstrate improved PSNR, SSIM, and LPIPS metrics, marking a significant advancement in single-shot 3D scene reconstruction.

Overview of PERF: Panoramic Neural Radiance Field from a Single Panorama

The paper introduces "PERF," a robust framework for training Panoramic Neural Radiance Fields (NeRF) from a single 360-degree panorama. The paper addresses significant challenges in extending NeRF for complex, real-world 360-degree scenes, which are often limited by occlusions and a narrow field of view. Traditional methods in novel view synthesis demand multiple viewpoint images, making single-shot NeRF training inherently difficult. The paper illustrates a methodological advancement towards reconstructing and generating 3D scenes with panoramic depth and visible scene data.

PERF seeks to circumvent the limitations of previous NeRF frameworks which primarily required dense multi-view datasets. By relying on only a single panoramic view, the research moves NeRF closer to practical applications like virtual tours, VR games, and telepresence by leveraging a new form of RGBD inpainting and a progressive inpainting-and-erasing strategy.

Methodology

The approach centers on transforming a single 2D panoramic image into a 3D environment. This is achieved through three core components: collaborative RGBD inpainting, panoramic depth estimation, and an innovative progressive inpainting-and-erasing strategy.

Collaborative RGBD Inpainting: The strategy integrates a Stable Diffusion model for advanced RGB inpainting, complementing it with a monocular depth estimator for depth map augmentation. This component effectively synthesizes previously occluded regions by leveraging pre-trained diffusion models over expansive datasets, which inherently encompass a diverse range of both seen and unseen scenarios.
Progressive Inpainting-and-Erasing: To maintain geometric consistency across varying views, this method intelligently identifies view-specific occlusion conflicts and selectively erases conflicting geometries in the rendered NeRF model. Such an approach enables panoramic roaming with the realistic generation of plausible unseen contexts while preserving fidelity to the visible dataset.
Panoramic Neural Radiance Field Training: The framework applies volume rendering techniques and depth supervision facilitated by panoramic depth prediction to refine NeRF training. The essential component here is a rigorous optimization framework that harmonizes both generated and real-world data into a cohesive 3D representation.

Experiments and Results

The authors conducted comprehensive experiments on the Replica dataset and a newly introduced "PERF-in-the-wild" dataset. The methodology demonstrated superior performance against established techniques, such as DS-NeRF, DietNeRF, and Omni-NeRF, articulated through higher fidelity in synthetic view synthesis (demonstrated by improved PSNR, SSIM, and LPIPS metrics).

Quantitative Outcomes: The paper reports a notable rise in the masked PSNR metric, reaffirming the model's proficiency in accurately generating occluded regions.

Qualitative Visualization: Figural comparisons illuminated the fidelity and semantic coherence of rendered views, showcasing smooth transitions across occluded spaces absent of the foggy or artifacted portrayals seen in competing methods.

Implications and Future Work

This paper propels single-view panoramic NeRF technologies towards practical real-world deployments by not only maintaining visual accuracy but also addressing the innate data scarcity problem related to multi-view image capture. Its contributions will likely pave the way for scalable and efficient 3D scene synthesis methods compatible with wider consumer applications.

Future research could focus on including semantic understanding layers that further enhance invisible region inference and improve environmental coherence in complex scenes. Another area worth exploring is the integration of dynamic scene elements, such as moving objects or temporal variations in scene lighting, into a similar framework.

The success of PERF marks a promising advancement in the domain of neural rendering, with potential extensions influencing adjacent fields such as computer graphics, virtual reality, and machine vision.

PDF Markdown

Related Papers

GitHub

Tweets

https://twitter.com/NarekHakobyan/status/1913623659404468716