Local Light Field Fusion: Practical View Synthesis with Prescriptive Sampling Guidelines (1905.00889v1)

Published 2 May 2019 in cs.CV and cs.GR

Abstract: We present a practical and robust deep learning solution for capturing and rendering novel views of complex real world scenes for virtual exploration. Previous approaches either require intractably dense view sampling or provide little to no guidance for how users should sample views of a scene to reliably render high-quality novel views. Instead, we propose an algorithm for view synthesis from an irregular grid of sampled views that first expands each sampled view into a local light field via a multiplane image (MPI) scene representation, then renders novel views by blending adjacent local light fields. We extend traditional plenoptic sampling theory to derive a bound that specifies precisely how densely users should sample views of a given scene when using our algorithm. In practice, we apply this bound to capture and render views of real world scenes that achieve the perceptual quality of Nyquist rate view sampling while using up to 4000x fewer views. We demonstrate our approach's practicality with an augmented reality smartphone app that guides users to capture input images of a scene and viewers that enable realtime virtual exploration on desktop and mobile platforms.

Citations (455)

View on Semantic Scholar

Summary

The paper presents a novel deep learning approach that synthesizes views through local light field fusion and multiplane images.
It extends plenoptic sampling theory to cut view sampling density by up to 4000x while maintaining high perceptual quality via PSNR, SSIM, and LPIPS metrics.
The method’s real-time implementation on desktop and mobile platforms enables practical AR/VR applications with prescriptive sampling guidelines.

An Examination of Local Light Field Fusion for Practical View Synthesis

The paper entitled "Local Light Field Fusion: Practical View Synthesis with Prescriptive Sampling Guidelines" presents a novel deep learning-based approach to view synthesis, which addresses several limitations of existing methodologies in capturing and rendering complex real-world scenes. The authors propose a method that effectively synthesizes new views by leveraging an irregular grid of sampled images, expanding each to a local light field, and blending the resultant fields. Their research introduces a robust mechanism by which to determine optimal sampling rates necessary for achieving desired rendering quality, effectively enhancing the applicability of view synthesis in various virtual reality and augmented reality scenarios.

Methodology

The core innovation within this paper lies in its introduction of a multilayered scene representation termed the multiplane image (MPI). By promoting each sampled view to a layered format, the method allows for effective local light field representation, which significantly enhances rendering accuracy and quality. The MPI is generated using a convolutional neural network (CNN) that processes multiple input views to produce RGB and alpha planes, which can render nearby viewpoints via simple alpha compositing, enabling real-time novel view synthesis.

The authors extend the traditional plenoptic sampling theory to establish a theoretical framework for determining how densely views must be sampled. By utilizing MPIs, the view sampling density necessary to achieve an equivalent perceptual quality to continuous Nyquist rate sampling can be reduced by a factor of up to $4000\times$ . This reduction in required samples is achieved through a methodical partitioning of scene depth along with leveraging estimated opacities to handle occlusions effectively.

Results and Implications

The paper quantitatively demonstrates the efficacy of the proposed Local Light Field Fusion method by conducting extensive evaluations against past methodologies such as Soft3D, unstructured lumigraph rendering (ULR), and standard light field interpolation (LFI) techniques. Through metrics such as PSNR, SSIM, and the LPIPS perceptual quality measure, the paper showcases superior performance in rendering quality across diverse datasets, especially in handling challenging non-Lambertian reflections and complex geometric scenarios.

The research further culminates in practical implementations with real-time rendering capabilities on desktop and mobile platforms, evidenced by the development of a smartphone application. This area of the research highlights the effectiveness of their augmented reality capture application that guides users in effectively sampling real-world scenes using their prescriptive guidelines.

Future Directions

This paper opens several avenues for future research and development. Enhancements to model architectures that further capitalize on the data-efficient MPI framework could drive even greater performance gains. Additionally, integrating more comprehensive sampling strategies and improving real-time processing capabilities on constrained hardware will bring about more robust applications in the emerging fields of AR and VR.

Addressing the challenges of scaling to higher resolutions and handling dynamically changing scenes could significantly impact practical deployments, making them more adaptable to the evolving landscape of immersive media technologies. Furthermore, the exploration of novel metric learning strategies that are better aligned with human visual perception may provide superior guidance for training view synthesis models.

In conclusion, the Local Light Field Fusion presented in this paper marks a significant stride towards practical, efficient, and high-fidelity view synthesis. Through a keen focus on reducing sampling density while maintaining quality, it provides a versatile toolset for developers and researchers aiming to expand the possibilities of immersive visual technologies.

PDF Markdown