Learning Indoor Inverse Rendering with 3D Spatially-Varying Lighting (2109.06061v2)

Published 13 Sep 2021 in cs.CV

Abstract: In this work, we address the problem of jointly estimating albedo, normals, depth and 3D spatially-varying lighting from a single image. Most existing methods formulate the task as image-to-image translation, ignoring the 3D properties of the scene. However, indoor scenes contain complex 3D light transport where a 2D representation is insufficient. In this paper, we propose a unified, learning-based inverse rendering framework that formulates 3D spatially-varying lighting. Inspired by classic volume rendering techniques, we propose a novel Volumetric Spherical Gaussian representation for lighting, which parameterizes the exitant radiance of the 3D scene surfaces on a voxel grid. We design a physics based differentiable renderer that utilizes our 3D lighting representation, and formulates the energy-conserving image formation process that enables joint training of all intrinsic properties with the re-rendering constraint. Our model ensures physically correct predictions and avoids the need for ground-truth HDR lighting which is not easily accessible. Experiments show that our method outperforms prior works both quantitatively and qualitatively, and is capable of producing photorealistic results for AR applications such as virtual object insertion even for highly specular objects.

Authors (4)

Zian Wang (27 papers)
Jonah Philion (15 papers)
Sanja Fidler (184 papers)
Jan Kautz (215 papers)

Citations (70)

View on Semantic Scholar

Summary

Learning Indoor Inverse Rendering with 3D Spatially-Varying Lighting

The paper explores the intricate problem of inverse rendering within indoor environments, addressing the estimation of intrinsic scene properties such as albedo, normals, depth, and lighting, from a single image. The novelty lies in the development of a framework that shifts the paradigm from traditional image-to-image translation approaches to a more comprehensive 3D spatially-varying representation, incorporating both high dynamic range (HDR) and spatial light variance.

The authors introduce a sophisticated Volumetric Spherical Gaussian (VSG) model to encapsulate the lighting dynamics of 3D scenes. This approach surpasses conventional spherical harmonics and 2D spatially-varying representations, enabling detailed capture of view-dependent lighting variations. With a differentiable ray tracing renderer, the model adopts energy-conservation principles to derive physically-accurate lighting predictions without reliance on inaccessible HDR ground truths. The framework synergizes direct and joint prediction modules to iteratively refine intrinsic properties, ensuring spatial coherence and realistic renditions vital for applications in augmented reality (AR).

Experiments affirm the model's superior performance over established methods, yielding photorealistic renderings for AR tasks. Key numerical results include enhanced scale-invariant mean squared error and more accurate angular error metrics for surface normals, showcasing the framework's proficient delineation of scene characteristics and lighting effects. Notably, the approach facilitates AR applications like virtual object insertion, adeptly supporting complex indoor light transport.

The implications are profound, suggesting new pathways in inverse rendering research, particularly pushing towards enriched interpretations of scene illumination. Future progress may see extensions into multi-view systems and real-time applications, granted the foundational aspects established here.

PDF Markdown

YouTube

Show All Videos

Learning Indoor Inverse Rendering with 3D Spatially-Varying Lighting (2109.06061v2)

Summary

Learning Indoor Inverse Rendering with 3D Spatially-Varying Lighting

Related Papers

YouTube