Extracting Triangular 3D Models, Materials, and Lighting From Images (2111.12503v5)

Published 24 Nov 2021 in cs.CV and cs.GR

Abstract: We present an efficient method for joint optimization of topology, materials and lighting from multi-view image observations. Unlike recent multi-view reconstruction approaches, which typically produce entangled 3D representations encoded in neural networks, we output triangle meshes with spatially-varying materials and environment lighting that can be deployed in any traditional graphics engine unmodified. We leverage recent work in differentiable rendering, coordinate-based networks to compactly represent volumetric texturing, alongside differentiable marching tetrahedrons to enable gradient-based optimization directly on the surface mesh. Finally, we introduce a differentiable formulation of the split sum approximation of environment lighting to efficiently recover all-frequency lighting. Experiments show our extracted models used in advanced scene editing, material decomposition, and high quality view interpolation, all running at interactive rates in triangle-based renderers (rasterizers and path tracers). Project website: https://nvlabs.github.io/nvdiffrec/ .

Citations (321)

View on Semantic Scholar

Summary

The paper introduces an end-to-end differentiable rendering method, Deep Marching Tetrahedra (DMTet), for jointly optimizing topology, materials, and lighting.
It employs a split-sum approximation for high-frequency lighting, reducing computational costs while enhancing real-time performance.
Comparative evaluations demonstrate competitive view interpolation and immediate integration into interactive renderers.

Analyzing the Extraction of 3D Models, Materials, and Lighting from Images

The paper presents an advanced method for the reconstruction of 3D models with spatially-varying materials and lighting from multi-view image datasets. This method addresses several limitations of traditional approaches in 3D reconstruction, particularly focusing on the practicality of integrating the output directly into game engines and rendering platforms.

The primary innovation lies in the integration of differentiable rendering with a method termed Deep Marching Tetrahedra (DMTet). By learning topology, material characteristics, and lighting conditions collectively, the method surpasses conventional pipelines that typically require separate stages for topology optimization, texture assignment, and lighting integration. This holistic approach is facilitated by leveraging a deformable tetrahedral grid that transforms into a triangular mesh using a differentiable marching tetrahedra layer, thus allowing direct end-to-end optimization of the surface mesh based on image-based loss functions.

A significant contribution of this work is the differentiation of the split sum approximation for environment lighting. While previous methods relied on spherical harmonics or spherical Gaussians to approximate lighting, these approaches often incurred high computational costs or struggled with high-frequency lighting scenarios. The split-sum approximation offers a computationally efficient alternative that aligns well with high-frequency interactive applications by using pre-filtered environment maps and lookup tables. This enhancement proves crucial in maintaining performance without elevating the complexity of the shading model.

The method also introduces a robust volumetric texturing framework aided by multilayer perception (MLP) networks. Volumetric textures address the challenges associated with dynamically changing topology and enable the effective mapping of material properties regardless of transformation or deformation.

Experimental Evaluation and Results

The paper meticulously evaluates its approach by comparing it to state-of-the-art methods such as NeRF and NeuS. It is observed that the proposed method performs comparably well on several synthetic and real-world datasets in terms of view interpolation. Although there is a slight trade-off in PSNR values compared to fully-focused neural representation methods, the benefits of explicit geometry and fast rendering are substantial.

The empirical results represent a comprehensive analysis of the system's capability in tasks such as advanced scene editing, relighting, and simulation within interactive renderers. The technique's ability to generate directly deployable outputs in standard rasterization engines marks a significant departure from entrenched volumetric and implicit techniques that necessitate cumbersome additional steps for geometry extraction.

Implications and Future Prospects

The research advances the practicability of integrating sophisticated computer vision techniques into the domains of gaming and virtual simulation. By enhancing the compatibility and ease of deployment of the 3D content created, this method opens avenues for cost-effective and high-fidelity graphical applications. It has implications for industries ranging from entertainment and digital media to augmented and virtual reality, where customized and rapid interactive content generation is highly desirable.

The approach could potentially extend to more nuanced lighting and material interactions, exploring differentiated complex global illumination models involving reflections and refractions. Future research directions could focus on integrating advanced differential path tracing to further enhance realism, though balancing this with computational demands remains a challenge.

In summation, this paper offers a formidable framework for the real-time, resource-efficient extraction of triangulated 3D models with spatially-varying materials and lighting, presenting significant contributions to the automation of 3D content creation and its seamless integration into existing rendering ecosystems.

PDF Markdown

Related Papers

GitHub

https://nvlabs.github.io/nvdiffrec/