Intrinsic Image Diffusion for Indoor Single-view Material Estimation (2312.12274v2)

Published 19 Dec 2023 in cs.CV, cs.AI, and cs.GR

Abstract: We present Intrinsic Image Diffusion, a generative model for appearance decomposition of indoor scenes. Given a single input view, we sample multiple possible material explanations represented as albedo, roughness, and metallic maps. Appearance decomposition poses a considerable challenge in computer vision due to the inherent ambiguity between lighting and material properties and the lack of real datasets. To address this issue, we advocate for a probabilistic formulation, where instead of attempting to directly predict the true material properties, we employ a conditional generative model to sample from the solution space. Furthermore, we show that utilizing the strong learned prior of recent diffusion models trained on large-scale real-world images can be adapted to material estimation and highly improves the generalization to real images. Our method produces significantly sharper, more consistent, and more detailed materials, outperforming state-of-the-art methods by $1.5dB$ on PSNR and by $45\%$ better FID score on albedo prediction. We demonstrate the effectiveness of our approach through experiments on both synthetic and real-world datasets.

References (43)

Citations (7)

View on Semantic Scholar

Summary

The paper introduces a probabilistic generative model that diffuses intrinsic images to estimate material properties with high fidelity.
It leverages a BRDF representation and over 50,000 rendered images to provide detailed and consistent predictions for indoor scenes.
Quantitative evaluations show superior performance with improved PSNR, SSIM, LPIPS, and FID scores compared to existing methods.

Intrinsic Image Diffusion for Material Estimation

Introduction to Appearance Decomposition

Appearance decomposition is a critical but challenging area in computer vision. It involves separating an image into its fundamental components: material properties and lighting. This process is essential for numerous applications, including content editing, virtual reality, and relighting of scenes. The main challenge lies in the fact that visual appearances result from the complex interplay between lighting and material properties, leading to inherent ambiguity in separating these components.

Probabilistic Approach to Estimation

Traditional methods have adopted a deterministic approach, aiming to provide a single solution, which often results in loss of high-frequency details and averaged-out solutions that fail to represent the true complexity of materials. The paper introduces Intrinsic Image Diffusion, a conditional generative model that embraces the probabilistic nature of the appearance decomposition problem. By generating multiple solutions, the model allows for a more comprehensive exploration of the solution space. This method leverages recent diffusion models, which have been pre-trained with large-scale real-world images to better generalize across real and synthetic data.

Material Representation and Dataset

The material properties are represented using a BRDF model which includes albedo, roughness, and metallic properties, fundamentally used in computer graphics. The model was trained using a rendered dataset of over 50000 images with corresponding material maps, providing high-fidelity training data. This dataset, coupled with the model’s ability to adapt the image prior from pre-trained diffusion models, results in predictions that are more detailed, consistent, and faithful to the actual materials when compared to existing approaches.

Methodology and Evaluation

The training pipeline of the Intrinsic Image Diffusion model involves noise prediction based on the input image through a defined diffusion process, and it leverages the known strong prior of pre-trained diffusion models. During inference, the model can sample multiple potential explanations for a single input view, predicting albedo and BRDF features. The paper quantitatively and qualitatively evaluates the new model on both synthetic and real-world datasets, showing that it outperforms state-of-the-art methods, achieving better PSNR, SSIM, LPIPS, and FID scores.

Furthermore, the paper discusses using the model to optimize lighting in indoor scenes, benefitting from the consistent and precise material predictions produced by the model. This optimization process can reproduce detailed and controllable lighting, improving the scene's realism.

Conclusion and Potential

The paper concludes by highlighting the Intrinsic Image Diffusion model's significant advancements in single-view material estimation. By using a probabilistic formulation and tapping into the learned priors of diffusion models, the technique opens up new possibilities for accurate and detailed material estimation. The approach also paves the way for future work, including weak supervision and expanded inverse rendering frameworks, making the field of appearance decomposition richer for new exploration and applications.

Intrinsic Image Diffusion for Indoor Single-view Material Estimation (2312.12274v2)

Summary

Intrinsic Image Diffusion for Material Estimation

Introduction to Appearance Decomposition

Probabilistic Approach to Estimation

Material Representation and Dataset

Methodology and Evaluation

Conclusion and Potential

Tweets

YouTube

Intrinsic Image Diffusion for Indoor Single-view Material Estimation (2312.12274v2)

Summary

Intrinsic Image Diffusion for Material Estimation

Introduction to Appearance Decomposition

Probabilistic Approach to Estimation

Material Representation and Dataset

Methodology and Evaluation

Conclusion and Potential

Related Papers

Tweets

YouTube