Intrinsic Image Diffusion for Indoor Single-view Material Estimation (2312.12274v2)
Abstract: We present Intrinsic Image Diffusion, a generative model for appearance decomposition of indoor scenes. Given a single input view, we sample multiple possible material explanations represented as albedo, roughness, and metallic maps. Appearance decomposition poses a considerable challenge in computer vision due to the inherent ambiguity between lighting and material properties and the lack of real datasets. To address this issue, we advocate for a probabilistic formulation, where instead of attempting to directly predict the true material properties, we employ a conditional generative model to sample from the solution space. Furthermore, we show that utilizing the strong learned prior of recent diffusion models trained on large-scale real-world images can be adapted to material estimation and highly improves the generalization to real images. Our method produces significantly sharper, more consistent, and more detailed materials, outperforming state-of-the-art methods by $1.5dB$ on PSNR and by $45\%$ better FID score on albedo prediction. We demonstrate the effectiveness of our approach through experiments on both synthetic and real-world datasets.
- Inverse path tracing for joint material and lighting estimation. In IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2019, Long Beach, CA, USA, June 16-20, 2019, pages 2447–2456. Computer Vision Foundation / IEEE, 2019.
- Intrinsic scene properties from a single RGB-D image. In 2013 IEEE Conference on Computer Vision and Pattern Recognition, Portland, OR, USA, June 23-28, 2013, pages 17–24. IEEE Computer Society, 2013.
- Shape, illumination, and reflectance from shading. IEEE Trans. Pattern Anal. Mach. Intell., 37(8):1670–1687, 2015.
- Intrinsic images in the wild. ACM Trans. Graph., 33(4):159:1–159:12, 2014.
- A simple model for intrinsic image decomposition with depth cues. In IEEE International Conference on Computer Vision, ICCV 2013, Sydney, Australia, December 1-8, 2013, pages 241–248. IEEE Computer Society, 2013.
- MAIR: multi-view attention inverse rendering with 3d spatially-varying lighting estimation. CoRR, abs/2303.12368, 2023.
- Omnidata: A scalable pipeline for making multi-task mid-level vision datasets from 3d scans. In 2021 IEEE/CVF International Conference on Computer Vision, ICCV 2021, Montreal, QC, Canada, October 10-17, 2021, pages 10766–10776. IEEE, 2021.
- Intrinsic images by clustering. In Computer graphics forum, pages 1415–1424. Wiley Online Library, 2012.
- Ground truth dataset and baseline evaluations for intrinsic image algorithms. In IEEE 12th International Conference on Computer Vision, ICCV 2009, Kyoto, Japan, September 27 - October 4, 2009, pages 2335–2342. IEEE Computer Society, 2009.
- Denoising diffusion probabilistic models. In Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, NeurIPS 2020, December 6-12, 2020, virtual, 2020.
- Openclip, 2021. If you use this software, please cite it as below.
- Image-to-image translation with conditional adversarial networks. CVPR, 2017.
- Adam: A method for stochastic optimization. In 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings, 2015.
- Variational diffusion models. CoRR, abs/2107.00630, 2021.
- Shading annotations in the wild. In 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017, pages 850–859. IEEE Computer Society, 2017.
- Lightness and retinex theory. Journal of the Optical Society of America, 61:1–11, 1971.
- Cgintrinsics: Better intrinsic image decomposition through physically-based rendering. In Computer Vision - ECCV 2018 - 15th European Conference, Munich, Germany, September 8-14, 2018, Proceedings, Part III, pages 381–399. Springer, 2018a.
- Learning intrinsic image decomposition from watching the world. In 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pages 9039–9048. Computer Vision Foundation / IEEE Computer Society, 2018b.
- Learning to reconstruct shape and spatially-varying reflectance from a single image. ACM Trans. Graph., 37(6):269, 2018.
- Inverse rendering for complex indoor scenes: Shape, spatially-varying lighting and SVBRDF from a single image. In 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2020, Seattle, WA, USA, June 13-19, 2020, pages 2472–2481. Computer Vision Foundation / IEEE, 2020.
- Physically-based editing of indoor scene lighting from a single image. In Computer Vision - ECCV 2022 - 17th European Conference, Tel Aviv, Israel, October 23-27, 2022, Proceedings, Part VI, pages 555–572. Springer, 2022.
- Zero-1-to-3: Zero-shot one image to 3d object. CoRR, abs/2303.11328, 2023.
- Decoupled weight decay regularization. In 7th International Conference on Learning Representations, ICLR 2019, New Orleans, LA, USA, May 6-9, 2019. OpenReview.net, 2019.
- Material and lighting reconstruction for complex indoor scenes with texture-space differentiable rendering. In 32nd Eurographics Symposium on Rendering, EGSR 2021 - Digital Library Only Track, Saarbrücken, Germany, June 29 - July 2, 2021, pages 73–84. Eurographics Association, 2021.
- Free-viewpoint indoor neural relighting from multi-view stereo. ACM Trans. Graph., 40(5):194:1–194:18, 2021.
- State of the art on diffusion models for visual computing. CoRR, abs/2310.07204, 2023.
- Learning transferable visual models from natural language supervision. In ICML, 2021.
- Umat: Uncertainty-aware single image high resolution material capture. In IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2023, Vancouver, BC, Canada, June 17-24, 2023, pages 5764–5774. IEEE, 2023.
- High-resolution image synthesis with latent diffusion models. CoRR, abs/2112.10752, 2021.
- U-net: Convolutional networks for biomedical image segmentation. In Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III, pages 234–241. Springer, 2015.
- LAION-5B: an open large-scale dataset for training next generation image-text models. In NeurIPS, 2022.
- Deep unsupervised learning using nonequilibrium thermodynamics. In Proceedings of the 32nd International Conference on Machine Learning, ICML 2015, Lille, France, 6-11 July 2015, pages 2256–2265. JMLR.org, 2015.
- Denoising diffusion implicit models. arXiv:2010.02502, 2020a.
- Score-based generative modeling through stochastic differential equations. CoRR, abs/2011.13456, 2020b.
- Microfacet models for refraction through rough surfaces. In Proceedings of the Eurographics Symposium on Rendering Techniques, Grenoble, France, 2007, pages 195–206. Eurographics Association, 2007.
- Learning indoor inverse rendering with 3d spatially-varying lighting. In 2021 IEEE/CVF International Conference on Computer Vision, ICCV 2021, Montreal, QC, Canada, October 10-17, 2021, pages 12518–12527. IEEE, 2021.
- Measured albedo in the wild: Filling the gap in intrinsics evaluation. COPR, 2023.
- Diffusion models: A comprehensive survey of methods and applications. CoRR, abs/2209.00796, 2022.
- Scannet++: A high-fidelity dataset of 3d indoor scenes. In Proceedings of the International Conference on Computer Vision (ICCV), 2023.
- Adding conditional control to text-to-image diffusion models. CoRR, abs/2302.05543, 2023.
- Learning-based inverse rendering of complex indoor scenes with differentiable monte carlo raytracing. In SIGGRAPH Asia 2022 Conference Papers, SA 2022, Daegu, Republic of Korea, December 6-9, 2022, pages 6:1–6:8. ACM, 2022a.
- I22{}^{\mbox{2}}start_FLOATSUPERSCRIPT 2 end_FLOATSUPERSCRIPT-sdf: Intrinsic indoor scene reconstruction and editing via raytracing in neural sdfs. CoRR, abs/2303.07634, 2023.
- Irisformer: Dense vision transformers for single-image inverse rendering in indoor scenes. In IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2022, New Orleans, LA, USA, June 18-24, 2022, pages 2812–2821. IEEE, 2022b.