Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
194 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

RGB$\leftrightarrow$X: Image decomposition and synthesis using material- and lighting-aware diffusion models (2405.00666v1)

Published 1 May 2024 in cs.CV and cs.GR

Abstract: The three areas of realistic forward rendering, per-pixel inverse rendering, and generative image synthesis may seem like separate and unrelated sub-fields of graphics and vision. However, recent work has demonstrated improved estimation of per-pixel intrinsic channels (albedo, roughness, metallicity) based on a diffusion architecture; we call this the RGB$\rightarrow$X problem. We further show that the reverse problem of synthesizing realistic images given intrinsic channels, X$\rightarrow$RGB, can also be addressed in a diffusion framework. Focusing on the image domain of interior scenes, we introduce an improved diffusion model for RGB$\rightarrow$X, which also estimates lighting, as well as the first diffusion X$\rightarrow$RGB model capable of synthesizing realistic images from (full or partial) intrinsic channels. Our X$\rightarrow$RGB model explores a middle ground between traditional rendering and generative models: we can specify only certain appearance properties that should be followed, and give freedom to the model to hallucinate a plausible version of the rest. This flexibility makes it possible to use a mix of heterogeneous training datasets, which differ in the available channels. We use multiple existing datasets and extend them with our own synthetic and real data, resulting in a model capable of extracting scene properties better than previous work and of generating highly realistic images of interior scenes.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (50)
  1. Recovering intrinsic scene characteristics. Comput. vis. syst 2, 3-26 (1978), 2.
  2. Intrinsic images in the wild. ACM Transactions on Graphics (TOG) 33, 4 (2014), 1–12.
  3. Stylegan knows normal, depth, albedo, and more. Advances in Neural Information Processing Systems 36 (2024).
  4. MiDaS v3.1 – A Model Zoo for Robust Monocular Relative Depth Estimation. arXiv preprint arXiv:2307.14460 (2023).
  5. InstructPix2Pix: Learning to Follow Image Editing Instructions. In CVPR.
  6. Chris Careaga and Yağız Aksoy. 2023. Intrinsic Image Decomposition via Ordinal Shading. ACM Trans. Graph. (2023).
  7. Generative Models: What do they know? Do they know things? Let’s find out! arXiv preprint arXiv:2311.17137 (2023).
  8. A Survey on Intrinsic Images: Delving Deep into Lambert and Beyond. International Journal of Computer Vision 130, 3 (2022), 836–868.
  9. Generative adversarial nets. Advances in neural information processing systems 27 (2014).
  10. OutCast: Single Image Relighting with Cast Shadows. Computer Graphics Forum 43 (2022).
  11. Ground-truth dataset and baseline evaluations for intrinsic image algorithms. In International Conference on Computer Vision. 2335–2342. https://doi.org/10.1109/ICCV.2009.5459428
  12. A review on generative adversarial networks: Algorithms, theory, and applications. IEEE transactions on knowledge and data engineering 35, 4 (2021), 3313–3332.
  13. Jonathan Ho and Tim Salimans. 2022. Classifier-free diffusion guidance. arXiv preprint arXiv:2207.12598 (2022).
  14. Lora: Low-rank adaptation of large language models. arXiv preprint arXiv:2106.09685 (2021).
  15. Composer: Creative and controllable image synthesis with composable conditions. arXiv preprint arXiv:2302.09778 (2023).
  16. Jay-Artist. 2012. Country-Kitchen Cycles. https://blendswap.com/blend/5156
  17. A style-based generator architecture for generative adversarial networks. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 4401–4410.
  18. Analyzing and improving the image quality of stylegan. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 8110–8119.
  19. Intrinsic Image Diffusion for Single-view Material Estimation. In arxiv.
  20. Edwin H Land and John J McCann. 1971. Lightness and retinex theory. Josa 61, 1 (1971), 1–11.
  21. Exploiting Diffusion Prior for Generalizable Pixel-Level Semantic Prediction. arXiv preprint arXiv:2311.18832 (2023).
  22. BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models. arXiv:2301.12597 [cs.CV]
  23. Inverse rendering for complex indoor scenes: Shape, spatially-varying lighting and svbrdf from a single image. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2475–2484.
  24. Physically-Based Editing of Indoor Scene Lighting from a Single Image. In ECCV 2022. 555–572.
  25. OpenRooms: An Open Framework for Photorealistic Indoor Scene Datasets. In 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 7186–7195. https://doi.org/10.1109/CVPR46437.2021.00711
  26. Common diffusion noise schedules and sample steps are flawed. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision. 5404–5411.
  27. Ilya Loshchilov and Frank Hutter. 2017. Decoupled weight decay regularization. arXiv preprint arXiv:1711.05101 (2017).
  28. Deep Shading: Convolutional Neural Networks for Screen Space Shading. Comput. Graph. Forum 36, 4 (jul 2017), 65–78.
  29. NVIDIA. 2020. NVIDIA OptiX™ AI-Accelerated Denoiser. https://developer.nvidia.com/optix-denoiser
  30. Recent progress on generative adversarial networks (GANs): A survey. IEEE access 7 (2019), 36322–36333.
  31. Total relighting: learning to relight portraits for background replacement. ACM Trans. Graph. 40, 4 (2021), 43–1.
  32. Matt Pharr and Greg Humphreys. 2004. Physically Based Rendering: From Theory to Implementation. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA.
  33. Ariadna Quattoni and Antonio Torralba. 2009. Recognizing indoor scenes. In 2009 IEEE conference on computer vision and pattern recognition. IEEE, 413–420.
  34. Learning transferable visual models from natural language supervision. In International conference on machine learning. PMLR, 8748–8763.
  35. Hierarchical Text-Conditional Image Generation with CLIP Latents. arXiv:2204.06125 [cs.CV]
  36. Towards Robust Monocular Depth Estimation: Mixing Datasets for Zero-Shot Cross-Dataset Transfer. IEEE Transactions on Pattern Analysis and Machine Intelligence 44, 3 (2022).
  37. Hypersim: A Photorealistic Synthetic Dataset for Holistic Indoor Scene Understanding. In International Conference on Computer Vision (ICCV) 2021.
  38. High-resolution image synthesis with latent diffusion models. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 10684–10695.
  39. Nerf for outdoor scene relighting. In European Conference on Computer Vision. Springer, 615–631.
  40. Tim Salimans and Jonathan Ho. 2022. Progressive distillation for fast sampling of diffusion models. arXiv preprint arXiv:2202.00512 (2022).
  41. Alchemist: Parametric Control of Material Properties with Diffusion Models. arXiv:2312.02970 [cs.CV]
  42. Denoising diffusion implicit models. arXiv preprint arXiv:2010.02502 (2020).
  43. Manu Mathew Thomas and Angus G. Forbes. 2018. Deep Illumination: Approximating Dynamic Global Illumination with Generative Adversarial Network. arXiv:1710.09834 [cs.GR]
  44. Microfacet models for refraction through rough surfaces (EGSR’07). 195–206.
  45. Pvt v2: Improved baselines with pyramid vision transformer. Computational Visual Media 8, 3 (2022), 415–424.
  46. Neural Fields meet Explicit Geometric Representations for Inverse Rendering of Urban Scenes. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
  47. Self-supervised outdoor scene relighting. In Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XXII 16. Springer, 84–101.
  48. Adding Conditional Control to Text-to-Image Diffusion Models. arXiv:2302.05543 [cs.CV]
  49. Learning-Based Inverse Rendering of Complex Indoor Scenes with Differentiable Monte Carlo Raytracing. In SIGGRAPH Asia 2022 Conference Papers. ACM, Article 6, 8 pages. https://doi.org/10.1145/3550469.3555407
  50. IRISformer: Dense Vision Transformers for Single-Image Inverse Rendering in Indoor Scenes. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 2822–2831.
Citations (12)

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets