DreamMat: High-quality PBR Material Generation with Geometry- and Light-aware Diffusion Models (2405.17176v1)
Abstract: 2D diffusion model, which often contains unwanted baked-in shading effects and results in unrealistic rendering effects in the downstream applications. Generating Physically Based Rendering (PBR) materials instead of just RGB textures would be a promising solution. However, directly distilling the PBR material parameters from 2D diffusion models still suffers from incorrect material decomposition, such as baked-in shading effects in albedo. We introduce DreamMat, an innovative approach to resolve the aforementioned problem, to generate high-quality PBR materials from text descriptions. We find out that the main reason for the incorrect material distillation is that large-scale 2D diffusion models are only trained to generate final shading colors, resulting in insufficient constraints on material decomposition during distillation. To tackle this problem, we first finetune a new light-aware 2D diffusion model to condition on a given lighting environment and generate the shading results on this specific lighting condition. Then, by applying the same environment lights in the material distillation, DreamMat can generate high-quality PBR materials that are not only consistent with the given geometry but also free from any baked-in shading effects in albedo. Extensive experiments demonstrate that the materials produced through our methods exhibit greater visual appeal to users and achieve significantly superior rendering quality compared to baseline methods, which are preferable for downstream tasks such as game and film production.
- Single-Image 3D Human Digitization with Shape-guided Diffusion. In SIGGRAPH Asia. 1–11.
- Jonathan T Barron and Jitendra Malik. 2014. Shape, illumination, and reflectance from shading. TPAMI 37, 8 (2014), 1670–1687.
- Neural reflectance fields for appearance acquisition. arXiv preprint arXiv:2008.03824 (2020).
- Deep 3d capture: Geometry and reflectance from sparse multi-view images. In CVPR.
- Nerd: Neural reflectance decomposition from image collections. In CVPR.
- Neural-pil: Neural pre-integrated lighting for reflectance decomposition. In NeurIPS.
- Brent Burley and Walt Disney Animation Studios. 2012. Physically-based shading at disney. In SIGGRAPH.
- TexFusion: Synthesizing 3D Textures with Text-Guided Image Diffusion Models. In ICCV.
- Scenetex: High-quality texture synthesis for indoor scenes via diffusion priors. arXiv preprint arXiv:2311.17261 (2023).
- Text2Tex: Text-driven Texture Synthesis via Diffusion Models. In ICCV.
- Fantasia3D: Disentangling Geometry and Appearance for High-quality Text-to-3D Content Creation. In ICCV.
- TANGO: Text-driven Photorealistic and Robust 3D Stylization via Lighting Decomposition. In NeurIPS.
- L-Tracing: Fast Light Visibility Estimation on Neural Surfaces by Sphere Tracing. In ECCV.
- Structure from Duplicates: Neural Inverse Graphics from a Pile of Objects. arXiv preprint arXiv:2401.05236 (2024).
- Multi-view 3d reconstruction of a texture-less smooth surface of unknown generic reflectance. In CVPR.
- Geometry Aware Texturing. In SIGGRAPH Asia. 1–2.
- Robert L Cook and Kenneth E. Torrance. 1982. A reflectance model for computer graphics. ACM Transactions on Graphics (ToG) 1, 1 (1982), 7–24.
- Scannet: Richly-annotated 3d reconstructions of indoor scenes. In CVPR.
- Pandora: Polarization-aided neural decomposition of radiance. In ECCV.
- Objaverse: A universe of annotated 3d objects. In CVPR.
- DIP: Differentiable Interreflection-aware Physics-based Inverse Rendering. arXiv preprint arXiv:2212.04705 (2022).
- Deep polarization imaging for 3D shape and SVBRDF acquisition. In CVPR.
- Deep inverse rendering for high-resolution SVBRDF estimation from an arbitrary number of images. ACM Transactions on Graphics (ToG) 38, 4 (2019), 1–15.
- Relightable 3D Gaussian: Real-time Point Cloud Relighting with BRDF Decomposition and Ray Tracing. arXiv preprint arXiv:2311.16043 (2023).
- MaterialGAN: reflectance capture using a generative SVBRDF model. ACM Transactions on Graphics (ToG) 39, 6 (2020), 1–13.
- threestudio: A unified framework for 3D content generation. https://github.com/threestudio-project/threestudio.
- Shape, light, and material decomposition from images using Monte Carlo rendering and denoising. NeurIPS.
- CLIPScore: A Reference-free Evaluation Metric for Image Captioning. In EMNLP.
- Gans trained by a two time-scale update rule converge to a local nash equilibrium. Advances in neural information processing systems 30 (2017).
- Denoising diffusion probabilistic models. In NeurIPS.
- Text2room: Extracting textured 3d meshes from 2d text-to-image models. arXiv preprint arXiv:2303.11989 (2023).
- Humannorm: Learning normal diffusion model for high-quality and realistic 3d human generation. arXiv preprint arXiv:2310.01406 (2023).
- GaussianShader: 3D Gaussian Splatting with Shading Functions for Reflective Surfaces. arXiv preprint arXiv:2311.17977 (2023).
- TensoIR: Tensorial Inverse Rendering. In CVPR.
- James T. Kajiya. 1986. The rendering equation. In SIGGRAPH.
- Brian Karis and Epic Games. 2013. Real shading in unreal engine 4. Proc. Physically Based Shading Theory Practice 4, 3 (2013), 1.
- Noise-free score distillation. arXiv preprint arXiv:2310.17590 (2023).
- 3D Gaussian Splatting for Real-Time Radiance Field Rendering. ACM Transactions on Graphics (ToG) 42, 4 (July 2023).
- Julian Knodt and Xifeng Gao. 2023. Consistent Mesh Diffusion. arXiv preprint arXiv:2312.00971 (2023).
- Intrinsic Image Diffusion for Single-view Material Estimation. arXiv preprint arXiv:2312.12274 (2023).
- NeROIC: Neural Rendering of Objects from Online Image Collections. In SIGGRAPH.
- Content creation for a 3D game with Maya and Unity 3D. Institute of Computer Graphics and Algorithms, Vienna University of Technology 6 (2011), 124.
- EucliDreamer: Fast and High-Quality Texturing for 3D Models with Stable Diffusion Depth. arXiv preprint arXiv:2311.15573 (2023).
- NeISF: Neural Incident Stokes Field for Geometry and Material Estimation. arXiv preprint arXiv:2311.13187 (2023).
- BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation. In ICML.
- Junxuan Li and Hongdong Li. 2022. Neural Reflectance for Shape Recovery with Shadow Handling. In CVPR.
- SweetDreamer: Aligning Geometric Priors in 2D Diffusion for Consistent Text-to-3D. arxiv:2310.02596 (2023).
- Inverse rendering for complex indoor scenes: Shape, spatially-varying lighting and svbrdf from a single image. In CVPR.
- Learning to reconstruct shape and spatially-varying reflectance from a single image. In SIGGRAPH Asia.
- GS-IR: 3D Gaussian Splatting for Inverse Rendering. arXiv preprint arXiv:2311.16473 (2023).
- Magic3D: High-Resolution Text-to-3D Content Creation. In CVPR.
- One-2-3-45: Any single image to 3d mesh in 45 seconds without per-shape optimization. arXiv preprint arXiv:2306.16928 (2023).
- Zero-1-to-3: Zero-shot one image to 3d object. In ICCV.
- SyncDreamer: Learning to Generate Multiview-consistent Images from a Single-view Image. arXiv preprint arXiv:2309.03453 (2023).
- NeRO: Neural Geometry and BRDF Reconstruction of Reflective Objects from Multiview Images. In SIGGRAPH.
- Text-Guided Texturing by Synchronized Multi-View Diffusion. arXiv preprint arXiv:2311.12891 (2023).
- UniDream: Unifying Diffusion Priors for Relightable Text-to-3D Generation. arXiv preprint arXiv:2312.08754 (2023).
- Unified shape and svbrdf recovery using differentiable monte carlo rendering. In Computer Graphics Forum, Vol. 40. Wiley Online Library, 101–113.
- Diffusion Posterior Illumination for Ambiguity-aware Inverse Rendering. ACM Transactions on Graphics (TOG) 42, 6 (2023), 1–14.
- X-Mesh: Towards Fast and Accurate Text-driven 3D Stylization via Dynamic Textual Guidance. In ICCV.
- Latent-NeRF for Shape-Guided Generation of 3D Shapes and Textures. arXiv preprint arXiv:2211.07600 (2022).
- Nerf: Representing scenes as neural radiance fields for view synthesis. In ECCV.
- Instant neural graphics primitives with a multiresolution hash encoding. ACM Transactions on Graphics (ToG) 41, 4 (2022), 1–15.
- Extracting Triangular 3D Models, Materials, and Lighting From Images. In CVPR.
- Practical SVBRDF Acquisition of 3D Objects with Unstructured Flash Photography. ACM Transactions on Graphics (ToG) 37, 6, Article 267 (2018), 12 pages.
- Mitsuba 2: A Retargetable Forward and Inverse Renderer. ACM Transactions on Graphics (ToG) 38, 6, Article 203 (2019), 17 pages.
- ControlDreamer: Stylized 3D Generation with Multi-View ControlNet. arXiv preprint arXiv:2312.01129 (2023).
- DreamFusion: Text-to-3D using 2D Diffusion. In ICLR.
- Magic123: One Image to High-Quality 3D Object Generation Using Both 2D and 3D Diffusion Priors. arXiv preprint arXiv:2306.17843 (2023).
- Learning transferable visual models from natural language supervision. In ICML.
- Texture: Text-guided texturing of 3d shapes. In SIGGRAPH.
- High-resolution image synthesis with latent diffusion models. In CVPR.
- Sam Sartor and Pieter Peers. 2023. MatFusion: A Generative Diffusion Model for SVBRDF Capture. In SIGGRAPH Asia.
- Sketchfab. [n. d.]. Sketchfab - The best 3D viewer on the web. https://www.sketchfab.com
- Nerv: Neural reflectance and visibility fields for relighting and view synthesis. In CVPR.
- Neural-PBIR reconstruction of shape, material, and illumination. In CVPR.
- Dreamcraft3d: Hierarchical 3d generation with bootstrapped diffusion prior. arXiv preprint arXiv:2310.16818 (2023).
- DINAR: Diffusion Inpainting of Neural Textures for One-Shot Human Avatars. In ICCV.
- MVDiffusion: Enabling Holistic Multi-view Image Generation with Correspondence-Aware Diffusion. (2023).
- Zhibin Tang and Tiantong He. 2023. Text-guided High-definition Consistency Texture Model. arXiv preprint arXiv:2305.05901 (2023).
- Diffusion with Forward Models: Solving Stochastic Inverse Problems Without Direct Supervision. In NeurIPS.
- Neural BSSRDF: Object Appearance Representation Including Heterogeneous Subsurface Scattering. arXiv preprint arXiv:2312.15711 (2023).
- ControlMat: A Controlled Generative Approach to Material Capture. arXiv preprint arXiv:2309.01700 (2023).
- MatFuse: Controllable Material Generation with Diffusion Models. arXiv preprint arXiv:2308.11408 (2023).
- ProlificDreamer: High-Fidelity and Diverse Text-to-3D Generation with Variational Score Distillation. In NeurIPS.
- AnyHome: Open-Vocabulary Generation of Structured and Textured 3D Homes. arXiv preprint arXiv:2312.06644 (2023).
- De-rendering 3d objects in the wild. In CVPR.
- Recovering shape and spatially-varying surface reflectance under unknown illumination. ACM Transactions on Graphics (ToG) 35, 6 (2016), 1–12.
- MATLABER: Material-Aware Text-to-3D via LAtent BRDF auto-EncodeR. arXiv preprint arXiv:2308.09278 (2023).
- DreamSpace: Dreaming Your Room Space with Text-Driven Panoramic Texture Propagation. arXiv preprint arXiv:2310.13119 (2023).
- PS-NeRF: Neural Inverse Rendering for Multi-view Photometric Stereo. In ECCV.
- SIRe-IR: Inverse Rendering for BRDF Reconstruction with Shadow and Illumination Removal in High-Illuminance Scenes. arXiv preprint arXiv:2310.13030 (2023).
- Neilf: Neural incident light field for physically-based material estimation. In ECCV.
- Multiview Neural Surface Reconstruction by Disentangling Geometry and Appearance. In NeurIPS.
- Intrinsicnerf: Learning intrinsic neural radiance fields for editable novel view synthesis. In ICCV.
- Jounathan Young. 2021. xatlas. https://github.com/jpcy/xatlas.git
- Paint-it: Text-to-Texture Synthesis via Deep Convolutional Texture Map Optimization and Physically-Based Rendering. arXiv preprint arXiv:2312.11360 (2023).
- Texture Generation on 3D Meshes with Point-UV Diffusion. In ICCV.
- Text-to-3d with classifier score distillation. arXiv preprint arXiv:2310.19415 (2023).
- Paint3D: Paint Anything 3D with Lighting-Less Texture Diffusion Models. arXiv preprint arXiv:2312.13913 (2023).
- Repaint123: Fast and High-quality One Image to 3D Generation with Progressive Controllable 2D Repainting. arXiv preprint arXiv:2312.13271 (2023).
- NeILF++: Inter-Reflectable Light Fields for Geometry and Material Estimation. arXiv preprint arXiv:2303.17147 (2023).
- Iron: Inverse rendering by optimizing neural sdfs and materials from photometric images. In CVPR.
- PhySG: Inverse Rendering with Spherical Gaussians for Physics-based Material Editing and Relighting. In CVPR.
- Adding Conditional Control to Text-to-Image Diffusion Models. In ICCV.
- Nerfactor: Neural factorization of shape and reflectance under an unknown illumination. ACM Transactions on Graphics (ToG) 40, 6 (2021), 1–18.
- Modeling Indirect Illumination for Inverse Rendering. In CVPR.
- Polarimetric multi-view inverse rendering. TPAMI (2022).
- TileGen: Tileable, Controllable Material Generation and Capture. In SIGGRAPH Asia.
- Zhizhuo Zhou and Shubham Tulsiani. 2023. SparseFusion: Distilling View-conditioned Diffusion for 3D Reconstruction. In CVPR.
- I2-SDF: Intrinsic Indoor Scene Reconstruction and Editing via Raytracing in Neural SDFs. In CVPR.
- Junzhe Zhu and Peiye Zhuang. 2023. HiFA: High-fidelity Text-to-3D Generation with Advanced Diffusion Guidance. arXiv preprint arXiv:2305.18766 (2023).