Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
139 tokens/sec
GPT-4o
47 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

DreamMat: High-quality PBR Material Generation with Geometry- and Light-aware Diffusion Models (2405.17176v1)

Published 27 May 2024 in cs.GR and cs.AI

Abstract: 2D diffusion model, which often contains unwanted baked-in shading effects and results in unrealistic rendering effects in the downstream applications. Generating Physically Based Rendering (PBR) materials instead of just RGB textures would be a promising solution. However, directly distilling the PBR material parameters from 2D diffusion models still suffers from incorrect material decomposition, such as baked-in shading effects in albedo. We introduce DreamMat, an innovative approach to resolve the aforementioned problem, to generate high-quality PBR materials from text descriptions. We find out that the main reason for the incorrect material distillation is that large-scale 2D diffusion models are only trained to generate final shading colors, resulting in insufficient constraints on material decomposition during distillation. To tackle this problem, we first finetune a new light-aware 2D diffusion model to condition on a given lighting environment and generate the shading results on this specific lighting condition. Then, by applying the same environment lights in the material distillation, DreamMat can generate high-quality PBR materials that are not only consistent with the given geometry but also free from any baked-in shading effects in albedo. Extensive experiments demonstrate that the materials produced through our methods exhibit greater visual appeal to users and achieve significantly superior rendering quality compared to baseline methods, which are preferable for downstream tasks such as game and film production.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (112)
  1. Single-Image 3D Human Digitization with Shape-guided Diffusion. In SIGGRAPH Asia. 1–11.
  2. Jonathan T Barron and Jitendra Malik. 2014. Shape, illumination, and reflectance from shading. TPAMI 37, 8 (2014), 1670–1687.
  3. Neural reflectance fields for appearance acquisition. arXiv preprint arXiv:2008.03824 (2020).
  4. Deep 3d capture: Geometry and reflectance from sparse multi-view images. In CVPR.
  5. Nerd: Neural reflectance decomposition from image collections. In CVPR.
  6. Neural-pil: Neural pre-integrated lighting for reflectance decomposition. In NeurIPS.
  7. Brent Burley and Walt Disney Animation Studios. 2012. Physically-based shading at disney. In SIGGRAPH.
  8. TexFusion: Synthesizing 3D Textures with Text-Guided Image Diffusion Models. In ICCV.
  9. Scenetex: High-quality texture synthesis for indoor scenes via diffusion priors. arXiv preprint arXiv:2311.17261 (2023).
  10. Text2Tex: Text-driven Texture Synthesis via Diffusion Models. In ICCV.
  11. Fantasia3D: Disentangling Geometry and Appearance for High-quality Text-to-3D Content Creation. In ICCV.
  12. TANGO: Text-driven Photorealistic and Robust 3D Stylization via Lighting Decomposition. In NeurIPS.
  13. L-Tracing: Fast Light Visibility Estimation on Neural Surfaces by Sphere Tracing. In ECCV.
  14. Structure from Duplicates: Neural Inverse Graphics from a Pile of Objects. arXiv preprint arXiv:2401.05236 (2024).
  15. Multi-view 3d reconstruction of a texture-less smooth surface of unknown generic reflectance. In CVPR.
  16. Geometry Aware Texturing. In SIGGRAPH Asia. 1–2.
  17. Robert L Cook and Kenneth E. Torrance. 1982. A reflectance model for computer graphics. ACM Transactions on Graphics (ToG) 1, 1 (1982), 7–24.
  18. Scannet: Richly-annotated 3d reconstructions of indoor scenes. In CVPR.
  19. Pandora: Polarization-aided neural decomposition of radiance. In ECCV.
  20. Objaverse: A universe of annotated 3d objects. In CVPR.
  21. DIP: Differentiable Interreflection-aware Physics-based Inverse Rendering. arXiv preprint arXiv:2212.04705 (2022).
  22. Deep polarization imaging for 3D shape and SVBRDF acquisition. In CVPR.
  23. Deep inverse rendering for high-resolution SVBRDF estimation from an arbitrary number of images. ACM Transactions on Graphics (ToG) 38, 4 (2019), 1–15.
  24. Relightable 3D Gaussian: Real-time Point Cloud Relighting with BRDF Decomposition and Ray Tracing. arXiv preprint arXiv:2311.16043 (2023).
  25. MaterialGAN: reflectance capture using a generative SVBRDF model. ACM Transactions on Graphics (ToG) 39, 6 (2020), 1–13.
  26. threestudio: A unified framework for 3D content generation. https://github.com/threestudio-project/threestudio.
  27. Shape, light, and material decomposition from images using Monte Carlo rendering and denoising. NeurIPS.
  28. CLIPScore: A Reference-free Evaluation Metric for Image Captioning. In EMNLP.
  29. Gans trained by a two time-scale update rule converge to a local nash equilibrium. Advances in neural information processing systems 30 (2017).
  30. Denoising diffusion probabilistic models. In NeurIPS.
  31. Text2room: Extracting textured 3d meshes from 2d text-to-image models. arXiv preprint arXiv:2303.11989 (2023).
  32. Humannorm: Learning normal diffusion model for high-quality and realistic 3d human generation. arXiv preprint arXiv:2310.01406 (2023).
  33. GaussianShader: 3D Gaussian Splatting with Shading Functions for Reflective Surfaces. arXiv preprint arXiv:2311.17977 (2023).
  34. TensoIR: Tensorial Inverse Rendering. In CVPR.
  35. James T. Kajiya. 1986. The rendering equation. In SIGGRAPH.
  36. Brian Karis and Epic Games. 2013. Real shading in unreal engine 4. Proc. Physically Based Shading Theory Practice 4, 3 (2013), 1.
  37. Noise-free score distillation. arXiv preprint arXiv:2310.17590 (2023).
  38. 3D Gaussian Splatting for Real-Time Radiance Field Rendering. ACM Transactions on Graphics (ToG) 42, 4 (July 2023).
  39. Julian Knodt and Xifeng Gao. 2023. Consistent Mesh Diffusion. arXiv preprint arXiv:2312.00971 (2023).
  40. Intrinsic Image Diffusion for Single-view Material Estimation. arXiv preprint arXiv:2312.12274 (2023).
  41. NeROIC: Neural Rendering of Objects from Online Image Collections. In SIGGRAPH.
  42. Content creation for a 3D game with Maya and Unity 3D. Institute of Computer Graphics and Algorithms, Vienna University of Technology 6 (2011), 124.
  43. EucliDreamer: Fast and High-Quality Texturing for 3D Models with Stable Diffusion Depth. arXiv preprint arXiv:2311.15573 (2023).
  44. NeISF: Neural Incident Stokes Field for Geometry and Material Estimation. arXiv preprint arXiv:2311.13187 (2023).
  45. BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation. In ICML.
  46. Junxuan Li and Hongdong Li. 2022. Neural Reflectance for Shape Recovery with Shadow Handling. In CVPR.
  47. SweetDreamer: Aligning Geometric Priors in 2D Diffusion for Consistent Text-to-3D. arxiv:2310.02596 (2023).
  48. Inverse rendering for complex indoor scenes: Shape, spatially-varying lighting and svbrdf from a single image. In CVPR.
  49. Learning to reconstruct shape and spatially-varying reflectance from a single image. In SIGGRAPH Asia.
  50. GS-IR: 3D Gaussian Splatting for Inverse Rendering. arXiv preprint arXiv:2311.16473 (2023).
  51. Magic3D: High-Resolution Text-to-3D Content Creation. In CVPR.
  52. One-2-3-45: Any single image to 3d mesh in 45 seconds without per-shape optimization. arXiv preprint arXiv:2306.16928 (2023).
  53. Zero-1-to-3: Zero-shot one image to 3d object. In ICCV.
  54. SyncDreamer: Learning to Generate Multiview-consistent Images from a Single-view Image. arXiv preprint arXiv:2309.03453 (2023).
  55. NeRO: Neural Geometry and BRDF Reconstruction of Reflective Objects from Multiview Images. In SIGGRAPH.
  56. Text-Guided Texturing by Synchronized Multi-View Diffusion. arXiv preprint arXiv:2311.12891 (2023).
  57. UniDream: Unifying Diffusion Priors for Relightable Text-to-3D Generation. arXiv preprint arXiv:2312.08754 (2023).
  58. Unified shape and svbrdf recovery using differentiable monte carlo rendering. In Computer Graphics Forum, Vol. 40. Wiley Online Library, 101–113.
  59. Diffusion Posterior Illumination for Ambiguity-aware Inverse Rendering. ACM Transactions on Graphics (TOG) 42, 6 (2023), 1–14.
  60. X-Mesh: Towards Fast and Accurate Text-driven 3D Stylization via Dynamic Textual Guidance. In ICCV.
  61. Latent-NeRF for Shape-Guided Generation of 3D Shapes and Textures. arXiv preprint arXiv:2211.07600 (2022).
  62. Nerf: Representing scenes as neural radiance fields for view synthesis. In ECCV.
  63. Instant neural graphics primitives with a multiresolution hash encoding. ACM Transactions on Graphics (ToG) 41, 4 (2022), 1–15.
  64. Extracting Triangular 3D Models, Materials, and Lighting From Images. In CVPR.
  65. Practical SVBRDF Acquisition of 3D Objects with Unstructured Flash Photography. ACM Transactions on Graphics (ToG) 37, 6, Article 267 (2018), 12 pages.
  66. Mitsuba 2: A Retargetable Forward and Inverse Renderer. ACM Transactions on Graphics (ToG) 38, 6, Article 203 (2019), 17 pages.
  67. ControlDreamer: Stylized 3D Generation with Multi-View ControlNet. arXiv preprint arXiv:2312.01129 (2023).
  68. DreamFusion: Text-to-3D using 2D Diffusion. In ICLR.
  69. Magic123: One Image to High-Quality 3D Object Generation Using Both 2D and 3D Diffusion Priors. arXiv preprint arXiv:2306.17843 (2023).
  70. Learning transferable visual models from natural language supervision. In ICML.
  71. Texture: Text-guided texturing of 3d shapes. In SIGGRAPH.
  72. High-resolution image synthesis with latent diffusion models. In CVPR.
  73. Sam Sartor and Pieter Peers. 2023. MatFusion: A Generative Diffusion Model for SVBRDF Capture. In SIGGRAPH Asia.
  74. Sketchfab. [n. d.]. Sketchfab - The best 3D viewer on the web. https://www.sketchfab.com
  75. Nerv: Neural reflectance and visibility fields for relighting and view synthesis. In CVPR.
  76. Neural-PBIR reconstruction of shape, material, and illumination. In CVPR.
  77. Dreamcraft3d: Hierarchical 3d generation with bootstrapped diffusion prior. arXiv preprint arXiv:2310.16818 (2023).
  78. DINAR: Diffusion Inpainting of Neural Textures for One-Shot Human Avatars. In ICCV.
  79. MVDiffusion: Enabling Holistic Multi-view Image Generation with Correspondence-Aware Diffusion. (2023).
  80. Zhibin Tang and Tiantong He. 2023. Text-guided High-definition Consistency Texture Model. arXiv preprint arXiv:2305.05901 (2023).
  81. Diffusion with Forward Models: Solving Stochastic Inverse Problems Without Direct Supervision. In NeurIPS.
  82. Neural BSSRDF: Object Appearance Representation Including Heterogeneous Subsurface Scattering. arXiv preprint arXiv:2312.15711 (2023).
  83. ControlMat: A Controlled Generative Approach to Material Capture. arXiv preprint arXiv:2309.01700 (2023).
  84. MatFuse: Controllable Material Generation with Diffusion Models. arXiv preprint arXiv:2308.11408 (2023).
  85. ProlificDreamer: High-Fidelity and Diverse Text-to-3D Generation with Variational Score Distillation. In NeurIPS.
  86. AnyHome: Open-Vocabulary Generation of Structured and Textured 3D Homes. arXiv preprint arXiv:2312.06644 (2023).
  87. De-rendering 3d objects in the wild. In CVPR.
  88. Recovering shape and spatially-varying surface reflectance under unknown illumination. ACM Transactions on Graphics (ToG) 35, 6 (2016), 1–12.
  89. MATLABER: Material-Aware Text-to-3D via LAtent BRDF auto-EncodeR. arXiv preprint arXiv:2308.09278 (2023).
  90. DreamSpace: Dreaming Your Room Space with Text-Driven Panoramic Texture Propagation. arXiv preprint arXiv:2310.13119 (2023).
  91. PS-NeRF: Neural Inverse Rendering for Multi-view Photometric Stereo. In ECCV.
  92. SIRe-IR: Inverse Rendering for BRDF Reconstruction with Shadow and Illumination Removal in High-Illuminance Scenes. arXiv preprint arXiv:2310.13030 (2023).
  93. Neilf: Neural incident light field for physically-based material estimation. In ECCV.
  94. Multiview Neural Surface Reconstruction by Disentangling Geometry and Appearance. In NeurIPS.
  95. Intrinsicnerf: Learning intrinsic neural radiance fields for editable novel view synthesis. In ICCV.
  96. Jounathan Young. 2021. xatlas. https://github.com/jpcy/xatlas.git
  97. Paint-it: Text-to-Texture Synthesis via Deep Convolutional Texture Map Optimization and Physically-Based Rendering. arXiv preprint arXiv:2312.11360 (2023).
  98. Texture Generation on 3D Meshes with Point-UV Diffusion. In ICCV.
  99. Text-to-3d with classifier score distillation. arXiv preprint arXiv:2310.19415 (2023).
  100. Paint3D: Paint Anything 3D with Lighting-Less Texture Diffusion Models. arXiv preprint arXiv:2312.13913 (2023).
  101. Repaint123: Fast and High-quality One Image to 3D Generation with Progressive Controllable 2D Repainting. arXiv preprint arXiv:2312.13271 (2023).
  102. NeILF++: Inter-Reflectable Light Fields for Geometry and Material Estimation. arXiv preprint arXiv:2303.17147 (2023).
  103. Iron: Inverse rendering by optimizing neural sdfs and materials from photometric images. In CVPR.
  104. PhySG: Inverse Rendering with Spherical Gaussians for Physics-based Material Editing and Relighting. In CVPR.
  105. Adding Conditional Control to Text-to-Image Diffusion Models. In ICCV.
  106. Nerfactor: Neural factorization of shape and reflectance under an unknown illumination. ACM Transactions on Graphics (ToG) 40, 6 (2021), 1–18.
  107. Modeling Indirect Illumination for Inverse Rendering. In CVPR.
  108. Polarimetric multi-view inverse rendering. TPAMI (2022).
  109. TileGen: Tileable, Controllable Material Generation and Capture. In SIGGRAPH Asia.
  110. Zhizhuo Zhou and Shubham Tulsiani. 2023. SparseFusion: Distilling View-conditioned Diffusion for 3D Reconstruction. In CVPR.
  111. I2-SDF: Intrinsic Indoor Scene Reconstruction and Editing via Raytracing in Neural SDFs. In CVPR.
  112. Junzhe Zhu and Peiye Zhuang. 2023. HiFA: High-fidelity Text-to-3D Generation with Advanced Diffusion Guidance. arXiv preprint arXiv:2305.18766 (2023).
Citations (5)

Summary

  • The paper introduces a novel geometry- and light-aware diffusion framework that generates high-quality PBR materials from text by eliminating baked shading effects.
  • It employs a modified Classifier Score Distillation loss and a hash-grid-based material representation to ensure consistency with object geometry and realistic lighting.
  • Quantitative comparisons and user studies demonstrate DreamMat’s superior performance in overall quality, fidelity to text prompts, and effective material-light disentanglement.

DreamMat: High-quality PBR Material Generation with Geometry- and Light-aware Diffusion Models

Introduction

In computer graphics, the creation of high-quality object appearances significantly enhances rendering realism, especially in applications such as movies, games, and AR/VR. The generation of physically-based rendering (PBR) materials from text descriptions represents a progressive direction in addressing the labor-intensive and expertise-demanding process of object appearance creation.

The paper "DreamMat: High-quality PBR Material Generation with Geometry- and Light-aware Diffusion Models" introduces a novel approach to ameliorate the challenges associated with generating PBR materials. The method, dubbed DreamMat, leverages geometry- and light-aware diffusion models to refine material generation, thereby eliminating baked-in shading effects within albedo maps and enhancing the overall rendering quality.

Methodology

The DreamMat framework addresses several fundamental challenges inherent in distilling PBR materials from 2D diffusion models. Traditional 2D diffusion models, while powerful in generating final shading colors, often fail at accurate material parameter decomposition, leading to unrealistic rendering effects caused by baked-in shading highlights and shadows. DreamMat’s key innovations lie in two critical stages: the introduction of a geometry- and light-aware 2D diffusion model and the randomized application of environment lights during the material distillation process.

Material Representation and Rendering

The PBR materials are represented using a hash-grid-based approach, parameterized by albedo, roughness, and metallic values. The rendering equation, split into diffuse and specular components, employs Monte Carlo sampling techniques to render the object’s appearance under varying lighting conditions. By fixing the environmental lights to known HDR images, the ill-posed nature of the material decomposition task is mitigated, ensuring that the material generation aligns with realistic lighting contexts.

Distillation with Geometry- and Light-aware Diffusion Models

The distillation process employs a modified Score-Distillation Sampling (SDS) loss, termed Classifier Score Distillation (CSD) loss, to refine the generated materials. The CSD loss leverages both positive and negative text prompts, enhancing the fidelity of the generated appearances relative to the given descriptions. The incorporation of geometric conditions (depth and normal maps) and light conditions (predefined materials under a specified environment light) in the diffusion model’s training ensures that the generated images remain consistent with both the object’s geometry and the lighting environment.

Training and Implementation

The geometry- and light-aware diffusion model is finetuned using a substantial dataset of rendered images from the Objaverse, enriched with conditional images based on different geometry and lighting conditions. The material generation is performed on a mesh by distilling the finetuned diffusion model in conjunction with rendered images under various environment lights.

Results

Extensive qualitative and quantitative comparisons with state-of-the-art methods, including TANGO, TEXTure, Text2Tex, and Fantasia3D, demonstrate DreamMat’s superiority in generating high-quality, light-consistent materials. The user paper, involving 42 respondents, highlighted DreamMat’s excellence across several metrics such as overall quality, fidelity to text prompts, and the efficacy of material-light disentanglement.

Implications and Future Work

DreamMat sets a precedence for integrating robust diffusion models with physically-informed rendering constraints, thus paving the way for more sophisticated and intuitive tools for material generation in computer graphics. Future directions may explore enhancing the generalizability of this approach to more complex scenes and diverse material types. Further research may also investigate optimizing the efficacy and computational efficiency of the distillation process.

Conclusion

The DreamMat methodology introduces a significant advancement in the domain of text-guided PBR material generation, adeptly handling the nuanced challenges of lighting and geometric consistency. The framework not only demonstrates superior performance over existing methods but also significantly broadens the spectrum of practical applications, ensuring high-quality material generation suitable for modern rendering engines and robust enough for detailed scene compositions.