DreamMat: Prompt-Driven PBR Synthesis
- DreamMat is a generative framework that produces relightable, editable, and physically accurate PBR materials directly from text descriptions.
- It integrates geometry- and light-aware diffusion with a hash-grid-encoded MLP to generate clean SVBRDF maps free from baked shading artifacts.
- Empirical validations demonstrate that DreamMat outperforms previous methods by ensuring prompt fidelity and realistic material behavior under varied lighting.
DreamMat is a generative framework designed to produce high-quality, relightable, and editable physically based rendering (PBR) materials directly from text descriptions for arbitrary 3D mesh geometry. The central innovation is the integration of geometry- and light-aware diffusion models to address well-documented failures of standard 2D text-to-image diffusion pipelines, particularly baked-in shading artifacts and ill-posed material decomposition, which impede realistic appearance when assets are relit in downstream graphics engines (Zhang et al., 27 May 2024).
1. Problem Formulation and Motivation
Digital asset pipelines demand SVBRDF (spatially-varying bidirectional reflectance distribution function) representations: typically albedo (), roughness (), and metallic () maps parameterized over each surface location . Existing solutions usually generate RGB textures conditioned on user prompts, then attempt to decompose these outputs into PBR parameters. However, since most large-scale 2D diffusion models are trained to produce final shaded appearances under unknown lighting, decomposition methods inherit baked-in shadows, highlights, and other illumination artifacts. The result is texture maps containing unwanted lighting entanglement—when applied to real-time lighting in game engines, this produces double lighting, color shifts, and physically implausible material responses.
Prior attempts, e.g., Fantasia3D, TEXTure, and Text2Tex, attempted direct score distillation of material parameters from diffusion outputs but failed to robustly separate lighting and intrinsic properties. The lack of lighting control and missing geometry constraints in such models create incorrect material decompositions.
2. Method Architecture and Material Representation
DreamMat models PBR material maps as SVBRDF functions via a hash-grid-encoded MLP (using Instant-NGP architecture):
where is a learnable function mapping surface coordinates to material property vectors. This parameterization supports detailed and spatially smooth materials suitable for high-resolution mesh assets.
The mesh is rendered using physically accurate rendering equations. Specifically, DreamMat employs Monte Carlo estimation of the rendering equation:
where is incident radiance from a known HDRI environment, is the Cook-Torrance BRDF, and is the surface normal. This yields decomposed diffuse and specular components:
where (Fresnel), (geometric attenuation), and (halfway vector) are standard microfacet model elements.
3. Geometry- and Light-aware Diffusion Guidance
A key advancement is the training of a diffusion model (via ControlNet) explicitly conditioned on both geometry and lighting. Conditioning includes:
- Geometry: Input depth maps and normal maps from the mesh.
- Lighting: Rendered references of six canonical materials under the current environment light, fully specifying illumination context.
- Text: User prompt describing material style or class.
This geometry- and light-aware model is trained on synthetic multi-light renderings from Objaverse, eliminating ambiguity in material decomposition by ensuring every diffusion output is generated under known lighting and mesh geometry. During material parameter optimization, rendered mesh images (matching the fixed lighting context) are distilled via this guided diffusion model.
4. Material Distillation and Loss Functions
DreamMat optimizes SVBRDF parameters using Classifier Score Distillation (CSD) loss. For a rendered image and time step , a noise is added to yield ; the diffusion model denoises to produce . The loss guiding material parameter gradients is:
CSD uses prompt-based penalties for qualities such as oversaturation, underexposure, or semantic errors, ensuring material attributes do not drift from prompt intent. A smoothness loss regularizes grid parameterization:
This reduces high-frequency noise in SVBRDF maps and ensures physically plausible material transitions.
5. Elimination of Baked-in Shading Effects
By conditioning all synthesis and distillation steps explicitly on lighting, DreamMat eliminates the entanglement of illumination with material. The fixed environment map, used consistently during diffusion guidance and rendering, prevents albedo maps from inheriting shadows or directional color shifts. Random selection of multiple environment lights during distillation ensures material robustness and prevents overfitting SVBRDFs to any single lighting scenario. This approach yields albedo, roughness, and metallic maps that are clean and physically correct when rendered under novel lighting conditions.
6. Quantitative and Qualitative Validation
Empirical tests benchmark DreamMat against TANGO, TEXTure, Text2Tex, and Fantasia3D methods. Visual comparison demonstrates that DreamMat's albedo maps are free of baked highlights and shadows, with plausible, spatially-varying roughness and metallicity. When rendered under new light (unseen at training), DreamMat assets retain correct appearance, unlike prior methods that produce double lighting or color distortions.
A user paper with 42 respondents shows DreamMat rated highest for overall quality, prompt fidelity, material map realism, disentanglement, and photorealistic rendering. Numeric evaluations include CLIP scores (semantic alignment), FID (distribution similarity to diffusion outputs), and ablation studies that confirm only geometry- and light-aware conditioning with CSD loss produces robust SVBRDFs.
7. Limitations and Practical Impact
DreamMat is bounded by the limitations of the underlying BRDF and diffusion models, such as insufficient modeling of transparency, subsurface scattering, and highly specular phenomena. Challenging cases may manifest less accurate roughness or metallic maps. Generation time is approximately 20 minutes per asset, subject to future improvements in diffusion model efficiency. Dependence on underlying text-to-image diffusion restricts prompt fidelity and diversity to the capabilities of these models.
Nevertheless, DreamMat outputs can be exported as high-resolution UV textures for immediate use in production graphics engines (Blender, Unreal Engine), fully supporting editable and relightable asset workflows. The method enables significant reduction in artist labor for PBR material creation and rapid iteration over style via prompt changes.
| Key Issue | DreamMat Solution | Application |
|---|---|---|
| Baked shading in albedo | Light-aware diffusion, fixed lighting | Relightable, artifact-free appearance |
| Disentanglement failure | Geometry + lighting conditioned ControlNet | Accurate SVBRDF decomposition |
| Noisy or rough output | CSD + smoothness loss, hash-grid representation | High-res, artifact-free UV PBR material maps |
DreamMat is the first method to provide robust, fully relightable and editable PBR materials from textual prompts for arbitrary geometry, enabled by geometry- and light-aware diffusion model guidance and tailored score distillation loss. This establishes a new benchmark paradigm for autonomous, prompt-driven material synthesis in digital content workflows (Zhang et al., 27 May 2024).