Papers
Topics
Authors
Recent
Assistant
AI Research Assistant
Well-researched responses based on relevant abstracts and paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses.
Gemini 2.5 Flash
Gemini 2.5 Flash 175 tok/s
Gemini 2.5 Pro 54 tok/s Pro
GPT-5 Medium 38 tok/s Pro
GPT-5 High 37 tok/s Pro
GPT-4o 108 tok/s Pro
Kimi K2 180 tok/s Pro
GPT OSS 120B 447 tok/s Pro
Claude Sonnet 4.5 36 tok/s Pro
2000 character limit reached

DreamMat: Prompt-Driven PBR Synthesis

Updated 3 November 2025
  • DreamMat is a generative framework that produces relightable, editable, and physically accurate PBR materials directly from text descriptions.
  • It integrates geometry- and light-aware diffusion with a hash-grid-encoded MLP to generate clean SVBRDF maps free from baked shading artifacts.
  • Empirical validations demonstrate that DreamMat outperforms previous methods by ensuring prompt fidelity and realistic material behavior under varied lighting.

DreamMat is a generative framework designed to produce high-quality, relightable, and editable physically based rendering (PBR) materials directly from text descriptions for arbitrary 3D mesh geometry. The central innovation is the integration of geometry- and light-aware diffusion models to address well-documented failures of standard 2D text-to-image diffusion pipelines, particularly baked-in shading artifacts and ill-posed material decomposition, which impede realistic appearance when assets are relit in downstream graphics engines (Zhang et al., 27 May 2024).

1. Problem Formulation and Motivation

Digital asset pipelines demand SVBRDF (spatially-varying bidirectional reflectance distribution function) representations: typically albedo (c\mathbf{c}), roughness (α\alpha), and metallic (mm) maps parameterized over each surface location p\mathbf{p}. Existing solutions usually generate RGB textures conditioned on user prompts, then attempt to decompose these outputs into PBR parameters. However, since most large-scale 2D diffusion models are trained to produce final shaded appearances under unknown lighting, decomposition methods inherit baked-in shadows, highlights, and other illumination artifacts. The result is texture maps containing unwanted lighting entanglement—when applied to real-time lighting in game engines, this produces double lighting, color shifts, and physically implausible material responses.

Prior attempts, e.g., Fantasia3D, TEXTure, and Text2Tex, attempted direct score distillation of material parameters from diffusion outputs but failed to robustly separate lighting and intrinsic properties. The lack of lighting control and missing geometry constraints in such models create incorrect material decompositions.

2. Method Architecture and Material Representation

DreamMat models PBR material maps as SVBRDF functions via a hash-grid-encoded MLP (using Instant-NGP architecture):

(c,α,m)=Γθ(p)(\mathbf{c},\, \alpha,\, m) = \Gamma_\theta(\mathbf{p})

where Γθ\Gamma_\theta is a learnable function mapping surface coordinates p\mathbf{p} to material property vectors. This parameterization supports detailed and spatially smooth materials suitable for high-resolution mesh assets.

The mesh is rendered using physically accurate rendering equations. Specifically, DreamMat employs Monte Carlo estimation of the rendering equation:

L(p,ωo)=ΩLi(ωi)f(ωi,ωo)(ωin)dωiL(\mathbf{p}, \omega_o) = \int_\Omega L_i(\omega_i)\, f(\omega_i, \omega_o)\, (\omega_i \cdot n)\, d\omega_i

where Li(ωi)L_i(\omega_i) is incident radiance from a known HDRI environment, f(ωi,ωo)f(\omega_i, \omega_o) is the Cook-Torrance BRDF, and nn is the surface normal. This yields decomposed diffuse and specular components:

Ldiffuse=cNdi=1NdL(ωi)L_\text{diffuse} = \frac{\mathbf{c}}{N_d} \sum_{i=1}^{N_d} L(\omega_i)

Lspecular=1Nsi=1NsF(c,m)G(ωo,ωi,n,α)(ωoh)(nh)(nωo)L(ωi)L_\text{specular} = \frac{1}{N_s} \sum_{i=1}^{N_s} \frac{F(\mathbf{c}, m)\, G(\omega_o, \omega_i, n, \alpha)\, (\omega_o \cdot h)}{(n \cdot h)(n \cdot \omega_o)}\, L(\omega_i)

where FF (Fresnel), GG (geometric attenuation), and hh (halfway vector) are standard microfacet model elements.

3. Geometry- and Light-aware Diffusion Guidance

A key advancement is the training of a diffusion model (via ControlNet) explicitly conditioned on both geometry and lighting. Conditioning includes:

  • Geometry: Input depth maps and normal maps from the mesh.
  • Lighting: Rendered references of six canonical materials under the current environment light, fully specifying illumination context.
  • Text: User prompt describing material style or class.

This geometry- and light-aware model is trained on synthetic multi-light renderings from Objaverse, eliminating ambiguity in material decomposition by ensuring every diffusion output is generated under known lighting and mesh geometry. During material parameter optimization, rendered mesh images (matching the fixed lighting context) are distilled via this guided diffusion model.

4. Material Distillation and Loss Functions

DreamMat optimizes SVBRDF parameters using Classifier Score Distillation (CSD) loss. For a rendered image II and time step tt, a noise ϵt\epsilon_t is added to yield ItI_t; the diffusion model denoises ItI_t to produce ItI'_t. The loss guiding material parameter gradients is:

θLDistill=Et[δ(It)Iθ]\nabla_{\theta} \mathcal{L}_\text{Distill} = \mathbb{E}_t \left[ \delta(I_t) \frac{\partial I}{\partial \theta } \right]

δ(It)=ItI\delta(I_t) = I'_t - I

CSD uses prompt-based penalties for qualities such as oversaturation, underexposure, or semantic errors, ensuring material attributes do not drift from prompt intent. A smoothness loss regularizes grid parameterization:

Lsmooth=Γθ(p)Γθ(p+ϵ)2\mathcal{L}_\text{smooth} = \| \Gamma_\theta(\mathbf{p}) - \Gamma_\theta(\mathbf{p} + \epsilon) \|^2

This reduces high-frequency noise in SVBRDF maps and ensures physically plausible material transitions.

5. Elimination of Baked-in Shading Effects

By conditioning all synthesis and distillation steps explicitly on lighting, DreamMat eliminates the entanglement of illumination with material. The fixed environment map, used consistently during diffusion guidance and rendering, prevents albedo maps from inheriting shadows or directional color shifts. Random selection of multiple environment lights during distillation ensures material robustness and prevents overfitting SVBRDFs to any single lighting scenario. This approach yields albedo, roughness, and metallic maps that are clean and physically correct when rendered under novel lighting conditions.

6. Quantitative and Qualitative Validation

Empirical tests benchmark DreamMat against TANGO, TEXTure, Text2Tex, and Fantasia3D methods. Visual comparison demonstrates that DreamMat's albedo maps are free of baked highlights and shadows, with plausible, spatially-varying roughness and metallicity. When rendered under new light (unseen at training), DreamMat assets retain correct appearance, unlike prior methods that produce double lighting or color distortions.

A user paper with 42 respondents shows DreamMat rated highest for overall quality, prompt fidelity, material map realism, disentanglement, and photorealistic rendering. Numeric evaluations include CLIP scores (semantic alignment), FID (distribution similarity to diffusion outputs), and ablation studies that confirm only geometry- and light-aware conditioning with CSD loss produces robust SVBRDFs.

7. Limitations and Practical Impact

DreamMat is bounded by the limitations of the underlying BRDF and diffusion models, such as insufficient modeling of transparency, subsurface scattering, and highly specular phenomena. Challenging cases may manifest less accurate roughness or metallic maps. Generation time is approximately 20 minutes per asset, subject to future improvements in diffusion model efficiency. Dependence on underlying text-to-image diffusion restricts prompt fidelity and diversity to the capabilities of these models.

Nevertheless, DreamMat outputs can be exported as high-resolution UV textures for immediate use in production graphics engines (Blender, Unreal Engine), fully supporting editable and relightable asset workflows. The method enables significant reduction in artist labor for PBR material creation and rapid iteration over style via prompt changes.


Key Issue DreamMat Solution Application
Baked shading in albedo Light-aware diffusion, fixed lighting Relightable, artifact-free appearance
Disentanglement failure Geometry + lighting conditioned ControlNet Accurate SVBRDF decomposition
Noisy or rough output CSD + smoothness loss, hash-grid representation High-res, artifact-free UV PBR material maps

DreamMat is the first method to provide robust, fully relightable and editable PBR materials from textual prompts for arbitrary geometry, enabled by geometry- and light-aware diffusion model guidance and tailored score distillation loss. This establishes a new benchmark paradigm for autonomous, prompt-driven material synthesis in digital content workflows (Zhang et al., 27 May 2024).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)
Forward Email Streamline Icon: https://streamlinehq.com

Follow Topic

Get notified by email when new papers are published related to DreamMat.