Papers
Topics
Authors
Recent
Assistant
AI Research Assistant
Well-researched responses based on relevant abstracts and paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses.
Gemini 2.5 Flash
Gemini 2.5 Flash 175 tok/s
Gemini 2.5 Pro 54 tok/s Pro
GPT-5 Medium 38 tok/s Pro
GPT-5 High 37 tok/s Pro
GPT-4o 108 tok/s Pro
Kimi K2 180 tok/s Pro
GPT OSS 120B 447 tok/s Pro
Claude Sonnet 4.5 36 tok/s Pro
2000 character limit reached

Paint-it: Generative PBR Texture Synthesis

Updated 3 November 2025
  • Paint-it is a generative system for creating full PBR texture maps from text prompts, integrating deep neural re-parameterization with diffusion-based guidance.
  • It employs a U-Net architecture and score distillation sampling (SDS) to optimize textures, ensuring semantic control and suppression of noisy gradients.
  • The system delivers relightable, artifact-free PBR maps for diverse applications like gaming, AR/VR, and film, outperforming prior methods in quality and user scores.

Paint-it is a generative system for text-driven, high-fidelity physically-based rendering (PBR) texture synthesis on 3D meshes, integrating deep convolutional neural re-parameterization with modern diffusion-based guidance. It directly produces full sets of PBR texture maps (diffuse, roughness, metalness, normal) from a text prompt, leveraging an overview-through-optimization approach that combines Score Distillation Sampling (SDS) with a U-Net-based neural parameterization pipeline and image-based physically based rendering. Paint-it addresses key practical and methodological bottlenecks in mesh texturing by enabling semantic-level user control, suppressing optimization artifacts induced by noisy gradient signals from diffusion models, and supporting direct material-level manipulation in downstream engines.

1. System Architecture and Texture Parameterization

Paint-it replaces pixel-level parameterization of mesh UV textures with a deep convolutional neural network (DCNN), specifically a randomly initialized U-Net with skip connections, denoted as Tθ()\mathcal{T}_\theta(\cdot). Instead of directly updating each texel, the texture maps are the outputs of this DCNN given a fixed spatial noise field: [Kθd,Kθrm,Kθn]=Tθ(η)[K^{\text{d}}_\theta, K^{\text{rm}}_\theta, K^{\text{n}}_\theta] = \mathcal{T}_\theta(\eta) where:

  • KθdK^{\text{d}}_\theta: 3-channel diffuse texture
  • KθrmK^{\text{rm}}_\theta: 2-channel roughness and metalness
  • KθnK^{\text{n}}_\theta: 3-channel normal map
  • η\eta: fixed 2D noise input

This neural re-parameterization supports structured, frequency-ordered optimization (spectral bias), naturally filters out high-frequency noise from gradient updates, and aligns with the hierarchical patterns of real-world material properties. The U-Net prior regularizes the texture maps, providing smooth spatial correlations and robustness to noisy or underspecified text prompts.

2. Optimization Objective via Score Distillation Sampling (SDS)

Core supervision for text match and visual realism arises from Score Distillation Sampling (SDS), as introduced in DreamFusion. At each optimization step:

  1. The mesh M\mathcal{M} is rendered (physically based rendering, multi-view, environment lighting) with current textures, producing images IθI_\theta.
  2. A pre-trained diffusion model (frozen weights) receives IθI_\theta along with the text prompt yy and a sampled noise level tt.
  3. The SDS update is formulated as

θLSDS(ϕ,Iθ)=Et,ϵ[(ϵ^ϕ(Iθ,t;y,t)ϵ)Iθθ]\nabla_{\theta} \mathcal{L}_\text{SDS}(\phi, I_\theta) = \mathbb{E}_{t,\epsilon}\Big[ \big(\hat{\epsilon}_{\phi}(I_{\theta,t}; y, t) - \epsilon\big) \frac{\partial I_{\theta}}{\partial \theta} \Big]

where ϵ^ϕ\hat{\epsilon}_\phi predicts the denoising residual for Iθ,tI_{\theta,t} given prompt yy, and ϵ\epsilon is the sampled noise.

Optimization thus encourages the rendered view distribution to score highly under the text-conditioned diffusion model, distilling the generative prior into the texture synthesis.

3. DC-PBR: Effect of Deep Convolutional PBR Parameterization

The DC-PBR method contrasts with pixel-wise UV parameterizations and per-point MLP-based schemes. Its advantages include:

  • Frequency Curriculum: U-Nets learn low frequencies before high, suppressing the immediate adoption of high-frequency noise from the SDS gradients, and yielding smoother, more realistic early-stage texture synthesis (frequency scheduling).
  • Noise Filtering: The spatial inductive bias of convolutions prevents accumulation of incoherent, high-frequency artifacts, especially prevalent in SDS-driven optimization.
  • Material Expressiveness: Full PBR maps (diffuse, roughness, metalness, normal) support high-fidelity relighting, material variation, and view-dependent effects in standard rendering pipelines.

Ablation studies show that skipping the DC-PBR architecture (falling back to pixel-wise or MLP param) results in either excessive noise, patchiness, or limited expressiveness, and lower FID/user scores relative to DC-PBR.

4. Rendering Pipeline: Physically-Based and Differentiable

Paint-it uses a standard physically-based rendering (PBR) model with a Cook-Torrance BRDF, rendering images as: Lθ(x,ωo)=ΩLi(x,ωi)fθ(x,ωi,ωo)(ωinθ)dωiL_\theta(\boldsymbol{x}, \omega_o) = \int_\Omega L_i(\boldsymbol{x}, \omega_i) f_\theta(\boldsymbol{x}, \omega_i, \omega_o) (\omega_i \cdot \mathbf{n}_\theta) d\omega_i where each spatial point x\boldsymbol{x} on the surface is assigned spatially varying BRDF properties via the neural-generated textures, and nθ\mathbf{n}_\theta is the normal from KθnK^{\text{n}}_\theta. This process is differentiable, so SDS gradients backpropagate through both rendering and texture generation.

Integration with differentiable rasterization frameworks (e.g., NVDiffRast) facilitates end-to-end optimization compatible with standard graphics hardware (15–30 min per mesh at full resolution on RTX A6000).

5. Experimental Evaluation and Quantitative Results

Comprehensive validation across models (Objaverse, RenderPeople, SMAL) and general domains demonstrates:

Method PBR-Maps FID (↓) User Score (↑, /5)
Latent-Paint No 57.35 2.14
Fantasia3D No 51.01 2.52
TEXTure No 37.28 3.21
Paint-it (DC-PBR) Yes 34.46 4.37

Paint-it achieves the best FID and is the only method with a user score above the "realistic" threshold (4.0). User studies confirm high material and semantic quality. Ablation studies confirm the necessity for both DC-PBR and full multi-channel PBR supervision.

6. Practical Applications and Generalization

Paint-it directly yields multi-channel, relightable, and physically correct PBR texture maps compatible with industry-standard engines (Blender, Unreal, Unity), supporting:

  • Text-prompt-based 3D asset creation from scratch for arbitrary meshes.
  • Relighting and material editing at test time (diffuse/roughness/metalness/normal).
  • Support for animated and dynamic meshes (as UVs are preserved).
  • View-consistent, artifact-suppressed detail as required in AR, VR, gaming, and film production contexts.

Generalization is demonstrated by successful application to objects, clothed humans, and animals with comparable fidelity.

7. Methodological Impact and Future Directions

Paint-it establishes the importance of neural re-parameterization in optimization-based texture synthesis. By embedding the DC-PBR architecture within the SDS-optimized pipeline, it overcomes the optimization instability typical of diffusion-model supervision, especially in high-dimensional texture spaces.

A plausible implication is the possibility of feedforward, supervised training for real-time applications, as Paint-it's fundamental design decouples texture parameterization from per-pixel or per-point artifacts. The approach also sets the groundwork for future multi-modal and interactive 3D asset creation pipelines, potentially integrating direct user guidance or extensions to other material representations.

Forward Email Streamline Icon: https://streamlinehq.com

Follow Topic

Get notified by email when new papers are published related to Paint-it.