Papers
Topics
Authors
Recent
Assistant
AI Research Assistant
Well-researched responses based on relevant abstracts and paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses.
Gemini 2.5 Flash
Gemini 2.5 Flash 119 tok/s
Gemini 2.5 Pro 51 tok/s Pro
GPT-5 Medium 27 tok/s Pro
GPT-5 High 17 tok/s Pro
GPT-4o 60 tok/s Pro
Kimi K2 196 tok/s Pro
GPT OSS 120B 423 tok/s Pro
Claude Sonnet 4.5 37 tok/s Pro
2000 character limit reached

IntrinsiX: High-Quality PBR Generation using Image Priors (2504.01008v1)

Published 1 Apr 2025 in cs.CV and cs.AI

Abstract: We introduce IntrinsiX, a novel method that generates high-quality intrinsic images from text description. In contrast to existing text-to-image models whose outputs contain baked-in scene lighting, our approach predicts physically-based rendering (PBR) maps. This enables the generated outputs to be used for content creation scenarios in core graphics applications that facilitate re-lighting, editing, and texture generation tasks. In order to train our generator, we exploit strong image priors, and pre-train separate models for each PBR material component (albedo, roughness, metallic, normals). We then align these models with a new cross-intrinsic attention formulation that concatenates key and value features in a consistent fashion. This allows us to exchange information between each output modality and to obtain semantically coherent PBR predictions. To ground each intrinsic component, we propose a rendering loss which provides image-space signals to constrain the model, thus facilitating sharp details also in the output BRDF properties. Our results demonstrate detailed intrinsic generation with strong generalization capabilities that outperforms existing intrinsic image decomposition methods used with generated images by a significant margin. Finally, we show a series of applications, including re-lighting, editing, and text-conditioned room-scale PBR texture generation.

Summary

  • The paper introduces a novel method that generates editable PBR maps (albedo, roughness, metallic, and normal) directly from textual input using strong image priors.
  • It employs a two-step training process with fine-tuned LoRAs and cross-intrinsic attention to ensure high-fidelity rendering and semantic alignment.
  • Evaluations demonstrate superior performance over traditional techniques, supporting real-time image relighting and enhanced scene texturing in digital content creation.

IntrinsiX: High-Quality PBR Generation using Image Priors

Introduction to IntrinsiX Methodology

IntrinsiX introduces a novel approach for generating high-quality Physically-Based Rendering (PBR) maps directly from textual input. Unlike traditional text-to-image models that produce images with baked-in lighting, IntrinsiX uses a generative model to output albedo, roughness, metallic, and normal maps. These intrinsic images can be further manipulated in typical rendering pipelines for applications in realistic scene rendering, material editing, and content creation for gaming and virtual reality environments.

Model Architecture and Training

IntrinsiX leverages pre-trained text-to-image (T2I) models to generate intrinsic images, utilizing strong image priors to produce coherent PBR maps. The model undergoes a two-step training process:

  1. PBR Prior Training:
    • Separate mini-models (LoRAs) are fine-tuned for each PBR component: albedo, roughness, metallic, and normal maps. These are trained on curated datasets using a strong T2I model as the backbone.
    • The training involves learning the probability distribution of intrinsic properties, which, although challenging due to data scarcity, benefits from the pre-trained T2I diffusion models to generate diverse, high-quality results.
  2. PBR Prior Alignment:
    • In this stage, the LoRAs are aligned by incorporating cross-intrinsic attention, which facilitates communication between different PBR components.
    • A novel rendering loss, utilizing importance-based light sampling, grounds the model in image space and improves the sharpness and detail of the generated PBR maps. Figure 1

      Figure 1: Method Overview. We generate the intrinsic properties of an image given text as input.

Applications and Practical Use Cases

IntrinsiX's ability to generate editable PBR maps allows several downstream applications:

  1. Editable Image Generation:
    • The generated PBR maps enable relighting and real-time editing of images, allowing changes to light positions, albedo colors, or material properties like metallic and roughness, as shown in the manipulations of artistic renderings. Figure 2

Figure 2

Figure 2

Figure 2

Figure 2

Figure 2

Figure 2

Figure 2

Figure 2

Figure 2

Figure 2

Figure 2

Figure 2

Figure 2

Figure 2

Figure 2

Figure 2

Figure 2

Figure 2

Figure 2

Figure 2

Figure 2

Figure 2

Figure 2

Figure 2: Editable Image Generation. Our generated PBR maps can be edited and utilized in standard physically-based rendering frameworks to produce diverse RGB renderings.

  1. PBR Scene Texturing:
    • By leveraging score distillation, IntrinsiX can texture entire 3D scenes accurately. This makes it an attractive tool for game developers and VR content creators who require accurate and dynamic environment styling. Figure 3

Figure 3

Figure 3

Figure 3

Figure 3

Figure 3

Figure 3

Figure 3

Figure 3

Figure 3

Figure 3: Scene Texturing. We can use our method for scene texturing using score distillation.

Comparative Performance and Evaluation

IntrinsiX has been evaluated against existing techniques that perform intrinsic image decomposition from traditional RGB images. The results show:

  • Higher fidelity and coherence in rendered outputs: IntrinsiX avoids the pitfalls of baked-in lighting and texture artifacts common in traditional RGB decomposition methods.
  • User Studies and Quantitative Metrics: User preferences significantly favor IntrinsiX for its superior albedo and specular qualities, prompting higher ratings for rendering quality and semantic alignment with text prompts. Figure 4

Figure 4

Figure 4

Figure 4

Figure 4

Figure 4

Figure 4

Figure 4

Figure 4

Figure 4

Figure 4

Figure 4

Figure 4

Figure 4

Figure 4

Figure 4

Figure 4

Figure 4

Figure 4

Figure 4

Figure 4

Figure 4

Figure 4

Figure 4

Figure 4

Figure 4

Figure 4

Figure 4

Figure 4

Figure 4

Figure 4

Figure 4

Figure 4

Figure 4

Figure 4

Figure 4

Figure 4: Rendering comparisons. Sample PBR maps and rendered images under different lighting conditions showcase the model's ability to capture semantic essence accurately.

Implementation Considerations

Implementing IntrinsiX in practical applications involves managing computational resources effectively, given the intensive nature of diffusion models and PBR map generation. IntrinsiX is well-suited for environments where adaptability and high-quality rendering are critical, though trade-offs in real-time processing need careful consideration in resource-constrained settings.

Conclusion

IntrinsiX establishes a new standard for intrinsic image generation from text, offering substantial improvements in realism and versatility of output. Its ability to produce sophisticated PBR maps opens new avenues in digital content creation, particularly in areas requiring high-quality, editable visuals. Future enhancements could integrate more robust datasets and advanced sampling strategies, improving the model's capability to handle even broader application scenarios.

Dice Question Streamline Icon: https://streamlinehq.com

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.