- The paper introduces a novel method that generates editable PBR maps (albedo, roughness, metallic, and normal) directly from textual input using strong image priors.
- It employs a two-step training process with fine-tuned LoRAs and cross-intrinsic attention to ensure high-fidelity rendering and semantic alignment.
- Evaluations demonstrate superior performance over traditional techniques, supporting real-time image relighting and enhanced scene texturing in digital content creation.
IntrinsiX: High-Quality PBR Generation using Image Priors
Introduction to IntrinsiX Methodology
IntrinsiX introduces a novel approach for generating high-quality Physically-Based Rendering (PBR) maps directly from textual input. Unlike traditional text-to-image models that produce images with baked-in lighting, IntrinsiX uses a generative model to output albedo, roughness, metallic, and normal maps. These intrinsic images can be further manipulated in typical rendering pipelines for applications in realistic scene rendering, material editing, and content creation for gaming and virtual reality environments.
Model Architecture and Training
IntrinsiX leverages pre-trained text-to-image (T2I) models to generate intrinsic images, utilizing strong image priors to produce coherent PBR maps. The model undergoes a two-step training process:
- PBR Prior Training:
- Separate mini-models (LoRAs) are fine-tuned for each PBR component: albedo, roughness, metallic, and normal maps. These are trained on curated datasets using a strong T2I model as the backbone.
- The training involves learning the probability distribution of intrinsic properties, which, although challenging due to data scarcity, benefits from the pre-trained T2I diffusion models to generate diverse, high-quality results.
- PBR Prior Alignment:
Applications and Practical Use Cases
IntrinsiX's ability to generate editable PBR maps allows several downstream applications:
- Editable Image Generation:
- The generated PBR maps enable relighting and real-time editing of images, allowing changes to light positions, albedo colors, or material properties like metallic and roughness, as shown in the manipulations of artistic renderings.























Figure 2: Editable Image Generation. Our generated PBR maps can be edited and utilized in standard physically-based rendering frameworks to produce diverse RGB renderings.
- PBR Scene Texturing:
- By leveraging score distillation, IntrinsiX can texture entire 3D scenes accurately. This makes it an attractive tool for game developers and VR content creators who require accurate and dynamic environment styling.









Figure 3: Scene Texturing. We can use our method for scene texturing using score distillation.
IntrinsiX has been evaluated against existing techniques that perform intrinsic image decomposition from traditional RGB images. The results show:
- Higher fidelity and coherence in rendered outputs: IntrinsiX avoids the pitfalls of baked-in lighting and texture artifacts common in traditional RGB decomposition methods.
- User Studies and Quantitative Metrics: User preferences significantly favor IntrinsiX for its superior albedo and specular qualities, prompting higher ratings for rendering quality and semantic alignment with text prompts.



































Figure 4: Rendering comparisons. Sample PBR maps and rendered images under different lighting conditions showcase the model's ability to capture semantic essence accurately.
Implementation Considerations
Implementing IntrinsiX in practical applications involves managing computational resources effectively, given the intensive nature of diffusion models and PBR map generation. IntrinsiX is well-suited for environments where adaptability and high-quality rendering are critical, though trade-offs in real-time processing need careful consideration in resource-constrained settings.
Conclusion
IntrinsiX establishes a new standard for intrinsic image generation from text, offering substantial improvements in realism and versatility of output. Its ability to produce sophisticated PBR maps opens new avenues in digital content creation, particularly in areas requiring high-quality, editable visuals. Future enhancements could integrate more robust datasets and advanced sampling strategies, improving the model's capability to handle even broader application scenarios.