Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
129 tokens/sec
GPT-4o
28 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

InTeX: Interactive Text-to-texture Synthesis via Unified Depth-aware Inpainting (2403.11878v1)

Published 18 Mar 2024 in cs.CV

Abstract: Text-to-texture synthesis has become a new frontier in 3D content creation thanks to the recent advances in text-to-image models. Existing methods primarily adopt a combination of pretrained depth-aware diffusion and inpainting models, yet they exhibit shortcomings such as 3D inconsistency and limited controllability. To address these challenges, we introduce InteX, a novel framework for interactive text-to-texture synthesis. 1) InteX includes a user-friendly interface that facilitates interaction and control throughout the synthesis process, enabling region-specific repainting and precise texture editing. 2) Additionally, we develop a unified depth-aware inpainting model that integrates depth information with inpainting cues, effectively mitigating 3D inconsistencies and improving generation speed. Through extensive experiments, our framework has proven to be both practical and effective in text-to-texture synthesis, paving the way for high-quality 3D content creation.

Citations (8)

Summary

  • The paper introduces InteX, a unified framework that enhances 3D texture synthesis using depth-aware inpainting to achieve rapid generation in about 30 seconds.
  • The paper leverages a novel model that combines depth estimation and inpainting to overcome challenges in 3D inconsistency and efficiency found in prior methods.
  • The paper incorporates an interactive GUI that empowers users to directly control texture synthesis, offering precise adjustments for realistic 3D outputs.

Interactive Text-to-texture Synthesis via Unified Depth-aware Inpainting

Introduction

The efforts in automating 3D content generation have largely focused on the crucial task of producing high-quality textures; a process traditionally dependant on skilled artists for manual input. Text-to-texture synthesis, propelled by learnings from text-to-image models and their successes, aims to automate the creation of realistic textures guided by textual descriptions. However, adapting these advancements to 3D objects presents a unique set of challenges, notably 3D inconsistency and limited user control over the generated textures. The proposal of InteX, a framework for interactive text-to-texture synthesis utilizing a unified depth-aware inpainting model, targets these issues head-on, offering enhanced user interaction and improved 3D consistency across synthesized textures.

Related Work

Two main approaches have dominated recent research in text-to-texture synthesis. The first approach involves direct training of 3D diffusion models on 3D datasets, while the second leverages pretrained 2D diffusion models in an iterative inpainting process to generate textures. However, scaling 3D diffusion models for high-resolution and diverse generation remains challenging, and the iterative nature of the latter approach often leads to 3D inconsistency and an efficiency bottleneck. This backdrop contextualizes the necessity for a solution like InteX, which navigates these shortcomings through its innovative framework.

Methodology

At its core, InteX streamlines texture synthesis via a unified depth-aware inpainting model trained on extensive 3D datasets, notably enhancing generation speed and mitigating 3D inconsistency issues. The framework integrates several key components:

  • A user-friendly GUI enabling precise control over texture synthesis, including retexturing operations based on user interactions.
  • The employment of a pretrained unified depth-aware inpainting model to contextualize inpainting operations within the bounds of 3D geometry and depth, simplifying the synthesis pipeline and improving operational efficiency.

Unified Depth-aware Inpainting

Unlike prior methods that rely on separate depth-to-image and image inpainting models causing discrepancies in 3D consistency, InteX utilizes a unified model that combines both tasks. This approach, based on the ControlNet architecture, is trained using a hybrid mask generation strategy to better simulate inpainting scenarios, thus achieving depth-aligned inpainting results.

Interactive Texture Synthesis Interface

InteX introduces a graphical interface designed for practical use where users can directly manipulate the texture synthesis process, including selecting camera viewpoints for inpainting and specifying regions for repainting. This functionality enables the addressing of discrete problematic areas without the need to completely retexture the 3D model.

Experiments

Evaluation against existing methods demonstrates the framework’s superior capability in generating textures that are not only of a higher quality but also exhibit a better 3D consistency. Specifically, InteX achieves significant improvements in generation speed, reducing texture synthesis time to approximately 30 seconds per instance. User studies further corroborate the practical efficacy of InteX, highlighting its advancement in facilitating more accurate and user-preferred text-to-texture synthesis results.

Discussion and Future Work

The introduction of InteX marks a significant step forward in interactive texture synthesis, combing an intuitive user interface with a robust unified depth-aware inpainting model to address long-standing challenges in the field. While the method offers substantial improvements over existing techniques, further exploration into multi-view depth-aware inpainting models stands as a promising avenue to completely eliminating 3D inconsistencies. Continued advancements in leveraging user inputs for more nuanced control during the texture synthesis process are also anticipated, marking an exciting trajectory for the future development of 3D content creation tools.

Acknowledgements

The work was supported by multiple grants and benefited from collaboration with industry partners, setting a precedent for academic-industry partnerships in pushing the boundaries of AI-driven 3D content generation.