TexGen: Text-Guided 3D Texture Generation with Multi-view Sampling and Resampling (2408.01291v1)

Published 2 Aug 2024 in cs.CV

Abstract: Given a 3D mesh, we aim to synthesize 3D textures that correspond to arbitrary textual descriptions. Current methods for generating and assembling textures from sampled views often result in prominent seams or excessive smoothing. To tackle these issues, we present TexGen, a novel multi-view sampling and resampling framework for texture generation leveraging a pre-trained text-to-image diffusion model. For view consistent sampling, first of all we maintain a texture map in RGB space that is parameterized by the denoising step and updated after each sampling step of the diffusion model to progressively reduce the view discrepancy. An attention-guided multi-view sampling strategy is exploited to broadcast the appearance information across views. To preserve texture details, we develop a noise resampling technique that aids in the estimation of noise, generating inputs for subsequent denoising steps, as directed by the text prompt and current texture map. Through an extensive amount of qualitative and quantitative evaluations, we demonstrate that our proposed method produces significantly better texture quality for diverse 3D objects with a high degree of view consistency and rich appearance details, outperforming current state-of-the-art methods. Furthermore, our proposed texture generation technique can also be applied to texture editing while preserving the original identity. More experimental results are available at https://dong-huo.github.io/TexGen/

References (2)

Citations (2)

View on Semantic Scholar

Summary

The paper introduces TexGen, a method that uses attention-guided multi-view sampling and T²GR to create coherent 3D textures.
It overcomes view inconsistency and over-smoothing by recalculating noise estimations to preserve high-frequency texture details.
Experimental results show improved FID, KID, and CLIP scores, demonstrating state-of-the-art performance in texture synthesis.

TexGen: Text-Guided 3D Texture Generation with Multi-view Sampling and Resampling

The paper "TexGen: Text-Guided 3D Texture Generation with Multi-view Sampling and Resampling" presents an innovative approach to synthesizing high-quality 3D textures from textual descriptions. This work addresses the longstanding challenges of view inconsistency and overly smoothed textures in prior methods by introducing TexGen, a method leveraging a pre-trained text-to-image (T2I) diffusion model.

Methodology

TexGen employs a multi-view sampling and resampling framework designed to provide view-consistent 3D textures rich in detail. The process involves two key components: Attention-Guided Multi-view Sampling and Text{content}Texture-Guided Resampling (T $^2$ GR).

Attention-Guided Multi-view Sampling: This component ensures view consistency during texture generation. By leveraging the alignment of Key and Value features across different views within the self-attention mechanism of the T2I model, TexGen creates coherent textures. The method sequentially generates denoised observations from multiple views around the object, progressively assembling them into a consistent texture map.
Text{content}Texture-Guided Resampling (T $^2$ GR): To enhance texture details and avoid over-smoothing, this step recalculates noise estimations under the guidance of the current texture map. By combining texture and text-guided noise estimations through a multi-conditioned classifier-free guidance approach, TexGen preserves high-frequency details, ensuring high-quality texture synthesis.

Results

The experimental evaluation demonstrates that TexGen surpasses existing state-of-the-art methods in producing high-fidelity, view-consistent textures. The quantitative results, including lower Frechet Inception Distance (FID) and Kernel Inception Distance (KID), alongside higher CLIP scores, affirm the method's effectiveness. Additionally, user paper preferences indicate a significant favorability toward TexGen over other methods.

Implications

The practical implications of TexGen are substantial, especially in industries like film, gaming, and AR/VR where high-quality 3D content is paramount. The ability to generate rich, consistent textures from textual descriptions streamlines the workflow for artists and developers, potentially reducing production times and costs. Theoretically, this work contributes to the broader field of 3D generative models by addressing the intricate balance between consistency and detail, a challenging aspect in the domain of procedural texture generation.

Future Developments

Future research directions could explore the disentanglement of material properties and lighting conditions, potentially incorporating more sophisticated models like neural radiance fields. Additionally, expanding the framework to handle more complex scenes and multiple objects could further improve its utility.

In conclusion, TexGen represents a significant advancement in text-driven 3D texture generation, offering a robust solution to enduring issues of consistency and detail preservation. The methodological innovations introduced have broad implications, paving the way for more efficient and high-quality texture synthesis in various applications.

PDF Markdown

Related Papers

GitHub

TexGen: Text-Guided 3D Texture Generation with Multi-view Sampling and Resampling

Tweets

https://twitter.com/_akhaliq/status/1820288909751845223

https://twitter.com/TheTuringPost/status/1821260083877269571

https://twitter.com/_vztu/status/1820904950530248722

https://twitter.com/javaeeeee1/status/1820528739115241866

https://twitter.com/CSVisionPapers/status/1820632422532215077

https://twitter.com/arXivGPT/status/1820836369872163260