TEXGen: a Generative Diffusion Model for Mesh Textures (2411.14740v1)

Published 22 Nov 2024 in cs.CV, cs.AI, and cs.GR

Abstract: While high-quality texture maps are essential for realistic 3D asset rendering, few studies have explored learning directly in the texture space, especially on large-scale datasets. In this work, we depart from the conventional approach of relying on pre-trained 2D diffusion models for test-time optimization of 3D textures. Instead, we focus on the fundamental problem of learning in the UV texture space itself. For the first time, we train a large diffusion model capable of directly generating high-resolution texture maps in a feed-forward manner. To facilitate efficient learning in high-resolution UV spaces, we propose a scalable network architecture that interleaves convolutions on UV maps with attention layers on point clouds. Leveraging this architectural design, we train a 700 million parameter diffusion model that can generate UV texture maps guided by text prompts and single-view images. Once trained, our model naturally supports various extended applications, including text-guided texture inpainting, sparse-view texture completion, and text-driven texture synthesis. Project page is at http://cvmi-lab.github.io/TEXGen/.

Citations (1)

View on Semantic Scholar

Summary

The paper presents the TEXGen model that combines UV map convolutions with point cloud-based attention to generate high-resolution texture maps.
It leverages a hybrid 2D-3D framework to overcome UV mapping fragmentation and eliminate the need for extensive test-time optimization.
Empirical results confirm state-of-the-art performance with superior FID and KID scores in text-guided texture inpainting and sparse-view texture completion.

TEXGen: Advancements in Mesh Texture Synthesis Through Generative Diffusion Models

The paper "TEXGen: a Generative Diffusion Model for Mesh Textures" presents a novel approach to the synthesis of textures for 3D meshes, engaging a large-scale generative diffusion model capable of direct texture map generation. The authors address the limitations of previous methods that rely heavily on pre-trained 2D diffusion models and necessitate extensive test-time optimization. Through the development of a new hybrid network architecture, TEXGen allows for efficient and effective learning of high-resolution UV texture maps, delivering substantial improvements over predecessor methods in mesh texturing.

Model Architecture and Approach

TEXGen introduces a scalable network architecture that uniquely combines convolutions on UV maps with point cloud-based attention mechanisms. This hybrid 2D-3D network framework extends the operations beyond the conventional 2D plane into the 3D spatial domain, deriving localized yet globally coherent features crucial for detailed texture synthesis. The architecture benefits from the UV space's structured representation, integrating surface adjacency information and influencing neighborhood relationships to support seamless global texturing. In doing so, it tackles the fragmentation problem common within UV mapping paradigms, where geographically proximal UV islands may reflect distant parts on the mesh surface. By embracing a sparse point feature approach in 3D space, TEXGen balances computational demands and 3D surface continuity, establishing a performance-efficient yet precise texturing environment.

Performance and Evaluation

The 700 million parameter diffusion model trained within this framework has demonstrated its capacity to generate UV texture maps guided by input prompts, such as images or descriptive text. The elimination of test-time optimization requirements represents a significant stride towards the seamless generation of textures. Empirical assessments showcase state-of-the-art results across numerous generative tasks, including text-guided texture inpainting and sparse-view texture completion. Rendering realistic and high-fidelity textures, the model substantially reduces the susceptibility to the 'Janus problem'—an artefact-driven limitation of prior 2D-based methods. TEXGen's efficacy is quantitatively affirmed through superior FID and KID scores when compared with existing models, alongside substantial reductions in inference time.

Implications and Future Directions

Practically, the implications of TEXGen extend into domains requiring detailed and consistent texturing—virtual reality, animation, and game design stand to benefit from this model's capabilities. Theoretically, this work nudges the field closer to realizing fully autonomous, feed-forward models for complex 3D object representation and manipulation. The hybrid framework set forth offers a blueprint for further exploration into multimodal interactions between 2D and 3D data representations in model designs.

Looking forward, the extension of TEXGen's framework to encompass Physically Based Rendering (PBR) materials augurs a promising direction, whereby different material properties can be generated alongside textures, enhancing realism and interactivity. The model's adaptability and scalability hint at potential applications in broader AI-driven generative fields. Additionally, the approach underscores the importance of large-scale data environments and scalable network architectures in achieving generative AI's next evolutionary milestones.

In summary, TEXGen marks a pivotal advancement in generative diffusion models for mesh texturing, effectively bridging the 2D texture generation methods to cohesive 3D applications, setting the stage for innovative AI solutions in visual content creation.

PDF Markdown

Related Papers

GitHub

TEXGen

Tweets

https://twitter.com/VastAIResearch/status/1863184004905877702

https://twitter.com/yanpei_cao/status/1863212917795201490

https://twitter.com/gm8xx8/status/1860899584366592124

Reddit

[2411.14740] TEXGen: a Generative Diffusion Model for Mesh Textures (1 point, 1 comment)