FlexPainter: Flexible and Multi-View Consistent Texture Generation (2506.02620v1)

Published 3 Jun 2025 in cs.GR and cs.CV

Abstract: Texture map production is an important part of 3D modeling and determines the rendering quality. Recently, diffusion-based methods have opened a new way for texture generation. However, restricted control flexibility and limited prompt modalities may prevent creators from producing desired results. Furthermore, inconsistencies between generated multi-view images often lead to poor texture generation quality. To address these issues, we introduce \textbf{FlexPainter}, a novel texture generation pipeline that enables flexible multi-modal conditional guidance and achieves highly consistent texture generation. A shared conditional embedding space is constructed to perform flexible aggregation between different input modalities. Utilizing such embedding space, we present an image-based CFG method to decompose structural and style information, achieving reference image-based stylization. Leveraging the 3D knowledge within the image diffusion prior, we first generate multi-view images simultaneously using a grid representation to enhance global understanding. Meanwhile, we propose a view synchronization and adaptive weighting module during diffusion sampling to further ensure local consistency. Finally, a 3D-aware texture completion model combined with a texture enhancement model is used to generate seamless, high-resolution texture maps. Comprehensive experiments demonstrate that our framework significantly outperforms state-of-the-art methods in both flexibility and generation quality.

Summary

The paper introduces a flexible diffusion-based framework that generates high-quality textures with consistent multi-view results.
The paper employs a shared conditional embedding and image-based Classifier-Free Guidance to seamlessly integrate textual and visual prompts.
The paper’s view synchronization module ensures coherent texture mapping, significantly improving FID and KID metrics in experimental evaluations.

FlexPainter: Flexible and Multi-View Consistent Texture Generation

The paper entitled "FlexPainter: Flexible and Multi-View Consistent Texture Generation" presents an advanced framework for generating high-quality texture maps for 3D models with enhanced flexibility and consistency across different viewing modalities. This work addresses significant challenges encountered in the domain of texture synthesis, particularly the integration of multi-modal conditional prompts and the maintenance of intra-view consistency during texture generation.

Overview

FlexPainter emerges as a solution to improve the rendering quality in texture map production by leveraging diffusion-based generative methods. Texture maps play a critical role in various applications such as gaming, virtual reality (VR), augmented reality (AR), and digital animation. Current approaches often rely heavily on single-modal inputs, typically text or image, which limit the creative control available to designers. Additionally, ensuring consistency across different viewpoints is essential to avoid artifacts and preserve realism in textures applied to 3D models.

Key Methodologies and Contributions

The approach introduces a shared conditional embedding space that facilitates the blending of multi-modal inputs—allowing textual and visual prompts to be integrated into a unified framework. This multi-granularity control through linear operations within the embedding space allows for nuanced adjustments in texture characteristics, enhancing the designer's ability to tailor outputs to specific requirements.

Furthermore, an image-based Classifier-Free Guidance (CFG) mechanism is employed to increase the adaptability and quality of generated textures. This method aids in deconstructing reference images into structural and stylistic components, allowing for aesthetic stylization guided by visual reference while suppressing unwanted structural information.

The authors address the issue of inter-view consistency by adopting a multi-view image grid representation. This approach heightens the model’s global understanding of the object, thus promoting coherence across different views. To further ensure alignment between views, a view synchronization module is deployed during diffusion sampling. This module leverages repetitive projection and rasterization steps alongside an adaptive WeighterNet—a neural network designed to aggregate partial UV maps robustly across different denoising stages.

Strong Results and Claims

Experiments conducted demonstrate FlexPainter’s superiority over existing methods concerning flexibility in conditional input and generation quality. The framework significantly outperforms state-of-the-art approaches with improved FID and KID metrics, underscoring its efficacy in generating coherent textures that align accurately with user prompts across diverse input types. Additionally, user preference studies reveal a high ranking for FlexPainter's output, validating its practical applicability and acceptance among users.

Implications and Future Directions

Practically, FlexPainter offers substantial time savings for designers by facilitating intuitive control over texture features and reducing the frequency of repetitive adjustments. Theoretically, its methodology advances the understanding of conditional embedding manipulation and synchronization in texture generation tasks.

Looking ahead, the paper suggests exploring methods to incorporate explicit lighting effects within textures, further reducing the necessary adjustments in real-world applications. Additionally, maintaining high fidelity to the original 3D mesh details when utilizing depth maps as geometric conditioning remains a challenge to be addressed.

Overall, FlexPainter provides a substantial contribution to the field of texture synthesis by enhancing control, consistency, and quality. It opens avenues for future research in multi-view consistent generation and the integration of more comprehensive environmental factors within texture maps, promoting advancements in AI-driven creative processes in 3D modeling.