Papers
Topics
Authors
Recent
2000 character limit reached

TexPainter: Generative Mesh Texturing with Multi-view Consistency

Published 17 May 2024 in cs.CV and cs.GR | (2406.18539v1)

Abstract: The recent success of pre-trained diffusion models unlocks the possibility of the automatic generation of textures for arbitrary 3D meshes in the wild. However, these models are trained in the screen space, while converting them to a multi-view consistent texture image poses a major obstacle to the output quality. In this paper, we propose a novel method to enforce multi-view consistency. Our method is based on the observation that latent space in a pre-trained diffusion model is noised separately for each camera view, making it difficult to achieve multi-view consistency by directly manipulating the latent codes. Based on the celebrated Denoising Diffusion Implicit Models (DDIM) scheme, we propose to use an optimization-based color-fusion to enforce consistency and indirectly modify the latent codes by gradient back-propagation. Our method further relaxes the sequential dependency assumption among the camera views. By evaluating on a series of general 3D models, we find our simple approach improves consistency and overall quality of the generated textures as compared to competing state-of-the-arts. Our implementation is available at: https://github.com/Quantuman134/TexPainter

Citations (4)

Summary

  • The paper introduces TexPainter, a method that achieves multi-view consistent texture generation for 3D meshes using diffusion-based latent optimization.
  • It employs parallel denoising and color-space fusion techniques to blend textures across camera views, significantly reducing view-dependent artifacts.
  • Quantitative evaluations using FID scores and ablation studies demonstrate its superior texture quality compared to methods like TEXTure and TexFusion.

TexPainter: Generative Mesh Texturing with Multi-view Consistency

TexPainter is a method designed for automatic texture generation for arbitrary 3D meshes. It achieves this by leveraging the capabilities of pre-trained diffusion models with a specific focus on multi-view consistency in the generated textures. The following sections detail the methodology, advantages, and evaluations of the approach.

Introduction and Motivation

Generative models have significantly progressed with diffusion models' success, particularly in 2D image generation. However, transferring these advances to 3D mesh texturing poses challenges due to the need for multi-view consistency in textures. TexPainter addresses this challenge by using pre-trained Denoising Diffusion Implicit Models (DDIM) to enforce multi-view consistency. It focuses on optimization-based color-fusion techniques to modify latent codes effectively across different camera views, thus improving texture quality and consistency.

The core insight is that while diffusion models independently denoise each camera view in latent space, TexPainter introduces a process to harmonize these views in the color space, achieving a cohesive final texture. This method avoids assumptions about sequential dependencies, which are inherent limitations in previous techniques.

Methodology

TexPainter operates by running separate denoising processes for each camera view in parallel using DDIM. The method involves the following key steps:

  1. Parallel Denoising: Each view generates a predicted noiseless latent code at the 0th timestep.
  2. Color-space Fusion: These latent codes are decoded into images, which are then blended in color space to create a consistent texture image.
  3. Optimization of Latents: The predicted latent codes are optimized to ensure the decoded images match the rendered scenes using the fused texture.
  4. Iterative Refinement: The process iterates over denoising steps, continually refining the texture and latent codes until convergence. Figure 1

    Figure 1: Illustration of TexPainter's pipeline, including parallel denoising, color-space fusion, and latent optimization.

The blending in color space using weighted averaging ensures that textures align with surface normals and camera views, promoting consistency across views. This is followed by an optimization step, using gradient backpropagation to adjust the latent space representation for each view, ensuring that they collectively form a unified and consistent texture when rendered.

Evaluation

TexPainter's effectiveness was compared against other state-of-the-art methods, such as TEXTure, TexFusion, and Fantasia3D. Evaluations involve both qualitative and quantitative assessments:

  1. Qualitative Comparison: Visual inspections show TexPainter produces clearer, more consistent textures across multiple views compared to methods like TEXTure and TexFusion, which suffer from view-dependent artifacts.
  2. Quantitative Metrics: The method was evaluated using Fréchet Inception Distance (FID), revealing superior performance in producing visually coherent images. The FID scores illustrate improved semantic alignment of textures with text prompts compared to competing methods.
  3. Ablation Studies: Experiments demonstrate the necessity of TexPainter's components, particularly the color-space fusion and optimization, which are critical to its success in achieving consistency and high-quality textures. Figure 2

Figure 2

Figure 2

Figure 2

Figure 2

Figure 2

Figure 2: Ablation study results highlighting the importance of color-space fusion for maintaining texture consistency.

Implications and Future Work

The development of TexPainter marks a significant step toward automated high-fidelity texturing of 3D models using diffusion models. Its approach to multi-view consistency extends beyond just texturing and has implications for broader 3D content creation tasks.

Future work includes optimizing runtime performance and exploring dynamic camera view selection to improve coverage and further enhance texture fidelity. Additionally, integrating more advanced diffusion models and exploring their application in other 3D representation modalities could further expand the capabilities of generative texturing.

Conclusion

TexPainter advances mesh texturing by leveraging diffusion models for consistent multi-view texturing. Its novel color-space fusion strategy and optimization of latent representations yield significant improvements in quality and consistency over existing methods. This approach presents a substantial advancement in generative 3D texturing, with potential implications for broader applications in automated content creation within computer graphics.

Paper to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Collections

Sign up for free to add this paper to one or more collections.