Paint-it: Text-to-Texture Synthesis via Deep Convolutional Texture Map Optimization and Physically-Based Rendering (2312.11360v2)

Published 18 Dec 2023 in cs.CV, cs.AI, and cs.GR

Abstract: We present Paint-it, a text-driven high-fidelity texture map synthesis method for 3D meshes via neural re-parameterized texture optimization. Paint-it synthesizes texture maps from a text description by synthesis-through-optimization, exploiting the Score-Distillation Sampling (SDS). We observe that directly applying SDS yields undesirable texture quality due to its noisy gradients. We reveal the importance of texture parameterization when using SDS. Specifically, we propose Deep Convolutional Physically-Based Rendering (DC-PBR) parameterization, which re-parameterizes the physically-based rendering (PBR) texture maps with randomly initialized convolution-based neural kernels, instead of a standard pixel-based parameterization. We show that DC-PBR inherently schedules the optimization curriculum according to texture frequency and naturally filters out the noisy signals from SDS. In experiments, Paint-it obtains remarkable quality PBR texture maps within 15 min., given only a text description. We demonstrate the generalizability and practicality of Paint-it by synthesizing high-quality texture maps for large-scale mesh datasets and showing test-time applications such as relighting and material control using a popular graphics engine. Project page: https://kim-youwang.github.io/paint-it

References (66)

Citations (29)

View on Semantic Scholar

Summary

The paper introduces a novel text-to-texture synthesis approach using deep convolutional physically-based rendering (DC-PBR) and score-distillation sampling.
The methodology uses frequency-scheduled learning and iterative refinement to create high-quality textures on various 3D meshes.
The approach streamlines production pipelines by producing photorealistic textures that seamlessly integrate with modern graphics engines.

Introduction to Text-to-Texture Synthesis

The field of computer graphics and 3D modeling continuously seeks advancements in texturing techniques. Reproducing realistic textures on 3D models is not only important for visual appeal but also for enhancing the immersive experience in digital environments. In this regard, a recent development called Paint-it has emerged, which introduces a novel approach to text-driven texture map synthesis.

Methodology of Paint-it

Synthesis-through-Optimization

Paint-it operates on a technique known as synthesis-through-optimization. In essence, it translates text descriptions into texture maps by utilizing an optimization process. The core of Paint-it's method is the Deep Convolutional Physically-Based Rendering (DC-PBR) re-parameterization of texture maps. Contrary to traditional pixel-based texture mapping, DC-PBR employs convolutional neural networks to reconfigure the texture map parameters. This enhances the synthesis process by enabling frequency-scheduled learning, which filters out noisy, high-frequency signals and promotes the generation of high-quality textures.

Score-Distillation Sampling

The optimization process in Paint-it is guided by Score-Distillation Sampling (SDS), a technique that iteratively refines the 3D representation to match the input text description. Despite the fact that directly applying SDS can produce suboptimal textures due to its noisiness, harnessing it in conjunction with DC-PBR significantly improves the end results by emphasizing content over noise.

Empirical Analysis and Results

Texture Map Quality

Extensive experiments with Paint-it have demonstrated its ability to generate remarkable textures for a breadth of 3D meshes, including humans, animals, and various objects. The synthesized texture maps exhibit impressive realism compared to competing methods. One of the key achievements of Paint-it is its ability to create photorealistic textures that support practical applications and integrate well with popular graphics engines.

Frequency-Scheduled Synthesis

Analyses have highlighted the impact of DC-PBR in shaping the synthesis process in a frequency-selective manner. Early-stage optimization focuses on low-frequency components, such as base colors and gross features, gradually working towards mid-frequency and eventually high-frequency details like fine textures and patterns.

Practical Applications and Integration

Compatibility with Graphics Engines

Notably, the texture maps produced by Paint-it are compatible with prevailing graphics engines. This facilitates subsequent stages of production like relighting, material control, and even simulating diverse appearances for identical mesh structures, showcasing its potential for vastly diversified 3D content creation.

Streamlining Production Pipelines

The introduction of Paint-it's text-driven texture synthesis has the potential to revolutionize the current, often repetitive, and labor-intensive production pipeline. By significantly reducing manual efforts traditionally associated with texture creation, Paint-it offers a scalable solution for generating an array of detailed and aesthetically distinct 3D assets.

Conclusion

In summary, Paint-it unlocks new frontiers in texture synthesis with its pioneering text-to-texture approach. Through its novel DC-PBR optimization and adept usage of SDS, it provides an advanced groundwork for future graphics production, where creating a multitude of realistic and creative virtual textures could be as simple as describing them in text.