Papers
Topics
Authors
Recent
Assistant
AI Research Assistant
Well-researched responses based on relevant abstracts and paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses.
Gemini 2.5 Flash
Gemini 2.5 Flash 134 tok/s
Gemini 2.5 Pro 41 tok/s Pro
GPT-5 Medium 27 tok/s Pro
GPT-5 High 24 tok/s Pro
GPT-4o 81 tok/s Pro
Kimi K2 205 tok/s Pro
GPT OSS 120B 432 tok/s Pro
Claude Sonnet 4.5 37 tok/s Pro
2000 character limit reached

Paint-it: Text-to-Texture Synthesis via Deep Convolutional Texture Map Optimization and Physically-Based Rendering (2312.11360v2)

Published 18 Dec 2023 in cs.CV, cs.AI, and cs.GR

Abstract: We present Paint-it, a text-driven high-fidelity texture map synthesis method for 3D meshes via neural re-parameterized texture optimization. Paint-it synthesizes texture maps from a text description by synthesis-through-optimization, exploiting the Score-Distillation Sampling (SDS). We observe that directly applying SDS yields undesirable texture quality due to its noisy gradients. We reveal the importance of texture parameterization when using SDS. Specifically, we propose Deep Convolutional Physically-Based Rendering (DC-PBR) parameterization, which re-parameterizes the physically-based rendering (PBR) texture maps with randomly initialized convolution-based neural kernels, instead of a standard pixel-based parameterization. We show that DC-PBR inherently schedules the optimization curriculum according to texture frequency and naturally filters out the noisy signals from SDS. In experiments, Paint-it obtains remarkable quality PBR texture maps within 15 min., given only a text description. We demonstrate the generalizability and practicality of Paint-it by synthesizing high-quality texture maps for large-scale mesh datasets and showing test-time applications such as relighting and material control using a popular graphics engine. Project page: https://kim-youwang.github.io/paint-it

Definition Search Book Streamline Icon: https://streamlinehq.com
References (66)
  1. https://ami.postech.ac.kr/members.
  2. http://virtualhumans.mpi-inf.mpg.de/people.html.
  3. https://renderpeople.com/, 2023.
  4. The generalized PatchMatch correspondence algorithm. In European Conference on Computer Vision (ECCV), 2010.
  5. Who left the dogs out?: 3D animal reconstruction with expectation maximization in the loop. In European Conference on Computer Vision (ECCV), 2020.
  6. Texfusion: Synthesizing 3d textures with text-guided image diffusion models. In IEEE International Conference on Computer Vision (ICCV), 2023a.
  7. Dreamavatar: Text-and-shape guided 3d human avatar generation via diffusion models. arXiv preprint, 2304.00916, 2023b.
  8. SMPLitex: A Generative Model and Dataset for 3D Human Texture Estimation from Single Image. In British Machine Vision Conference (BMVC), 2023.
  9. Efficient geometry-aware 3D generative adversarial networks. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2022.
  10. Text2tex: Text-driven texture synthesis via diffusion models. In IEEE International Conference on Computer Vision (ICCV), 2023a.
  11. Fantasia3d: Disentangling geometry and appearance for high-quality text-to-3d content creation. In IEEE International Conference on Computer Vision (ICCV), 2023b.
  12. gdna: Towards generative detailed neural avatars. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2022a.
  13. Tango: Text-driven photorealistic and robust 3d stylization via lighting decomposition. In Advances in Neural Information Processing Systems (NeurIPS), 2022b.
  14. Cross-attention of disentangled modalities for 3d human mesh recovery with transformers. In European Conference on Computer Vision (ECCV), 2022.
  15. A reflectance model for computer graphics. ACM Transactions on Graphics (SIGGRAPH), 1(1), 1982.
  16. Object removal by exemplar-based inpainting. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2003.
  17. Objaverse: A universe of annotated 3d objects. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2023.
  18. AG3D: Learning to generate 3D avatars from 2D image collections. In IEEE International Conference on Computer Vision (ICCV), 2023.
  19. Learning an animatable detailed 3D face model from in-the-wild images. ACM Transactions on Graphics (SIGGRAPH), 40(8), 2021.
  20. Texture synthesis using convolutional neural networks. In Advances in Neural Information Processing Systems (NeurIPS), 2015.
  21. Humans in 4D: Reconstructing and tracking humans with transformers. In IEEE International Conference on Computer Vision (ICCV), 2023.
  22. Human poseitioning system (hps): 3d human pose estimation and self-localization in large scenes from body-mounted sensors. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2021.
  23. Denoising and regularization via exploiting the structural bias of convolutional generators. In International Conference on Learning Representations (ICLR), 2020.
  24. Gans trained by a two time-scale update rule converge to a local nash equilibrium. In Advances in Neural Information Processing Systems (NeurIPS), 2017.
  25. Avatarclip: Zero-shot text-driven generation and animation of 3d avatars. ACM Transactions on Graphics (SIGGRAPH), 41(4):1–19, 2022.
  26. Dreamtime: An improved optimization strategy for text-to-3d content creation. arXiv preprint, 2306.12422, 2023a.
  27. Dreamwaltz: Make a scene with complex 3d animatable avatars. arXiv preprint, 2305.12529, 2023b.
  28. Zero-shot text-guided object generation with dream fields. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2022.
  29. Avatarcraft: Transforming text into neural human avatars with parameterized shape and pose control. In IEEE International Conference on Computer Vision (ICCV), 2023.
  30. Flame: Free-form language-based motion synthesis & editing. In AAAI Conference on Artificial Intelligence (AAAI), 2022.
  31. Modular primitives for high-performance differentiable rendering. ACM Transactions on Graphics (SIGGRAPH), 39(6), 2020.
  32. 360-degree textures of people in clothing from a single image. In International Conference on 3D Vision (3DV), 2019.
  33. Magic3d: High-resolution text-to-3d content creation. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2023.
  34. Learning to dress 3d people in generative clothing. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2020.
  35. Latent-nerf for shape-guided generation of 3d shapes and textures. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2023.
  36. Nonparametric blind super-resolution. In IEEE International Conference on Computer Vision (ICCV), 2013.
  37. Text2mesh: Text-driven neural stylization for meshes. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2022.
  38. Nerf: Representing scenes as neural radiance fields for view synthesis. In European Conference on Computer Vision (ECCV), 2020.
  39. Deepsdf: Learning continuous signed distance functions for shape representation. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2019.
  40. Expressive body capture: 3D hands, face, and body from a single image. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2019.
  41. Dreamfusion: Text-to-3d using 2d diffusion. In International Conference on Learning Representations (ICLR), 2022.
  42. Learning transferable visual models from natural language supervision. In International Conference on Machine Learning (ICML), 2021.
  43. Humor: 3d human motion model for robust pose estimation. In IEEE International Conference on Computer Vision (ICCV), 2021.
  44. Texture: Text-guided texturing of 3d shapes. ACM Transactions on Graphics (SIGGRAPH), 2023.
  45. High-resolution image synthesis with latent diffusion models. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2022.
  46. Photorealistic text-to-image diffusion models with deep language understanding. In Advances in Neural Information Processing Systems (NeurIPS), 2022.
  47. Deep marching tetrahedra: a hybrid representation for high-resolution 3d shape synthesis. In Advances in Neural Information Processing Systems (NeurIPS), 2021.
  48. On measuring and controlling the spectral bias of the deep image prior. International Journal of Computer Vision, 2022.
  49. Texturify: Generating textures on 3d shape surfaces. In European Conference on Computer Vision (ECCV), 2022.
  50. Consistency models. In International Conference on Machine Learning (ICML), 2023.
  51. Laughtalk: Expressive 3d talking head generation with laughter, 2023.
  52. Dinar: Diffusion inpainting of neural textures for one-shot human avatars. In IEEE International Conference on Computer Vision (ICCV), 2023.
  53. Human motion diffusion model. In International Conference on Learning Representations (ICLR), 2023.
  54. Deep image prior. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018.
  55. Score jacobian chaining: Lifting pretrained 2d diffusion models for 3d generation. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2023a.
  56. Non-local neural networks. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018.
  57. Prolificdreamer: High-fidelity and diverse text-to-3d generation with variational score distillation. In Advances in Neural Information Processing Systems (NeurIPS), 2023b.
  58. Learning a probabilistic latent space of object shapes via 3d generative-adversarial modeling. In Advances in Neural Information Processing Systems (NeurIPS), 2016.
  59. Visibility aware human-object interaction tracking from single rgb camera. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2023.
  60. 3D human texture estimation from a single image with transformers. In IEEE International Conference on Computer Vision (ICCV), 2021.
  61. Nsf: Neural surface fields for human modeling from monocular depth. In IEEE International Conference on Computer Vision (ICCV), 2023.
  62. Unified 3d mesh recovery of humans and animals by learning animal exercise. In British Machine Vision Conference (BMVC), 2021.
  63. CLIP-Actor: Text-driven recommendation and stylization for animating human meshes. In European Conference on Computer Vision (ECCV), 2022.
  64. A large-scale 3d face mesh video dataset via neural re-parameterized optimization, 2023.
  65. Towards metrical reconstruction of human faces. In European Conference on Computer Vision (ECCV), 2022.
  66. 3D menagerie: Modeling the 3D shape and pose of animals. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017.
Citations (29)

Summary

  • The paper introduces a novel text-to-texture synthesis approach using deep convolutional physically-based rendering (DC-PBR) and score-distillation sampling.
  • The methodology uses frequency-scheduled learning and iterative refinement to create high-quality textures on various 3D meshes.
  • The approach streamlines production pipelines by producing photorealistic textures that seamlessly integrate with modern graphics engines.

Introduction to Text-to-Texture Synthesis

The field of computer graphics and 3D modeling continuously seeks advancements in texturing techniques. Reproducing realistic textures on 3D models is not only important for visual appeal but also for enhancing the immersive experience in digital environments. In this regard, a recent development called Paint-it has emerged, which introduces a novel approach to text-driven texture map synthesis.

Methodology of Paint-it

Synthesis-through-Optimization

Paint-it operates on a technique known as synthesis-through-optimization. In essence, it translates text descriptions into texture maps by utilizing an optimization process. The core of Paint-it's method is the Deep Convolutional Physically-Based Rendering (DC-PBR) re-parameterization of texture maps. Contrary to traditional pixel-based texture mapping, DC-PBR employs convolutional neural networks to reconfigure the texture map parameters. This enhances the synthesis process by enabling frequency-scheduled learning, which filters out noisy, high-frequency signals and promotes the generation of high-quality textures.

Score-Distillation Sampling

The optimization process in Paint-it is guided by Score-Distillation Sampling (SDS), a technique that iteratively refines the 3D representation to match the input text description. Despite the fact that directly applying SDS can produce suboptimal textures due to its noisiness, harnessing it in conjunction with DC-PBR significantly improves the end results by emphasizing content over noise.

Empirical Analysis and Results

Texture Map Quality

Extensive experiments with Paint-it have demonstrated its ability to generate remarkable textures for a breadth of 3D meshes, including humans, animals, and various objects. The synthesized texture maps exhibit impressive realism compared to competing methods. One of the key achievements of Paint-it is its ability to create photorealistic textures that support practical applications and integrate well with popular graphics engines.

Frequency-Scheduled Synthesis

Analyses have highlighted the impact of DC-PBR in shaping the synthesis process in a frequency-selective manner. Early-stage optimization focuses on low-frequency components, such as base colors and gross features, gradually working towards mid-frequency and eventually high-frequency details like fine textures and patterns.

Practical Applications and Integration

Compatibility with Graphics Engines

Notably, the texture maps produced by Paint-it are compatible with prevailing graphics engines. This facilitates subsequent stages of production like relighting, material control, and even simulating diverse appearances for identical mesh structures, showcasing its potential for vastly diversified 3D content creation.

Streamlining Production Pipelines

The introduction of Paint-it's text-driven texture synthesis has the potential to revolutionize the current, often repetitive, and labor-intensive production pipeline. By significantly reducing manual efforts traditionally associated with texture creation, Paint-it offers a scalable solution for generating an array of detailed and aesthetically distinct 3D assets.

Conclusion

In summary, Paint-it unlocks new frontiers in texture synthesis with its pioneering text-to-texture approach. Through its novel DC-PBR optimization and adept usage of SDS, it provides an advanced groundwork for future graphics production, where creating a multitude of realistic and creative virtual textures could be as simple as describing them in text.

Dice Question Streamline Icon: https://streamlinehq.com

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Lightbulb Streamline Icon: https://streamlinehq.com

Continue Learning

We haven't generated follow-up questions for this paper yet.

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.

X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets

This paper has been mentioned in 9 tweets and received 85 likes.

Upgrade to Pro to view all of the tweets about this paper: