Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
97 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
5 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

TextureDreamer: Image-guided Texture Synthesis through Geometry-aware Diffusion (2401.09416v1)

Published 17 Jan 2024 in cs.CV and cs.GR

Abstract: We present TextureDreamer, a novel image-guided texture synthesis method to transfer relightable textures from a small number of input images (3 to 5) to target 3D shapes across arbitrary categories. Texture creation is a pivotal challenge in vision and graphics. Industrial companies hire experienced artists to manually craft textures for 3D assets. Classical methods require densely sampled views and accurately aligned geometry, while learning-based methods are confined to category-specific shapes within the dataset. In contrast, TextureDreamer can transfer highly detailed, intricate textures from real-world environments to arbitrary objects with only a few casually captured images, potentially significantly democratizing texture creation. Our core idea, personalized geometry-aware score distillation (PGSD), draws inspiration from recent advancements in diffuse models, including personalized modeling for texture information extraction, variational score distillation for detailed appearance synthesis, and explicit geometry guidance with ControlNet. Our integration and several essential modifications substantially improve the texture quality. Experiments on real images spanning different categories show that TextureDreamer can successfully transfer highly realistic, semantic meaningful texture to arbitrary objects, surpassing the visual quality of previous state-of-the-art.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (68)
  1. Adobe substance 3d. https://docs.substance3d.com/sat.
  2. Single-image 3d human digitization with shape-guided diffusion. In SIGGRAPH Asia, 2023.
  3. Patch-based optimization for image-based texture mapping. ACM Trans. Graph., 36(4):106–1, 2017.
  4. Mesh2tex: Generating mesh textures from image queries. arXiv preprint arXiv:2304.05868, 2023.
  5. Physics-based inverse rendering using combined implicit and explicit geometries. Computer Graphics Forum, 41(4):129–138, 2022.
  6. Texfusion: Synthesizing 3d textures with text-guided image diffusion models. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 4169–4181, 2023.
  7. Efficient geometry-aware 3d generative adversarial networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 16123–16133, 2022.
  8. Shapenet: An information-rich 3d model repository. arXiv preprint arXiv:1512.03012, 2015.
  9. Text2tex: Text-driven texture synthesis via diffusion models. arXiv preprint arXiv:2303.11396, 2023a.
  10. Fantasia3d: Disentangling geometry and appearance for high-quality text-to-3d content creation. arXiv preprint arXiv:2303.13873, 2023b.
  11. Auv-net: Learning aligned uv maps for texture transfer and synthesis. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 1465–1474, 2022.
  12. 3d-r2n2: A unified approach for single and multi-view 3d object reconstruction. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11-14, 2016, Proceedings, Part VIII 14, pages 628–644. Springer, 2016.
  13. Texture synthesis by non-parametric sampling. In Proceedings of the seventh IEEE international conference on computer vision, pages 1033–1038. IEEE, 1999.
  14. Hyperdiffusion: Generating implicit neural fields with weight-space diffusion. arXiv preprint arXiv:2303.17015, 2023.
  15. 3d-future: 3d furniture shape with texture. International Journal of Computer Vision, 129:3313–3337, 2021.
  16. An image is worth one word: Personalizing text-to-image generation using textual inversion. In The Eleventh International Conference on Learning Representations, 2022.
  17. Get3d: A generative model of high quality 3d textured shapes learned from images. Advances In Neural Information Processing Systems, 35:31841–31854, 2022.
  18. Generative adversarial networks. Communications of the ACM, 63(11):139–144, 2020.
  19. threestudio: A unified framework for 3d content generation. https://github.com/threestudio-project/threestudio, 2023.
  20. Shape, light, and material decomposition from images using monte carlo rendering and denoising. Advances in Neural Information Processing Systems, 35:22856–22869, 2022.
  21. Leveraging 2d data to learn textured 3d mesh generation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 7498–7507, 2020.
  22. Classifier-free diffusion guidance. arXiv preprint arXiv:2207.12598, 2022.
  23. Denoising diffusion probabilistic models. Advances in neural information processing systems, 33:6840–6851, 2020.
  24. Lora: Low-rank adaptation of large language models. arXiv preprint arXiv:2106.09685, 2021.
  25. Real shading in unreal engine 4. Proc. Physically Based Shading Theory Practice, 4(3):1, 2013.
  26. Holodiffusion: Training a 3d diffusion model using 2d images. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 18423–18433, 2023.
  27. Noise-free score distillation. arXiv preprint arXiv:2310.17590, 2023.
  28. Solid texture synthesis from 2d exemplars. In ACM SIGGRAPH 2007 papers, pages 2–es. 2007.
  29. Graphcut textures: Image and video synthesis using graph cuts. Acm transactions on graphics (tog), 22(3):277–286, 2003.
  30. Modular primitives for high-performance differentiable rendering. ACM Transactions on Graphics, 39(6), 2020.
  31. Tango: Text-driven photorealistic and robust 3d stylization via lighting decomposition. Advances in Neural Information Processing Systems, 35:30923–30936, 2022.
  32. The digital michelangelo project: 3d scanning of large statues. In Proceedings of the 27th annual conference on Computer graphics and interactive techniques, pages 131–144, 2000.
  33. Blip: Bootstrapping language-image pre-training for unified vision-language understanding and generation, 2022.
  34. Magic3d: High-resolution text-to-3d content creation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 300–309, 2023.
  35. Diffusion probabilistic models for 3d point cloud generation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 2837–2845, 2021.
  36. X-mesh: Towards fast and accurate text-driven 3d stylization via dynamic textual guidance. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 2749–2760, 2023.
  37. Latent-nerf for shape-guided generation of 3d shapes and textures. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 12663–12673, 2023.
  38. Text2mesh: Text-driven neural stylization for meshes. arXiv preprint arXiv:2112.03221, 2021.
  39. Text2mesh: Text-driven neural stylization for meshes. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 13492–13502, 2022.
  40. Nerf: Representing scenes as neural radiance fields for view synthesis. Communications of the ACM, 65(1):99–106, 2021.
  41. Clip-mesh: Generating textured meshes from text using pretrained image-text models. In SIGGRAPH Asia 2022 conference papers, pages 1–8, 2022.
  42. Instant neural graphics primitives with a multiresolution hash encoding. ACM Transactions on Graphics (ToG), 41(4):1–15, 2022.
  43. Extracting triangular 3d models, materials, and lighting from images. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 8280–8290, 2022.
  44. Deepsdf: Learning continuous signed distance functions for shape representation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 165–174, 2019.
  45. Pytorch: An imperative style, high-performance deep learning library. Advances in neural information processing systems, 32, 2019.
  46. Learning generative models of textured 3d meshes from real-world images. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 13879–13889, 2021.
  47. Dreamfusion: Text-to-3d using 2d diffusion. arXiv preprint arXiv:2209.14988, 2022.
  48. Magic123: One image to high-quality 3d object generation using both 2d and 3d diffusion priors. arXiv preprint arXiv:2306.17843, 2023.
  49. U2-net: Going deeper with nested u-structure for salient object detection. page 107404, 2020.
  50. Learning transferable visual models from natural language supervision. In International conference on machine learning, pages 8748–8763. PMLR, 2021.
  51. Dreambooth3d: Subject-driven text-to-3d generation. arXiv preprint arXiv:2303.13508, 2023.
  52. Hierarchical text-conditional image generation with clip latents. arXiv preprint arXiv:2204.06125, 1(2):3, 2022.
  53. Texture: Text-guided texturing of 3d shapes. In ACM SIGGRAPH 2023 Conference Proceedings, New York, NY, USA, 2023. Association for Computing Machinery.
  54. Dreambooth: Fine tuning text-to-image diffusion models for subject-driven generation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 22500–22510, 2023.
  55. Alchemist: Parametric control of material properties with diffusion models. arXiv preprint arXiv:2312.02970, 2023.
  56. Zero123++: a single image to consistent multi-view diffusion base model, 2023a.
  57. Mvdream: Multi-view diffusion for 3d generation. arXiv preprint arXiv:2308.16512, 2023b.
  58. Texturify: Generating textures on 3d shape surfaces. In European Conference on Computer Vision, pages 72–88. Springer, 2022.
  59. Deep unsupervised learning using nonequilibrium thermodynamics. In International conference on machine learning, pages 2256–2265. PMLR, 2015.
  60. Score-based generative modeling through stochastic differential equations. arXiv preprint arXiv:2011.13456, 2020.
  61. Neural-pbir reconstruction of shape, material, and illumination. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2023.
  62. Score jacobian chaining: Lifting pretrained 2d diffusion models for 3d generation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 12619–12629, 2023a.
  63. Prolificdreamer: High-fidelity and diverse text-to-3d generation with variational score distillation. arXiv preprint arXiv:2305.16213, 2023b.
  64. Learning a probabilistic latent space of object shapes via 3d generative-adversarial modeling. Advances in neural information processing systems, 29, 2016.
  65. Psdr-room: Single photo to scene using differentiable rendering. In ACM SIGGRAPH Asia 2023 Conference Proceedings, 2023.
  66. Texture generation on 3d meshes with point-uv diffusion. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 4206–4216, 2023.
  67. Adding conditional control to text-to-image diffusion models. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 3836–3847, 2023.
  68. Color map optimization for 3d reconstruction with consumer depth cameras. ACM Transactions on Graphics (ToG), 33(4):1–10, 2014.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (11)
  1. Yu-Ying Yeh (9 papers)
  2. Jia-Bin Huang (106 papers)
  3. Changil Kim (23 papers)
  4. Lei Xiao (68 papers)
  5. Thu Nguyen-Phuoc (15 papers)
  6. Numair Khan (13 papers)
  7. Cheng Zhang (388 papers)
  8. Manmohan Chandraker (108 papers)
  9. Carl S Marshall (2 papers)
  10. Zhao Dong (51 papers)
  11. Zhengqin Li (23 papers)
Citations (19)

Summary

  • The paper introduces TextureDreamer, a framework that automates 3D texture synthesis using geometry-aware score distillation for realistic outputs.
  • It leverages personalized variational score distillation and ControlNet architectures to transfer textures onto complex shapes using few input images.
  • The method overcomes traditional limitations by enhancing 3D consistency and photorealism, thereby democratizing detailed texture creation.

Introduction to TextureDreamer

The creation of realistic, detailed textures for 3D content holds a paramount place in various applications, such as augmented and virtual reality, robotics, and the entertainment industry. Traditional methods for crafting textures for 3D assets can be labor-intensive, expensive, and generally reliant on professional artists. Recent years have seen strides towards automating this process, yet the challenges of needing large sets of images or being constrained to specific object categories have remained obstacles. A new framework, known as TextureDreamer, aims to break down these barriers by transferring textures from a minimal number of images (typically 3 to 5) onto any target 3D shape.

Key Innovations of the Framework

TextureDreamer is not just another step but a leap forward in automation for texture creation. This method leverages the concept of personalized geometry-aware score distillation (PGSD), drawing on the strength of diffusion-based generative models. These models, originally trained on massive text-image pair datasets, have held the spotlight for their capacity to produce high-quality, diverse images from text prompts. TextureDreamer twists the narrative by using these models to extrapolate texture details from a handful of images.

The method outshines previous ones by not requiring densely sampled views or category-specific data to produce highly detailed, relightable textures. It combines personalized modeling for texture extraction, variational score distillation for finer texture representation, and ControlNet architecture, explicitly guiding the generative process using geometry information. This fusion creates textures that are semantically meaningful and visually richer compared to what presently exists.

Overcoming Challenges in Texture Synthesis

TextureDreamer addresses two primary limitations that have troubled past attempts at texture synthesis. It utilizes Variational Score Distillation (VSD) rather than the traditional Score Distillation Sampling, which is known to cause images to appear overly smooth or saturated. By treating the full 3D representation as a variable and aligning it with the pre-trained diffusion model, VSD allows the system to produce more photorealistic outputs without heavy reliance on guidance weight, crucial for achieving lifelike textures.

To solve 3D consistency issues, a common pitfall where textures lack alignment with the object geometry, TextureDreamer introduces a geometry-aware approach. It incorporates normal maps from the 3D mesh into the distillation process, which vastly improves how textures wrap around and adhere to the complexities of the 3D object's shape. Experiments conducted across various real-world images and object categories have illustrated that this method significantly surpasses existing techniques in transferring textures that maintain fidelity to the original images while conforming seamlessly to the 3D models.

Potential Impact and Future Directions

TextureDreamer stands poised to democratize the process of texture creation significantly. Its ability to produce high-quality textures from a small set of uncorrelated images could make detailed and realistic 3D modeling more accessible to a wider audience, beyond the domain of trained professionals, potentially sparking a transformation in the fields of 3D graphics and content generation.

As it often goes with innovation, TextureDreamer is not without its limitations. Special textures that are non-repetitive or unique may challenge the framework, and input images with a limited range of viewpoints may result in inconsistencies. Nonetheless, these challenges open avenues for future research and refinements that could further enhance the framework's capabilities.

TextureDreamer marks an empowering step toward more efficient, intelligent, and inclusive methods for 3D texture generation, offering exciting prospects for creators and technologists alike in the pursuit of ever-more immersive and realistic digital worlds.

Youtube Logo Streamline Icon: https://streamlinehq.com