TextureDreamer: Image-guided Texture Synthesis through Geometry-aware Diffusion (2401.09416v1)
Abstract: We present TextureDreamer, a novel image-guided texture synthesis method to transfer relightable textures from a small number of input images (3 to 5) to target 3D shapes across arbitrary categories. Texture creation is a pivotal challenge in vision and graphics. Industrial companies hire experienced artists to manually craft textures for 3D assets. Classical methods require densely sampled views and accurately aligned geometry, while learning-based methods are confined to category-specific shapes within the dataset. In contrast, TextureDreamer can transfer highly detailed, intricate textures from real-world environments to arbitrary objects with only a few casually captured images, potentially significantly democratizing texture creation. Our core idea, personalized geometry-aware score distillation (PGSD), draws inspiration from recent advancements in diffuse models, including personalized modeling for texture information extraction, variational score distillation for detailed appearance synthesis, and explicit geometry guidance with ControlNet. Our integration and several essential modifications substantially improve the texture quality. Experiments on real images spanning different categories show that TextureDreamer can successfully transfer highly realistic, semantic meaningful texture to arbitrary objects, surpassing the visual quality of previous state-of-the-art.
- Adobe substance 3d. https://docs.substance3d.com/sat.
- Single-image 3d human digitization with shape-guided diffusion. In SIGGRAPH Asia, 2023.
- Patch-based optimization for image-based texture mapping. ACM Trans. Graph., 36(4):106–1, 2017.
- Mesh2tex: Generating mesh textures from image queries. arXiv preprint arXiv:2304.05868, 2023.
- Physics-based inverse rendering using combined implicit and explicit geometries. Computer Graphics Forum, 41(4):129–138, 2022.
- Texfusion: Synthesizing 3d textures with text-guided image diffusion models. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 4169–4181, 2023.
- Efficient geometry-aware 3d generative adversarial networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 16123–16133, 2022.
- Shapenet: An information-rich 3d model repository. arXiv preprint arXiv:1512.03012, 2015.
- Text2tex: Text-driven texture synthesis via diffusion models. arXiv preprint arXiv:2303.11396, 2023a.
- Fantasia3d: Disentangling geometry and appearance for high-quality text-to-3d content creation. arXiv preprint arXiv:2303.13873, 2023b.
- Auv-net: Learning aligned uv maps for texture transfer and synthesis. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 1465–1474, 2022.
- 3d-r2n2: A unified approach for single and multi-view 3d object reconstruction. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11-14, 2016, Proceedings, Part VIII 14, pages 628–644. Springer, 2016.
- Texture synthesis by non-parametric sampling. In Proceedings of the seventh IEEE international conference on computer vision, pages 1033–1038. IEEE, 1999.
- Hyperdiffusion: Generating implicit neural fields with weight-space diffusion. arXiv preprint arXiv:2303.17015, 2023.
- 3d-future: 3d furniture shape with texture. International Journal of Computer Vision, 129:3313–3337, 2021.
- An image is worth one word: Personalizing text-to-image generation using textual inversion. In The Eleventh International Conference on Learning Representations, 2022.
- Get3d: A generative model of high quality 3d textured shapes learned from images. Advances In Neural Information Processing Systems, 35:31841–31854, 2022.
- Generative adversarial networks. Communications of the ACM, 63(11):139–144, 2020.
- threestudio: A unified framework for 3d content generation. https://github.com/threestudio-project/threestudio, 2023.
- Shape, light, and material decomposition from images using monte carlo rendering and denoising. Advances in Neural Information Processing Systems, 35:22856–22869, 2022.
- Leveraging 2d data to learn textured 3d mesh generation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 7498–7507, 2020.
- Classifier-free diffusion guidance. arXiv preprint arXiv:2207.12598, 2022.
- Denoising diffusion probabilistic models. Advances in neural information processing systems, 33:6840–6851, 2020.
- Lora: Low-rank adaptation of large language models. arXiv preprint arXiv:2106.09685, 2021.
- Real shading in unreal engine 4. Proc. Physically Based Shading Theory Practice, 4(3):1, 2013.
- Holodiffusion: Training a 3d diffusion model using 2d images. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 18423–18433, 2023.
- Noise-free score distillation. arXiv preprint arXiv:2310.17590, 2023.
- Solid texture synthesis from 2d exemplars. In ACM SIGGRAPH 2007 papers, pages 2–es. 2007.
- Graphcut textures: Image and video synthesis using graph cuts. Acm transactions on graphics (tog), 22(3):277–286, 2003.
- Modular primitives for high-performance differentiable rendering. ACM Transactions on Graphics, 39(6), 2020.
- Tango: Text-driven photorealistic and robust 3d stylization via lighting decomposition. Advances in Neural Information Processing Systems, 35:30923–30936, 2022.
- The digital michelangelo project: 3d scanning of large statues. In Proceedings of the 27th annual conference on Computer graphics and interactive techniques, pages 131–144, 2000.
- Blip: Bootstrapping language-image pre-training for unified vision-language understanding and generation, 2022.
- Magic3d: High-resolution text-to-3d content creation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 300–309, 2023.
- Diffusion probabilistic models for 3d point cloud generation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 2837–2845, 2021.
- X-mesh: Towards fast and accurate text-driven 3d stylization via dynamic textual guidance. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 2749–2760, 2023.
- Latent-nerf for shape-guided generation of 3d shapes and textures. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 12663–12673, 2023.
- Text2mesh: Text-driven neural stylization for meshes. arXiv preprint arXiv:2112.03221, 2021.
- Text2mesh: Text-driven neural stylization for meshes. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 13492–13502, 2022.
- Nerf: Representing scenes as neural radiance fields for view synthesis. Communications of the ACM, 65(1):99–106, 2021.
- Clip-mesh: Generating textured meshes from text using pretrained image-text models. In SIGGRAPH Asia 2022 conference papers, pages 1–8, 2022.
- Instant neural graphics primitives with a multiresolution hash encoding. ACM Transactions on Graphics (ToG), 41(4):1–15, 2022.
- Extracting triangular 3d models, materials, and lighting from images. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 8280–8290, 2022.
- Deepsdf: Learning continuous signed distance functions for shape representation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 165–174, 2019.
- Pytorch: An imperative style, high-performance deep learning library. Advances in neural information processing systems, 32, 2019.
- Learning generative models of textured 3d meshes from real-world images. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 13879–13889, 2021.
- Dreamfusion: Text-to-3d using 2d diffusion. arXiv preprint arXiv:2209.14988, 2022.
- Magic123: One image to high-quality 3d object generation using both 2d and 3d diffusion priors. arXiv preprint arXiv:2306.17843, 2023.
- U2-net: Going deeper with nested u-structure for salient object detection. page 107404, 2020.
- Learning transferable visual models from natural language supervision. In International conference on machine learning, pages 8748–8763. PMLR, 2021.
- Dreambooth3d: Subject-driven text-to-3d generation. arXiv preprint arXiv:2303.13508, 2023.
- Hierarchical text-conditional image generation with clip latents. arXiv preprint arXiv:2204.06125, 1(2):3, 2022.
- Texture: Text-guided texturing of 3d shapes. In ACM SIGGRAPH 2023 Conference Proceedings, New York, NY, USA, 2023. Association for Computing Machinery.
- Dreambooth: Fine tuning text-to-image diffusion models for subject-driven generation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 22500–22510, 2023.
- Alchemist: Parametric control of material properties with diffusion models. arXiv preprint arXiv:2312.02970, 2023.
- Zero123++: a single image to consistent multi-view diffusion base model, 2023a.
- Mvdream: Multi-view diffusion for 3d generation. arXiv preprint arXiv:2308.16512, 2023b.
- Texturify: Generating textures on 3d shape surfaces. In European Conference on Computer Vision, pages 72–88. Springer, 2022.
- Deep unsupervised learning using nonequilibrium thermodynamics. In International conference on machine learning, pages 2256–2265. PMLR, 2015.
- Score-based generative modeling through stochastic differential equations. arXiv preprint arXiv:2011.13456, 2020.
- Neural-pbir reconstruction of shape, material, and illumination. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2023.
- Score jacobian chaining: Lifting pretrained 2d diffusion models for 3d generation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 12619–12629, 2023a.
- Prolificdreamer: High-fidelity and diverse text-to-3d generation with variational score distillation. arXiv preprint arXiv:2305.16213, 2023b.
- Learning a probabilistic latent space of object shapes via 3d generative-adversarial modeling. Advances in neural information processing systems, 29, 2016.
- Psdr-room: Single photo to scene using differentiable rendering. In ACM SIGGRAPH Asia 2023 Conference Proceedings, 2023.
- Texture generation on 3d meshes with point-uv diffusion. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 4206–4216, 2023.
- Adding conditional control to text-to-image diffusion models. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 3836–3847, 2023.
- Color map optimization for 3d reconstruction with consumer depth cameras. ACM Transactions on Graphics (ToG), 33(4):1–10, 2014.
- Yu-Ying Yeh (9 papers)
- Jia-Bin Huang (106 papers)
- Changil Kim (23 papers)
- Lei Xiao (68 papers)
- Thu Nguyen-Phuoc (15 papers)
- Numair Khan (13 papers)
- Cheng Zhang (388 papers)
- Manmohan Chandraker (108 papers)
- Carl S Marshall (2 papers)
- Zhao Dong (51 papers)
- Zhengqin Li (23 papers)