Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
102 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Text-Driven Diverse Facial Texture Generation via Progressive Latent-Space Refinement (2404.09540v1)

Published 15 Apr 2024 in cs.CV

Abstract: Automatic 3D facial texture generation has gained significant interest recently. Existing approaches may not support the traditional physically based rendering pipeline or rely on 3D data captured by Light Stage. Our key contribution is a progressive latent space refinement approach that can bootstrap from 3D Morphable Models (3DMMs)-based texture maps generated from facial images to generate high-quality and diverse PBR textures, including albedo, normal, and roughness. It starts with enhancing Generative Adversarial Networks (GANs) for text-guided and diverse texture generation. To this end, we design a self-supervised paradigm to overcome the reliance on ground truth 3D textures and train the generative model with only entangled texture maps. Besides, we foster mutual enhancement between GANs and Score Distillation Sampling (SDS). SDS boosts GANs with more generative modes, while GANs promote more efficient optimization of SDS. Furthermore, we introduce an edge-aware SDS for multi-view consistent facial structure. Experiments demonstrate that our method outperforms existing 3D texture generation methods regarding photo-realistic quality, diversity, and efficiency.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (74)
  1. Image2stylegan++: How to edit the embedded images? In IEEE Conf. Comput. Vis. Pattern Recog., pages 8296–8305, 2020.
  2. Clipface: Text-guided editing of textured 3d morphable models. In ACM SIGGRAPH 2023 Conference Proceedings, pages 1–11, 2023.
  3. Ffhq-uv: Normalized facial uv-texture dataset for 3d face reconstruction. In IEEE Conf. Comput. Vis. Pattern Recog., pages 362–371, 2023.
  4. A morphable model for the synthesis of 3d faces. In Seminal Graphics Papers: Pushing the Boundaries, Volume 2, pages 157–164. ACM, 2023.
  5. Realy: Rethinking the evaluation of 3d face reconstruction. In Eur. Conf. Comput. Vis., pages 74–92. Springer, 2022.
  6. Efficient geometry-aware 3d generative adversarial networks. In IEEE Conf. Comput. Vis. Pattern Recog., pages 16123–16133, 2022.
  7. pi-gan: Periodic implicit generative adversarial networks for 3d-aware image synthesis. In IEEE Conf. Comput. Vis. Pattern Recog., pages 5799–5809, 2021.
  8. Photo-realistic facial details synthesis from single image. In Int. Conf. Comput. Vis., pages 9429–9439, 2019.
  9. IQA-PyTorch: Pytorch toolbox for image quality assessment. [Online]. Available: https://github.com/chaofengc/IQA-PyTorch, 2022.
  10. Topiq: A top-down approach from semantics to distortions for image quality assessment. arXiv preprint arXiv:2308.03060, 2023.
  11. Fantasia3d: Disentangling geometry and appearance for high-quality text-to-3d content creation. arXiv preprint arXiv:2303.13873, 2023.
  12. 3d-aware conditional image synthesis. In IEEE Conf. Comput. Vis. Pattern Recog., pages 4434–4445, 2023.
  13. Gram: Generative radiance manifolds for 3d-aware image generation. In IEEE Conf. Comput. Vis. Pattern Recog., pages 10673–10683, 2022.
  14. Accurate 3d face reconstruction with weakly-supervised learning: From single image to image set. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition workshops, pages 0–0, 2019.
  15. Taming transformers for high-resolution image synthesis. In IEEE Conf. Comput. Vis. Pattern Recog., pages 12873–12883, 2021.
  16. Get3d: A generative model of high quality 3d textured shapes learned from images. Adv. Neural Inform. Process. Syst., 35:31841–31854, 2022.
  17. Tm-net: Deep generative networks for textured meshes. ACM Trans. Graph., 40(6):1–15, 2021.
  18. Reconstruction of personalized 3d face rigs from monocular video. ACM Trans. Graph., 35(3):1–15, 2016.
  19. Ganfit: Generative adversarial network fitting for high fidelity 3d face reconstruction. In IEEE Conf. Comput. Vis. Pattern Recog., pages 1155–1164, 2019.
  20. Generative adversarial networks. Communications of the ACM, 63(11):139–144, 2020.
  21. CLIPScore: a reference-free evaluation metric for image captioning. In EMNLP, 2021.
  22. Denoising diffusion probabilistic models. Adv. Neural Inform. Process. Syst., 33:6840–6851, 2020.
  23. AvatarCLIP: zero-shot text-driven generation and animation of 3D avatars. ACM Trans. Graph., 41(4):1–19, July 2022.
  24. Arbitrary style transfer in real-time with adaptive instance normalization. In Int. Conf. Comput. Vis., pages 1501–1510, 2017.
  25. Zero-shot text-guided object generation with dream fields. In IEEE Conf. Comput. Vis. Pattern Recog., pages 867–876, 2022.
  26. Training generative adversarial networks with limited data. NIPS, 33:12104–12114, 2020.
  27. Alias-free generative adversarial networks. Adv. Neural Inform. Process. Syst., 34:852–863, 2021.
  28. A style-based generator architecture for generative adversarial networks. In IEEE Conf. Comput. Vis. Pattern Recog., pages 4401–4410, 2019.
  29. Analyzing and improving the image quality of stylegan. In IEEE Conf. Comput. Vis. Pattern Recog., pages 8110–8119, 2020.
  30. Musiq: Multi-scale image quality transformer. In Int. Conf. Comput. Vis., pages 5148–5157, 2021.
  31. Avatarme++: Facial shape and brdf inference with photorealistic rendering-aware gans. IEEE Trans. Pattern Anal. Mach. Intell., 44(12):9269–9284, 2021.
  32. Tango: Text-driven photorealistic and robust 3d stylization via lighting decomposition. Adv. Neural Inform. Process. Syst., 35:30923–30936, 2022.
  33. Learning formation of physically-based face attributes. In IEEE Conf. Comput. Vis. Pattern Recog., pages 3410–3419, 2020.
  34. Towards unsupervised learning of generative models for 3d controllable image synthesis. In IEEE Conf. Comput. Vis. Pattern Recog., pages 5871–5880, 2020.
  35. Magic3d: High-resolution text-to-3d content creation. In IEEE Conf. Comput. Vis. Pattern Recog., pages 300–309, 2023.
  36. 3d-fm gan: Towards 3d-controllable face manipulation. In Eur. Conf. Comput. Vis., pages 107–125. Springer, 2022.
  37. Rapid acquisition of specular and diffuse normal maps from polarized spherical gradient illumination. Rendering Techniques, 2007(9):10, 2007.
  38. Latent-nerf for shape-guided generation of 3d shapes and textures. In IEEE Conf. Comput. Vis. Pattern Recog., pages 12663–12673, 2023.
  39. Text2mesh: Text-driven neural stylization for meshes. In IEEE Conf. Comput. Vis. Pattern Recog., pages 13492–13502, 2022.
  40. Clip-mesh: Generating textured meshes from text using pretrained image-text models. In SIGGRAPH Asia 2022 conference papers, pages 1–8, 2022.
  41. Extracting triangular 3d models, materials, and lighting from images. In IEEE Conf. Comput. Vis. Pattern Recog., pages 8280–8290, 2022.
  42. Hologan: Unsupervised learning of 3d representations from natural images. In Int. Conf. Comput. Vis., pages 7588–7597, 2019.
  43. Texture fields: Learning texture representations in function space. In Int. Conf. Comput. Vis., pages 4531–4540, 2019.
  44. A shading-guided generative implicit model for shape-accurate 3d-aware image synthesis. Adv. Neural Inform. Process. Syst., 34:20002–20013, 2021.
  45. clip2latent: Text driven sampling of a pre-trained stylegan using denoising diffusion and clip. In Brit. Mach. Vis. Conf., page 594. BMVA Press, 2022.
  46. Dreamfusion: Text-to-3d using 2d diffusion. arXiv preprint arXiv:2209.14988, 2022.
  47. Learning transferable visual models from natural language supervision. In International conference on machine learning, pages 8748–8763. PMLR, 2021.
  48. Texture: Text-guided texturing of 3d shapes. arXiv preprint arXiv:2302.01721, 2023.
  49. High-resolution image synthesis with latent diffusion models. In IEEE Conf. Comput. Vis. Pattern Recog., pages 10684–10695, 2022.
  50. Clip-forge: Towards zero-shot text-to-shape generation. In IEEE Conf. Comput. Vis. Pattern Recog., pages 18603–18613, 2022.
  51. Graf: Generative radiance fields for 3d-aware image synthesis. Adv. Neural Inform. Process. Syst., 33:20154–20166, 2020.
  52. Voxgraf: Fast 3d-aware image synthesis with sparse voxel grids. Adv. Neural Inform. Process. Syst., 35:33999–34011, 2022.
  53. Deep marching tetrahedra: a hybrid representation for high-resolution 3d shape synthesis. Adv. Neural Inform. Process. Syst., 34:6087–6101, 2021.
  54. Closed-form factorization of latent semantics in gans. In IEEE Conf. Comput. Vis. Pattern Recog., pages 1532–1540, 2021.
  55. Lifting 2d stylegan for 3d-aware face generation. In IEEE Conf. Comput. Vis. Pattern Recog., pages 6258–6266, 2021.
  56. Texturify: Generating textures on 3d shape surfaces. In Eur. Conf. Comput. Vis., pages 72–88. Springer, 2022.
  57. Going the extra mile in face image quality assessment: A novel database and model. IEEE Trans. Multimedia, 2023.
  58. Blindly assess image quality in the wild guided by a self-adaptive hyper network. In IEEE Conf. Comput. Vis. Pattern Recog., June 2020.
  59. Next3d: Generative neural texture rasterization for 3d-aware head avatars. In IEEE Conf. Comput. Vis. Pattern Recog., pages 20991–21002, 2023.
  60. Volux-gan: A generative model for 3d face synthesis with hdri relighting. In ACM SIGGRAPH 2022 Conference Proceedings, pages 1–9, 2022.
  61. Dreamgaussian: Generative gaussian splatting for efficient 3d content creation. arXiv preprint arXiv:2309.16653, 2023.
  62. Nonlinear 3d face morphable model. In IEEE Conf. Comput. Vis. Pattern Recog., pages 7346–7355, 2018.
  63. Clip-nerf: Text-and-image driven manipulation of neural radiance fields. In IEEE Conf. Comput. Vis. Pattern Recog., pages 3835–3844, 2022.
  64. Score jacobian chaining: Lifting pretrained 2d diffusion models for 3d generation. In IEEE Conf. Comput. Vis. Pattern Recog., pages 12619–12629, 2023.
  65. Exploring clip for assessing the look and feel of images. In AAAI, 2023.
  66. Diffusion-gan: Training gans with diffusion. arXiv preprint arXiv:2206.02262, 2022.
  67. Prolificdreamer: High-fidelity and diverse text-to-3d generation with variational score distillation. arXiv preprint arXiv:2305.16213, 2023.
  68. Holistically-nested edge detection. In ICCV, pages 1395–1403, 2015.
  69. 3d-aware image synthesis via learning structural and textural representations. In IEEE Conf. Comput. Vis. Pattern Recog., pages 18430–18439, 2022.
  70. High-fidelity facial reflectance and geometry inference from an unconstrained image. ACM Trans. Graph., 37(4):1–14, 2018.
  71. Dreamface: Progressive generation of animatable 3d faces under text guidance. arXiv preprint arXiv:2304.03117, 2023.
  72. Adding conditional control to text-to-image diffusion models. In ICCV, pages 3836–3847, 2023.
  73. In-domain gan inversion for real image editing. In Eur. Conf. Comput. Vis., pages 592–608. Springer, 2020.
  74. Visual object networks: Image generation with disentangled 3d representations. In S. Bengio, H. Wallach, H. Larochelle, K. Grauman, N. Cesa-Bianchi, and R. Garnett, editors, Adv. Neural Inform. Process. Syst., volume 31. Curran Associates, Inc., 2018.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (8)
  1. Chi Wang (93 papers)
  2. Junming Huang (24 papers)
  3. Rong Zhang (133 papers)
  4. Qi Wang (561 papers)
  5. Haotian Yang (16 papers)
  6. Haibin Huang (60 papers)
  7. Chongyang Ma (52 papers)
  8. Weiwei Xu (65 papers)
Citations (2)