Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
110 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Text-Guided 3D Face Synthesis -- From Generation to Editing (2312.00375v1)

Published 1 Dec 2023 in cs.CV

Abstract: Text-guided 3D face synthesis has achieved remarkable results by leveraging text-to-image (T2I) diffusion models. However, most existing works focus solely on the direct generation, ignoring the editing, restricting them from synthesizing customized 3D faces through iterative adjustments. In this paper, we propose a unified text-guided framework from face generation to editing. In the generation stage, we propose a geometry-texture decoupled generation to mitigate the loss of geometric details caused by coupling. Besides, decoupling enables us to utilize the generated geometry as a condition for texture generation, yielding highly geometry-texture aligned results. We further employ a fine-tuned texture diffusion model to enhance texture quality in both RGB and YUV space. In the editing stage, we first employ a pre-trained diffusion model to update facial geometry or texture based on the texts. To enable sequential editing, we introduce a UV domain consistency preservation regularization, preventing unintentional changes to irrelevant facial attributes. Besides, we propose a self-guided consistency weight strategy to improve editing efficacy while preserving consistency. Through comprehensive experiments, we showcase our method's superiority in face synthesis. Project page: https://faceg2e.github.io/.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (56)
  1. Stable-dreamfusion. https://github.com/ashawkey/stable-dreamfusion, 2022.
  2. Clipface: Text-guided editing of textured 3d morphable models. arXiv preprint arXiv:2212.01406, 2022.
  3. Ffhq-uv: Normalized facial uv-texture dataset for 3d face reconstruction. arXiv preprint arXiv:2211.13874, 2022.
  4. ediffi: Text-to-image diffusion models with an ensemble of expert denoisers. arXiv preprint arXiv:2211.01324, 2022.
  5. High-fidelity 3d digital human head creation from rgb-d selfies. ACM Transactions on Graphics, 2021.
  6. Instructpix2pix: Learning to follow image editing instructions. In CVPR, pages 18392–18402, 2023.
  7. Fantasia3d: Disentangling geometry and appearance for high-quality text-to-3d content creation. arXiv preprint arXiv:2303.13873, 2023.
  8. Generating diverse 3d reconstructions from a single occluded face image. In CVPR, pages 1547–1557, 2022.
  9. Diffusion models beat gans on image synthesis. NeurIPS, 34:8780–8794, 2021.
  10. S2f2: Self-supervised high fidelity face reconstruction from monocular image. In 2023 IEEE 17th International Conference on Automatic Face and Gesture Recognition (FG), pages 1–8. IEEE, 2023.
  11. An image is worth one word: Personalizing text-to-image generation using textual inversion. arXiv preprint arXiv:2208.01618, 2022.
  12. Headsculpt: Crafting 3d head avatars with text. arXiv preprint arXiv:2306.03038, 2023.
  13. Prompt-to-prompt image editing with cross attention control. arXiv preprint arXiv:2208.01626, 2022.
  14. Denoising diffusion probabilistic models. NeurIPS, 33:6840–6851, 2020.
  15. Avatarclip: zero-shot text-driven generation and animation of 3d avatars. ACM TOG, 41(4):1–19, 2022.
  16. Lora: Low-rank adaptation of large language models. arXiv preprint arXiv:2106.09685, 2021.
  17. Composer: Creative and controllable image synthesis with composable conditions. arXiv preprint arXiv:2302.09778, 2023a.
  18. Humannorm: Learning normal diffusion model for high-quality and realistic 3d human generation. arXiv preprint arXiv:2310.01406, 2023b.
  19. Dreamtime: An improved optimization strategy for text-to-3d content creation. arXiv preprint arXiv:2306.12422, 2023c.
  20. Deemos. Inc. dreamface web demo. https://hyperhuman.deemos.com/, 2023.
  21. Zero-shot text-guided object generation with dream fields. In CVPR, pages 867–876, 2022.
  22. Avatarcraft: Transforming text into neural human avatars with parameterized shape and pose control. arXiv preprint arXiv:2303.17606, 2023a.
  23. 3d-togo: Towards text-guided cross-category 3d object generation. In AAAI, pages 1051–1059, 2023b.
  24. Imagic: Text-based real image editing with diffusion models. In CVPR, pages 6007–6017, 2023.
  25. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014.
  26. Modular primitives for high-performance differentiable rendering. ACM Transactions on Graphics, 39(6), 2020.
  27. TADA! Text to Animatable Digital Avatars. In International Conference on 3D Vision (3DV), 2024.
  28. Magic3d: High-resolution text-to-3d content creation. In CVPR, pages 300–309, 2023.
  29. Towards implicit text-guided 3d shape generation. In CVPR, pages 17896–17906, 2022.
  30. lllyasviel. Controlnet. https://huggingface.co/runwayml/lllyasviel/sd-controlnet-depth, 2023.
  31. Text2mesh: Text-driven neural stylization for meshes. In CVPR, pages 13492–13502, 2022.
  32. Dreamfusion: Text-to-3d using 2d diffusion. arXiv preprint arXiv:2209.14988, 2022.
  33. Learning transferable visual models from natural language supervision. In International conference on machine learning, pages 8748–8763. PMLR, 2021.
  34. Hierarchical text-conditional image generation with clip latents. arXiv preprint arXiv:2204.06125, 1(2):3, 2022.
  35. High-resolution image synthesis with latent diffusion models. In CVPR, pages 10684–10695, 2022.
  36. Dreambooth: Fine tuning text-to-image diffusion models for subject-driven generation. In CVPR, pages 22500–22510, 2023.
  37. RunwayML. Stable diffusion v1.5. https://huggingface.co/runwayml/stablediffusion-v1-5, 2022.
  38. Photorealistic text-to-image diffusion models with deep language understanding. NeurIPS, 35:36479–36494, 2022.
  39. Clip-forge: Towards zero-shot text-to-shape generation. In CVPR, pages 18603–18613, 2022.
  40. Clip-sculptor: Zero-shot generation of high-fidelity and diverse shapes from natural language. In CVPR, pages 18339–18348, 2023.
  41. Laion-400m: Open dataset of clip-filtered 400 million image-text pairs. arXiv preprint arXiv:2111.02114, 2021.
  42. Laion-5b: An open large-scale dataset for training next generation image-text models. NeurIPS, 35:25278–25294, 2022.
  43. Denoising diffusion implicit models. In International Conference on Learning Representations, 2020.
  44. timbrooks. Instructpix2pix. https://huggingface.co/runwayml/timbrooks/instruct-pix2pix, 2023.
  45. Diffusers: State-of-the-art diffusion models, 2022.
  46. Rodin: A generative model for sculpting 3d digital avatars using diffusion. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 4563–4573, 2023.
  47. 3d face reconstruction with dense landmarks. In ECCV, pages 160–177. Springer, 2022.
  48. High-fidelity 3d face generation from natural language descriptions. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 4521–4530, 2023.
  49. Boxdiff: Text-to-image synthesis with training-free box-constrained diffusion. pages 7452–7461, 2023.
  50. Dream3d: Zero-shot text-to-3d synthesis using 3d shape prior and text-to-image diffusion models. In CVPR, pages 20908–20918, 2023a.
  51. Dream3d: Zero-shot text-to-3d synthesis using 3d shape prior and text-to-image diffusion models. In CVPR, pages 20908–20918, 2023b.
  52. Clip-actor: Text-driven recommendation and stylization for animating human meshes. In ECCV, pages 173–191. Springer, 2022.
  53. Avatarverse: High-quality & stable 3d avatar creation from text and pose. arXiv preprint arXiv:2308.03610, 2023a.
  54. Dreamface: Progressive generation of animatable 3d faces under text guidance. arXiv preprint arXiv:2304.03117, 2023b.
  55. Adding conditional control to text-to-image diffusion models, 2023c.
  56. Towards metrical reconstruction of human faces. In ECCV, pages 250–269. Springer, 2022.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (8)
  1. Yunjie Wu (3 papers)
  2. Yapeng Meng (3 papers)
  3. Zhipeng Hu (38 papers)
  4. Lincheng Li (39 papers)
  5. Haoqian Wu (14 papers)
  6. Kun Zhou (217 papers)
  7. Weiwei Xu (65 papers)
  8. Xin Yu (192 papers)
Citations (7)

Summary

We haven't generated a summary for this paper yet.

Github Logo Streamline Icon: https://streamlinehq.com