Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Fast Text-to-3D-Aware Face Generation and Manipulation via Direct Cross-modal Mapping and Geometric Regularization (2403.06702v3)

Published 11 Mar 2024 in cs.CV

Abstract: Text-to-3D-aware face (T3D Face) generation and manipulation is an emerging research hot spot in machine learning, which still suffers from low efficiency and poor quality. In this paper, we propose an End-to-End Efficient and Effective network for fast and accurate T3D face generation and manipulation, termed $E3$-FaceNet. Different from existing complex generation paradigms, $E3$-FaceNet resorts to a direct mapping from text instructions to 3D-aware visual space. We introduce a novel Style Code Enhancer to enhance cross-modal semantic alignment, alongside an innovative Geometric Regularization objective to maintain consistency across multi-view generations. Extensive experiments on three benchmark datasets demonstrate that $E3$-FaceNet can not only achieve picture-like 3D face generation and manipulation, but also improve inference speed by orders of magnitudes. For instance, compared with Latent3D, $E3$-FaceNet speeds up the five-view generations by almost 470 times, while still exceeding in generation quality. Our code is released at https://github.com/Aria-Zhangjl/E3-FaceNet.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (53)
  1. Image2stylegan++: How to edit the embedded images? In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp.  8296–8305, 2020.
  2. Clipface: Text-guided editing of textured 3d morphable models. In ACM SIGGRAPH 2023 Conference Proceedings, pp.  1–11, 2023.
  3. Demystifying mmd gans. arXiv preprint arXiv:1801.01401, 2018.
  4. A morphable model for the synthesis of 3d faces. In Seminal Graphics Papers: Pushing the Boundaries, Volume 2, pp.  157–164. 2023.
  5. Nerd: Neural reflectance decomposition from image collections. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pp.  12684–12694, 2021.
  6. Text and image guided 3d avatar generation and manipulation. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pp.  4421–4431, 2023.
  7. pi-gan: Periodic implicit generative adversarial networks for 3d-aware image synthesis. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp.  5799–5809, 2021.
  8. Efficient geometry-aware 3d generative adversarial networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp.  16123–16133, 2022.
  9. Synthesizing coupled 3d face modalities by trunk-branch generative adversarial networks. In European conference on computer vision, pp.  415–433. Springer, 2020.
  10. Generative adversarial nets. Advances in neural information processing systems, 27, 2014.
  11. Stylenerf: A style-based 3d-aware generator for high-resolution image synthesis. arXiv preprint arXiv:2110.08985, 2021.
  12. Gans trained by a two time-scale update rule converge to a local nash equilibrium. Advances in neural information processing systems, 30, 2017.
  13. Jetchev, N. Clipmatrix: Text-controlled creation of 3d textured meshes. arXiv preprint arXiv:2109.12922, 2021.
  14. Analyzing and improving the image quality of stylegan. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp.  8110–8119, 2020.
  15. Comparison of surface normal estimation methods for range sensing applications. In 2009 IEEE international conference on robotics and automation, pp.  3206–3211. Ieee, 2009.
  16. Differentiable monte carlo ray tracing through edge sampling. ACM Transactions on Graphics (TOG), 37(6):1–11, 2018.
  17. 3d-clfusion: Fast text-to-3d rendering with contrastive latent diffusion. arXiv preprint arXiv:2303.11938, 2023.
  18. Magic3d: High-resolution text-to-3d content creation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp.  300–309, 2023.
  19. Which training methods for gans do actually converge? In International conference on machine learning, pp.  3481–3490. PMLR, 2018.
  20. Occupancy networks: Learning 3d reconstruction in function space. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp.  4460–4470, 2019.
  21. Text2mesh: Text-driven neural stylization for meshes. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp.  13492–13502, 2022.
  22. Nerf: Representing scenes as neural radiance fields for view synthesis. Communications of the ACM, 65(1):99–106, 2021.
  23. Conditional generative adversarial nets. arXiv preprint arXiv:1411.1784, 2014.
  24. Stylesdf: High-resolution 3d-consistent image and geometry generation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp.  13503–13513, 2022.
  25. Deepsdf: Learning continuous signed distance functions for shape representation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp.  165–174, 2019.
  26. Styleclip: Text-driven manipulation of stylegan imagery. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pp.  2085–2094, 2021.
  27. Learning dynamic prior knowledge for text-to-face pixel synthesis. In Proceedings of the 30th ACM International Conference on Multimedia, pp.  5132–5141, 2022a.
  28. Towards open-ended text-to-face generation, combination and manipulation. In Proceedings of the 30th ACM International Conference on Multimedia, pp.  5045–5054, 2022b.
  29. Dreamfusion: Text-to-3d using 2d diffusion. In The Eleventh International Conference on Learning Representations, 2022.
  30. Learning transferable visual models from natural language supervision. In International conference on machine learning, pp.  8748–8763. PMLR, 2021.
  31. High-resolution image synthesis with latent diffusion models. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp.  10684–10695, 2022.
  32. Text2face: A multi-modal 3d face model. arXiv preprint arXiv:2303.02688, 2023.
  33. Loho: Latent optimization of hairstyles via orthogonalization. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp.  1984–1993, 2021.
  34. Facenet: A unified embedding for face recognition and clustering. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp.  815–823, 2015.
  35. Graf: Generative radiance fields for 3d-aware image synthesis. Advances in Neural Information Processing Systems, 33:20154–20166, 2020.
  36. Mvdream: Multi-view diffusion for 3d generation. arXiv preprint arXiv:2308.16512, 2023.
  37. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014.
  38. Nerv: Neural reflectance and visibility fields for relighting and view synthesis. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp.  7495–7504, 2021.
  39. Multi-caption text-to-face synthesis: Dataset and algorithm. In Proceedings of the 29th ACM International Conference on Multimedia, pp.  2290–2298, 2021.
  40. Anyface: Free-style text-to-face synthesis and manipulation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp.  18687–18696, 2022.
  41. Rethinking the inception architecture for computer vision. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp.  2818–2826, 2016.
  42. Generative adversarial networks and their application to 3d face generation: A survey. Image and vision computing, 108:104119, 2021.
  43. Score jacobian chaining: Lifting pretrained 2d diffusion models for 3d generation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp.  12619–12629, 2023a.
  44. Prolificdreamer: High-fidelity and diverse text-to-3d generation with variational score distillation. arXiv preprint arXiv:2305.16213, 2023b.
  45. High-fidelity 3d face generation from natural language descriptions. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp.  4521–4530, 2023.
  46. A survey on deep generative 3d-aware image synthesis. ACM Computing Surveys, 2023.
  47. Tedigan: Text-guided diverse face image generation and manipulation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp.  2256–2265, 2021.
  48. Enforcing geometric constraints of virtual normal for depth prediction. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pp.  5684–5693, 2019.
  49. Towards high-fidelity text-guided 3d face generation and manipulation using only images. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pp.  15326–15337, 2023.
  50. Fdnerf: Semantics-driven face reconstruction, prompt editing and relighting with diffusion models. arXiv preprint arXiv:2306.00783, 2023a.
  51. Nerf++: Analyzing and improving neural radiance fields. arXiv preprint arXiv:2010.07492, 2020.
  52. Dreamface: Progressive generation of animatable 3d faces under text guidance. arXiv preprint arXiv:2304.03117, 2023b.
  53. Zhou, Y. Generative adversarial network for text-to-face synthesis and manipulation. In Proceedings of the 29th ACM International Conference on Multimedia, pp.  2940–2944, 2021.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (8)
  1. Jinlu Zhang (14 papers)
  2. Yiyi Zhou (38 papers)
  3. Qiancheng Zheng (1 paper)
  4. Xiaoxiong Du (1 paper)
  5. Gen Luo (32 papers)
  6. Jun Peng (36 papers)
  7. Xiaoshuai Sun (91 papers)
  8. Rongrong Ji (315 papers)
Citations (2)

Summary

We haven't generated a summary for this paper yet.