Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
167 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

GeneAvatar: Generic Expression-Aware Volumetric Head Avatar Editing from a Single Image (2404.02152v1)

Published 2 Apr 2024 in cs.CV

Abstract: Recently, we have witnessed the explosive growth of various volumetric representations in modeling animatable head avatars. However, due to the diversity of frameworks, there is no practical method to support high-level applications like 3D head avatar editing across different representations. In this paper, we propose a generic avatar editing approach that can be universally applied to various 3DMM driving volumetric head avatars. To achieve this goal, we design a novel expression-aware modification generative model, which enables lift 2D editing from a single image to a consistent 3D modification field. To ensure the effectiveness of the generative modification process, we develop several techniques, including an expression-dependent modification distillation scheme to draw knowledge from the large-scale head avatar model and 2D facial texture editing tools, implicit latent space guidance to enhance model convergence, and a segmentation-based loss reweight strategy for fine-grained texture inversion. Extensive experiments demonstrate that our method delivers high-quality and consistent results across multiple expression and viewpoints. Project page: https://zju3dv.github.io/geneavatar/

Definition Search Book Streamline Icon: https://streamlinehq.com
References (77)
  1. Webar/beauty demo app.
  2. 3davatargan: Bridging domains for personalized editable avatars. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 4552–4562, 2023.
  3. Rignerf: Fully controllable neural 3d portraits. In Proceedings of the IEEE/CVF conference on Computer Vision and Pattern Recognition, pages 20364–20373, 2022.
  4. Learning personalized high quality volumetric head avatars from monocular rgb videos. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 16890–16900, 2023.
  5. Sine: Semantic-driven image-based nerf editing with prior-guided editing field. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 20919–20929, 2023.
  6. A morphable model for the synthesis of 3d faces. In 26th Annual Conference on Computer Graphics and Interactive Techniques (SIGGRAPH 1999), pages 187–194. ACM Press, 1999.
  7. Instructpix2pix: Learning to follow image editing instructions. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 18392–18402, 2023.
  8. Efficient geometry-aware 3D generative adversarial networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2022.
  9. Tensorf: Tensorial radiance fields. In Proceedings of the European Conference on Computer Vision, pages 333–350. Springer, 2022.
  10. Navigating the gan parameter space for semantic image editing. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 3671–3680, 2021.
  11. deepfakes. faceswap. https://github.com/deepfakes/faceswap, 2023a. Accessed: 2023-10-10.
  12. deepfakes. roop. SomdevSangwan, 2023b. Accessed: 2023-10-10.
  13. Arcface: Additive angular margin loss for deep face recognition. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 4690–4699, 2019.
  14. Learning an animatable detailed 3d face model from in-the-wild images. ACM Transactions on Graphics (ToG), 40(4):1–13, 2021.
  15. Dynamic neural radiance fields for monocular 4d facial avatar reconstruction. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 8649–8658, 2021.
  16. Reconstructing personalized semantic facial nerf models from monocular video. ACM Transactions on Graphics (TOG), 41(6):1–12, 2022.
  17. Expressive text-to-image generation with rich text. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 7545–7556, 2023.
  18. Instruct-nerf2nerf: Editing 3d scenes with instructions. arXiv preprint arXiv:2303.12789, 2023.
  19. Ganspace: Discovering interpretable gan controls. Advances in neural information processing systems, 33:9841–9850, 2020.
  20. Delta denoising score. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 2328–2337, 2023.
  21. Composer: Creative and controllable image synthesis with composable conditions. arXiv preprint arXiv:2302.09778, 2023.
  22. Psgan: Pose and expression robust spatial-aware gan for customizable makeup transfer. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 5194–5202, 2020.
  23. A style-based generator architecture for generative adversarial networks. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 4401–4410, 2019.
  24. Analyzing and improving the image quality of stylegan. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 8110–8119, 2020.
  25. Alias-free generative adversarial networks. Advances in Neural Information Processing Systems, 34:852–863, 2021.
  26. Learning a model of facial shape and expression from 4d scans. ACM Trans. Graph., 36(6):194–1, 2017.
  27. Beautygan: Instance-level facial makeup transfer with deep generative adversarial network. In Proceedings of the 26th ACM international conference on Multimedia, pages 645–653, 2018.
  28. Pvp: Personalized video prior for editable dynamic portraits using stylegan. In Computer Graphics Forum, page e14890. Wiley Online Library, 2023.
  29. Editgan: High-precision semantic image editing. Advances in Neural Information Processing Systems, 34:16331–16345, 2021.
  30. Freedrag: Point tracking is not you need for interactive point-based image editing. arXiv preprint arXiv:2307.04684, 2023.
  31. Neural sparse voxel fields. Advances in Neural Information Processing Systems, 33:15651–15663, 2020.
  32. Sked: Sketch-guided text-based 3d editing. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 14607–14619, 2023.
  33. Nerf: Representing scenes as neural radiance fields for view synthesis. Communications of the ACM, 65(1):99–106, 2021.
  34. Dragondiffusion: Enabling drag-style manipulation on diffusion models. arXiv preprint arXiv:2307.02421, 2023.
  35. Instant neural graphics primitives with a multiresolution hash encoding. arXiv preprint arXiv:2201.05989, 2022.
  36. Lipstick ain’t enough: beyond color matching for in-the-wild makeup transfer. In Proceedings of the IEEE/CVF Conference on computer vision and pattern recognition, pages 13305–13314, 2021.
  37. Alteredavatar: Stylizing dynamic 3d avatars with fast style adaptation. arXiv preprint arXiv:2305.19245, 2023.
  38. Avatarstudio: Text-driven editing of 3d dynamic human head avatars. arXiv preprint arXiv:2306.00547, 2023a.
  39. Drag your gan: Interactive point-based manipulation on the generative image manifold. In ACM SIGGRAPH 2023 Conference Proceedings, pages 1–11, 2023b.
  40. Zero-shot image-to-image translation. In ACM SIGGRAPH 2023 Conference Proceedings, pages 1–11, 2023.
  41. Styleclip: Text-driven manipulation of stylegan imagery. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 2085–2094, 2021.
  42. Deepfacelab: Integrated, flexible and extensible face-swapping framework. arXiv preprint arXiv:2005.05535, 2020.
  43. Pivotal tuning for latent-based editing of real images. ACM Transactions on graphics (TOG), 42(1):1–13, 2022.
  44. Plenoxels: Radiance Fields without Neural Networks. In CVPR, 2022.
  45. Control4d: Dynamic portrait editing by learning 4d gan from 2d diffusion-based editor. arXiv preprint arXiv:2305.20082, 2023.
  46. Interpreting the latent space of gans for semantic face editing. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 9243–9252, 2020.
  47. Dragdiffusion: Harnessing diffusion models for interactive point-based image editing. arXiv preprint arXiv:2306.14435, 2023.
  48. Direct voxel grid optimization: Super-fast convergence for radiance fields reconstruction. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 5459–5469, 2022a.
  49. Ide-3d: Interactive disentangled editing for high-resolution 3d-aware portrait synthesis. arXiv preprint arXiv:2205.15517, 2022b.
  50. Next3d: Generative neural texture rasterization for 3d-aware head avatars. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 20991–21002, 2023.
  51. Volux-gan: A generative model for 3d face synthesis with hdri relighting. In ACM SIGGRAPH 2022 Conference Proceedings, pages 1–9, 2022.
  52. 3dfaceshop: Explicitly controllable 3d-aware portrait generation. IEEE Transactions on Visualization and Computer Graphics, 2023.
  53. Learning compositional radiance fields of dynamic human heads. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 5704–5713, 2021.
  54. Anifacegan: Animatable 3d-aware face image generation for video avatars. Advances in Neural Information Processing Systems, 35:36188–36201, 2022.
  55. Gan inversion: A survey. IEEE Transactions on Pattern Analysis and Machine Intelligence, 45(3):3121–3138, 2022.
  56. 3d-aware image synthesis via learning structural and textural representations. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 18430–18439, 2022.
  57. Avatarmav: Fast 3d head avatar reconstruction using motion-aware neural voxels. In ACM SIGGRAPH 2023 Conference Proceedings, pages 1–10, 2023.
  58. Learning object-compositional neural radiance field for editable scene rendering. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 13779–13788, 2021.
  59. Neumesh: Learning disentangled neural mesh-based implicit field for geometry and texture editing. In European Conference on Computer Vision, pages 597–614. Springer, 2022a.
  60. Neural rendering in a room: amodal 3d understanding and free-viewpoint rendering for the closed scene composed of pre-captured objects. ACM Transactions on Graphics (TOG), 41(4):1–10, 2022b.
  61. Elegant: Exquisite and locally editable gan for makeup transfer. In European Conference on Computer Vision, pages 737–754. Springer, 2022c.
  62. Vox-fusion: Dense tracking and mapping with voxel-based neural implicit representation. In 2022 IEEE International Symposium on Mixed and Augmented Reality (ISMAR), pages 499–507. IEEE, 2022d.
  63. Intrinsicnerf: Learning intrinsic neural radiance fields for editable novel view synthesis. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 339–351, 2023.
  64. Coarf: Controllable 3d artistic style transfer for radiance fields. In 2024 International Conference on 3D Vision (3DV). IEEE, 2024.
  65. Fdnerf: Semantics-driven face reconstruction, prompt editing and relighting with diffusion models. arXiv preprint arXiv:2306.00783, 2023.
  66. Nerfactor: Neural factorization of shape and reflectance under an unknown illumination. ACM Transactions on Graphics (TOG), 40(6):1–18, 2021.
  67. Factorized and controllable neural re-rendering of outdoor scene for photo extrapolation. In Proceedings of the 30th ACM International Conference on Multimedia, pages 1455–1464, 2022.
  68. Im avatar: Implicit morphable head avatars from videos. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 13545–13555, 2022.
  69. Pointavatar: Deformable point-based head avatars from videos. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 21057–21067, 2023.
  70. In-domain gan inversion for real image editing. In European conference on computer vision, pages 592–608. Springer, 2020a.
  71. Linkgan: Linking gan latents to pixels for controllable image synthesis. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 7656–7666, 2023.
  72. Sean: Image synthesis with semantic region-adaptive normalization. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 5104–5113, 2020b.
  73. Nice-slam: Neural implicit scalable encoding for slam. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 12786–12796, 2022.
  74. Dreameditor: Text-driven 3d scene editing with neural fields. arXiv preprint arXiv:2306.13455, 2023.
  75. Towards metrical reconstruction of human faces. In European Conference on Computer Vision, pages 250–269. Springer, 2022.
  76. Instant volumetric head avatars. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 4574–4584, 2023.
  77. zllrunning. face-makeup.pytorch. https://github.com/zllrunning/face-makeup.PyTorch, 2023. Accessed: 2023-10-10.
Citations (3)

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com