Papers
Topics
Authors
Recent
2000 character limit reached

Real-time 3D-aware Portrait Editing from a Single Image (2402.14000v3)

Published 21 Feb 2024 in cs.CV

Abstract: This work presents 3DPE, a practical method that can efficiently edit a face image following given prompts, like reference images or text descriptions, in a 3D-aware manner. To this end, a lightweight module is distilled from a 3D portrait generator and a text-to-image model, which provide prior knowledge of face geometry and superior editing capability, respectively. Such a design brings two compelling advantages over existing approaches. First, our method achieves real-time editing with a feedforward network (i.e., ~0.04s per image), over 100x faster than the second competitor. Second, thanks to the powerful priors, our module could focus on the learning of editing-related variations, such that it manages to handle various types of editing simultaneously in the training phase and further supports fast adaptation to user-specified customized types of editing during inference (e.g., with ~5min fine-tuning per style).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (73)
  1. Image2StyleGAN: How to Embed Images Into the StyleGAN Latent Space?. In Int. Conf. Comput. Vis.
  2. Image2StyleGAN++: How to Edit the Embedded Images?. In IEEE Conf. Comput. Vis. Pattern Recog.
  3. Clipface: Text-guided editing of textured 3d morphable models. In SIGGRAPH.
  4. High-fidelity GAN inversion with padding space. In Eur. Conf. Comput. Vis.
  5. Volker Blanz and Thomas Vetter. 2023. A morphable model for the synthesis of 3D faces. In Seminal Graphics Papers: Pushing the Boundaries, Volume 2.
  6. Instructpix2pix: Learning to follow image editing instructions. In IEEE Conf. Comput. Vis. Pattern Recog.
  7. Pix2NeRF: Unsupervised Conditional p-GAN for Single Image to Neural Radiance Fields Translation. In IEEE Conf. Comput. Vis. Pattern Recog.
  8. MasaCtrl: Tuning-Free Mutual Self-Attention Control for Consistent Image Synthesis and Editing. arXiv preprint arXiv:2304.08465 (2023).
  9. Pix2video: Video editing using image diffusion. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 23206–23217.
  10. Stablevideo: Text-driven consistency-aware diffusion video editing. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 23040–23050.
  11. Efficient geometry-aware 3D generative adversarial networks. In IEEE Conf. Comput. Vis. Pattern Recog.
  12. pi-gan: Periodic implicit generative adversarial networks for 3d-aware image synthesis. In IEEE Conf. Comput. Vis. Pattern Recog.
  13. HairNeRF: Geometry-Aware Image Synthesis for Hairstyle Transfer. In Int. Conf. Comput. Vis.
  14. Arcface: Additive angular margin loss for deep face recognition. In IEEE Conf. Comput. Vis. Pattern Recog.
  15. Disentangled and controllable face image generation via 3d imitative-contrastive learning. In IEEE Conf. Comput. Vis. Pattern Recog.
  16. Accurate 3d face reconstruction with weakly-supervised learning: From single image to image set. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition workshops.
  17. An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929 (2020).
  18. Portrait neural radiance fields from a single image. arXiv preprint arXiv:2012.05903 (2020).
  19. Tokenflow: Consistent diffusion features for consistent video editing. arXiv preprint arXiv:2307.10373 (2023).
  20. Diffusion models as plug-and-play priors. Adv. Neural Inform. Process. Syst. (2022).
  21. Stylenerf: A style-based 3d-aware generator for high-resolution image synthesis. In Int. Conf. Learn. Represent.
  22. Perspective reconstruction of human faces by joint mesh and landmark regression. In Eur. Conf. Comput. Vis.
  23. Instruct-nerf2nerf: Editing 3d scenes with instructions. arXiv preprint arXiv:2303.12789 (2023).
  24. Ganspace: Discovering interpretable gan controls. Adv. Neural Inform. Process. Syst. (2020).
  25. Masked autoencoders are scalable vision learners. In IEEE Conf. Comput. Vis. Pattern Recog.
  26. Delta denoising score. In Int. Conf. Comput. Vis.
  27. Lora: Low-rank adaptation of large language models. arXiv preprint arXiv:2106.09685 (2021).
  28. Local 3D Editing via 3D Distillation of CLIP Knowledge. In IEEE Conf. Comput. Vis. Pattern Recog.
  29. NeRFFaceLighting: Implicit and Disentangled Face Lighting Representation Leveraging Generative Prior in Neural Radiance Fields. ACM Trans. Graph. (2023).
  30. A style-based generator architecture for generative adversarial networks. In IEEE Conf. Comput. Vis. Pattern Recog.
  31. 3d gan inversion with pose optimization. In IEEE Winter Conf. Appl. Comput. Vis.
  32. Multi-concept customization of text-to-image diffusion. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 1931–1941.
  33. Blip: Bootstrapping language-image pre-training for unified vision-language understanding and generation. In Int. Conf. Mach. Learn.
  34. PREIM3D: 3D Consistent Precise Image Attribute Editing from a Single Image. In IEEE Conf. Comput. Vis. Pattern Recog.
  35. InstructPix2NeRF: Instructed 3D Portrait Editing from a Single Image. arXiv preprint arXiv:2311.02826 (2023).
  36. 3d gan inversion for controllable portrait image animation. arXiv preprint arXiv:2203.13441 (2022).
  37. Video-p2p: Video editing with cross-attention control. arXiv preprint arXiv:2303.04761 (2023).
  38. Cones: Concept neurons in diffusion models for customized generation. arXiv preprint arXiv:2303.05125 (2023).
  39. Sdedit: Guided image synthesis and editing with stochastic differential equations. arXiv preprint arXiv:2108.01073 (2021).
  40. Stylesdf: High-resolution 3d-consistent image and geometry generation. In IEEE Conf. Comput. Vis. Pattern Recog.
  41. Codef: Content deformation fields for temporally consistent video processing. arXiv preprint arXiv:2308.07926 (2023).
  42. A shading-guided generative implicit model for shape-accurate 3d-aware image synthesis. Adv. Neural Inform. Process. Syst. (2021).
  43. A 3D face model for pose and illumination invariant face recognition. In IEEE international conference on advanced video and signal based surveillance.
  44. Dreamfusion: Text-to-3d using 2d diffusion. arXiv preprint arXiv:2209.14988 (2022).
  45. Fatezero: Fusing attentions for zero-shot text-based video editing. arXiv preprint arXiv:2303.09535 (2023).
  46. Learning transferable visual models from natural language supervision. In Int. Conf. Mach. Learn.
  47. Pivotal tuning for latent-based editing of real images. ACM Trans. Graph. (2022).
  48. High-resolution image synthesis with latent diffusion models. In IEEE Conf. Comput. Vis. Pattern Recog.
  49. Dreambooth: Fine tuning text-to-image diffusion models for subject-driven generation. In IEEE Conf. Comput. Vis. Pattern Recog.
  50. Graf: Generative radiance fields for 3d-aware image synthesis. In Adv. Neural Inform. Process. Syst.
  51. Voxgraf: Fast 3d-aware image synthesis with sparse voxel grids. Adv. Neural Inform. Process. Syst. (2022).
  52. Interpreting the latent space of GANs for semantic face editing. In IEEE Conf. Comput. Vis. Pattern Recog.
  53. InterFaceGAN: Interpreting the Disentangled Face Representation Learned by GANs. IEEE Trans. Pattern Anal. Mach. Intell. (2020).
  54. Deep generative models on 3d representations: A survey. arXiv preprint arXiv:2210.15663 (2022).
  55. Learning 3d-aware image synthesis with unknown pose distribution. In IEEE Conf. Comput. Vis. Pattern Recog.
  56. Improving 3d-aware image synthesis with a geometry-aware discriminator. Adv. Neural Inform. Process. Syst. (2022).
  57. Epigraf: Rethinking training of 3d gans. In Adv. Neural Inform. Process. Syst.
  58. Ide-3d: Interactive disentangled editing for high-resolution 3d-aware portrait synthesis. ACM Trans. Graph. (2022).
  59. Fenerf: Face editing in neural radiance fields. In IEEE Conf. Comput. Vis. Pattern Recog.
  60. Real-time radiance fields for single-image portrait view synthesis. ACM Trans. Graph. (2023).
  61. Benchmarking and Analyzing 3D-aware Image Synthesis with a Modularized Codebase. arXiv preprint arXiv:2306.12423 (2023).
  62. Zero-shot video editing using off-the-shelf image diffusion models. arXiv preprint arXiv:2303.17599 (2023).
  63. NARRATE: A Normal Assisted Free-View Portrait Stylizer. arXiv preprint arXiv:2207.00974 (2022).
  64. Weihao Xia and Jing-Hao Xue. 2023. A Survey on Deep Generative 3D-aware Image Synthesis. Comput. Surveys (2023).
  65. High-fidelity 3D GAN Inversion by Pseudo-multi-view Optimization. In IEEE Conf. Comput. Vis. Pattern Recog.
  66. 3D-aware Image Synthesis via Learning Structural and Textural Representations. In IEEE Conf. Comput. Vis. Pattern Recog.
  67. Generative hierarchical features from synthesizing images. In IEEE Conf. Comput. Vis. Pattern Recog.
  68. Rerender A Video: Zero-Shot Text-Guided Video-to-Video Translation. arXiv preprint arXiv:2306.07954 (2023).
  69. 3d gan inversion with facial symmetry prior. In IEEE Conf. Comput. Vis. Pattern Recog.
  70. Adding conditional control to text-to-image diffusion models. In Int. Conf. Comput. Vis.
  71. The unreasonable effectiveness of deep features as a perceptual metric. In IEEE Conf. Comput. Vis. Pattern Recog.
  72. In-domain GAN inversion for real image editing. In Eur. Conf. Comput. Vis.
  73. Generative visual manipulation on the natural image manifold. In Eur. Conf. Comput. Vis.
Citations (6)

Summary

We haven't generated a summary for this paper yet.

Dice Question Streamline Icon: https://streamlinehq.com

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Lightbulb Streamline Icon: https://streamlinehq.com

Continue Learning

We haven't generated follow-up questions for this paper yet.

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.