Real-time 3D-aware Portrait Editing from a Single Image (2402.14000v3)
Abstract: This work presents 3DPE, a practical method that can efficiently edit a face image following given prompts, like reference images or text descriptions, in a 3D-aware manner. To this end, a lightweight module is distilled from a 3D portrait generator and a text-to-image model, which provide prior knowledge of face geometry and superior editing capability, respectively. Such a design brings two compelling advantages over existing approaches. First, our method achieves real-time editing with a feedforward network (i.e., ~0.04s per image), over 100x faster than the second competitor. Second, thanks to the powerful priors, our module could focus on the learning of editing-related variations, such that it manages to handle various types of editing simultaneously in the training phase and further supports fast adaptation to user-specified customized types of editing during inference (e.g., with ~5min fine-tuning per style).
- Image2StyleGAN: How to Embed Images Into the StyleGAN Latent Space?. In Int. Conf. Comput. Vis.
- Image2StyleGAN++: How to Edit the Embedded Images?. In IEEE Conf. Comput. Vis. Pattern Recog.
- Clipface: Text-guided editing of textured 3d morphable models. In SIGGRAPH.
- High-fidelity GAN inversion with padding space. In Eur. Conf. Comput. Vis.
- Volker Blanz and Thomas Vetter. 2023. A morphable model for the synthesis of 3D faces. In Seminal Graphics Papers: Pushing the Boundaries, Volume 2.
- Instructpix2pix: Learning to follow image editing instructions. In IEEE Conf. Comput. Vis. Pattern Recog.
- Pix2NeRF: Unsupervised Conditional p-GAN for Single Image to Neural Radiance Fields Translation. In IEEE Conf. Comput. Vis. Pattern Recog.
- MasaCtrl: Tuning-Free Mutual Self-Attention Control for Consistent Image Synthesis and Editing. arXiv preprint arXiv:2304.08465 (2023).
- Pix2video: Video editing using image diffusion. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 23206–23217.
- Stablevideo: Text-driven consistency-aware diffusion video editing. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 23040–23050.
- Efficient geometry-aware 3D generative adversarial networks. In IEEE Conf. Comput. Vis. Pattern Recog.
- pi-gan: Periodic implicit generative adversarial networks for 3d-aware image synthesis. In IEEE Conf. Comput. Vis. Pattern Recog.
- HairNeRF: Geometry-Aware Image Synthesis for Hairstyle Transfer. In Int. Conf. Comput. Vis.
- Arcface: Additive angular margin loss for deep face recognition. In IEEE Conf. Comput. Vis. Pattern Recog.
- Disentangled and controllable face image generation via 3d imitative-contrastive learning. In IEEE Conf. Comput. Vis. Pattern Recog.
- Accurate 3d face reconstruction with weakly-supervised learning: From single image to image set. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition workshops.
- An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929 (2020).
- Portrait neural radiance fields from a single image. arXiv preprint arXiv:2012.05903 (2020).
- Tokenflow: Consistent diffusion features for consistent video editing. arXiv preprint arXiv:2307.10373 (2023).
- Diffusion models as plug-and-play priors. Adv. Neural Inform. Process. Syst. (2022).
- Stylenerf: A style-based 3d-aware generator for high-resolution image synthesis. In Int. Conf. Learn. Represent.
- Perspective reconstruction of human faces by joint mesh and landmark regression. In Eur. Conf. Comput. Vis.
- Instruct-nerf2nerf: Editing 3d scenes with instructions. arXiv preprint arXiv:2303.12789 (2023).
- Ganspace: Discovering interpretable gan controls. Adv. Neural Inform. Process. Syst. (2020).
- Masked autoencoders are scalable vision learners. In IEEE Conf. Comput. Vis. Pattern Recog.
- Delta denoising score. In Int. Conf. Comput. Vis.
- Lora: Low-rank adaptation of large language models. arXiv preprint arXiv:2106.09685 (2021).
- Local 3D Editing via 3D Distillation of CLIP Knowledge. In IEEE Conf. Comput. Vis. Pattern Recog.
- NeRFFaceLighting: Implicit and Disentangled Face Lighting Representation Leveraging Generative Prior in Neural Radiance Fields. ACM Trans. Graph. (2023).
- A style-based generator architecture for generative adversarial networks. In IEEE Conf. Comput. Vis. Pattern Recog.
- 3d gan inversion with pose optimization. In IEEE Winter Conf. Appl. Comput. Vis.
- Multi-concept customization of text-to-image diffusion. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 1931–1941.
- Blip: Bootstrapping language-image pre-training for unified vision-language understanding and generation. In Int. Conf. Mach. Learn.
- PREIM3D: 3D Consistent Precise Image Attribute Editing from a Single Image. In IEEE Conf. Comput. Vis. Pattern Recog.
- InstructPix2NeRF: Instructed 3D Portrait Editing from a Single Image. arXiv preprint arXiv:2311.02826 (2023).
- 3d gan inversion for controllable portrait image animation. arXiv preprint arXiv:2203.13441 (2022).
- Video-p2p: Video editing with cross-attention control. arXiv preprint arXiv:2303.04761 (2023).
- Cones: Concept neurons in diffusion models for customized generation. arXiv preprint arXiv:2303.05125 (2023).
- Sdedit: Guided image synthesis and editing with stochastic differential equations. arXiv preprint arXiv:2108.01073 (2021).
- Stylesdf: High-resolution 3d-consistent image and geometry generation. In IEEE Conf. Comput. Vis. Pattern Recog.
- Codef: Content deformation fields for temporally consistent video processing. arXiv preprint arXiv:2308.07926 (2023).
- A shading-guided generative implicit model for shape-accurate 3d-aware image synthesis. Adv. Neural Inform. Process. Syst. (2021).
- A 3D face model for pose and illumination invariant face recognition. In IEEE international conference on advanced video and signal based surveillance.
- Dreamfusion: Text-to-3d using 2d diffusion. arXiv preprint arXiv:2209.14988 (2022).
- Fatezero: Fusing attentions for zero-shot text-based video editing. arXiv preprint arXiv:2303.09535 (2023).
- Learning transferable visual models from natural language supervision. In Int. Conf. Mach. Learn.
- Pivotal tuning for latent-based editing of real images. ACM Trans. Graph. (2022).
- High-resolution image synthesis with latent diffusion models. In IEEE Conf. Comput. Vis. Pattern Recog.
- Dreambooth: Fine tuning text-to-image diffusion models for subject-driven generation. In IEEE Conf. Comput. Vis. Pattern Recog.
- Graf: Generative radiance fields for 3d-aware image synthesis. In Adv. Neural Inform. Process. Syst.
- Voxgraf: Fast 3d-aware image synthesis with sparse voxel grids. Adv. Neural Inform. Process. Syst. (2022).
- Interpreting the latent space of GANs for semantic face editing. In IEEE Conf. Comput. Vis. Pattern Recog.
- InterFaceGAN: Interpreting the Disentangled Face Representation Learned by GANs. IEEE Trans. Pattern Anal. Mach. Intell. (2020).
- Deep generative models on 3d representations: A survey. arXiv preprint arXiv:2210.15663 (2022).
- Learning 3d-aware image synthesis with unknown pose distribution. In IEEE Conf. Comput. Vis. Pattern Recog.
- Improving 3d-aware image synthesis with a geometry-aware discriminator. Adv. Neural Inform. Process. Syst. (2022).
- Epigraf: Rethinking training of 3d gans. In Adv. Neural Inform. Process. Syst.
- Ide-3d: Interactive disentangled editing for high-resolution 3d-aware portrait synthesis. ACM Trans. Graph. (2022).
- Fenerf: Face editing in neural radiance fields. In IEEE Conf. Comput. Vis. Pattern Recog.
- Real-time radiance fields for single-image portrait view synthesis. ACM Trans. Graph. (2023).
- Benchmarking and Analyzing 3D-aware Image Synthesis with a Modularized Codebase. arXiv preprint arXiv:2306.12423 (2023).
- Zero-shot video editing using off-the-shelf image diffusion models. arXiv preprint arXiv:2303.17599 (2023).
- NARRATE: A Normal Assisted Free-View Portrait Stylizer. arXiv preprint arXiv:2207.00974 (2022).
- Weihao Xia and Jing-Hao Xue. 2023. A Survey on Deep Generative 3D-aware Image Synthesis. Comput. Surveys (2023).
- High-fidelity 3D GAN Inversion by Pseudo-multi-view Optimization. In IEEE Conf. Comput. Vis. Pattern Recog.
- 3D-aware Image Synthesis via Learning Structural and Textural Representations. In IEEE Conf. Comput. Vis. Pattern Recog.
- Generative hierarchical features from synthesizing images. In IEEE Conf. Comput. Vis. Pattern Recog.
- Rerender A Video: Zero-Shot Text-Guided Video-to-Video Translation. arXiv preprint arXiv:2306.07954 (2023).
- 3d gan inversion with facial symmetry prior. In IEEE Conf. Comput. Vis. Pattern Recog.
- Adding conditional control to text-to-image diffusion models. In Int. Conf. Comput. Vis.
- The unreasonable effectiveness of deep features as a perceptual metric. In IEEE Conf. Comput. Vis. Pattern Recog.
- In-domain GAN inversion for real image editing. In Eur. Conf. Comput. Vis.
- Generative visual manipulation on the natural image manifold. In Eur. Conf. Comput. Vis.
Sponsored by Paperpile, the PDF & BibTeX manager trusted by top AI labs.
Get 30 days freePaper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.