FED-NeRF: Achieve High 3D Consistency and Temporal Coherence for Face Video Editing on Dynamic NeRF (2401.02616v1)
Abstract: The success of the GAN-NeRF structure has enabled face editing on NeRF to maintain 3D view consistency. However, achieving simultaneously multi-view consistency and temporal coherence while editing video sequences remains a formidable challenge. This paper proposes a novel face video editing architecture built upon the dynamic face GAN-NeRF structure, which effectively utilizes video sequences to restore the latent code and 3D face geometry. By editing the latent code, multi-view consistent editing on the face can be ensured, as validated by multiview stereo reconstruction on the resulting edited images in our dynamic NeRF. As the estimation of face geometries occurs on a frame-by-frame basis, this may introduce a jittering issue. We propose a stabilizer that maintains temporal coherence by preserving smooth changes of face expressions in consecutive frames. Quantitative and qualitative analyses reveal that our method, as the pioneering 4D face video editor, achieves state-of-the-art performance in comparison to existing 2D or 3D-based approaches independently addressing identity and motion. Codes will be released.
- Third time’s the charm? image and video editing with stylegan3. arXiv preprint https://arxiv.org/abs/2201.13433, 2022.
- A morphable model for the synthesis of 3d faces. In Proceedings of the 26th Annual Conference on Computer Graphics and Interactive Techniques, SIGGRAPH, pages 187–194, 1999.
- pi-gan: Periodic implicit generative adversarial networks for 3d-aware image synthesis. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2021a.
- pi-gan: Periodic implicit generative adversarial networks for 3d-aware image synthesis. In Proc. CVPR, 2021b.
- Efficient geometry-aware 3d generative adversarial networks. CoRR, abs/2112.07945, 2021c.
- Video face replacement. In Proceedings of the 2011 SIGGRAPH Asia conference, pages 1–10, 2011.
- EMOCA: Emotion driven monocular face capture and animation. In Conference on Computer Vision and Pattern Recognition (CVPR), pages 20311–20322, 2022.
- Arcface: Additive angular margin loss for deep face recognition. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2019.
- Learning an animatable detailed 3D face model from in-the-wild images. ACM Transactions on Graphics, (Proc. SIGGRAPH), 40(8), 2021.
- Generative adversarial networks, 2014.
- Stylenerf: A style-based 3d-aware generator for high-resolution image synthesis. CoRR, abs/2110.08985, 2021.
- Identity-aware and shape-aware propagation of face editing in videos. IEEE Transactions on Visualization and Computer Graphics, pages 1–12, 2023.
- A style-based generator architecture for generative adversarial networks. CoRR, abs/1812.04948, 2018.
- A style-based generator architecture for generative adversarial networks. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 4401–4410, 2019a.
- Analyzing and improving the image quality of stylegan. CoRR, abs/1912.04958, 2019b.
- Analyzing and improving the image quality of StyleGAN. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2020.
- Alias-free generative adversarial networks. CoRR, abs/2106.12423, 2021.
- Diffusion video autoencoders: Toward temporally consistent face video editing via disentangled video encoding. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 6091–6100, 2023.
- Adam: A method for stochastic optimization, 2017.
- Learning a model of facial shape and expression from 4D scans. ACM Transactions on Graphics, (Proc. SIGGRAPH Asia), 36(6):194:1–194:17, 2017.
- Swin transformer: Hierarchical vision transformer using shifted windows. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2021.
- Nerf: Representing scenes as neural radiance fields for view synthesis. In ECCV, 2020.
- Giraffe: Representing scenes as compositional generative neural feature fields. In Proc. IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), 2021.
- Nerfies: Deformable neural radiance fields. ICCV, 2021a.
- Hypernerf: A higher-dimensional representation for topologically varying neural radiance fields. ACM Trans. Graph., 40(6), 2021b.
- Styleclip: Text-driven manipulation of stylegan imagery. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 2085–2094, 2021.
- Cagenerf: Cage-based neural radiance field for generalized 3d deformation and animation. In Advances in Neural Information Processing Systems, pages 31402–31415. Curran Associates, Inc., 2022.
- D-nerf: Neural radiance fields for dynamic scenes. arXiv preprint arXiv:2011.13961, 2020.
- Learning transferable visual models from natural language supervision. In ICML, 2021.
- High-resolution image synthesis with latent diffusion models, 2021.
- Structure-from-Motion Revisited. In Conference on Computer Vision and Pattern Recognition (CVPR), 2016.
- GRAF: generative radiance fields for 3d-aware image synthesis. CoRR, abs/2007.02442, 2020a.
- Graf: Generative radiance fields for 3d-aware image synthesis. In Advances in Neural Information Processing Systems (NeurIPS), 2020b.
- Interfacegan: Interpreting the disentangled face representation learned by gans, 2020.
- Very deep convolutional networks for large-scale image recognition, 2015.
- Implicit neural representations with periodic activation functions. CoRR, abs/2006.09661, 2020.
- Diffusion guided domain adaptation of image generators. arXiv preprint https://arxiv.org/abs/2212.04473, 2022.
- Ide-3d: Interactive disentangled editing for high-resolution 3d-aware portrait synthesis. ACM Transactions on Graphics (TOG), 41(6):1–10, 2022a.
- Fenerf: Face editing in neural radiance fields. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 7672–7682, 2022b.
- Raft: Recurrent all-pairs field transforms for optical flow. In Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part II 16, pages 402–419. Springer, 2020.
- Face2face: Real-time face capture and reenactment of rgb videos. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016.
- Regressing robust and discriminative 3d morphable models with a very deep neural network, 2016.
- Stitch it in time: Gan-based facial editing of real videos, 2022.
- Neural trajectory fields for dynamic novel view synthesis, 2021.
- Anifacegan: Animatable 3d-aware face image generation for video avatars. In Advances in Neural Information Processing Systems, 2022.
- Stylespace analysis: Disentangled controls for stylegan image generation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 12863–12872, 2021.
- Space-time neural irradiance fields for free-viewpoint video, 2021.
- Omniavatar: Geometry-guided controllable 3d head synthesis. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 12814–12824, 2023a.
- Temporally consistent semantic video editing. arXiv preprint arXiv: 2206.10590, 2022.
- Rigid: Recurrent gan inversion and editing of real face videos, 2023b.
- Face morphing using 3d-aware appearance optimization. In Graphics Interface, pages 93–99. Citeseer, 2012.
- A latent transformer for disentangled face editing in images and videos. 2021 International Conference on Computer Vision, 2021.
- Make encoder great again in 3d gan inversion through geometry and occlusion-aware encoding. arXiv preprint arXiv:2303.12326, 2023.
- Facednerf: Semantics-driven face reconstruction, prompt editing and relighting with diffusion models, 2023.
- CelebV-HQ: A large-scale video facial attributes dataset. In ECCV, 2022.
- Mofanerf: Morphable facial neural radiance field. In European Conference on Computer Vision, 2022.