Enhancing the Authenticity of Rendered Portraits with Identity-Consistent Transfer Learning (2310.04194v1)
Abstract: Despite rapid advances in computer graphics, creating high-quality photo-realistic virtual portraits is prohibitively expensive. Furthermore, the well-know ''uncanny valley'' effect in rendered portraits has a significant impact on the user experience, especially when the depiction closely resembles a human likeness, where any minor artifacts can evoke feelings of eeriness and repulsiveness. In this paper, we present a novel photo-realistic portrait generation framework that can effectively mitigate the ''uncanny valley'' effect and improve the overall authenticity of rendered portraits. Our key idea is to employ transfer learning to learn an identity-consistent mapping from the latent space of rendered portraits to that of real portraits. During the inference stage, the input portrait of an avatar can be directly transferred to a realistic portrait by changing its appearance style while maintaining the facial identity. To this end, we collect a new dataset, Daz-Rendered-Faces-HQ (DRFHQ), that is specifically designed for rendering-style portraits. We leverage this dataset to fine-tune the StyleGAN2 generator, using our carefully crafted framework, which helps to preserve the geometric and color features relevant to facial identity. We evaluate our framework using portraits with diverse gender, age, and race variations. Qualitative and quantitative evaluations and ablation studies show the advantages of our method compared to state-of-the-art approaches.
- Image2StyleGAN: How to Embed Images Into the StyleGAN Latent Space? In IEEE/CVF International Conference on Computer Vision, ICCV, 4431–4440. IEEE.
- Image2StyleGAN++: How to Edit the Embedded Images? In IEEE Conference on Computer Vision and Pattern Recognition, CVPR, 8293–8302. Computer Vision Foundation / IEEE.
- AI, S. 2022. Synthesis AI. Website. https://opensynthetics.com/dataset/diverse-human-faces-dataset/.
- ReStyle: A Residual-Based StyleGAN Encoder via Iterative Refinement. In IEEE/CVF International Conference on Computer Vision, ICCV, 6691–6700. IEEE.
- HyperStyle: StyleGAN Inversion with HyperNetworks for Real Image Editing. In IEEE Conference on Computer Vision and Pattern Recognition, CVPR, 18490–18500. IEEE.
- High-Quality Single-Shot Capture of Facial Geometry. SIGGRAPH ’10. New York, NY, USA: Association for Computing Machinery. ISBN 9781450302104.
- High-Quality Passive Facial Performance Capture Using Anchor Frames. ACM Trans. Graph., 30(4): 75:1–75:10.
- High Resolution Passive Facial Performance Capture. In ACM SIGGRAPH 2010 Papers, SIGGRAPH ’10. New York, NY, USA: Association for Computing Machinery. ISBN 9781450302104.
- Pi-GAN: Periodic Implicit Generative Adversarial Networks for 3D-Aware Image Synthesis. In IEEE Conference on Computer Vision and Pattern Recognition, CVPR, 5799–5809.
- Rendering with style: combining traditional and neural approaches for high-quality face rendering. ACM Trans. Graph., 40(6): 223:1–223:14.
- DeepFaceEditing: Deep Face Generation and Editing with Disentangled Geometry and Appearance Control. ACM Trans. Graph., 40(4): 90:1–15.
- StarGAN v2: Diverse Image Synthesis for Multiple Domains. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
- CLO. 2022. CLO Virtual Fashion Inc. Website. connect.clo-set.com.
- Acquiring the Reflectance Field of a Human Face. In Proceedings of the 27th Annual Conference on Computer Graphics and Interactive Techniques, SIGGRAPH ’00, 145–156. USA: ACM Press/Addison-Wesley Publishing Co. ISBN 1581132085.
- Flickr. 2022. Flickr. Website. flickr.com.
- Multi-View Stereo on Consistent Face Topology. Comput. Graph. Forum, 36(2): 295–309.
- StyleGAN-NADA: CLIP-guided domain adaptation of image generators. ACM Trans. Graph., 41(4): 141:1–141:13.
- Reconstructing Personalized Semantic Facial NeRF Models from Monocular Video. ACM Trans. Graph., 41(6): 200:1–200:12.
- High Resolution Zero-Shot Domain Adaptation of Synthetically Rendered Face Images. In Computer Vision - ECCV 2020 - 16th European Conference, Glasgow, UK, August 23-28, 2020, Proceedings, Part XXVIII, volume 12373 of Lecture Notes in Computer Science, 220–236. Springer.
- Multiview Face Capture Using Polarized Spherical Gradient Illumination. ACM Trans. Graph., 30(6): 1–10.
- Generative Adversarial Nets. In Advances in Neural Information Processing Systems, volume 27. Curran Associates, Inc.
- Photogeometric Scene Flow for High-Detail Dynamic 3D Reconstruction. In IEEE/CVF International Conference on Computer Vision, ICCV, 846–854. IEEE Computer Society.
- Near-Instant Capture of High-Resolution Facial Geometry and Reflectance. In ACM SIGGRAPH 2015 Talks, SIGGRAPH ’15, 32:1. Association for Computing Machinery. ISBN 9781450336369.
- StyleNeRF: A Style-based 3D Aware Generator for High-resolution Image Synthesis. In 10th International Conference on Learning Representations, ICLR, 1–13. OpenReview.net.
- Improved Training of Wasserstein GANs. In Advances in Neural Information Processing Systems, 5767–5777.
- GANs Trained by a Two Time-Scale Update Rule Converge to a Local Nash Equilibrium. In Guyon, I.; Luxburg, U. V.; Bengio, S.; Wallach, H.; Fergus, R.; Vishwanathan, S.; and Garnett, R., eds., Advances in Neural Information Processing Systems, volume 30. Curran Associates, Inc.
- Measuring the Uncanny Valley Effect - Refinements to Indices for Perceived Humanness, Attractiveness, and Eeriness. Int. J. Soc. Robotics, 9(1): 129–139.
- HeadNeRF: A Real-Time NeRF-Based Parametric Head Model. In IEEE Conference on Computer Vision and Pattern Recognition, CVPR, 20374–20384.
- CurricularFace: Adaptive Curriculum Learning Loss for Deep Face Recognition. arXiv:2004.00288.
- StyleCariGAN: caricature generation via StyleGAN feature map modulation. ACM Trans. Graph., 40(4): 116:1–116:16.
- Progressive Growing of GANs for Improved Quality, Stability, and Variation. In 6th International Conference on Learning Representations, ICLR.
- Training Generative Adversarial Networks with Limited Data. In Advances in Neural Information Processing Systems, volume 33, 12104–12114. Curran Associates, Inc.
- Alias-Free Generative Adversarial Networks. In Advances in Neural Information Processing Systems, volume 34, 852–863. Curran Associates, Inc.
- A Style-Based Generator Architecture for Generative Adversarial Networks. In IEEE Conference on Computer Vision and Pattern Recognition, CVPR, 4401–4410.
- Analyzing and Improving the Image Quality of StyleGAN. In IEEE Conference on Computer Vision and Pattern Recognition, CVPR, 8107–8116.
- One Millisecond Face Alignment with an Ensemble of Regression Trees. In IEEE Conference on Computer Vision and Pattern Recognition, CVPR, 1867–1874.
- Learning Formation of Physically-Based Face Attributes. In IEEE Conference on Computer Vision and Pattern Recognition, CVPR, 3407–3416.
- BlendGAN: Implicitly GAN Blending for Arbitrary Stylized Face Generation. In Advances in Neural Information Processing Systems, volume 34, 29710–29722. Curran Associates, Inc.
- Rapid Face Asset Acquisition with Recurrent Feature Alignment. ACM Trans. Graph., 41(6): 214:1–214:17.
- DCT-net: domain-calibrated translation for portrait stylization. ACM Trans. Graph., 41(4): 140:1–140:9.
- NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis. In Computer Vision - ECCV 2020 - 16th European Conference, Glasgow, UK, August 23-28, 2020, Proceedings, Part I, volume 12346 of Lecture Notes in Computer Science, 405–421. Springer.
- Moore, R. K. 2012. A Bayesian explanation of the ‘Uncanny Valley’effect and related psychological phenomena. Scientific reports, 2(1): 1–5.
- Mori, M. 1970. Bukimi no tani [the uncanny valley]. Energy, 7: 33–35.
- Few-Shot Image Generation via Cross-Domain Correspondence. In IEEE Conference on Computer Vision and Pattern Recognition, CVPR, 10743–10752.
- UIBVFED: Virtual facial expression dataset. Plos one, 15(4): e0231266.
- Everything is There in Latent Space: Attribute Editing and Attribute Style Manipulation by StyleGAN Latent Space Exploration. In MM ’22: The 30th ACM International Conference on Multimedia, Lisboa, Portugal, October 10 - 14, 2022, 1828–1836. ACM.
- Resolution Dependent GAN Interpolation for Controllable Image Synthesis Between Domains. CoRR, abs/2010.05334.
- Productions, D. 2023. Daz Productions Inc. Website. www.daz3d.com/gallery.
- Encoding in Style: A StyleGAN Encoder for Image-to-Image Translation. In IEEE Conference on Computer Vision and Pattern Recognition, CVPR, 2287–2296. Computer Vision Foundation / IEEE.
- Single-Shot High-Quality Facial Geometry and Skin Appearance Capture. ACM Trans. Graph., 39(4): 81:1–81:12.
- Pivotal Tuning for Latent-Based Editing of Real Images. ACM Trans. Graph., 42(1): 6:1–6:13.
- AgileAvatar: Stylized 3D Avatar Creation via Cascaded Domain Bridging. In SIGGRAPH Asia 2022 Conference Papers, SA 2022, 23:1–23:8. ACM.
- GRAF: Generative Radiance Fields for 3D-Aware Image Synthesis. In Advances in Neural Information Processing Systems, volume 33, 20154–20166.
- The Uncanny Valley: Effect of Realism on the Impression of Artificial Human Faces. Presence Teleoperators Virtual Environ., 16(4): 337–351.
- AgileGAN: stylizing portraits by inversion-consistent transfer learning. ACM Trans. Graph., 40(4): 117:1–117:13.
- Light Stage Super-Resolution: Continuous High-Frequency Relighting. ACM Trans. Graph., 39(6): 260:1–260:12.
- Designing an encoder for StyleGAN image manipulation. ACM Trans. Graph., 40(4): 133:1–133:14.
- Fake It Till You Make It: Face analysis in the wild using synthetic data alone. arXiv:2109.15102.
- StyleAlign: Analysis and Applications of Aligned StyleGAN Models. arXiv preprint arXiv:2110.11323.
- GIRAFFE HD: A High-Resolution 3D-aware Generative Model. In IEEE Conference on Computer Vision and Pattern Recognition, CVPR, 18419–18428.
- Pastiche Master: Exemplar-Based High-Resolution Portrait Style Transfer. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 7693–7702.
- VToonify: Controllable High-Resolution Portrait Video Style Transfer. ACM Trans. Graph., 41(6): 203:1–203:15.
- SalS-GAN: Spatially-Adaptive Latent Space in StyleGAN for Real Image Embedding. In MM ’21: ACM Multimedia Conference, Virtual Event, China, October 20 - 24, 2021, 5176–5184. ACM.
- The Unreasonable Effectiveness of Deep Features as a Perceptual Metric. In IEEE Conference on Computer Vision and Pattern Recognition, CVPR, 586–595. Computer Vision Foundation / IEEE Computer Society.
- Generalized One-shot Domain Adaptation of Generative Adversarial Networks. In Advances in Neural Information Processing Systems.
- Unpaired Image-To-Image Translation Using Cycle-Consistent Adversarial Networks. In Proceedings of the IEEE International Conference on Computer Vision (ICCV).
- Mind the Gap: Domain Gap Control for Single Shot Domain Adaptation for Generative Adversarial Networks. In 10th International Conference on Learning Representations, ICLR, 1–12. OpenReview.net.
- MoFaNeRF: Morphable Facial Neural Radiance Field. In Computer Vision – ECCV 2022: 17th European Conference, Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part III, 268–285. Berlin, Heidelberg: Springer-Verlag. ISBN 978-3-031-20061-8.