eMotion-GAN: A Motion-based GAN for Photorealistic and Facial Expression Preserving Frontal View Synthesis (2404.09940v1)
Abstract: Many existing facial expression recognition (FER) systems encounter substantial performance degradation when faced with variations in head pose. Numerous frontalization methods have been proposed to enhance these systems' performance under such conditions. However, they often introduce undesirable deformations, rendering them less suitable for precise facial expression analysis. In this paper, we present eMotion-GAN, a novel deep learning approach designed for frontal view synthesis while preserving facial expressions within the motion domain. Considering the motion induced by head variation as noise and the motion induced by facial expression as the relevant information, our model is trained to filter out the noisy motion in order to retain only the motion related to facial expression. The filtered motion is then mapped onto a neutral frontal face to generate the corresponding expressive frontal face. We conducted extensive evaluations using several widely recognized dynamic FER datasets, which encompass sequences exhibiting various degrees of head pose variations in both intensity and orientation. Our results demonstrate the effectiveness of our approach in significantly reducing the FER performance gap between frontal and non-frontal faces. Specifically, we achieved a FER improvement of up to +5\% for small pose variations and up to +20\% improvement for larger pose variations. Code available at \url{https://github.com/o-ikne/eMotion-GAN.git}.
- Li, “Deep facial expression recognition: A survey,” IEEE transactions on affective computing, vol. 13, no. 3, pp. 1195–1215, 2020.
- S. Li and W. Deng, “Deep facial expression recognition: A survey,” IEEE Transactions on Affective Computing, pp. 1–1, 2020.
- B. Allaert, J. Mennesson, I. M. Bilasco, and C. Djeraba, “Impact of the face registration techniques on facial expressions recognition,” Signal Processing: Image Communication, vol. 61, pp. 44–53, 2018.
- Y. Hu, X. Wu, B. Yu, R. He, and Z. Sun, “Pose-guided photorealistic face rotation,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2018, pp. 8398–8406.
- X. Dong, Y. Yang, S.-E. Wei, X. Weng, Y. Sheikh, and S.-I. Yu, “Supervision by registration and triangulation for landmark detection,” IEEE transactions on pattern analysis and machine intelligence, vol. 43, no. 10, pp. 3681–3694, 2020.
- M. Mirza and S. Osindero, “Conditional generative adversarial nets,” arXiv preprint arXiv:1411.1784, 2014.
- P. Zarbakhsh and H. Demirel, “4d facial expression recognition using multimodal time series analysis of geometric landmark-based deformations,” The Visual Computer, vol. 36, no. 5, pp. 951–965, 2020.
- C. Liu, J. Yuen, A. Torralba, J. Sivic, and W. T. Freeman, “Sift flow: Dense correspondence across different scenes,” in European conference on computer vision. Springer, 2008, pp. 28–42.
- Z. Kang, M. Sadeghi, R. Horaud, and X. Alameda-Pineda, “Expression-preserving face frontalization improves visually assisted speech processing,” International Journal of Computer Vision, vol. 131, no. 5, pp. 1122–1140, 2023.
- Z. Kang, R. Horaud, and M. Sadeghi, “Robust face frontalization for visual speech recognition,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2021, pp. 2485–2495.
- W. N. I. Al-Obaydy and S. A. Suandi, “Automatic pose normalization for open-set single-sample face recognition in video surveillance,” Multimedia Tools and Applications, vol. 79, no. 3, pp. 2897–2915, 2020.
- X. Zhu, Z. Lei, J. Yan, D. Yi, and S. Z. Li, “High-fidelity pose and expression normalization for face recognition in the wild,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2015, pp. 787–796.
- R. Belmonte, B. Allaert, P. Tirilly, I. M. Bilasco, C. Djeraba, and N. Sebe, “Impact of facial landmark localization on facial expression recognition,” IEEE Transactions on Affective Computing, 2021.
- D. P. Kingma and M. Welling, “Auto-encoding variational bayes,” arXiv preprint arXiv:1312.6114, 2013.
- X. Yin, X. Yu, K. Sohn, X. Liu, and M. Chandraker, “Towards large-pose face frontalization in the wild,” in Proceedings of the IEEE international conference on computer vision, 2017, pp. 3990–3999.
- R. Huang, S. Zhang, T. Li, and R. He, “Beyond face rotation: Global and local perception gan for photorealistic and identity preserving frontal view synthesis,” in Proceedings of the IEEE international conference on computer vision, 2017, pp. 2439–2448.
- R. Wu, G. Zhang, S. Lu, and T. Chen, “Cascade ef-gan: Progressive facial expression editing with local focuses,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020, pp. 5021–5030.
- J. Zhao, J. Xing, L. Xiong, S. Yan, and J. Feng, “Recognizing profile faces by imagining frontal view,” International Journal of Computer Vision, vol. 128, pp. 460–478, 2020.
- Y. Lee, T. Choi, H. Go, H. Lee, S. Cho, and J. Kim, “Exp-gan: 3d-aware facial image generation with expression control,” in Proceedings of the Asian Conference on Computer Vision, 2022, pp. 3812–3827.
- Z. Zhang, X. Chen, B. Wang, G. Hu, W. Zuo, and E. R. Hancock, “Face frontalization using an appearance-flow-based convolutional neural network,” IEEE Transactions on Image Processing, vol. 28, no. 5, pp. 2187–2199, 2018.
- Y.-J. Ju, G.-H. Lee, J.-H. Hong, and S.-W. Lee, “Complete face recovery gan: Unsupervised joint face rotation and de-occlusion from a single-view image,” in Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 2022, pp. 3711–3721.
- J.-R. Chang, Y.-S. Chen, and W.-C. Chiu, “Learning facial representations from the cycle-consistency of face,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2021, pp. 9680–9689.
- Z. Peng, B. Jiang, H. Xu, W. Feng, and J. Zhang, “Facial optical flow estimation via neural non-rigid registration,” Computational Visual Media, vol. 9, no. 1, pp. 109–122, 2023.
- M. R. Koujan, A. Roussos, and S. Zafeiriou, “Deepfaceflow: in-the-wild dense 3d facial motion estimation,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020, pp. 6618–6627.
- Z. Zhang, R. Liang, X. Chen, X. Xu, G. Hu, W. Zuo, and E. R. Hancock, “Semi-supervised face frontalization in the wild,” IEEE Transactions on Information Forensics and Security, vol. 16, pp. 909–922, 2020.
- J.-Y. Zhu, T. Park, P. Isola, and A. A. Efros, “Unpaired image-to-image translation using cycle-consistent adversarial networks,” in Proceedings of the IEEE international conference on computer vision, 2017, pp. 2223–2232.
- H. Zhou, J. Liu, Z. Liu, Y. Liu, and X. Wang, “Rotate-and-render: Unsupervised photorealistic face rotation from single-view images,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2020, pp. 5911–5920.
- R. Gross, I. Matthews, J. Cohn, T. Kanade, and S. Baker, “Multi-pie,” Image and vision computing, vol. 28, no. 5, pp. 807–813, 2010.
- X. Zhang, L. Yin, J. F. Cohn, S. Canavan, M. Reale, A. Horowitz, P. Liu, and J. M. Girard, “Bp4d-spontaneous: a high-resolution spontaneous 3d dynamic facial expression database,” Image and Vision Computing, vol. 32, no. 10, pp. 692–706, 2014.
- J. Chen, J. Chen, H. Chao, and M. Yang, “Image blind denoising with generative adversarial network based noise modeling,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2018, pp. 3155–3164.
- W. Xu, C. Long, R. Wang, and G. Wang, “Drb-gan: A dynamic resblock generative adversarial network for artistic style transfer,” in Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), October 2021, pp. 6383–6392.
- D. Ulyanov, A. Vedaldi, and V. Lempitsky, “Improved texture networks: Maximizing quality and diversity in feed-forward stylization and texture synthesis,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2017, pp. 6924–6932.
- S. Niklaus, L. Mai, and F. Liu, “Video frame interpolation via adaptive separable convolution,” in Proceedings of the IEEE international conference on computer vision, 2017, pp. 261–270.
- K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 770–778.
- P. Isola, J.-Y. Zhu, T. Zhou, and A. A. Efros, “Image-to-image translation with conditional adversarial networks,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2017, pp. 1125–1134.
- J. She, Y. Hu, H. Shi, J. Wang, Q. Shen, and T. Mei, “Dive into ambiguity: Latent distribution mining and pairwise uncertainty estimation for facial expression recognition,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2021, pp. 6248–6257.
- K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale image recognition,” CoRR, vol. abs/1409.1556, 2014.
- P. Lucey, J. F. Cohn, T. Kanade, J. Saragih, Z. Ambadar, and I. Matthews, “The extended cohn-kanade dataset (ck+): A complete dataset for action unit and emotion-specified expression,” in 2010 ieee computer society conference on computer vision and pattern recognition-workshops. IEEE, 2010, pp. 94–101.
- J. Van Der Schalk, S. T. Hawk, A. H. Fischer, and B. Doosje, “Moving faces, looking places: validation of the amsterdam dynamic facial expression set (adfes).” Emotion, vol. 11, no. 4, p. 907, 2011.
- M. Pantic, M. Valstar, R. Rademaker, and L. Maat, “Web-based database for facial expression analysis,” in 2005 IEEE international conference on multimedia and Expo. IEEE, 2005, pp. 5–pp.
- G. Zhao, X. Huang, M. Taini, S. Z. Li, and M. PietikäInen, “Facial expression recognition from near-infrared videos,” Image and vision computing, vol. 29, no. 9, pp. 607–619, 2011.
- G. Farnebäck, “Two-frame motion estimation based on polynomial expansion,” in Scandinavian conference on Image analysis. Springer, 2003, pp. 363–370.
- B. Allaert, I. R. Ward, M. Bilasco, C. Djeraba, and M. Bennamoun, “A comparative study on optical flow for facial expression analysis,” Neurocomputing, 2022.
- A. Radford, L. Metz, and S. Chintala, “Unsupervised representation learning with deep convolutional generative adversarial networks,” arXiv preprint arXiv:1511.06434, 2015.
- A. Paszke, S. Gross, S. Chintala, G. Chanan, E. Yang, Z. DeVito, Z. Lin, A. Desmaison, L. Antiga, and A. Lerer, “Automatic differentiation in pytorch,” In NIPS Autodiff Workshop, 2017.
- E. Richardson, Y. Alaluf, O. Patashnik, Y. Nitzan, Y. Azar, S. Shapiro, and D. Cohen-Or, “Encoding in style: a stylegan encoder for image-to-image translation,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2021, pp. 2287–2296.
- P. Truong, M. Danelljan, and R. Timofte, “Glu-net: Global-local universal network for dense flow and correspondences,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2020, pp. 6258–6268.
- A. P. Fard and M. H. Mahoor, “Ad-corre: Adaptive correlation-based loss for facial expression recognition in the wild,” IEEE Access, vol. 10, pp. 26 756–26 768, 2022.
- Z. Wen, W. Lin, T. Wang, and G. Xu, “Distract your attention: Multi-head cross attention network for facial expression recognition,” Biomimetics, vol. 8, no. 2, 2023. [Online]. Available: https://www.mdpi.com/2313-7673/8/2/199
- A. V. Savchenko, “Video-based frame-level facial analysis of affective behavior on mobile devices using efficientnets,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 2359–2366.
- M. Tan and Q. Le, “Efficientnet: Rethinking model scaling for convolutional neural networks,” in International conference on machine learning. PMLR, 2019, pp. 6105–6114.
- A. Mollahosseini, B. Hasani, and M. H. Mahoor, “Affectnet: A database for facial expression, valence, and arousal computing in the wild,” IEEE Transactions on Affective Computing, vol. 10, no. 1, pp. 18–31, 2017.
- J. Kossaifi, G. Tzimiropoulos, S. Todorovic, and M. Pantic, “Afew-va database for valence and arousal estimation in-the-wild,” Image and Vision Computing, vol. 65, pp. 23–36, 2017.