PAtt-Lite: Lightweight Patch and Attention MobileNet for Challenging Facial Expression Recognition (2306.09626v2)
Abstract: Facial Expression Recognition (FER) is a machine learning problem that deals with recognizing human facial expressions. While existing work has achieved performance improvements in recent years, FER in the wild and under challenging conditions remains a challenge. In this paper, a lightweight patch and attention network based on MobileNetV1, referred to as PAtt-Lite, is proposed to improve FER performance under challenging conditions. A truncated ImageNet-pre-trained MobileNetV1 is utilized as the backbone feature extractor of the proposed method. In place of the truncated layers is a patch extraction block that is proposed for extracting significant local facial features to enhance the representation from MobileNetV1, especially under challenging conditions. An attention classifier is also proposed to improve the learning of these patched feature maps from the extremely lightweight feature extractor. The experimental results on public benchmark databases proved the effectiveness of the proposed method. PAtt-Lite achieved state-of-the-art results on CK+, RAF-DB, FER2013, FERPlus, and the challenging conditions subsets for RAF-DB and FERPlus.
- Y. Wu and L. Shen, “An adaptive landmark-based attention network for students’ facial expression recognition,” in 2021 6th International Conference on Communication, Image and Signal Processings, CCISP 2021, pp. 139–144, Institute of Electrical and Electronics Engineers Inc., 2021.
- X. Li, R. Yue, W. Jia, H. Wang, and Y. Zheng, “Recognizing students’ emotions based on facial expression analysis,” in 11th International Conference on Information Technology in Medicine and Education, ITME 2021, pp. 96–100, Institute of Electrical and Electronics Engineers Inc., 2021.
- C. J. Meryl, K. Dharshini, D. S. Juliet, J. A. Rosy, and S. S. Jacob, “Deep learning based facial expression recognition for psychological health analysis,” Proceedings of the 2020 IEEE International Conference on Communication and Signal Processing, ICCSP 2020, pp. 1155–1158, 7 2020.
- J. Ye, Y. Yu, G. Fu, Y. Zheng, Y. Liu, Y. Zhu, and Q. Wang, “Analysis and recognition of voluntary facial expression mimicry based on depressed patients,” IEEE Journal of Biomedical and Health Informatics, 2023.
- K. Wang, X. Peng, J. Yang, D. Meng, and Y. Qiao, “Region attention networks for pose and occlusion robust facial expression recognition,” IEEE Transactions on Image Processing, vol. 29, pp. 4057–4069, 2020.
- Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner, “Gradient-based learning applied to document recognition,” Proceedings of the IEEE, vol. 86, no. 11, pp. 2278–2324, 1998.
- A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classification with deep convolutional neural networks,” Communications of the ACM, vol. 60, no. 6, pp. 84–90, 2017.
- C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, and A. Rabinovich, “Going deeper with convolutions,” in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 1–9, 2015.
- K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 770–778, 2016.
- C. Szegedy, V. Vanhoucke, S. Ioffe, J. Shlens, and Z. Wojna, “Rethinking the inception architecture for computer vision,” in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 2818–2826, 2016.
- A. G. Howard, M. Zhu, B. Chen, D. Kalenichenko, W. Wang, T. Weyand, M. Andreetto, and H. Adam, “Mobilenets: Efficient convolutional neural networks for mobile vision applications,” arXiv preprint arXiv:1704.04861, 2017.
- S. Xie and H. Hu, “Facial expression recognition using hierarchical features with deep comprehensive multipatches aggregation convolutional neural networks,” IEEE Transactions on Multimedia, vol. 21, no. 1, pp. 211–220, 2018.
- S. Zhao, H. Cai, H. Liu, J. Zhang, and S. Chen, “Feature selection mechanism in cnns for facial expression recognition.,” in BMVC, p. 317, 2018.
- Y. Li, J. Zeng, S. Shan, and X. Chen, “Occlusion aware facial expression recognition using cnn with attention mechanism,” IEEE Transactions on Image Processing, vol. 28, no. 5, pp. 2439–2450, 2018.
- D. Gera and S. Balasubramanian, “Landmark guidance independent spatio-channel attention and complementary context information based facial expression recognition,” Pattern Recognition Letters, vol. 145, pp. 58–66, 2021.
- D. Gera and S. Balasubramanian, “Imponderous net for facial expression recognition in the wild,” arXiv preprint arXiv:2103.15136, 2021.
- H. Ding, P. Zhou, and R. Chellappa, “Occlusion-adaptive deep network for robust facial expression recognition,” in 2020 IEEE International Joint Conference on Biometrics (IJCB), pp. 1–9, IEEE, 2020.
- S. Ioffe and C. Szegedy, “Batch normalization: Accelerating deep network training by reducing internal covariate shift,” in International conference on machine learning, pp. 448–456, pmlr, 2015.
- J. L. Ba, J. R. Kiros, and G. E. Hinton, “Layer normalization,” arXiv preprint arXiv:1607.06450, 2016.
- Y. Wu and K. He, “Group normalization,” in Proceedings of the European conference on computer vision (ECCV), pp. 3–19, 2018.
- A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, and I. Polosukhin, “Attention is all you need,” Advances in neural information processing systems, vol. 30, 2017.
- A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Unterthiner, M. Dehghani, M. Minderer, G. Heigold, S. Gelly, et al., “An image is worth 16x16 words: Transformers for image recognition at scale,” arXiv preprint arXiv:2010.11929, 2020.
- H. Touvron, M. Cord, M. Douze, F. Massa, A. Sablayrolles, and H. Jégou, “Training data-efficient image transformers & distillation through attention,” in International conference on machine learning, pp. 10347–10357, PMLR, 2021.
- Z. Liu, Y. Lin, Y. Cao, H. Hu, Y. Wei, Z. Zhang, S. Lin, and B. Guo, “Swin transformer: Hierarchical vision transformer using shifted windows,” in Proceedings of the IEEE/CVF international conference on computer vision, pp. 10012–10022, 2021.
- Z. Liu, H. Hu, Y. Lin, Z. Yao, Z. Xie, Y. Wei, J. Ning, Y. Cao, Z. Zhang, L. Dong, et al., “Swin transformer v2: Scaling up capacity and resolution,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 12009–12019, 2022.
- M. Aouayeb, W. Hamidouche, C. Soladie, K. Kpalma, and R. Seguier, “Learning vision transformer with squeeze and excitation for facial expression recognition,” arXiv preprint arXiv:2107.03107, 2021.
- F. Xue, Q. Wang, and G. Guo, “Transfer: Learning relation-aware facial expression representations with transformers,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 3601–3610, 2021.
- C. Zheng, M. Mendieta, and C. Chen, “Poster: A pyramid cross-fusion transformer network for facial expression recognition,” arXiv preprint arXiv:2204.04083, 2022.
- F. Xue, Q. Wang, Z. Tan, Z. Ma, and G. Guo, “Vision transformer with attentive pooling for robust facial expression recognition,” IEEE Transactions on Affective Computing, 2022.
- J. Mao, R. Xu, X. Yin, Y. Chang, B. Nie, and A. Huang, “Poster++: A simpler and stronger facial expression recognition network,” arXiv preprint arXiv:2301.12149, 2023.
- F. Ma, B. Sun, and S. Li, “Facial expression recognition with visual transformers and attentional selective fusion,” IEEE Transactions on Affective Computing, pp. 1–1, 2021.
- H. Li, M. Sui, F. Zhao, Z. Zha, and F. Wu, “Mvt: mask vision transformer for facial expression recognition in the wild,” arXiv preprint arXiv:2106.04520, 2021.
- D. Bahdanau, K. Cho, and Y. Bengio, “Neural machine translation by jointly learning to align and translate,” arXiv preprint arXiv:1409.0473, 2014.
- M.-T. Luong, H. Pham, and C. D. Manning, “Effective approaches to attention-based neural machine translation,” arXiv preprint arXiv:1508.04025, 2015.
- M. Lin, Q. Chen, and S. Yan, “Network in network,” arXiv preprint arXiv:1312.4400, 2013.
- J. Cheng, L. Dong, and M. Lapata, “Long short-term memory-networks for machine reading,” arXiv preprint arXiv:1601.06733, 2016.
- P. Lucey, J. F. Cohn, T. Kanade, J. Saragih, Z. Ambadar, and I. Matthews, “The extended cohn-kanade dataset (ck+): A complete dataset for action unit and emotion-specified expression,” in 2010 ieee computer society conference on computer vision and pattern recognition-workshops, pp. 94–101, IEEE, 2010.
- S. Li, W. Deng, and J. Du, “Reliable crowdsourcing and deep locality-preserving learning for expression recognition in the wild,” in Computer Vision and Pattern Recognition (CVPR), 2017 IEEE Conference on, pp. 2584–2593, IEEE, 2017.
- I. J. Goodfellow, D. Erhan, P. L. Carrier, A. Courville, M. Mirza, B. Hamner, W. Cukierski, Y. Tang, D. Thaler, D.-H. Lee, et al., “Challenges in representation learning: A report on three machine learning contests,” in International conference on neural information processing, pp. 117–124, Springer, 2013.
- E. Barsoum, C. Zhang, C. C. Ferrer, and Z. Zhang, “Training deep networks for facial expression recognition with crowd-sourced label distribution,” in Proceedings of the 18th ACM International Conference on Multimodal Interaction, pp. 279–283, 2016.
- “Transfer learning and fine-tuning — TensorFlow Core.”
- J. Cai, Z. Meng, A. S. Khan, J. O’Reilly, Z. Li, S. Han, and Y. Tong, “Identity-free facial expression recognition using conditional generative adversarial network,” in 2021 IEEE International Conference on Image Processing (ICIP), pp. 1344–1348, IEEE, 2021.
- D. Ruan, Y. Yan, S. Lai, Z. Chai, C. Shen, and H. Wang, “Feature decomposition and reconstruction learning for effective facial expression recognition,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 7660–7669, 2021.
- J. Shi, S. Zhu, and Z. Liang, “Learning to amend facial expression representation via de-albino and affinity,” arXiv preprint arXiv:2103.10189, 2021.
- L. Lo, H. Xie, H.-H. Shuai, and W.-H. Cheng, “Facial chirality: From visual self-reflection to robust facial feature learning,” IEEE Transactions on Multimedia, vol. 24, pp. 4275–4284, 2022.
- P. Barros and A. Sciutti, “Ciao! a contrastive adaptation mechanism for non-universal facial expression recognition,” in 2022 10th International Conference on Affective Computing and Intelligent Interaction (ACII), pp. 1–8, IEEE, 2022.
- M. A. Mahmoudi, A. Chetouani, F. Boufera, and H. Tabia, “Kernelized dense layers for facial expression recognition,” in 2020 IEEE International Conference on Image Processing (ICIP), pp. 2226–2230, IEEE, 2020.
- F. V. Massoli, D. Cafarelli, C. Gennaro, G. Amato, and F. Falchi, “Mafer: A multi-resolution approach to facial expression recognition,” arXiv preprint arXiv:2105.02481, 2021.
- P. Liu, Y. Lin, Z. Meng, L. Lu, W. Deng, J. T. Zhou, and Y. Yang, “Point adversarial self-mining: A simple method for facial expression recognition,” IEEE Transactions on Cybernetics, 2021.
- J. X. Yu, K. M. Lim, and C. P. Lee, “Move-cnns: Model averaging ensemble of convolutional neural networks for facial expression recognition.,” IAENG International Journal of Computer Science, vol. 48, no. 3, 2021.
- M. Karnati, A. Seal, A. Yazidi, and O. Krejcar, “Flepnet: Feature level ensemble parallel network for facial expression recognition,” IEEE Transactions on Affective Computing, vol. 13, no. 4, pp. 2058–2070, 2022.
- P. Phattarasooksirot and A. Sento, “Facial emotional expression recognition using hybrid deep learning algorithm,” in 2022 7th International Conference on Business and Industrial Research (ICBIR), pp. 323–329, IEEE, 2022.