Papers
Topics
Authors
Recent
Search
2000 character limit reached

A Real-Time Multi-Task Learning System for Joint Detection of Face, Facial Landmark and Head Pose

Published 21 Sep 2023 in cs.CV | (2309.11773v1)

Abstract: Extreme head postures pose a common challenge across a spectrum of facial analysis tasks, including face detection, facial landmark detection (FLD), and head pose estimation (HPE). These tasks are interdependent, where accurate FLD relies on robust face detection, and HPE is intricately associated with these key points. This paper focuses on the integration of these tasks, particularly when addressing the complexities posed by large-angle face poses. The primary contribution of this study is the proposal of a real-time multi-task detection system capable of simultaneously performing joint detection of faces, facial landmarks, and head poses. This system builds upon the widely adopted YOLOv8 detection framework. It extends the original object detection head by incorporating additional landmark regression head, enabling efficient localization of crucial facial landmarks. Furthermore, we conduct optimizations and enhancements on various modules within the original YOLOv8 framework. To validate the effectiveness and real-time performance of our proposed model, we conduct extensive experiments on 300W-LP and AFLW2000-3D datasets. The results obtained verify the capability of our model to tackle large-angle face pose challenges while delivering real-time performance across these interconnected tasks.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (44)
  1. X. Liu, Y. Chen, J. Li, and A. Cangelosi, “Real-time robotic mirrored behavior of facial expressions and head motions based on lightweight networks,” IEEE Internet of Things Journal, vol. 10, no. 2, pp. 1401–1413, 2023.
  2. S. Yang, Y. Wen, L. He, and M. Zhou, “Sparse common feature representation for undersampled face recognition,” IEEE Internet of Things Journal, vol. 8, no. 7, pp. 5607–5618, 2021.
  3. G. Muhammad and M. S. Hossain, “Emotion recognition for cognitive edge computing using deep learning,” IEEE Internet of Things Journal, vol. 8, no. 23, pp. 16 894–16 901, 2021.
  4. Y. Ma, J. Wu, C. Long, and Y.-B. Lin, “Mobidiv: A privacy-aware real-time driver identity verification on mobile phone,” IEEE Internet of Things Journal, vol. 9, no. 4, pp. 2802–2816, 2022.
  5. J. Zhang, Y. Wu, Y. Chen, J. Wang, J. Huang, and Q. Zhang, “Ubi-fatigue: Toward ubiquitous fatigue detection via contactless sensing,” IEEE Internet of Things Journal, vol. 9, no. 15, pp. 14 103–14 115, 2022.
  6. J. Guo, X. Zhu, Y. Yang, F. Yang, Z. Lei, and S. Z. Li, “Towards fast, accurate and stable 3d dense face alignment,” in Proc. Euro. Conf. Comput. Vis.   Springer, 2020, pp. 152–168.
  7. C. Sagonas, E. Antonakos, G. Tzimiropoulos, S. Zafeiriou, and M. Pantic, “300 faces in-the-wild challenge: Database and results,” Image Vis. Comput., vol. 47, pp. 3–18, 2016.
  8. B. Chen, W. Guan, P. Li, N. Ikeda, K. Hirasawa, and H. Lu, “Residual multi-task learning for facial landmark localization and expression recognition,” Pattern Recognition, vol. 115, p. 107893, 2021.
  9. J. Chen, L. Yang, L. Tan, and R. Xu, “Orthogonal channel attention-based multi-task learning for multi-view facial expression recognition,” Pattern Recognition, vol. 129, p. 108753, 2022.
  10. C. Hong, J. Yu, J. Zhang, X. Jin, and K.-H. Lee, “Multimodal face-pose estimation with multitask manifold deep learning,” IEEE transactions on industrial informatics, vol. 15, no. 7, pp. 3952–3961, 2018.
  11. R. Ranjan, V. M. Patel, and R. Chellappa, “Hyperface: A deep multi-task learning framework for face detection, landmark localization, pose estimation, and gender recognition,” IEEE transactions on pattern analysis and machine intelligence, vol. 41, no. 1, pp. 121–135, 2017.
  12. X. Zhu, Z. Lei, X. Liu, H. Shi, and S. Z. Li, “Face alignment across large poses: A 3d solution,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 146–155.
  13. P. Chandran, D. Bradley, M. Gross, and T. Beeler, “Attention-driven cropping for very high resolution facial landmark detection,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2020, pp. 5861–5870.
  14. X. Wang, L. Bo, and L. Fuxin, “Adaptive wing loss for robust face alignment via heatmap regression,” in Proc. IEEE Int. Conf. Comput. Vis., 2019, pp. 6971–6981.
  15. J. Wang, K. Sun, T. Cheng, B. Jiang, C. Deng, Y. Zhao, D. Liu, Y. Mu, M. Tan, X. Wang et al., “Deep high-resolution representation learning for visual recognition,” IEEE Trans. Pattern Anal. Mach. Intell., 2020.
  16. H. Jin, S. Liao, and L. Shao, “Pixel-in-pixel net: Towards efficient facial landmark detection in the wild,” Int. J. Comput. Vis., Sep 2021.
  17. S.-E. Wei, V. Ramakrishna, T. Kanade, and Y. Sheikh, “Convolutional pose machines,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2016, pp. 4724–4732.
  18. A. Newell, K. Yang, and J. Deng, “Stacked hourglass networks for human pose estimation,” in European Conference on Computer Vision.   Springer, 2016, pp. 483–499.
  19. D. E. King, “Dlib-ml: A machine learning toolkit,” The Journal of Machine Learning Research, vol. 10, pp. 1755–1758, 2009.
  20. K. Zhang, Z. Zhang, Z. Li, and Y. Qiao, “Joint face detection and alignment using multitask cascaded convolutional networks,” IEEE Signal Process. Lett., vol. 23, no. 10, pp. 1499–1503, 2016.
  21. D. Qi, W. Tan, Q. Yao, and J. Liu, “Yolo5face: Why reinventing a face detector,” arXiv preprint arXiv:2105.12931, 2021.
  22. J. Deng, J. Guo, E. Ververas, I. Kotsia, and S. Zafeiriou, “Retinaface: Single-shot multi-level face localisation in the wild,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2020, pp. 5203–5212.
  23. H. Liu, S. Fang, Z. Zhang, D. Li, K. Lin, and J. Wang, “Mfdnet: Collaborative poses perception and matrix fisher distribution for head pose estimation,” IEEE Transactions on Multimedia, vol. 24, pp. 2449–2460, 2022.
  24. T.-Y. Yang, Y.-T. Chen, Y.-Y. Lin, and Y.-Y. Chuang, “Fsa-net: Learning fine-grained structure aggregation for head pose estimation from a single image,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), June 2019.
  25. T. Liu, J. Wang, B. Yang, and X. Wang, “Ngdnet: Nonuniform gaussian-label distribution learning for infrared head pose estimation and on-task behavior understanding in the classroom,” Neurocomputing, vol. 436, pp. 210–220, 2021.
  26. P. Kellnhofer, A. Recasens, S. Stent, W. Matusik, and A. Torralba, “Gaze360: Physically unconstrained gaze estimation in the wild,” in Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), October 2019.
  27. K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2016, pp. 770–778.
  28. Z. Liu, H. Mao, C.-Y. Wu, C. Feichtenhofer, T. Darrell, and S. Xie, “A convnet for the 2020s,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2022, pp. 11 976–11 986.
  29. V. Lepetit, F. Moreno-Noguer, and P. Fua, “Epnp: An accurate o(n) solution to the pnp problem,” International journal of computer vision, vol. 81, pp. 155–166, 2009.
  30. D. Maji, S. Nagori, M. Mathew, and D. Poddar, “Yolo-pose: Enhancing yolo for multi person pose estimation using object keypoint similarity loss,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2022, pp. 2637–2646.
  31. H. Zhang, Y. Wang, F. Dayoub, and N. Sunderhauf, “Varifocalnet: An iou-aware dense object detector,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2021, pp. 8514–8523.
  32. Z. Zheng, P. Wang, W. Liu, J. Li, R. Ye, and D. Ren, “Distance-iou loss: Faster and better learning for bounding box regression,” in Proceedings of the AAAI conference on artificial intelligence, vol. 34, no. 07, 2020, pp. 12 993–13 000.
  33. X. Li, W. Wang, L. Wu, S. Chen, X. Hu, J. Li, J. Tang, and J. Yang, “Generalized focal loss: Learning qualified and distributed bounding boxes for dense object detection,” Advances in Neural Information Processing Systems, vol. 33, pp. 21 002–21 012, 2020.
  34. T.-Y. Lin, M. Maire, S. Belongie, J. Hays, P. Perona, D. Ramanan, P. Dollár, and C. L. Zitnick, “Microsoft coco: Common objects in context,” in Proc. Euro. Conf. Comput. Vis.   Springer, 2014, pp. 740–755.
  35. C. Sagonas, G. Tzimiropoulos, S. Zafeiriou, and M. Pantic, “300 faces in-the-wild challenge: The first facial landmark localization challenge,” in 2013 IEEE International Conference on Computer Vision Workshops, 2013, pp. 397–403.
  36. R. Yu, S. Saito, H. Li, D. Ceylan, and H. Li, “Learning dense facial correspondences in unconstrained images,” in Proceedings of the IEEE International Conference on Computer Vision, 2017, pp. 4723–4732.
  37. C. Bhagavatula, C. Zhu, K. Luu, and M. Savvides, “Faster than real-time facial alignment: A 3d spatial transformer network approach in unconstrained poses,” in Proceedings of the IEEE International Conference on Computer Vision, 2017, pp. 3980–3989.
  38. A. Bulat and G. Tzimiropoulos, “How far are we from solving the 2d & 3d face alignment problem?(and a dataset of 230,000 3d facial landmarks),” in Proceedings of the IEEE international conference on computer vision, 2017, pp. 1021–1030.
  39. X. Zhu, X. Liu, Z. Lei, and S. Z. Li, “Face alignment in full pose range: A 3d total solution,” IEEE transactions on pattern analysis and machine intelligence, vol. 41, no. 1, pp. 78–92, 2017.
  40. Y. Feng, F. Wu, X. Shao, Y. Wang, and X. Zhou, “Joint 3d face reconstruction and dense alignment with position map regression network,” in Proceedings of the European conference on computer vision (ECCV), 2018, pp. 534–551.
  41. X. Tu, J. Zhao, M. Xie, Z. Jiang, A. Balamurugan, Y. Luo, Y. Zhao, L. He, Z. Ma, and J. Feng, “3d face reconstruction from a single image assisted by 2d face images in the wild,” IEEE Transactions on Multimedia, vol. 23, pp. 1160–1172, 2021.
  42. C.-Y. Wu, Q. Xu, and U. Neumann, “Synergy between 3dmm and 3d landmarks for accurate 3d facial geometry,” in 2021 International Conference on 3D Vision (3DV).   IEEE, 2021, pp. 453–463.
  43. S. Zhang, X. Zhu, Z. Lei, H. Shi, X. Wang, and S. Z. Li, “Faceboxes: A cpu real-time face detector with high accuracy,” in 2017 IEEE International Joint Conference on Biometrics (IJCB).   IEEE, 2017, pp. 1–9.
  44. Z.-H. Feng, J. Kittler, W. Christmas, P. Huber, and X.-J. Wu, “Dynamic attention-controlled cascaded shape regression exploiting training data augmentation and fuzzy-set sample weighting,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2017, pp. 2481–2490.
Citations (1)

Summary

Paper to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Authors (2)

Collections

Sign up for free to add this paper to one or more collections.