Facial Landmark Detection Evaluation on MOBIO Database (2307.03329v1)
Abstract: MOBIO is a bi-modal database that was captured almost exclusively on mobile phones. It aims to improve research into deploying biometric techniques to mobile devices. Research has been shown that face and speaker recognition can be performed in a mobile environment. Facial landmark localization aims at finding the coordinates of a set of pre-defined key points for 2D face images. A facial landmark usually has specific semantic meaning, e.g. nose tip or eye centre, which provides rich geometric information for other face analysis tasks such as face recognition, emotion estimation and 3D face reconstruction. Pretty much facial landmark detection methods adopt still face databases, such as 300W, AFW, AFLW, or COFW, for evaluation, but seldomly use mobile data. Our work is first to perform facial landmark detection evaluation on the mobile still data, i.e., face images from MOBIO database. About 20,600 face images have been extracted from this audio-visual database and manually labeled with 22 landmarks as the groundtruth. Several state-of-the-art facial landmark detection methods are adopted to evaluate their performance on these data. The result shows that the data from MOBIO database is pretty challenging. This database can be a new challenging one for facial landmark detection evaluation.
- C. McCool, S. Marcel, A. Hadid, M. Pietikäinen, P. Matejka, J. Cernockỳ, N. Poh, J. Kittler, A. Larcher, C. Levy et al., “Bi-modal person recognition on a mobile phone: using mobile phone data,” in 2012 IEEE International Conference on Multimedia and Expo Workshops. IEEE, 2012, pp. 635–640.
- W. Liu, Y. Wen, Z. Yu, M. Li, B. Raj, and L. Song, “Sphereface: Deep hypersphere embedding for face recognition,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2017, pp. 212–220.
- I. Masi, S. Rawls, G. Medioni, and P. Natarajan, “Pose-aware face recognition in the wild,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 4838–4846.
- Y. Taigman, M. Yang, M. Ranzato, and L. Wolf, “Deepface: Closing the gap to human-level performance in face verification,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2014, pp. 1701–1708.
- J. Yang, P. Ren, D. Zhang, D. Chen, F. Wen, H. Li, and G. Hua, “Neural aggregation network for video face recognition,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2017, pp. 4362–4371.
- C. Fabian Benitez-Quiroz, R. Srinivasan, and A. M. Martinez, “Emotionet: An accurate, real-time algorithm for the automatic annotation of a million facial expressions in the wild,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016, pp. 5562–5570.
- S. Li, W. Deng, and J. Du, “Reliable crowdsourcing and deep locality-preserving learning for expression recognition in the wild,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2017, pp. 2852–2861.
- R. Walecki, O. Rudovic, V. Pavlovic, and M. Pantic, “Copula ordinal regression for joint estimation of facial action unit intensity,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016, pp. 4902–4910.
- Z. Zeng, M. Pantic, G. I. Roisman, and T. S. Huang, “A survey of affect recognition methods: Audio, visual, and spontaneous expressions,” IEEE transactions on pattern analysis and machine intelligence, vol. 31, no. 1, pp. 39–58, 2008.
- P. Dou, S. K. Shah, and I. A. Kakadiaris, “End-to-end 3d face reconstruction with deep neural networks,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017, pp. 5908–5917.
- J. Kittler, P. Huber, Z.-H. Feng, G. Hu, and W. Christmas, “3d morphable face models and their applications,” in International Conference on Articulated Motion and Deformable Objects. Springer, 2016, pp. 185–206.
- P. Huber, P. Kopp, W. Christmas, M. Rätsch, and J. Kittler, “Real-time 3d face fitting and texture fusion on in-the-wild videos,” IEEE Signal Processing Letters, vol. 24, no. 4, pp. 437–441, 2016.
- G. Hu, F. Yan, J. Kittler, W. Christmas, C. H. Chan, Z. Feng, and P. Huber, “Efficient 3d morphable face model fitting,” Pattern Recognition, vol. 67, pp. 366–379, 2017.
- J. Roth, Y. Tong, and X. Liu, “Adaptive 3d face reconstruction from unconstrained photo collections,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016, pp. 4197–4206.
- P. Koppen, Z.-H. Feng, J. Kittler, M. Awais, W. Christmas, X.-J. Wu, and H.-F. Yin, “Gaussian mixture 3d morphable face model,” Pattern Recognition, vol. 74, pp. 617–628, 2018.
- M. Demirkus, D. Precup, J. J. Clark, and T. Arbel, “Hierarchical temporal graphical model for head pose estimation and subsequent attribute classification in real-world videos,” Computer Vision and Image Understanding, vol. 136, pp. 128–145, 2015.
- X. Zhu and D. Ramanan, “Face detection, pose estimation, and landmark localization in the wild,” in 2012 IEEE conference on computer vision and pattern recognition. IEEE, 2012, pp. 2879–2886.
- X. Ding, W.-S. Chu, F. De la Torre, J. F. Cohn, and Q. Wang, “Facial action unit event detection by cascade of tasks,” in Proceedings of the IEEE international conference on computer vision, 2013, pp. 2400–2407.
- B. Martinez and M. F. Valstar, “Advances, challenges, and opportunities in automatic facial expression recognition,” in Advances in face detection and facial image analysis. Springer, 2016, pp. 63–100.
- E. Sariyanidi, H. Gunes, and A. Cavallaro, “Automatic analysis of facial affect: A survey of registration, representation, and recognition,” IEEE transactions on pattern analysis and machine intelligence, vol. 37, no. 6, pp. 1113–1133, 2014.
- Z. Liu, P. Luo, X. Wang, and X. Tang, “Deep learning face attributes in the wild,” in Proceedings of the IEEE international conference on computer vision, 2015, pp. 3730–3738.
- D. Cristinacce and T. F. Cootes, “Feature detection and tracking with constrained local models.” in Bmvc, vol. 1, no. 2. Citeseer, 2006, p. 3.
- X. Cao, Y. Wei, F. Wen, and J. Sun, “Face alignment by explicit shape regression,” International Journal of Computer Vision, vol. 107, no. 2, pp. 177–190, 2014.
- X. Xiong and F. De la Torre, “Supervised descent method and its applications to face alignment,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2013, pp. 532–539.
- T. Hassner, S. Harel, E. Paz, and R. Enbar, “Effective face frontalization in unconstrained images,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2015, pp. 4295–4304.
- A. Zadeh, R. Zellers, E. Pincus, and L.-P. Morency, “Mosi: multimodal corpus of sentiment intensity and subjectivity analysis in online opinion videos,” arXiv preprint arXiv:1606.06259, 2016.
- Y. Sun, X. Wang, and X. Tang, “Deep learning face representation from predicting 10,000 classes,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2014, pp. 1891–1898.
- S. Ren, X. Cao, Y. Wei, and J. Sun, “Face alignment at 3000 fps via regressing local binary features,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2014, pp. 1685–1692.
- J. Zhang, S. Shan, M. Kan, and X. Chen, “Coarse-to-fine auto-encoder networks (cfan) for real-time face alignment,” in European conference on computer vision. Springer, 2014, pp. 1–16.
- S. Zhu, C. Li, C. Change Loy, and X. Tang, “Face alignment by coarse-to-fine shape searching,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2015, pp. 4998–5006.
- Z. Zhang, P. Luo, C. C. Loy, and X. Tang, “Learning deep representation for face alignment with auxiliary attributes,” IEEE transactions on pattern analysis and machine intelligence, vol. 38, no. 5, pp. 918–930, 2015.
- T. F. Cootes, C. J. Taylor, D. H. Cooper, and J. Graham, “Active shape models-their training and application,” Computer vision and image understanding, vol. 61, no. 1, pp. 38–59, 1995.
- T. F. Cootes, G. J. Edwards, and C. J. Taylor, “Active appearance models,” IEEE Transactions on Pattern Analysis & Machine Intelligence, no. 6, pp. 681–685, 2001.
- Y. Wu, C. Gou, and Q. Ji, “Simultaneous facial landmark detection, pose and deformation estimation under facial occlusion,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2017, pp. 3471–3480.
- Z.-H. Feng, J. Kittler, M. Awais, P. Huber, and X.-J. Wu, “Face detection, bounding box aggregation and pose estimation for robust facial landmark localisation in the wild,” in Proceedings of the IEEE conference on computer vision and pattern recognition workshops, 2017, pp. 160–169.
- Y. Wu and Q. Ji, “Constrained joint cascade regression framework for simultaneous facial action unit recognition and facial landmark detection,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 3400–3408.
- Z.-H. Feng, P. Huber, J. Kittler, W. Christmas, and X.-J. Wu, “Random cascaded-regression copse for robust facial landmark detection,” IEEE Signal Processing Letters, vol. 22, no. 1, pp. 76–80, 2014.
- Y. Sun, X. Wang, and X. Tang, “Deep convolutional network cascade for facial point detection,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2013, pp. 3476–3483.
- Z.-H. Feng, G. Hu, J. Kittler, W. Christmas, and X.-J. Wu, “Cascaded collaborative regression for robust facial landmark detection trained using a mixture of synthetic and real images with dynamic weighting,” IEEE Transactions on Image Processing, vol. 24, no. 11, pp. 3425–3440, 2015.
- Z.-H. Feng, J. Kittler, M. Awais, P. Huber, and X.-J. Wu, “Wing loss for robust facial landmark localisation with convolutional neural networks,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018, pp. 2235–2245.
- A. Jourabloo and X. Liu, “Pose-invariant 3d face alignment,” in Proceedings of the IEEE International Conference on Computer Vision, 2015, pp. 3694–3702.
- D. Lee, H. Park, and C. D. Yoo, “Face alignment using cascade gaussian process regression trees,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2015, pp. 4204–4212.
- Z. Zhang, P. Luo, C. C. Loy, and X. Tang, “Facial landmark detection by deep multi-task learning,” in European conference on computer vision. Springer, 2014, pp. 94–108.
- C. Sagonas, G. Tzimiropoulos, S. Zafeiriou, and M. Pantic, “300 faces in-the-wild challenge: The first facial landmark localization challenge,” in Proceedings of the IEEE International Conference on Computer Vision Workshops, 2013, pp. 397–403.
- M. Koestinger, P. Wohlhart, P. M. Roth, and H. Bischof, “Annotated facial landmarks in the wild: A large-scale, real-world database for facial landmark localization,” in 2011 IEEE international conference on computer vision workshops (ICCV workshops). IEEE, 2011, pp. 2144–2151.
- P. N. Belhumeur, D. W. Jacobs, D. J. Kriegman, and N. Kumar, “Localizing parts of faces using a consensus of exemplars,” IEEE transactions on pattern analysis and machine intelligence, vol. 35, no. 12, pp. 2930–2940, 2013.
- V. Le, J. Brandt, Z. Lin, L. Bourdev, and T. S. Huang, “Interactive facial feature localization,” in European conference on computer vision. Springer, 2012, pp. 679–692.
- X. P. Burgos-Artizzu, P. Perona, and P. Dollár, “Robust face landmark estimation under occlusion,” in Proceedings of the IEEE International Conference on Computer Vision, 2013, pp. 1513–1520.
- K. Messer, J. Matas, J. Kittler, J. Luettin, and G. Maitre, “Xm2vtsdb: The extended m2vts database,” in Second international conference on audio and video-based biometric person authentication, vol. 964, 1999, pp. 965–966.
- S. Marcel, C. McCool, P. Matějka, T. Ahonen, J. Černockỳ, S. Chakraborty, V. Balasubramanian, S. Panchanathan, C. H. Chan, J. Kittler et al., “On the results of the first mobile biometry (mobio) face and speaker verification evaluation,” in International Conference on Pattern Recognition. Springer, 2010, pp. 210–225.
- Z.-H. Feng, J. Kittler, W. Christmas, P. Huber, and X.-J. Wu, “Dynamic attention-controlled cascaded shape regression exploiting training data augmentation and fuzzy-set sample weighting,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017, pp. 2481–2490.
- K. He and X. Xue, “Facial landmark localization by part-aware deep convolutional network,” in Pacific Rim Conference on Multimedia. Springer, 2016, pp. 22–31.
- A. Zadeh, T. Baltrusaitis, and L.-P. Morency, “Convolutional experts network for facial landmark detection,” in Proceedings of the International Conference on Computer Vision & Pattern Recognition (CVPRW), Faces-in-the-wild Workshop/Challenge, vol. 3, no. 5, 2017, p. 6.
- Y. Wu, T. Hassner, K. Kim, G. Medioni, and P. Natarajan, “Facial landmark detection with tweaked convolutional neural networks,” IEEE transactions on pattern analysis and machine intelligence, vol. 40, no. 12, pp. 3067–3074, 2017.
- H. Zhang, Q. Li, Z. Sun, and Y. Liu, “Combining data-driven and model-driven methods for robust facial landmark detection,” IEEE Transactions on Information Forensics and Security, vol. 13, no. 10, pp. 2409–2422, 2018.
- K. Zhang, Z. Zhang, Z. Li, and Y. Qiao, “Joint face detection and alignment using multitask cascaded convolutional networks,” IEEE Signal Processing Letters, vol. 23, no. 10, pp. 1499–1503, 2016.
Sponsor
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.