Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
158 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

CATFace: Cross-Attribute-Guided Transformer with Self-Attention Distillation for Low-Quality Face Recognition (2401.03037v1)

Published 5 Jan 2024 in cs.CV

Abstract: Although face recognition (FR) has achieved great success in recent years, it is still challenging to accurately recognize faces in low-quality images due to the obscured facial details. Nevertheless, it is often feasible to make predictions about specific soft biometric (SB) attributes, such as gender, and baldness even in dealing with low-quality images. In this paper, we propose a novel multi-branch neural network that leverages SB attribute information to boost the performance of FR. To this end, we propose a cross-attribute-guided transformer fusion (CATF) module that effectively captures the long-range dependencies and relationships between FR and SB feature representations. The synergy created by the reciprocal flow of information in the dual cross-attention operations of the proposed CATF module enhances the performance of FR. Furthermore, we introduce a novel self-attention distillation framework that effectively highlights crucial facial regions, such as landmarks by aligning low-quality images with those of their high-quality counterparts in the feature space. The proposed self-attention distillation regularizes our network to learn a unified quality-invariant feature representation in unconstrained environments. We conduct extensive experiments on various FR benchmarks varying in quality. Experimental results demonstrate the superiority of our FR method compared to state-of-the-art FR studies.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (93)
  1. R. Khallaf and M. Khallaf, “Classification and analysis of deep learning applications in construction: A systematic literature review,” Autom. Constr., vol. 129, p. 103760, 2021.
  2. C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, and A. Rabinovich, “Going deeper with convolutions,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), 2015, pp. 1–9.
  3. K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), 2016, pp. 770–778.
  4. F. Wang, X. Xiang, J. Cheng, and A. L. Yuille, “NormFace: L2 hypersphere embedding for face verification,” in Proc. 29th ACM Int. Conf. Multimedia, 2017, pp. 1041–1049.
  5. J. Deng, J. Guo, N. Xue, and S. Zafeiriou, “ArcFace: Additive angular margin loss for deep face recognition,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), 2019, pp. 4690–4699.
  6. W. Liu, Y. Wen, Z. Yu, M. Li, B. Raj, and L. Song, “SphereFace: Deep hypersphere embedding for face recognition,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), 2017, pp. 212–220.
  7. H. Wang, Y. Wang, Z. Zhou, X. Ji, D. Gong, J. Zhou, Z. Li, and W. Liu, “CosFace: Large margin cosine loss for deep face recognition,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), 2018, pp. 5265–5274.
  8. Q. Meng, S. Zhao, Z. Huang, and F. Zhou, “MagFace: A universal representation for face recognition and quality assessment,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), 2021, pp. 14 225–14 234.
  9. M. Kim, A. K. Jain, and X. Liu, “AdaFace: Quality adaptive margin for face recognition,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), 2022, pp. 18 750–18 759.
  10. Q. Cao, L. Shen, W. Xie, O. M. Parkhi, and A. Zisserman, “VggFace2: A dataset for recognising faces across pose and age,” in Proc. 13th IEEE Int. Conf. Automat. Face Gesture Recognit. (FG), 2018, pp. 67–74.
  11. D. Yi, Z. Lei, S. Liao, and S. Z. Li, “Learning face representation from scratch,” arXiv preprint arXiv:1411.7923, 2014.
  12. Y. Guo, L. Zhang, Y. Hu, X. He, and J. Gao, “MS-Celeb-1M: A dataset and benchmark for large-scale face recognition,” in Proc. Eur. Conf. Comput. Vis. (ECCV), 2016, pp. 87–102.
  13. B. Maze, J. Adams, J. A. Duncan, N. Kalka, T. Miller, C. Otto, A. K. Jain, W. T. Niggel, J. Anderson, J. Cheney et al., “IARPA Janus benchmark-c: Face dataset and protocol,” in Proc. Int. Conf. Biometrics (ICB), 2018, pp. 158–165.
  14. L. Tran, X. Yin, and X. Liu, “Disentangled representation learning GAN for pose-invariant face recognition,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), 2017, pp. 1415–1424.
  15. N. Singh, S. S. Rathore, and S. Kumar, “Towards a super-resolution based approach for improved face recognition in low resolution environment,” Multimed. Tools Appl., vol. 81, no. 27, pp. 38 887–38 919, 2022.
  16. J. Chen, J. Chen, Z. Wang, C. Liang, and C.-W. Lin, “Identity-aware face super-resolution for low-resolution face recognition,” IEEE Signal Process. Lett., vol. 27, pp. 645–649, 2020.
  17. I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio, “Generative adversarial networks,” Proc. Adv. Neural Inf. Process. Syst., pp. 2672–2680, 2014.
  18. M.-Y. Liu and O. Tuzel, “Coupled generative adversarial networks,” in Proc. Adv. Neural Inf. Process. Syst., D. Lee, M. Sugiyama, U. Luxburg, I. Guyon, and R. Garnett, Eds., vol. 29, 2016.
  19. T. Karras, S. Laine, and T. Aila, “A style-based generator architecture for generative adversarial networks,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), June 2019.
  20. X. Tu, J. Zhao, Q. Liu, W. Ai, G. Guo, Z. Li, W. Liu, and J. Feng, “Joint face image restoration and frontalization for recognition,” IEEE Trans. Circuits Syst. Video Technol., vol. 32, no. 3, pp. 1285–1298, 2021.
  21. X. Yin, X. Yu, K. Sohn, X. Liu, and M. Chandraker, “Towards large-pose face frontalization in the wild,” in Proc. IEEE/CVF Int. Conf. Comput. Vis. (ICCV), 2017.
  22. Y. Yin, J. P. Robinson, S. Jiang, Y. Bai, C. Qin, and Y. Fu, “SuperFront: From low-resolution to high-resolution frontal face synthesis,” in Proc. 29th ACM Int. Conf. Multimedia, 2021, pp. 1609–1617.
  23. J. Cao, Y. Hu, H. Zhang, R. He, and Z. Sun, “Towards high fidelity face frontalization in the wild,” Int. J. Comput. Vis., vol. 128, pp. 1485–1504, 2020.
  24. Z. Jin, J.-Y. Yang, Z.-S. Hu, and Z. Lou, “Face recognition based on the uncorrelated discriminant transformation,” Pattern Recognit., vol. 34, no. 7, pp. 1405–1416, 2001.
  25. M. Ji, S. Shin, S. Hwang, G. Park, and I.-C. Moon, “Refine myself by teaching myself: Feature refinement via self-knowledge distillation,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), 2021, pp. 10 664–10 673.
  26. S. Ge, S. Zhao, C. Li, Y. Zhang, and J. Li, “Efficient low-resolution face recognition via bridge distillation,” IEEE Trans. Image Process., vol. 29, pp. 6898–6908, 2020.
  27. S. Ge, S. Zhao, C. Li, and J. Li, “Low-resolution face recognition in the wild via selective knowledge distillation,” IEEE Trans. Image Process., vol. 28, no. 4, pp. 2051–2062, 2018.
  28. D. Roich, R. Mokady, A. H. Bermano, and D. Cohen-Or, “Pivotal tuning for latent-based editing of real images,” ACM Trans. Graph., vol. 42, no. 1, pp. 1–13, 2022.
  29. T. Xiao, J. Hong, and J. Ma, “ELEGANT: Exchanging latent encodings with gan for transferring multiple face attributes,” in Proc. Eur. Conf. Comput. Vis. (ECCV), 2018, pp. 168–184.
  30. F. Liu, M. Kim, A. Jain, and X. Liu, “Controllable and guided face synthesis for unconstrained face recognition,” in Proc. Eur. Conf. Comput. Vis. (ECCV).   Springer, 2022, pp. 701–719.
  31. Z. Mao, N. Chimitt, and S. H. Chan, “Accelerating atmospheric turbulence simulation via learned phase-to-space transform,” in Proc. IEEE/CVF Int. Conf. Comput. Vis. (ICCV), 2021, pp. 14 759–14 768.
  32. W. Robbins and T. E. Boult, “On the effect of atmospheric turbulence in the feature space of deep face recognition,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), 2022, pp. 1618–1626.
  33. R. Caruana, “Multitask learning,” Mach. Learn., vol. 28, pp. 41–75, 1997.
  34. A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classification with deep convolutional neural networks,” Proc. Adv. Neural Inf. Process. Syst., vol. 25, 2012.
  35. M.-T. Tran, S.-H. Kim, H.-J. Yang, and G.-S. Lee, “Multi-task learning for medical image inpainting based on organ boundary awareness,” Appl. Sci., vol. 11, no. 9, 2021.
  36. S. Chen, G. Bortsova, A. García-Uceda Juárez, G. Van Tulder, and M. De Bruijne, “Multi-task attention-based semi-supervised learning for medical image segmentation,” in Proc. Int. Conf. Med. Image Comput. Comput.-Assist. Intervent., 2019, pp. 457–465.
  37. A. Khattar, S. Hegde, and R. Hebbalaguppe, “Cross-domain multi-task learning for object detection and saliency estimation,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), 2021, pp. 3639–3648.
  38. W. Zhang, K. Wang, Y. Wang, L. Yan, and F.-Y. Wang, “A loss-balanced multi-task model for simultaneous detection and segmentation,” Neurocomputing, vol. 428, pp. 65–78, 2021.
  39. R. Ranjan, V. M. Patel, and R. Chellappa, “HyperFace: A deep multi-task learning framework for face detection, landmark localization, pose estimation, and gender recognition,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 41, no. 1, pp. 121–135, 2017.
  40. R. Ranjan, S. Sankaranarayanan, C. D. Castillo, and R. Chellappa, “An all-in-one convolutional neural network for face analysis,” in Proc. 12th IEEE Int. Conf. Automat. Face Gesture Recognit. (FG), 2017, pp. 17–24.
  41. G. Levi and T. Hassner, “Age and gender classification using convolutional neural networks,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. Workshops (CVPRW), 2015, pp. 34–42.
  42. G. Hinton, O. Vinyals, and J. Dean, “Distilling the knowledge in a neural network,” arXiv preprint arXiv:1503.02531, 2015.
  43. J. Yim, D. Joo, J. Bae, and J. Kim, “A gift from knowledge distillation: Fast optimization, network minimization and transfer learning,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), 2017, pp. 4133–4141.
  44. W. Park, D. Kim, Y. Lu, and M. Cho, “Relational knowledge distillation,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), 2019, pp. 3967–3976.
  45. L. Zhang, J. Song, A. Gao, J. Chen, C. Bao, and K. Ma, “Be your own teacher: Improve the performance of convolutional neural networks via self distillation,” in Proc. IEEE/CVF Int. Conf. Comput. Vis. (ICCV), 2019, pp. 3713–3722.
  46. T.-B. Xu and C.-L. Liu, “Data-distortion guided self-distillation for deep neural networks,” in Proc. AAAI Conf. Artif. Intell., vol. 33, no. 01, 2019, pp. 5565–5572.
  47. S. Yun, J. Park, K. Lee, and J. Shin, “Regularizing class-wise predictions via self-knowledge distillation,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), 2020, pp. 13 876–13 885.
  48. S. Shin, Y. Yu, and K. Lee, “Enhancing low-resolution face recognition with feature similarity knowledge distillation,” arXiv preprint arXiv:2303.04681, 2023.
  49. Y. Liu, J. Cao, B. Li, W. Hu, J. Ding, and L. Li, “Cross-architecture knowledge distillation,” in Proc. Asian Conf. Comput. Vis. (ACCV), 2022, pp. 3396–3411.
  50. A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Unterthiner, M. Dehghani, M. Minderer, G. Heigold, S. Gelly et al., “An image is worth 16x16 words: Transformers for image recognition at scale,” in Proc. Int. Conf. Learn. Represent.
  51. M. Luo, H. Wu, H. Huang, W. He, and R. He, “Memory-modulated transformer network for heterogeneous face recognition,” IEEE Trans. Inf. Forensics Secur., vol. 17, pp. 2095–2109, 2022.
  52. Y. Zhong and W. Deng, “Face transformer for recognition,” arXiv preprint arXiv:2103.14803, 2021.
  53. A. George, A. Mohammadi, and S. Marcel, “Prepended domain transformer: Heterogeneous face recognition without bells and whistles,” IEEE Trans. Inf. Forensics Secur., vol. 18, pp. 133–146, 2022.
  54. W. Su, Y. Wang, K. Li, P. Gao, and Y. Qiao, “Hybrid token transformer for deep face recognition,” Pattern Recognit., vol. 139, p. 109443, 2023.
  55. W. Liu, Y. Wen, Z. Yu, and M. Yang, “Large-margin softmax loss for convolutional neural networks,” arXiv preprint arXiv:1612.02295, 2016.
  56. D. Hendrycks and K. Gimpel, “Gaussian error linear units (GELUs),” arXiv preprint arXiv:1606.08415, 2016.
  57. K. Yuan, S. Guo, Z. Liu, A. Zhou, F. Yu, and W. Wu, “Incorporating convolution designs into visual transformers,” in Proc. IEEE/CVF Int. Conf. Comput. Vis. (ICCV), 2021, pp. 579–588.
  58. Z. Wang, X. Cun, J. Bao, W. Zhou, J. Liu, and H. Li, “Uformer: A general u-shaped transformer for image restoration,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), 2022, pp. 17 683–17 693.
  59. C. Szegedy, V. Vanhoucke, S. Ioffe, J. Shlens, and Z. Wojna, “Rethinking the inception architecture for computer vision,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), 2016, pp. 2818–2826.
  60. Z. Liu, P. Luo, X. Wang, and X. Tang, “Deep learning face attributes in the wild,” in Proc. IEEE/CVF Int. Conf. Comput. Vis. (ICCV), December 2015.
  61. Z. Zhu, G. Huang, J. Deng, Y. Ye, J. Huang, X. Chen, J. Zhu, T. Yang, J. Lu, D. Du et al., “WebFace260M: A benchmark unveiling the power of million-scale deep face recognition,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), 2021, pp. 10 492–10 502.
  62. Z. Huang, J. Zhang, and H. Shan, “When age-invariant face recognition meets face age synthesis: A multi-task learning framework,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), 2021, pp. 7282–7291.
  63. G. B. Huang, M. Mattar, T. Berg, and E. Learned-Miller, “Labeled faces in the wild: A database forstudying face recognition in unconstrained environments,” in Proc. Workshop Faces Real-Life Images, Detect., Alignment, Recognit., 2008.
  64. S. Sengupta, J.-C. Chen, C. Castillo, V. M. Patel, R. Chellappa, and D. W. Jacobs, “Frontal to profile face verification in the wild,” in Proc. IEEE Winter Conf. Appl. Comput. Vis. (WACV), 2016, pp. 1–9.
  65. T. Zheng and W. Deng, “Cross-Pose LFW: A database for studying cross-pose face recognition in unconstrained environments,” Beijing University of Posts and Telecommunications, Tech. Rep. 18-01, 2018.
  66. T. Zheng, W. Deng, and J. Hu, “Cross-Age LFW: A database for studying cross-age face recognition in unconstrained environments,” arXiv preprint arXiv:1708.08197, 2017.
  67. S. Moschoglou, A. Papaioannou, C. Sagonas, J. Deng, I. Kotsia, and S. Zafeiriou, “AgeDB: The first manually collected, in-the-wild age database,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. Workshops (CVPRW), 2017, pp. 1997–2005.
  68. C. Whitelam, E. Taborsky, A. Blanton, B. Maze, J. Adams, T. Miller, N. Kalka, A. K. Jain, J. A. Duncan, K. Allen et al., “IARPA Janus benchmark-b face dataset,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. Workshops (CVPRW), 2017, pp. 90–98.
  69. Z. Cheng, X. Zhu, and S. Gong, “Low-resolution face recognition,” in Proc. Asian Conf. Comput. Vis. (ACCV), 2019, pp. 605–621.
  70. M. Grgic, K. Delac, and S. Grgic, “SCFace–surveillance cameras face database,” Multimed. Tools Appl., vol. 51, pp. 863–879, 2011.
  71. S. Yang, P. Luo, C.-C. Loy, and X. Tang, “WIDER FACE: A face detection benchmark,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), 2016, pp. 5525–5533.
  72. J. Hu, L. Shen, and G. Sun, “Squeeze-and-excitation networks,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), 2018, pp. 7132–7141.
  73. J. Zhang, H. Liu, K. Yang, X. Hu, R. Liu, and R. Stiefelhagen, “CMX: Cross-modal fusion for rgb-x semantic segmentation with transformers,” arXiv preprint arXiv:2203.04838, 2022.
  74. Y.-H. H. Tsai, S. Bai, P. P. Liang, J. Z. Kolter, L.-P. Morency, and R. Salakhutdinov, “Multimodal transformer for unaligned multimodal language sequences,” in Proc. Annu. Meet. Assoc. Comput. Linguist., vol. 2019, 2019, p. 6558.
  75. P. Li, S. Tu, and L. Xu, “Deep rival penalized competitive learning for low-resolution face recognition,” Neural Networks, vol. 148, pp. 183–193, 2022.
  76. J. C. L. Chai, T.-S. Ng, C.-Y. Low, J. Park, and A. B. J. Teoh, “Recognizability embedding enhancement for very low-resolution face recognition and quality estimation,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), 2023, pp. 9957–9967.
  77. X. Yin, Y. Tai, Y. Huang, and X. Liu, “FAN: Feature adaptation network for surveillance face recognition and normalization,” in Proc. Asian Conf. Comput. Vis. (ACCV), 2020.
  78. H. Wang, S. Wang, and L. Fang, “Two-stage multi-scale resolution-adaptive network for low-resolution face recognition,” in Proc. 29th ACM Int. Conf. Multimedia, 2022, pp. 4053–4062.
  79. Z. Liu, P. Luo, X. Wang, and X. Tang, “Deep learning face attributes in the wild,” in Proc. IEEE/CVF Int. Conf. Comput. Vis. (ICCV), 2015, pp. 3730–3738.
  80. E. M. Rudd, M. Günther, and T. E. Boult, “MOON: A mixed objective optimization network for the recognition of facial attributes,” in Proc. Eur. Conf. Comput. Vis. (ECCV), 2016, pp. 19–35.
  81. N. Zhuang, Y. Yan, S. Chen, and H. Wang, “Multi-task learning of cascaded CNN for facial attribute classification,” in Proc. Int. Conf. Pattern Recognit. (ICPR), 2018, pp. 2069–2074.
  82. L. Mao, Y. Yan, J.-H. Xue, and H. Wang, “Deep multi-task multi-label CNN for effective facial attribute classification,” IEEE Trans. Affect. Comput., vol. 13, no. 2, pp. 818–828, 2020.
  83. X. Wang, S. Zhang, S. Wang, T. Fu, H. Shi, and T. Mei, “Mis-classified vector guided softmax loss for face recognition,” in Proc. AAAI Conf. Artif. Intell., no. 07, 2020, pp. 12 241–12 248.
  84. Y. Shi, X. Yu, K. Sohn, M. Chandraker, and A. K. Jain, “Towards universal representation learning for deep face recognition,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), 2020, pp. 6817–6826.
  85. S. Li, J. Xu, X. Xu, P. Shen, S. Li, and B. Hooi, “Spherical confidence learning for face recognition,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), 2021, pp. 15 624–15 632.
  86. C.-Y. Low, A. B.-J. Teoh, and J. Park, “MIND-Net: A deep mutual information distillation network for realistic low-resolution face recognition,” IEEE Signal Process. Lett., vol. 28, pp. 354–358, 2021.
  87. F. Boutros, N. Damer, F. Kirchbuchner, and A. Kuijper, “ElasticFace: Elastic margin loss for deep face recognition,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), 2022, pp. 1578–1587.
  88. H. Wang and S. Wang, “Low-resolution face recognition enhanced by high-resolution facial images,” in Proc. 12th IEEE Int. Conf. Automat. Face Gesture Recognit. (FG).   IEEE, 2023, pp. 1–8.
  89. S. Woo, J. Park, J.-Y. Lee, and I. S. Kweon, “CBAM: Convolutional block attention module,” in Proc. Eur. Conf. Comput. Vis. (ECCV), 2018, pp. 3–19.
  90. R. K. Soni and N. Nain, “Synthetic data approach for unconstrained low-resolution face recognition in surveillance applications,” in Proc. of the Indian Conf. on Comput. Vis., Graph. and Image Process. (ICVGIP), 2022, pp. 1–6.
  91. Y. Huang, P. Shen, Y. Tai, S. Li, X. Liu, J. Li, F. Huang, and R. Ji, “Improving face recognition from hard samples via distribution distillation loss,” in Proc. Eur. Conf. Comput. Vis. (ECCV).   Springer, 2020, pp. 138–154.
  92. Z. Lu, X. Jiang, and A. Kot, “Deep coupled resnet for low-resolution face recognition,” IEEE Signal Process. Lett., vol. 25, no. 4, pp. 526–530, 2018.
  93. S. Zagoruyko and N. Komodakis, “Paying more attention to attention: Improving the performance of convolutional neural networks via attention transfer,” arXiv preprint arXiv:1612.03928, 2016.
Citations (2)

Summary

We haven't generated a summary for this paper yet.