Trading-off Mutual Information on Feature Aggregation for Face Recognition (2309.13137v1)
Abstract: Despite the advances in the field of Face Recognition (FR), the precision of these methods is not yet sufficient. To improve the FR performance, this paper proposes a technique to aggregate the outputs of two state-of-the-art (SOTA) deep FR models, namely ArcFace and AdaFace. In our approach, we leverage the transformer attention mechanism to exploit the relationship between different parts of two feature maps. By doing so, we aim to enhance the overall discriminative power of the FR system. One of the challenges in feature aggregation is the effective modeling of both local and global dependencies. Conventional transformers are known for their ability to capture long-range dependencies, but they often struggle with modeling local dependencies accurately. To address this limitation, we augment the self-attention mechanism to capture both local and global dependencies effectively. This allows our model to take advantage of the overlapping receptive fields present in corresponding locations of the feature maps. However, fusing two feature maps from different FR models might introduce redundancies to the face embedding. Since these models often share identical backbone architectures, the resulting feature maps may contain overlapping information, which can mislead the training process. To overcome this problem, we leverage the principle of Information Bottleneck to obtain a maximally informative facial representation. This ensures that the aggregated features retain the most relevant and discriminative information while minimizing redundant or misleading details. To evaluate the effectiveness of our proposed method, we conducted experiments on popular benchmarks and compared our results with state-of-the-art algorithms. The consistent improvement we observed in these benchmarks demonstrates the efficacy of our approach in enhancing FR performance.
- J. Deng, J. Guo, N. Xue, and S. Zafeiriou, “Arcface: Additive angular margin loss for deep face recognition,” in CVPR, 2019, pp. 4690–4699.
- H. Wang, Y. Wang, Z. Zhou, X. Ji, D. Gong, J. Zhou, Z. Li, and W. Liu, “Cosface: Large margin cosine loss for deep face recognition,” in CVPR, 2018, pp. 5265–5274.
- W. Liu, Y. Wen, Z. Yu, M. Li, B. Raj, and L. Song, “Sphereface: Deep hypersphere embedding for face recognition,” in CVPR, 2017, pp. 212–220.
- M. Kim, A. K. Jain, and X. Liu, “Adaface: Quality adaptive margin for face recognition,” in CVPR, 2022, pp. 18 750–18 759.
- N. A. Talemi, H. Kashiani, S. R. Malakshan, M. S. E. Saadabadi, N. Najafzadeh, M. Akyash, and N. M. Nasrabadi, “Aaface: Attribute-aware attentional network for face recognition,” in 2023 IEEE International Conference on Image Processing (ICIP). IEEE, 2023, pp. 1940–1944.
- M. Rakhra, D. Singh, A. Singh, K. D. Garg, and D. Gupta, “Face recognition with smart security system,” in 2022 10th International Conference on Reliability, Infocom Technologies and Optimization (Trends and Future Directions)(ICRITO). IEEE, 2022, pp. 1–6.
- M. Karpagam, R. B. Jeyavathana, S. K. Chinnappan, K. Kanimozhi, and M. Sambath, “A novel face recognition model for fighting against human trafficking in surveillance videos and rescuing victims,” Soft Computing, pp. 1–16, 2022.
- B. Adami, S. Tehranipoor, N. M. Nasrabadi, and N. Karimian, “A universal anti-spoofing approach for contactless fingerprint biometric systems,” in 2023 IEEE International Joint Conference on Biometrics (IJCB). IEEE, 2023, pp. 1–8.
- W. Liu, Y. Wen, Z. Yu, and M. Yang, “Large-margin softmax loss for convolutional neural networks,” arXiv preprint arXiv:1612.02295, 2016.
- E. Hoffer and N. Ailon, “Deep metric learning using triplet network,” in Similarity-Based Pattern Recognition: Third International Workshop, SIMBAD 2015, Copenhagen, Denmark, October 12-14, 2015. Proceedings 3. Springer, 2015, pp. 84–92.
- Y. Wen, K. Zhang, Z. Li, and Y. Qiao, “A discriminative feature learning approach for deep face recognition,” in Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part VII 14. Springer, 2016, pp. 499–515.
- K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale image recognition,” arXiv preprint arXiv:1409.1556, 2014.
- K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in CVPR, 2016, pp. 770–778.
- C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, and A. Rabinovich, “Going deeper with convolutions,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2015, pp. 1–9.
- A. G. Howard, M. Zhu, B. Chen, D. Kalenichenko, W. Wang, T. Weyand, M. Andreetto, and H. Adam, “Mobilenets: Efficient convolutional neural networks for mobile vision applications,” arXiv preprint arXiv:1704.04861, 2017.
- M. Tan and Q. Le, “Efficientnet: Rethinking model scaling for convolutional neural networks,” in International conference on machine learning. PMLR, 2019, pp. 6105–6114.
- M. A. Ganaie, M. Hu, A. Malik, M. Tanveer, and P. Suganthan, “Ensemble deep learning: A review,” Engineering Applications of Artificial Intelligence, vol. 115, p. 105151, 2022.
- A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, and I. Polosukhin, “Attention is all you need,” NeurIPS, vol. 30, 2017.
- A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Unterthiner, M. Dehghani, M. Minderer, G. Heigold, S. Gelly et al., “An image is worth 16x16 words: Transformers for image recognition at scale,” arXiv preprint arXiv:2010.11929, 2020.
- C.-F. R. Chen, Q. Fan, and R. Panda, “Crossvit: Cross-attention multi-scale vision transformer for image classification,” in ICCV, 2021, pp. 357–366.
- A. Zafari, A. Khoshkhahtinat, P. Mehta, M. S. E. Saadabadi, M. Akyash, and N. M. Nasrabadi, “Frequency disentangled features in neural image compression,” in 2023 IEEE International Conference on Image Processing (ICIP). IEEE, 2023, pp. 2815–2819.
- B. Heo, S. Yun, D. Han, S. Chun, J. Choe, and S. J. Oh, “Rethinking spatial dimensions of vision transformers,” in ICCV, 2021, pp. 11 936–11 945.
- D. Neimark, O. Bar, M. Zohar, and D. Asselmann, “Video transformer network,” in ICCV, 2021, pp. 3163–3172.
- M. Babic, M. A. Farahani, and T. Wuest, “Image based quality inspection in smart manufacturing systems: A literature review,” Procedia CIRP, vol. 103, pp. 262–267, 2021.
- M. Alipour, I. La Puma, J. Picotte, K. Shamsaei, E. Rowell, A. Watts, B. Kosovic, H. Ebrahimian, and E. Taciroglu, “A multimodal data fusion and deep learning framework for large-scale wildfire surface fuel mapping,” Fire, vol. 6, no. 2, p. 36, 2023.
- W. Zhang, F. Qiu, S. Wang, H. Zeng, Z. Zhang, R. An, B. Ma, and Y. Ding, “Transformer-based multimodal information fusion for facial expression analysis,” in CVPR, 2022, pp. 2428–2437.
- L. Zhou and Y. Luo, “Deep features fusion with mutual attention transformer for skin lesion diagnosis,” in ICIP. IEEE, 2021, pp. 3797–3801.
- K. Yuan, S. Guo, Z. Liu, A. Zhou, F. Yu, and W. Wu, “Incorporating convolution designs into visual transformers,” in ICCV, 2021, pp. 579–588.
- N. Tishby, F. C. Pereira, and W. Bialek, “The information bottleneck method,” arXiv preprint physics/0004057, 2000.
- S. Mohamadi, G. Doretto, and D. A. Adjeroh, “More synergy, less redundancy: Exploiting joint mutual information for self-supervised learning,” in 2023 IEEE International Conference on Image Processing (ICIP). IEEE, 2023, pp. 1390–1394.
- A. A. Alemi, I. Fischer, J. V. Dillon, and K. Murphy, “Deep variational information bottleneck,” arXiv preprint arXiv:1612.00410, 2016.
- S. Moschoglou, A. Papaioannou, C. Sagonas, J. Deng, I. Kotsia, and S. Zafeiriou, “Agedb: the first manually collected, in-the-wild age database,” in proceedings of the IEEE conference on computer vision and pattern recognition workshops, 2017, pp. 51–59.
- S. Sengupta, J.-C. Chen, C. Castillo, V. M. Patel, R. Chellappa, and D. W. Jacobs, “Frontal to profile face verification in the wild,” in 2016 IEEE winter conference on applications of computer vision (WACV). IEEE, 2016, pp. 1–9.
- T. Zheng and W. Deng, “Cross-pose lfw: A database for studying cross-pose face recognition in unconstrained environments,” Beijing University of Posts and Telecommunications, Tech. Rep, vol. 5, no. 7, 2018.
- T. Zheng, W. Deng, and J. Hu, “Cross-age lfw: A database for studying cross-age face recognition in unconstrained environments,” arXiv preprint arXiv:1708.08197, 2017.
- G. B. Huang, M. Mattar, T. Berg, and E. Learned-Miller, “Labeled faces in the wild: A database forstudying face recognition in unconstrained environments,” in Workshop on faces in’Real-Life’Images: detection, alignment, and recognition, 2008.
- C. Whitelam, E. Taborsky, A. Blanton, B. Maze, J. Adams, T. Miller, N. Kalka, A. K. Jain, J. A. Duncan, K. Allen et al., “Iarpa janus benchmark-b face dataset,” in proceedings of the IEEE conference on computer vision and pattern recognition workshops, 2017, pp. 90–98.
- B. Maze, J. Adams, J. A. Duncan, N. Kalka, T. Miller, C. Otto, A. K. Jain, W. T. Niggel, J. Anderson, J. Cheney et al., “Iarpa janus benchmark-c: Face dataset and protocol,” in 2018 international conference on biometrics (ICB). IEEE, 2018, pp. 158–165.
- L. Breiman, “Random forests,” Machine learning, vol. 45, pp. 5–32, 2001.
- J. H. Friedman, “Greedy function approximation: a gradient boosting machine,” Annals of statistics, pp. 1189–1232, 2001.
- T. Chen, T. He, M. Benesty, V. Khotilovich, Y. Tang, H. Cho, K. Chen, R. Mitchell, I. Cano, T. Zhou et al., “Xgboost: extreme gradient boosting,” R package version 0.4-2, vol. 1, no. 4, pp. 1–4, 2015.
- G. Ke, Q. Meng, T. Finley, T. Wang, W. Chen, W. Ma, Q. Ye, and T.-Y. Liu, “Lightgbm: A highly efficient gradient boosting decision tree,” Advances in neural information processing systems, vol. 30, 2017.
- J. A. Hoeting, D. Madigan, A. E. Raftery, and C. T. Volinsky, “Bayesian model averaging: a tutorial (with comments by m. clyde, david draper and ei george, and a rejoinder by the authors,” Statistical science, vol. 14, no. 4, pp. 382–417, 1999.
- P. Izmailov, D. Podoprikhin, T. Garipov, D. Vetrov, and A. G. Wilson, “Averaging weights leads to wider optima and better generalization,” arXiv preprint arXiv:1803.05407, 2018.
- M. I. Belghazi, A. Baratin, S. Rajeshwar, S. Ozair, Y. Bengio, A. Courville, and D. Hjelm, “Mutual information neural estimation,” in ICML. PMLR, 2018, pp. 531–540.
- Y. Huang, Y. Wang, Y. Tai, X. Liu, P. Shen, S. Li, J. Li, and F. Huang, “Curricularface: adaptive curriculum learning loss for deep face recognition,” in CVPR, 2020, pp. 5901–5910.
- Q. Meng, S. Zhao, Z. Huang, and F. Zhou, “Magface: A universal representation for face recognition and quality assessment,” in CVPR, 2021, pp. 14 225–14 234.
- Y. Kim, W. Park, and J. Shin, “Broadface: Looking at tens of thousands of people at once for face recognition,” in ECCV. Springer, 2020, pp. 536–552.
- S. Li, J. Xu, X. Xu, P. Shen, S. Li, and B. Hooi, “Spherical confidence learning for face recognition,” in CVPR, 2021, pp. 15 629–15 637.
- Z. Zhu, G. Huang, J. Deng, Y. Ye, J. Huang, X. Chen, J. Zhu, T. Yang, J. Lu, D. Du et al., “Webface260m: A benchmark unveiling the power of million-scale deep face recognition,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021, pp. 10 492–10 502.
- J. Deng, J. Guo, E. Ververas, I. Kotsia, and S. Zafeiriou, “Retinaface: Single-shot multi-level face localisation in the wild,” in CVPR, 2020, pp. 5203–5212.