Papers
Topics
Authors
Recent
Assistant
AI Research Assistant
Well-researched responses based on relevant abstracts and paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses.
Gemini 2.5 Flash
Gemini 2.5 Flash 175 tok/s
Gemini 2.5 Pro 54 tok/s Pro
GPT-5 Medium 38 tok/s Pro
GPT-5 High 37 tok/s Pro
GPT-4o 108 tok/s Pro
Kimi K2 180 tok/s Pro
GPT OSS 120B 447 tok/s Pro
Claude Sonnet 4.5 36 tok/s Pro
2000 character limit reached

KeyPoint Relative Position Encoding for Face Recognition (2403.14852v1)

Published 21 Mar 2024 in cs.CV

Abstract: In this paper, we address the challenge of making ViT models more robust to unseen affine transformations. Such robustness becomes useful in various recognition tasks such as face recognition when image alignment failures occur. We propose a novel method called KP-RPE, which leverages key points (e.g.~facial landmarks) to make ViT more resilient to scale, translation, and pose variations. We begin with the observation that Relative Position Encoding (RPE) is a good way to bring affine transform generalization to ViTs. RPE, however, can only inject the model with prior knowledge that nearby pixels are more important than far pixels. Keypoint RPE (KP-RPE) is an extension of this principle, where the significance of pixels is not solely dictated by their proximity but also by their relative positions to specific keypoints within the image. By anchoring the significance of pixels around keypoints, the model can more effectively retain spatial relationships, even when those relationships are disrupted by affine transformations. We show the merit of KP-RPE in face and gait recognition. The experimental results demonstrate the effectiveness in improving face recognition performance from low-quality images, particularly where alignment is prone to failure. Code and pre-trained models are available.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (91)
  1. Killing two birds with one stone: Efficient and robust training of face recognition cnns by partial fc. In CVPR, pages 4042–4051, 2022.
  2. Partial fc: Training 10 million identities on a single machine. In ICCV, pages 1445–1449, 2021.
  3. Digiface-1m: 1 million digital face images for face recognition. In WACV. IEEE, 2023.
  4. How far are we from solving the 2D & 3D face alignment problem? (and a dataset of 230,000 3d facial landmarks). In ICCV, 2017.
  5. Realtime multi-person 2d pose estimation using part affinity fields. In CVPR, pages 7291–7299, 2017.
  6. Gaitset: Regarding gait as a set for cross-view gait recognition. In Proceedings of the AAAI conference on artificial intelligence, volume 33, pages 8126–8133, 2019.
  7. Low-resolution face recognition. In ACCV, 2018.
  8. Randaugment: Practical automated data augmentation with a reduced search space. In CVPR workshops, pages 702–703, 2020.
  9. Transformer-xl: Attentive language models beyond a fixed-length context. In ACL, pages 2978–2988, 2019.
  10. Retinaface: Single-shot multi-level face localisation in the wild. In CVPR, pages 5203–5212, 2020.
  11. ArcFace: Additive angular margin loss for deep face recognition. In CVPR, 2019.
  12. Variational prototype learning for deep face recognition. In CVPR, 2021.
  13. Lightweight face recognition challenge. In ICCVW, 2019.
  14. Li Deng. The mnist database of handwritten digit images for machine learning research. IEEE Signal Processing Magazine, 29(6):141–142, 2012.
  15. An image is worth 16x16 words: Transformers for image recognition at scale. In ICLR, 2021.
  16. Gaitgci: Generative counterfactual intervention for gait recognition. In CVPR, 2023.
  17. Exploring deep models for practical gait recognition. arXiv preprint arXiv:2303.03301, 2023.
  18. Opengait: Revisiting gait recognition toward better practicality. arXiv preprint arXiv:2211.06597, 2022.
  19. Convolutional sequence to sequence learning. In ICML, pages 1243–1252. PMLR, 2017.
  20. Meet-in-the-middle: Multi-scale upsampling and matching for cross-resolution face recognition. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pages 120–129, 2023.
  21. Latent fingerprint recognition: Fusion of local and global embeddings. arXiv preprint arXiv:2304.13800, 2023.
  22. MS-Celeb-1M: A dataset and benchmark for large-scale face recognition. In ECCV, 2016.
  23. Unified keypoint-based action recognition framework via structured keypoint pooling. In CVPR, pages 22962–22971, 2023.
  24. Augment your batch: Improving generalization through instance repetition. In CVPR, pages 8129–8138, 2020.
  25. Labeled Faces in the Wild: A database forstudying face recognition in unconstrained environments. In Workshop on Faces in’Real-Life’Images: Detection, Alignment, and Recognition, 2008.
  26. Improving face recognition from hard samples via distribution distillation loss. In ECCV, 2020.
  27. CurricularFace: adaptive curriculum learning loss for deep face recognition. In CVPR, 2020.
  28. Improve transformer models with better relative position embeddings. In EMNLP, pages 3327–3335, Online, November 2020.
  29. IJB–S: IARPA Janus Surveillance Video Benchmark. In BTAS, 2018.
  30. AdaFace: Quality adaptive margin for face recognition. In CVPR, 2022.
  31. Cluster and aggregate: Face recognition with large probe set. In NeurIPS, 2022.
  32. DCFace: Synthetic face generation with dual condition diffusion model. 2023.
  33. BroadFace: Looking at tens of thousands of people at once for face recognition. In ECCV, 2020.
  34. Pushing the frontiers of unconstrained face detection and recognition: IARPA Janus Benchmark-A. In CVPR, 2015.
  35. Luvli face alignment: Estimating landmarks’ location, uncertainty, and visibility likelihood. In CVPR, 2020.
  36. Towards understanding cross resolution feature matching for surveillance face recognition. In Proceedings of the 30th ACM International Conference on Multimedia, pages 6706–6716, 2022.
  37. Spherical confidence learning for face recognition. In CVPR, 2021.
  38. Feature pyramid networks for object detection. In CVPR, pages 2117–2125, 2017.
  39. Farsight: A physics-driven whole-body biometric system at large distance and altitude. In WACV, 2024.
  40. Learning clothing and pose invariant 3d shape representation for long-term person re-identification. In ICCV, 2023.
  41. Controllable and guided face synthesis for unconstrained face recognition. In Computer Vision–ECCV 2022: 17th European Conference, Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part XII, pages 701–719. Springer, 2022.
  42. Ssd: Single shot multibox detector. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14, pages 21–37. Springer, 2016.
  43. SphereFace: Deep hypersphere embedding for face recognition. In CVPR, 2017.
  44. Swin transformer: Hierarchical vision transformer using shifted windows. In ICCV, 2021.
  45. Sgdr: Stochastic gradient descent with warm restarts. arXiv preprint arXiv:1608.03983, 2016.
  46. Decoupled weight decay regularization. arXiv preprint arXiv:1711.05101, 2017.
  47. Dynamic aggregated network for gait recognition. In CVPR, 2023.
  48. IARPA Janus Benchmark-C: Face dataset and protocol. In ICB, 2018.
  49. MagFace: A universal representation for face recognition and quality assessment. In CVPR, 2021.
  50. AGEDB: the first manually collected, in-the-wild age database. In CVPRW, 2017.
  51. Stacked hourglass networks for human pose estimation. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11-14, 2016, Proceedings, Part VIII 14, pages 483–499. Springer, 2016.
  52. Towards accurate multi-person pose estimation in the wild. In CVPR, pages 4903–4911, 2017.
  53. Stand-alone self-attention in vision models. NeurIPS, 32, 2019.
  54. Crystal loss and quality pooling for unconstrained face verification and recognition. arXiv preprint arXiv:1804.01159, 2018.
  55. Deepspeed: System optimizations enable training deep learning models with over 100 billion parameters. In Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pages 3505–3506, 2020.
  56. Faster r-cnn: Towards real-time object detection with region proposal networks. Advances in neural information processing systems, 28, 2015.
  57. Dynamic routing between capsules. Advances in neural information processing systems, 30, 2017.
  58. Mobilenetv2: Inverted residuals and linear bottlenecks. In CVPR, pages 4510–4520, 2018.
  59. Frontal to profile face verification in the wild. In WACV, 2016.
  60. Self-attention with relative position representations. arXiv preprint arXiv:1803.02155, 2018.
  61. Probabilistic face embeddings. In ICCV, 2019.
  62. Towards universal representation learning for deep face recognition. In CVPR, 2020.
  63. Teaching where to look: Attention similarity knowledge distillation for low resolution face recognition. In Computer Vision–ECCV 2022: 17th European Conference, Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part XII, pages 631–647. Springer, 2022.
  64. Human interaction learning on 3d skeleton point clouds for video violence recognition. In Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part IV 16, pages 74–90. Springer, 2020.
  65. Towards highly accurate and stable face alignment for high-resolution videos. In AAAI, 2019.
  66. Mlp-mixer: An all-mlp architecture for vision. In NeurIPS, 2021.
  67. Training data-efficient image transformers & distillation through attention. In ICML, 2021.
  68. Attention is all you need. In NeurIPS, 2017.
  69. CosFace: Large margin cosine loss for deep face recognition. In CVPR, 2018.
  70. Hierarchical spatio-temporal representation learning for gait recognition. In ICCV, 2023.
  71. Dygait: Exploiting dynamic representations for high-performance gait recognition. In ICCV, 2023.
  72. IARPA Janus Benchmark-B face dataset. In CVPRW, 2017.
  73. Ross Wightman. Pytorch image models. https://github.com/rwightman/pytorch-image-models, 2019.
  74. Rethinking and improving relative position encoding for vision transformer. In ICCV, pages 10033–10041, 2021.
  75. Look at boundary: A boundary-aware face alignment algorithm. In CVPR, pages 2129–2138, 2018.
  76. Sparse local patch transformer for robust face alignment and landmarks inherent relation learning. In CVPR, pages 4052–4061, 2022.
  77. Spatial temporal graph convolutional networks for skeleton-based action recognition. In Proceedings of the AAAI conference on artificial intelligence, volume 32, 2018.
  78. Wider face: A face detection benchmark. In CVPR, pages 5525–5533, 2016.
  79. Deep learning for person re-identification: A survey and outlook. IEEE transactions on pattern analysis and machine intelligence, 44(6):2872–2893, 2021.
  80. Fan: Feature adaptation network for surveillance face recognition and normalization. In ACCV, 2020.
  81. Joint face detection and alignment using multitask cascaded convolutional networks. Signal Processing Letters, 2016.
  82. On learning disentangled representations for gait recognition. IEEE T-PAMI, 44(1):345–360, 2020.
  83. Grouped knowledge distillation for deep face recognition. AAAI, 2023.
  84. An automatic system for unconstrained video-based face recognition. IEEE Transactions on Biometrics, Behavior, and Identity Science, 2(3):194–209, 2020.
  85. Uncertainty modeling of contextual-connections between tracklets for unconstrained video-based face recognition. In ICCV, pages 703–712, 2019.
  86. Gait recognition in the wild with multi-hop temporal switch. In Proceedings of the 30th ACM International Conference on Multimedia, 2022.
  87. Gait recognition in the wild with dense 3d representations and a benchmark. In CVPR, pages 20228–20237, 2022.
  88. Cross-Pose LFW: A database for studying cross-pose face recognition in unconstrained environments. Beijing University of Posts and Telecommunications, Tech. Rep, 5:7, 2018.
  89. Cross-Age LFW: A database for studying cross-age face recognition in unconstrained environments. CoRR, abs/1708.08197, 2017.
  90. Tinaface: Strong but simple baseline for face detection. arXiv preprint arXiv:2011.13183, 2020.
  91. WebFace260M: A benchmark unveiling the power of million-scale deep face recognition. In CVPR, 2021.
Citations (2)

Summary

We haven't generated a summary for this paper yet.

Dice Question Streamline Icon: https://streamlinehq.com

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Lightbulb Streamline Icon: https://streamlinehq.com

Continue Learning

We haven't generated follow-up questions for this paper yet.

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.

X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets

This paper has been mentioned in 1 tweet and received 1 like.

Upgrade to Pro to view all of the tweets about this paper:

Youtube Logo Streamline Icon: https://streamlinehq.com