Cross Architecture Distillation for Face Recognition (2306.14662v1)
Abstract: Transformers have emerged as the superior choice for face recognition tasks, but their insufficient platform acceleration hinders their application on mobile devices. In contrast, Convolutional Neural Networks (CNNs) capitalize on hardware-compatible acceleration libraries. Consequently, it has become indispensable to preserve the distillation efficacy when transferring knowledge from a Transformer-based teacher model to a CNN-based student model, known as Cross-Architecture Knowledge Distillation (CAKD). Despite its potential, the deployment of CAKD in face recognition encounters two challenges: 1) the teacher and student share disparate spatial information for each pixel, obstructing the alignment of feature space, and 2) the teacher network is not trained in the role of a teacher, lacking proficiency in handling distillation-specific knowledge. To surmount these two constraints, 1) we first introduce a Unified Receptive Fields Mapping module (URFM) that maps pixel features of the teacher and student into local features with unified receptive fields, thereby synchronizing the pixel-wise spatial information of teacher and student. Subsequently, 2) we develop an Adaptable Prompting Teacher network (APT) that integrates prompts into the teacher, enabling it to manage distillation-specific knowledge while preserving the model's discriminative capacity. Extensive experiments on popular face benchmarks and two large-scale verification sets demonstrate the superiority of our method.
- Language Models are Few-Shot Learners. In Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, NeurIPS 2020, December 6-12, 2020, virtual, Hugo Larochelle, Marc’Aurelio Ranzato, Raia Hadsell, Maria-Florina Balcan, and Hsuan-Tien Lin (Eds.). https://proceedings.neurips.cc/paper/2020/hash/1457c0d6bfcb4967418bfb8ac142f64a-Abstract.html
- End-to-End Object Detection with Transformers. In Computer Vision - ECCV 2020 - 16th European Conference, Glasgow, UK, August 23-28, 2020, Proceedings, Part I (Lecture Notes in Computer Science, Vol. 12346), Andrea Vedaldi, Horst Bischof, Thomas Brox, and Jan-Michael Frahm (Eds.). Springer, 213–229. https://doi.org/10.1007/978-3-030-58452-8_13
- Cross-Layer Distillation with Semantic Calibration. In Thirty-Fifth AAAI Conference on Artificial Intelligence, AAAI 2021, Thirty-Third Conference on Innovative Applications of Artificial Intelligence, IAAI 2021, The Eleventh Symposium on Educational Advances in Artificial Intelligence, EAAI 2021, Virtual Event, February 2-9, 2021. AAAI Press, 7028–7036. https://ojs.aaai.org/index.php/AAAI/article/view/16865
- MobileFaceNets: Efficient CNNs for Accurate Real-Time Face Verification on Mobile Devices. In Biometric Recognition - 13th Chinese Conference, CCBR 2018, Urumqi, China, August 11-12, 2018, Proceedings (Lecture Notes in Computer Science, Vol. 10996), Jie Zhou, Yunhong Wang, Zhenan Sun, Zhenhong Jia, Jianjiang Feng, Shiguang Shan, Kurban Ubul, and Zhenhua Guo (Eds.). Springer, 428–438. https://doi.org/10.1007/978-3-319-97909-0_46
- DarkRank: Accelerating Deep Metric Learning via Cross Sample Similarities Transfer. In Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence, (AAAI-18), the 30th innovative Applications of Artificial Intelligence (IAAI-18), and the 8th AAAI Symposium on Educational Advances in Artificial Intelligence (EAAI-18), New Orleans, Louisiana, USA, February 2-7, 2018, Sheila A. McIlraith and Kilian Q. Weinberger (Eds.). AAAI Press, 2852–2859. https://www.aaai.org/ocs/index.php/AAAI/AAAI18/paper/view/17147
- Learning a Similarity Metric Discriminatively, with Application to Face Verification. In 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 2005), 20-26 June 2005, San Diego, CA, USA. IEEE Computer Society, 539–546. https://doi.org/10.1109/CVPR.2005.202
- ArcFace: Additive Angular Margin Loss for Deep Face Recognition. In IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2019, Long Beach, CA, USA, June 16-20, 2019. Computer Vision Foundation / IEEE, 4690–4699. https://doi.org/10.1109/CVPR.2019.00482
- An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale. In 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net. https://openreview.net/forum?id=YicbFdNTTy
- Deep Residual Learning for Image Recognition. In 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016. IEEE Computer Society, 770–778. https://doi.org/10.1109/CVPR.2016.90
- Distilling the Knowledge in a Neural Network. CoRR abs/1503.02531 (2015). arXiv:1503.02531 http://arxiv.org/abs/1503.02531
- Elad Hoffer and Nir Ailon. 2015. Deep metric learning using Triplet network. In 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Workshop Track Proceedings, Yoshua Bengio and Yann LeCun (Eds.). http://arxiv.org/abs/1412.6622
- Searching for mobilenetv3. In Proceedings of the IEEE/CVF international conference on computer vision. 1314–1324.
- Labeled faces in the wild: A database forstudying face recognition in unconstrained environments. In Workshop on faces in’Real-Life’Images: detection, alignment, and recognition.
- CurricularFace: Adaptive Curriculum Learning Loss for Deep Face Recognition. In 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2020, Seattle, WA, USA, June 13-19, 2020. Computer Vision Foundation / IEEE, 5900–5909. https://doi.org/10.1109/CVPR42600.2020.00594
- Evaluation-oriented Knowledge Distillation for Deep Face Recognition. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 18740–18749.
- Show, Attend and Distill: Knowledge Distillation via Attention-based Feature Matching. In Thirty-Fifth AAAI Conference on Artificial Intelligence, AAAI 2021, Thirty-Third Conference on Innovative Applications of Artificial Intelligence, IAAI 2021, The Eleventh Symposium on Educational Advances in Artificial Intelligence, EAAI 2021, Virtual Event, February 2-9, 2021. AAAI Press, 7945–7952. https://ojs.aaai.org/index.php/AAAI/article/view/16969
- Visual Prompt Tuning. In Computer Vision - ECCV 2022 - 17th European Conference, Tel Aviv, Israel, October 23-27, 2022, Proceedings, Part XXXIII (Lecture Notes in Computer Science, Vol. 13693), Shai Avidan, Gabriel J. Brostow, Moustapha Cissé, Giovanni Maria Farinella, and Tal Hassner (Eds.). Springer, 709–727. https://doi.org/10.1007/978-3-031-19827-4_41
- The MegaFace Benchmark: 1 Million Faces for Recognition at Scale. In 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016. IEEE Computer Society, 4873–4882. https://doi.org/10.1109/CVPR.2016.527
- Transformers in Vision: A Survey. ACM Comput. Surv. 54, 10s (2022), 200:1–200:41. https://doi.org/10.1145/3505244
- Davis E. King. 2009. Dlib-ml: A Machine Learning Toolkit. J. Mach. Learn. Res. 10 (2009), 1755–1758. https://doi.org/10.5555/1577069.1755843
- Learning Discriminant Face Descriptor. IEEE Trans. Pattern Anal. Mach. Intell. 36, 2 (2014), 289–302. https://doi.org/10.1109/TPAMI.2013.112
- The Power of Scale for Parameter-Efficient Prompt Tuning. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, EMNLP 2021, Virtual Event / Punta Cana, Dominican Republic, 7-11 November, 2021, Marie-Francine Moens, Xuanjing Huang, Lucia Specia, and Scott Wen-tau Yih (Eds.). Association for Computational Linguistics, 3045–3059. https://doi.org/10.18653/v1/2021.emnlp-main.243
- Stan Z. Li and Anil K. Jain (Eds.). 2011. Handbook of Face Recognition, 2nd Edition. Springer. https://doi.org/10.1007/978-0-85729-932-1
- Xiang Lisa Li and Percy Liang. 2021. Prefix-Tuning: Optimizing Continuous Prompts for Generation. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, ACL/IJCNLP 2021, (Volume 1: Long Papers), Virtual Event, August 1-6, 2021, Chengqing Zong, Fei Xia, Wenjie Li, and Roberto Navigli (Eds.). Association for Computational Linguistics, 4582–4597. https://doi.org/10.18653/v1/2021.acl-long.353
- Exploring Inter-Channel Correlation for Diversity-preserved Knowledge Distillation. In 2021 IEEE/CVF International Conference on Computer Vision, ICCV 2021, Montreal, QC, Canada, October 10-17, 2021. IEEE, 8251–8260. https://doi.org/10.1109/ICCV48922.2021.00816
- Pre-train, Prompt, and Predict: A Systematic Survey of Prompting Methods in Natural Language Processing. ACM Comput. Surv. 55, 9 (2023), 195:1–195:35. https://doi.org/10.1145/3560815
- SphereFace: Deep Hypersphere Embedding for Face Recognition. In 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017. IEEE Computer Society, 6738–6746. https://doi.org/10.1109/CVPR.2017.713
- P-Tuning v2: Prompt Tuning Can Be Comparable to Fine-tuning Universally Across Scales and Tasks. CoRR abs/2110.07602 (2021). arXiv:2110.07602 https://arxiv.org/abs/2110.07602
- Cross-Architecture Knowledge Distillation. In Proceedings of the Asian Conference on Computer Vision. 3396–3411.
- Swin transformer: Hierarchical vision transformer using shifted windows. In Proceedings of the IEEE/CVF international conference on computer vision. 10012–10022.
- Video Swin Transformer. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 3202–3211.
- Understanding the Effective Receptive Field in Deep Convolutional Neural Networks. In Advances in Neural Information Processing Systems 29: Annual Conference on Neural Information Processing Systems 2016, December 5-10, 2016, Barcelona, Spain, Daniel D. Lee, Masashi Sugiyama, Ulrike von Luxburg, Isabelle Guyon, and Roman Garnett (Eds.). 4898–4906. https://proceedings.neurips.cc/paper/2016/hash/c8067ad1937f728f51288b3eb986afaa-Abstract.html
- IARPA Janus Benchmark - C: Face Dataset and Protocol. In 2018 International Conference on Biometrics, ICB 2018, Gold Coast, Australia, February 20-23, 2018. IEEE, 158–165. https://doi.org/10.1109/ICB2018.2018.00033
- MagFace: A Universal Representation for Face Recognition and Quality Assessment. In IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2021, virtual, June 19-25, 2021. Computer Vision Foundation / IEEE, 14225–14234. https://doi.org/10.1109/CVPR46437.2021.01400
- AgeDB: The First Manually Collected, In-the-Wild Age Database. In 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops, CVPR Workshops 2017, Honolulu, HI, USA, July 21-26, 2017. IEEE Computer Society, 1997–2005. https://doi.org/10.1109/CVPRW.2017.250
- NVIDIA. 2007. CUDA. https://developer.nvidia.com/cuda-zone
- NVIDIA. 2022. TensorRT. https://developer.nvidia.com/cuda-zone
- Relational Knowledge Distillation. In IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2019, Long Beach, CA, USA, June 16-20, 2019. Computer Vision Foundation / IEEE, 3967–3976. https://doi.org/10.1109/CVPR.2019.00409
- Nikolaos Passalis and Anastasios Tefas. 2018. Learning Deep Representations with Probabilistic Knowledge Transfer. In Computer Vision - ECCV 2018 - 15th European Conference, Munich, Germany, September 8-14, 2018, Proceedings, Part XI (Lecture Notes in Computer Science, Vol. 11215), Vittorio Ferrari, Martial Hebert, Cristian Sminchisescu, and Yair Weiss (Eds.). Springer, 283–299. https://doi.org/10.1007/978-3-030-01252-6_17
- Switchable Online Knowledge Distillation. In Computer Vision - ECCV 2022 - 17th European Conference, Tel Aviv, Israel, October 23-27, 2022, Proceedings, Part XI (Lecture Notes in Computer Science, Vol. 13671), Shai Avidan, Gabriel J. Brostow, Moustapha Cissé, Giovanni Maria Farinella, and Tal Hassner (Eds.). Springer, 449–466. https://doi.org/10.1007/978-3-031-20083-0_27
- FitNets: Hints for Thin Deep Nets. In 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings, Yoshua Bengio and Yann LeCun (Eds.). http://arxiv.org/abs/1412.6550
- FaceNet: A unified embedding for face recognition and clustering. In IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2015, Boston, MA, USA, June 7-12, 2015. IEEE Computer Society, 815–823. https://doi.org/10.1109/CVPR.2015.7298682
- Frontal to profile face verification in the wild. In 2016 IEEE Winter Conference on Applications of Computer Vision, WACV 2016, Lake Placid, NY, USA, March 7-10, 2016. IEEE Computer Society, 1–9. https://doi.org/10.1109/WACV.2016.7477558
- CODA-Prompt: COntinual Decomposed Attention-based Prompting for Rehearsal-Free Continual Learning. CoRR abs/2211.13218 (2022). https://doi.org/10.48550/arXiv.2211.13218 arXiv:2211.13218
- Segmenter: Transformer for Semantic Segmentation. In 2021 IEEE/CVF International Conference on Computer Vision, ICCV 2021, Montreal, QC, Canada, October 10-17, 2021. IEEE, 7242–7252. https://doi.org/10.1109/ICCV48922.2021.00717
- Deep Learning Face Representation by Joint Identification-Verification. In Advances in Neural Information Processing Systems 27: Annual Conference on Neural Information Processing Systems 2014, December 8-13 2014, Montreal, Quebec, Canada, Zoubin Ghahramani, Max Welling, Corinna Cortes, Neil D. Lawrence, and Kilian Q. Weinberger (Eds.). 1988–1996. https://proceedings.neurips.cc/paper/2014/hash/e5e63da79fcd2bebbd7cb8bf1c1d0274-Abstract.html
- Training data-efficient image transformers & distillation through attention. In Proceedings of the 38th International Conference on Machine Learning, ICML 2021, 18-24 July 2021, Virtual Event (Proceedings of Machine Learning Research, Vol. 139), Marina Meila and Tong Zhang (Eds.). PMLR, 10347–10357. http://proceedings.mlr.press/v139/touvron21a.html
- Frederick Tung and Greg Mori. 2019. Similarity-Preserving Knowledge Distillation. In 2019 IEEE/CVF International Conference on Computer Vision, ICCV 2019, Seoul, Korea (South), October 27 - November 2, 2019. IEEE, 1365–1374. https://doi.org/10.1109/ICCV.2019.00145
- Attention is All you Need. In Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, December 4-9, 2017, Long Beach, CA, USA, Isabelle Guyon, Ulrike von Luxburg, Samy Bengio, Hanna M. Wallach, Rob Fergus, S. V. N. Vishwanathan, and Roman Garnett (Eds.). 5998–6008. https://proceedings.neurips.cc/paper/2017/hash/3f5ee243547dee91fbd053c1c4a845aa-Abstract.html
- CosFace: Large Margin Cosine Loss for Deep Face Recognition. In 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018. Computer Vision Foundation / IEEE Computer Society, 5265–5274. https://doi.org/10.1109/CVPR.2018.00552
- FaceX-Zoo: A PyTorch Toolbox for Face Recognition. In MM ’21: ACM Multimedia Conference, Virtual Event, China, October 20 - 24, 2021, Heng Tao Shen, Yueting Zhuang, John R. Smith, Yang Yang, Pablo Cesar, Florian Metze, and Balakrishnan Prabhakaran (Eds.). ACM, 3779–3782. https://doi.org/10.1145/3474085.3478324
- Learning to Prompt for Continual Learning. In IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2022, New Orleans, LA, USA, June 18-24, 2022. IEEE, 139–149. https://doi.org/10.1109/CVPR52688.2022.00024
- IARPA Janus Benchmark-B Face Dataset. In 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops, CVPR Workshops 2017, Honolulu, HI, USA, July 21-26, 2017. IEEE Computer Society, 592–600. https://doi.org/10.1109/CVPRW.2017.87
- Rethinking and Improving Relative Position Encoding for Vision Transformer. In 2021 IEEE/CVF International Conference on Computer Vision, ICCV 2021, Montreal, QC, Canada, October 10-17, 2021. IEEE, 10013–10021. https://doi.org/10.1109/ICCV48922.2021.00988
- Sergey Zagoruyko and Nikos Komodakis. 2017. Paying More Attention to Attention: Improving the Performance of Convolutional Neural Networks via Attention Transfer. In 5th International Conference on Learning Representations, ICLR 2017, Toulon, France, April 24-26, 2017, Conference Track Proceedings. OpenReview.net. https://openreview.net/forum?id=Sks9_ajex
- Grouped Knowledge Distillation for Deep Face Recognition. In AAAI 2023.
- Consistent Sub-Decision Network for Low-Quality Masked Face Recognition. IEEE Signal Process. Lett. 29 (2022), 1147–1151. https://doi.org/10.1109/LSP.2022.3170246
- Tianyue Zheng and Weihong Deng. 2018. Cross-pose lfw: A database for studying cross-pose face recognition in unconstrained environments. Beijing University of Posts and Telecommunications, Tech. Rep 5 (2018), 7.
- Cross-Age LFW: A Database for Studying Cross-Age Face Recognition in Unconstrained Environments. CoRR abs/1708.08197 (2017). arXiv:1708.08197 http://arxiv.org/abs/1708.08197
- Yaoyao Zhong and Weihong Deng. 2021a. Face transformer for recognition. arXiv preprint arXiv:2103.14803 (2021).
- Yaoyao Zhong and Weihong Deng. 2021b. Face Transformer for Recognition. CoRR abs/2103.14803 (2021). arXiv:2103.14803 https://arxiv.org/abs/2103.14803