Model Pairing Using Embedding Translation for Backdoor Attack Detection on Open-Set Classification Tasks
Abstract: Backdoor attacks allow an attacker to embed a specific vulnerability in a machine learning algorithm, activated when an attacker-chosen pattern is presented, causing a specific misprediction. The need to identify backdoors in biometric scenarios has led us to propose a novel technique with different trade-offs. In this paper we propose to use model pairs on open-set classification tasks for detecting backdoors. Using a simple linear operation to project embeddings from a probe model's embedding space to a reference model's embedding space, we can compare both embeddings and compute a similarity score. We show that this score, can be an indicator for the presence of a backdoor despite models being of different architectures, having been trained independently and on different datasets. This technique allows for the detection of backdoors on models designed for open-set classification tasks, which is little studied in the literature. Additionally, we show that backdoors can be detected even when both models are backdoored. The source code is made available for reproducibility purposes.
- M. Xue, C. He, J. Wang, and W. Liu, “Backdoors hidden in facial features: a novel invisible backdoor attack against face recognition systems,” Peer-to-Peer Netw. Appl., vol. 14, no. 3, pp. 1458–1474, May 2021. [Online]. Available: https://link.springer.com/10.1007/s12083-020-01031-z
- X. Chen, C. Liu, B. Li, K. Lu, and D. Song, “Targeted Backdoor Attacks on Deep Learning Systems Using Data Poisoning,” arXiv:1712.05526 [cs], Dec. 2017, arXiv: 1712.05526. [Online]. Available: http://arxiv.org/abs/1712.05526
- E. Wenger, J. Passananti, A. N. Bhagoji, Y. Yao, H. Zheng, and B. Y. Zhao, “Backdoor Attacks Against Deep Learning Systems in the Physical World,” Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 6206–6215, Jun. 2021.
- E. Sarkar, H. Benkraouda, G. Krishnan, H. Gamil, and M. Maniatakos, “FaceHack: Attacking Facial Recognition Systems Using Malicious Facial Characteristics,” IEEE Trans. Biom. Behav. Identity Sci., vol. 4, no. 3, pp. 361–372, Jul. 2022. [Online]. Available: https://ieeexplore.ieee.org/document/9632692/
- C. He, M. Xue, J. Wang, and W. Liu, “Embedding Backdoors as the Facial Features: Invisible Backdoor Attacks Against Face Recognition Systems,” in Proceedings of the ACM Turing Celebration Conference - China. Hefei China: ACM, May 2020, pp. 231–235. [Online]. Available: https://dl.acm.org/doi/10.1145/3393527.3393567
- Y. Liu, S. Ma, Y. Aafer, W.-C. Lee, J. Zhai, W. Wang, and X. Zhang, “Trojaning Attack on Neural Networks,” Purdue University, p. 17, 2017.
- H. Li, Y. Wang, X. Xie, Y. Liu, S. Wang, R. Wan, L.-P. Chau, and A. C. Kot, “Light Can Hack Your Face! Black-box Backdoor Attack on Face Recognition Systems,” Sep. 2020, arXiv:2009.06996 [cs]. [Online]. Available: http://arxiv.org/abs/2009.06996
- B. Tran, J. Li, and A. Ma, “Spectral Signatures in Backdoor Attacks,” in Proceedings of NeurIPS, 2018, p. 11.
- B. Chen, W. Carvalho, N. Baracaldo, H. Ludwig, B. Edwards, T. Lee, I. Molloy, and B. Srivastava, “Detecting Backdoor Attacks on Deep Neural Networks by Activation Clustering,” arXiv:1811.03728 [cs, stat], Nov. 2018, arXiv: 1811.03728. [Online]. Available: http://arxiv.org/abs/1811.03728
- B. Wang, Y. Yao, S. Shan, H. Li, B. Viswanath, H. Zheng, and B. Y. Zhao, “Neural Cleanse: Identifying and Mitigating Backdoor Attacks in Neural Networks,” in 2019 IEEE Symposium on Security and Privacy (SP). San Francisco, CA, USA: IEEE, May 2019, pp. 707–723. [Online]. Available: https://ieeexplore.ieee.org/document/8835365/
- H. Chen, C. Fu, J. Zhao, and F. Koushanfar, “DeepInspect: A Black-box Trojan Detection and Mitigation Framework for Deep Neural Networks,” in Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence. Macao, China: International Joint Conferences on Artificial Intelligence Organization, Aug. 2019, pp. 4658–4664. [Online]. Available: https://www.ijcai.org/proceedings/2019/647
- Y. Gao, C. Xu, D. Wang, S. Chen, D. C. Ranasinghe, and S. Nepal, “STRIP: a defence against trojan attacks on deep neural networks,” in Proceedings of the 35th Annual Computer Security Applications Conference. San Juan Puerto Rico USA: ACM, Dec. 2019, pp. 113–125. [Online]. Available: https://dl.acm.org/doi/10.1145/3359789.3359790
- S. Ma, Y. Liu, G. Tao, W.-C. Lee, and X. Zhang, “NIC: Detecting Adversarial Samples with Neural Network Invariant Checking,” in Proceedings 2019 Network and Distributed System Security Symposium. San Diego, CA: Internet Society, 2019. [Online]. Available: https://www.ndss-symposium.org/wp-content/uploads/2019/02/ndss2019_03A-4_Ma_paper.pdf
- E. Chou, F. Tramèr, and G. Pellegrino, “SentiNet: Detecting Localized Universal Attacks Against Deep Learning Systems,” May 2020, arXiv:1812.00292 [cs]. [Online]. Available: http://arxiv.org/abs/1812.00292
- X. Xu, Q. Wang, H. Li, N. Borisov, C. A. Gunter, and B. Li, “Detecting AI Trojans Using Meta Neural Analysis,” in 2021 IEEE Symposium on Security and Privacy (SP). San Francisco, CA, USA: IEEE, May 2021, pp. 103–120. [Online]. Available: https://ieeexplore.ieee.org/document/9519467/
- A. Unnervik and S. Marcel, “An anomaly detection approach for backdoored neural networks: face recognition as a case study,” in 2022 International Conference of the Biometrics Special Interest Group (BIOSIG). Darmstadt, Germany: IEEE, Sep. 2022, pp. 1–5. [Online]. Available: https://ieeexplore.ieee.org/document/9897044/
- Z. Wang, K. Mei, H. Ding, J. Zhai, and S. Ma, “Rethinking the Reverse-engineering of Trojan Triggers,” Neural Information Processing Systems, 2022.
- Z. Wang, H. Ding, J. Zhai, and S. Ma, “Training with More Confidence: Mitigating Injected and Natural Backdoors During Training,” Neural Information Processing Systems, 2022.
- Y. Li and X. Lyu, “Anti-Backdoor Learning: Training Clean Models on Poisoned Data,” Neural Information Processing Systems, 2021.
- S. Marcel, J. Fierrez, and N. Evans, “Handbook of biometric anti-spoofing-trusted biometrics under spoofing attacks, third edition,” Advances in Computer Vision and Pattern Recognition. Springer, 2023.
- ISO/IEC JTC 1/SC 37 Biometrics, “Information technology –International Organization for Standardization,” Feb. 2016.
- A. George, Z. Mostaani, D. Geissenbuhler, O. Nikisins, A. Anjos, and S. Marcel, “Biometric face presentation attack detection with multi-channel convolutional neural network,” IEEE Transactions on Information Forensics and Security, vol. 15, pp. 42–55, 2019.
- D. G. McNeely-White, “Same data, same features: Modern imagenet-trained convolutional neural networks learn the same thing,” Ph.D. dissertation, Colorado State University, 2020.
- D. McNeely-White, B. Sattelberg, N. Blanchard, and R. Beveridge, “Exploring the interchangeability of cnn embedding spaces,” arXiv preprint arXiv:2010.02323, 2020.
- D. McNeely-White, J. R. Beveridge, and B. A. Draper, “Inception and resnet features are (almost) equivalent,” Cognitive Systems Research, vol. 59, pp. 312–318, 2020.
- C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, and A. Rabinovich, “Going deeper with convolutions,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2015, pp. 1–9.
- K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 770–778.
- G. Roeder, L. Metz, and D. Kingma, “On linear identifiability of learned representations,” in International Conference on Machine Learning. PMLR, 2021, pp. 9030–9039.
- D. McNeely-White, B. Sattelberg, N. Blanchard, and R. Beveridge, “Canonical face embeddings,” IEEE Transactions on Biometrics, Behavior, and Identity Science, vol. 4, no. 2, pp. 197–209, 2022.
- T. Gu, B. Dolan-Gavitt, and S. Garg, “BadNets: Identifying Vulnerabilities in the Machine Learning Model Supply Chain,” arXiv:1708.06733 [cs], vol. abs/1708.06733, 2019, arXiv: 1708.06733. [Online]. Available: http://arxiv.org/abs/1708.06733
- T. Karras, S. Laine, and T. Aila, “A Style-Based Generator Architecture for Generative Adversarial Networks,” Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2019.
- F. Schroff, D. Kalenichenko, and J. Philbin, “FaceNet: A unified embedding for face recognition and clustering,” in 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Boston, MA, USA: IEEE, Jun. 2015, pp. 815–823. [Online]. Available: http://ieeexplore.ieee.org/document/7298682/
- Z. Zhu, G. Huang, J. Deng, Y. Ye, J. Huang, X. Chen, J. Zhu, T. Yang, J. Lu, D. Du, and J. Zhou, “WebFace260M: A Benchmark Unveiling the Power of Million-scale Deep Face Recognition,” IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2021.
- D. Yi, Z. Lei, S. Liao, and S. Z. Li, “Learning Face Representation from Scratch,” arXiv:1411.7923[cs], p. 9, 2014.
- J. Deng, J. Guo, N. Xue, and S. Zafeiriou, “ArcFace: Additive Angular Margin Loss for Deep Face Recognition,” in 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Long Beach, CA, USA: IEEE, Jun. 2019, pp. 4685–4694. [Online]. Available: https://ieeexplore.ieee.org/document/8953658/
- G. Wahba, “A least squares estimate of satellite attitude,” SIAM review, vol. 7, no. 3, pp. 409–409, 1965.
- W. Kabsch, “A solution for the best rotation to relate two sets of vectors,” Acta Crystallographica Section A: Crystal Physics, Diffraction, Theoretical and General Crystallography, vol. 32, no. 5, pp. 922–923, 1976.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.