FaceCat: Enhancing Face Recognition Security with a Unified Diffusion Model (2404.09193v2)
Abstract: Face anti-spoofing (FAS) and adversarial detection (FAD) have been regarded as critical technologies to ensure the safety of face recognition systems. However, due to limited practicality, complex deployment, and the additional computational overhead, it is necessary to implement both detection techniques within a unified framework. This paper aims to achieve this goal by breaking through two primary obstacles: 1) the suboptimal face feature representation and 2) the scarcity of training data. To address the limited performance caused by existing feature representations, motivated by the rich structural and detailed features of face diffusion models, we propose FaceCat, the first approach leveraging the diffusion model to simultaneously enhance the performance of FAS and FAD. Specifically, FaceCat elaborately designs a hierarchical fusion mechanism to capture rich face semantic features of the diffusion model. These features then serve as a robust foundation for a lightweight head, designed to execute FAS and FAD simultaneously. Due to the limitations in feature representation that arise from relying solely on single-modality image data, we further propose a novel text-guided multi-modal alignment strategy that utilizes text prompts to enrich feature representation, thereby enhancing performance. To combat data scarcity, we build a comprehensive dataset with a wide range of 28 attack types, offering greater potential for a unified framework in facial security. Extensive experiments validate the effectiveness of FaceCat generalizes significantly better and obtains excellent robustness against common input transformations.
- Label-efficient semantic segmentation with diffusion models. arXiv preprint arXiv:2112.03126 (2021).
- OULU-NPU: A mobile face presentation attack database with real-world variations. In 2017 12th IEEE international conference on automatic face & gesture recognition (FG 2017). IEEE, 612–618.
- Rehearsal-Free Domain Continual Face Anti-Spoofing: Generalize More and Forget Less. arXiv preprint arXiv:2303.09914 (2023).
- Unsupervised learning of visual features by contrasting cluster assignments. Advances in neural information processing systems 33 (2020), 9912–9924.
- Adversarial examples detection in features distance spaces. In Proceedings of the European conference on computer vision (ECCV) workshops. 0–0.
- AdvFAS: A robust face anti-spoofing framework against adversarial examples. Computer Vision and Image Understanding 235 (2023), 103779.
- Generative adversarial networks: An overview. IEEE signal processing magazine 35, 1 (2018), 53–65.
- Faceguard: A self-supervised defense against adversarial face images. In 2023 IEEE 17th International Conference on Automatic Face and Gesture Recognition (FG). IEEE, 1–8.
- Unified detection of digital and physical face attacks. In 2023 IEEE 17th International Conference on Automatic Face and Gesture Recognition (FG). IEEE, 1–8.
- Arcface: Additive angular margin loss for deep face recognition. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 4690–4699.
- Libre: A practical bayesian approach to adversarial detection. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 972–982.
- Prafulla Dhariwal and Alexander Nichol. 2021. Diffusion models beat gans on image synthesis. Advances in neural information processing systems 34 (2021), 8780–8794.
- Adversarial feature learning. arXiv preprint arXiv:1605.09782 (2016).
- Jeff Donahue and Karen Simonyan. 2019. Large scale adversarial representation learning. Advances in neural information processing systems 32 (2019).
- Efficient decision-based black-box adversarial attacks on face recognition. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 7714–7722.
- An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929 (2020).
- A study of the effect of jpg compression on adversarial images. arXiv preprint arXiv:1608.00853 (2016).
- Remote Sensing Change Detection (Segmentation) using Denoising Diffusion Probabilistic Models. arXiv e-prints (2022), arXiv–2206.
- Anjith George and Sébastien Marcel. 2019. Deep pixel-wise binary supervision for face presentation attack detection. In 2019 International Conference on Biometrics (ICB). IEEE, 1–8.
- Anjith George and Sebastien Marcel. 2021. Cross modal focal loss for rgbd face anti-spoofing. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 7882–7891.
- Biometric face presentation attack detection with multi-channel convolutional neural network. IEEE transactions on information forensics and security 15 (2019), 42–55.
- Explaining and harnessing adversarial examples. arXiv preprint arXiv:1412.6572 (2014).
- Detecting adversarial examples in deep neural networks using normalizing filters. UMBC Student Collection (2019).
- Masked autoencoders are scalable vision learners. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 16000–16009.
- Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition. 770–778.
- In defense of the triplet loss for person re-identification. arXiv preprint arXiv:1703.07737 (2017).
- Deep models and shortwave infrared information to detect face presentation attacks. IEEE Transactions on Biometrics, Behavior, and Identity Science 2, 4 (2020), 399–409.
- Denoising diffusion probabilistic models. Advances in neural information processing systems 33 (2020), 6840–6851.
- Labeled faces in the wild: A database forstudying face recognition in unconstrained environments. In Workshop on faces in’Real-Life’Images: detection, alignment, and recognition.
- A style-based generator architecture for generative adversarial networks. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 4401–4410.
- Diffusionclip: Text-guided diffusion models for robust image manipulation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2426–2435.
- Semi-supervised learning with deep generative models. Advances in neural information processing systems 27 (2014).
- Diederik P Kingma and Max Welling. 2013. Auto-encoding variational bayes. arXiv preprint arXiv:1312.6114 (2013).
- Stepan Komkov and Aleksandr Petiushko. 2021. Advhat: Real-world adversarial attack on arcface face id system. In 2020 25th International Conference on Pattern Recognition (ICPR). IEEE, 819–826.
- Focal loss for dense object detection. In Proceedings of the IEEE international conference on computer vision. 2980–2988.
- 3d high-fidelity mask face presentation attack detection challenge. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 814–823.
- Learning deep models for face anti-spoofing: Binary or auxiliary supervision. In Proceedings of the IEEE conference on computer vision and pattern recognition. 389–398.
- Deep tree learning for zero-shot face anti-spoofing. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 4680–4689.
- Towards deep learning models resistant to adversarial attacks. arXiv preprint arXiv:1706.06083 (2017).
- Detection of face recognition adversarial attacks. Computer Vision and Image Understanding 202 (2021), 103103.
- Adversarial Detection without Model Information. arXiv preprint arXiv:2202.04271 (2022).
- Diffusion Models Beat GANs on Image Classification. arXiv preprint arXiv:2307.08702 (2023).
- Learning transferable visual models from natural language supervision. In International conference on machine learning. PMLR, 8748–8763.
- Unsupervised representation learning with deep convolutional generative adversarial networks. arXiv preprint arXiv:1511.06434 (2015).
- Zero-shot text-to-image generation. In International Conference on Machine Learning. PMLR, 8821–8831.
- FLIP: Cross-domain Face Anti-spoofing with Language Guidance. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 19685–19696.
- Inception-v4, inception-resnet and the impact of residual connections on learning. In Proceedings of the AAAI conference on artificial intelligence, Vol. 31.
- M Tan. 1905. rethinking model scaling for convolutional neural networks. arXiv. 2019 doi: 10.48550. arXiv (1905).
- Facesec: A fine-grained robustness evaluation framework for face recognition systems. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 13254–13263.
- Laurens Van der Maaten and Geoffrey Hinton. 2008. Visualizing data using t-SNE. Journal of machine learning research 9, 11 (2008).
- Image quality assessment: from error visibility to structural similarity. IEEE transactions on image processing 13, 4 (2004), 600–612.
- Domain generalization via shuffled style assembly for face anti-spoofing. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 4123–4133.
- Filter clustering for compressing cnn model with better feature diversity. IEEE Transactions on Circuits and Systems for Video Technology (2022).
- Matthew Watson and Noura Al Moubayed. 2021. Attack-agnostic adversarial detection on medical data using explainable machine learning. In 2020 25th International Conference on Pattern Recognition (ICPR). IEEE, 8180–8187.
- Diffusion Models as Masked Autoencoders. arXiv preprint arXiv:2304.03283 (2023).
- Gan inversion: A survey. IEEE Transactions on Pattern Analysis and Machine Intelligence 45, 3 (2022), 3121–3138.
- Denoising Diffusion Autoencoders are Unified Self-supervised Learners. arXiv preprint arXiv:2303.09769 (2023).
- Improving transferability of adversarial examples with input diversity. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2730–2739.
- Towards Effective Adversarial Textured 3D Meshes on Physical Face Recognition. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 4119–4128.
- Benchmarking joint face spoofing and forgery detection with visual and physiological cues. arXiv preprint arXiv:2208.05401 (2022).
- Flexible-modal face anti-spoofing: A benchmark. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 6345–6350.
- Searching central difference convolutional networks for face anti-spoofing. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 5295–5305.
- Joint face detection and alignment using multitask cascaded convolutional networks. IEEE signal processing letters 23, 10 (2016), 1499–1503.
- The unreasonable effectiveness of deep features as a perceptual metric. In Proceedings of the IEEE conference on computer vision and pattern recognition. 586–595.
- Effective Presentation Attack Detection Driven by Face Related Task. In European Conference on Computer Vision. Springer, 408–423.
- Jiawei Chen (162 papers)
- Xiao Yang (159 papers)
- Yinpeng Dong (103 papers)
- Hang Su (225 papers)
- Zhaoxia Yin (41 papers)