Model X-ray:Detecting Backdoored Models via Decision Boundary (2402.17465v2)
Abstract: Backdoor attacks pose a significant security vulnerability for deep neural networks (DNNs), enabling them to operate normally on clean inputs but manipulate predictions when specific trigger patterns occur. Currently, post-training backdoor detection approaches often operate under the assumption that the defender has knowledge of the attack information, logit output from the model, and knowledge of the model parameters. In contrast, our approach functions as a lightweight diagnostic scanning tool offering interpretability and visualization. By accessing the model to obtain hard labels, we construct decision boundaries within the convex combination of three samples. We present an intriguing observation of two phenomena in backdoored models: a noticeable shrinking of areas dominated by clean samples and a significant increase in the surrounding areas dominated by target labels. Leveraging this observation, we propose Model X-ray, a novel backdoor detection approach based on the analysis of illustrated two-dimensional (2D) decision boundaries. Our approach includes two strategies focused on the decision areas dominated by clean samples and the concentration of label distribution, and it can not only identify whether the target model is infected but also determine the target attacked label under the all-to-one attack strategy. Importantly, it accomplishes this solely by the predicted hard labels of clean inputs, regardless of any assumptions about attacks and prior knowledge of the training details of the model. Extensive experiments demonstrated that Model X-ray has outstanding effectiveness and efficiency across diverse backdoor attacks, datasets, and architectures. Besides, ablation studies on hyperparameters and more attack strategies and discussions are also provided.
- Github: Backdoorbench. URL https://github.com/SCLBD/BackdoorBench.
- Github: Mm-bd. URL https://github.com/wanghangpsu/MM-BD.
- Github: Meta-nerual-trojan-detection. URL https://github.com/AI-secure/Meta-Nerual-Trojan-Detection.
- Rényi entropy. https://en.wikipedia.org/wiki/R
- Blind backdoors in deep learning models. In 30th USENIX Security Symposium (USENIX Security 21), pp. 1505–1521, 2021.
- A new backdoor attack in cnns by training set corruption without label poisoning. In 2019 IEEE International Conference on Image Processing (ICIP), pp. 101–105. IEEE, 2019.
- Detecting backdoor attacks on deep neural networks by activation clustering. arXiv preprint arXiv:1811.03728, 2018.
- Deepinspect: A black-box trojan detection and mitigation framework for deep neural networks. In IJCAI, volume 2, pp. 8, 2019.
- Targeted backdoor attacks on deep learning systems using data poisoning. arXiv preprint arXiv:1712.05526, 2017.
- Certified adversarial robustness via randomized smoothing. In international conference on machine learning, pp. 1310–1320. PMLR, 2019.
- Imagenet: A large-scale hierarchical image database. In 2009 IEEE conference on computer vision and pattern recognition, pp. 248–255. Ieee, 2009.
- Lira: Learnable, imperceptible and robust backdoor attacks. In Proceedings of the IEEE/CVF international conference on computer vision, pp. 11966–11976, 2021.
- Black-box detection of backdoor attacks with limited information and data. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 16482–16491, 2021.
- An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929, 2020.
- Global and local mixture consistency cumulative learning for long-tailed visual recognitions. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 15814–15823, 2023.
- Dermatologist-level classification of skin cancer with deep neural networks. nature, 542(7639):115–118, 2017.
- Freeeagle: Detecting complex neural trojans in data-free cases. arXiv preprint arXiv:2302.14500, 2023.
- Strip: A defence against trojan attacks on deep neural networks. In Proceedings of the 35th Annual Computer Security Applications Conference, pp. 113–125, 2019.
- Speech recognition with deep recurrent neural networks. In 2013 IEEE international conference on acoustics, speech and signal processing, pp. 6645–6649. Ieee, 2013.
- Badnets: Identifying vulnerabilities in the machine learning model supply chain. arXiv preprint arXiv:1708.06733, 2017.
- Aeva: Black-box backdoor detection using adversarial extreme value analysis. arXiv preprint arXiv:2110.14880, 2021.
- Scale-up: An efficient black-box input-level backdoor detection via analyzing scaled prediction consistency. arXiv preprint arXiv:2302.03251, 2023.
- Identity mappings in deep residual networks. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part IV 14, pp. 630–645. Springer, 2016.
- Decision boundary analysis of adversarial examples. In International Conference on Learning Representations, 2018.
- Sensitive-sample fingerprinting of deep neural networks. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 4729–4737, 2019.
- Detection of traffic signs in real-world images: The german traffic sign detection benchmark. In The 2013 international joint conference on neural networks (IJCNN), pp. 1–8. Ieee, 2013.
- Searching for mobilenetv3. In Proceedings of the IEEE/CVF international conference on computer vision, pp. 1314–1324, 2019.
- Howard, J. Github: imagenette. URL https://github.com/fastai/imagenette/.
- On the geometry of adversarial examples. arXiv preprint arXiv:1811.00525, 2018.
- Learning multiple layers of features from tiny images. 2009.
- Imagenet classification with deep convolutional neural networks. Advances in neural information processing systems, 25, 2012.
- Invisible backdoor attack with sample-specific triggers. In Proceedings of the IEEE/CVF international conference on computer vision, pp. 16463–16472, 2021a.
- Anti-backdoor learning: Training clean models on poisoned data. Advances in Neural Information Processing Systems, 34:14900–14912, 2021b.
- Detecting backdoors during the inference stage based on corruption robustness consistency. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 16363–16372, 2023.
- Trojaning attack on neural networks. In 25th Annual Network And Distributed System Security Symposium (NDSS 2018). Internet Soc, 2018.
- Abs: Scanning neural networks for back-doors by artificial brain stimulation. In Proceedings of the 2019 ACM SIGSAC Conference on Computer and Communications Security, pp. 1265–1282, 2019.
- Wanet–imperceptible warping-based backdoor attack. arXiv preprint arXiv:2102.10369, 2021.
- Deep face recognition. In BMVC 2015-Proceedings of the British Machine Vision Conference 2015. British Machine Vision Association, 2015.
- Mdtd: A multi domain trojan detector for deep neural networks. arXiv preprint arXiv:2308.15673, 2023.
- You only look once: Unified, real-time object detection. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 779–788, 2016.
- Can neural nets learn the same model twice? investigating reproducibility and double descent from the decision boundary perspective. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 13699–13708, 2022.
- Demon in the variant: Statistical analysis of {{\{{DNNs}}\}} for robust backdoor contamination detection. In 30th USENIX Security Symposium (USENIX Security 21), pp. 1541–1558, 2021.
- Spectral signatures in backdoor attacks. Advances in neural information processing systems, 31, 2018.
- Neural cleanse: Identifying and mitigating backdoor attacks in neural networks. In 2019 IEEE Symposium on Security and Privacy (SP), pp. 707–723. IEEE, 2019.
- Mm-bd: Post-training detection of backdoor attacks with arbitrary backdoor pattern types using a maximum margin statistic. In 2024 IEEE Symposium on Security and Privacy (SP), pp. 15–15. IEEE Computer Society, 2023.
- Practical detection of trojan neural networks: Data-limited and data-free cases. In Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XXIII 16, pp. 222–238. Springer, 2020.
- Bppattack: Stealthy and efficient trojan attacks against deep neural networks via image quantization and contrastive adversarial learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 15074–15084, 2022.
- Backdoorbench: A comprehensive benchmark of backdoor learning. Advances in Neural Information Processing Systems, 35:10546–10559, 2022.
- Adversarial neuron pruning purifies backdoored deep models. Advances in Neural Information Processing Systems, 34:16913–16925, 2021.
- Umd: Unsupervised model detection for x2x backdoor attacks. arXiv preprint arXiv:2305.18651, 2023.
- Detecting ai trojans using meta neural analysis. In 2021 IEEE Symposium on Security and Privacy (SP), pp. 103–120. IEEE, 2021.
- Rethinking the backdoor attacks’ triggers: A frequency perspective. 2021 ieee. In CVF International Conference on Computer Vision (ICCV), pp. 16453–16461, 2021.
- mixup: Beyond empirical risk minimization. arXiv preprint arXiv:1710.09412, 2017.