Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
169 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

CBD: A Certified Backdoor Detector Based on Local Dominant Probability (2310.17498v2)

Published 26 Oct 2023 in cs.LG and cs.CR

Abstract: Backdoor attack is a common threat to deep neural networks. During testing, samples embedded with a backdoor trigger will be misclassified as an adversarial target by a backdoored model, while samples without the backdoor trigger will be correctly classified. In this paper, we present the first certified backdoor detector (CBD), which is based on a novel, adjustable conformal prediction scheme based on our proposed statistic local dominant probability. For any classifier under inspection, CBD provides 1) a detection inference, 2) the condition under which the attacks are guaranteed to be detectable for the same classification domain, and 3) a probabilistic upper bound for the false positive rate. Our theoretical results show that attacks with triggers that are more resilient to test-time noise and have smaller perturbation magnitudes are more likely to be detected with guarantees. Moreover, we conduct extensive experiments on four benchmark datasets considering various backdoor types, such as BadNet, CB, and Blend. CBD achieves comparable or even higher detection accuracy than state-of-the-art detectors, and it in addition provides detection certification. Notably, for backdoor attacks with random perturbation triggers bounded by $\ell_2\leq0.75$ which achieves more than 90\% attack success rate, CBD achieves 100\% (98\%), 100\% (84\%), 98\% (98\%), and 72\% (40\%) empirical (certified) detection true positive rates on the four benchmark datasets GTSRB, SVHN, CIFAR-10, and TinyImageNet, respectively, with low false positive rates.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (88)
  1. Model Zoo. https://modelzoo.co/.
  2. A gentle introduction to conformal prediction and distribution-free uncertainty quantification, 2021.
  3. Hamed Pirsiavash Aniruddha Saha, Akshayvarun Subramanya. Hidden trigger backdoor attacks. In AAAI Conference on Artificial Intelligence (AAAI), 2020.
  4. DP-InstaHide: Provably Defusing Poisoning and Backdoor Attacks with Differentially Private Data Augmentations. In ICLR Workshop on Security and Safety in Machine Learning Systems, March 2021.
  5. Detecting backdoor attacks on deep neural networks by activation clustering. http://arxiv.org/abs/1811.03728, Nov 2018.
  6. DeepInspect: A black-box trojan detection and mitigation framework for deep neural networks. In International Joint Conference on Artificial Intelligence (IJCAI), pages 4658–4664, 7 2019.
  7. Zoo: Zeroth order optimization based black-box attacks to deep neural networks without training substitute models. In ACM Workshop on Artificial Intelligence and Security (AISec), page 15–26, 2017.
  8. Targeted backdoor attacks on deep learning systems using data poisoning. https://arxiv.org/abs/1712.05526v1, 2017.
  9. Sentinet: Detecting localized universal attacks against deep learning systems. In 2020 IEEE Security and Privacy Workshops (SPW), pages 48–54. IEEE, 2020.
  10. Certified adversarial robustness via randomized smoothing. In Proceedings of the 36th International Conference on Machine Learning, pages 1310–1320, 2019.
  11. Imagenet: A large-scale hierarchical image database. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 248–255, 2009.
  12. Februus: Input purification defense against trojan attacks on deep neural network systems. In Annual Computer Security Applications Conference (ACSAC), page 897–912, 2020.
  13. Black-box detection of backdoor attacks with limited information and data. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2021.
  14. Robust anomaly detection and backdoor attack detection via differential privacy. In International Conference on Learning Representations (ICLR), 2020.
  15. STRIP: A defence against trojan attacks on deep neural networks. In Annual Computer Security Applications Conference (ACSAC), 2019.
  16. What doesn’t kill you makes you robust(er): Adversarial training against poisons and backdoors. In ICLR 2021 Workshop on Security and Safety in Machine Learning Systems, 2021.
  17. Explaining and harnessing adversarial examples. In International Conference on Learning Representations (ICLR), 2015.
  18. Badnets: Evaluating backdooring attacks on deep neural networks. IEEE Access, 7:47230–47244, 2019.
  19. Few-shot backdoor defense using shapley estimation. In CVPR, 2022.
  20. TABOR: A highly accurate approach to inspecting and restoring Trojan backdoors in AI systems. https://arxiv.org/abs/1908.01763, 2019.
  21. Frank R. Hampel. The influence curve and its role in robust estimation. Journal of the American Statistical Association, 69, 1974.
  22. Deep residual learning for image recognition. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016.
  23. On the effectiveness of mitigating data poisoning attacks with gradient shaping. https://arxiv.org/abs/2002.11497, 2020.
  24. Trigger hunting with a topological prior for trojan detection. In International Conference on Learning Representations, 2022.
  25. Backdoor defense via decoupling the training process. In International Conference on Learning Representations (ICLR), 2022.
  26. Datamodels: Predicting predictions from training data, 2022.
  27. Certified robustness of nearest neighbors against data poisoning and backdoor attacks. In AAAI Conference on Artificial Intelligence, 2020.
  28. Adam: A method for stochastic optimization. In International Conference on Learning Representations (ICLR), 2015.
  29. Universal litmus patterns: Revealing backdoor attacks in cnns. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 298–307, 2020.
  30. Alex Krizhevsky. Learning multiple layers of features from tiny images. University of Toronto, 05 2012.
  31. Leaderboard. GTSRB Leaderboard. https://www.kaggle.com/c/nyu-cv-fall-2018/leaderboard, 2018.
  32. Certified robustness to adversarial examples with differential privacy. In 2019 IEEE Symposium on Security and Privacy (SP), 2019.
  33. SoK: Certified Robustness for Deep Neural Networks. In 44th IEEE Symposium on Security and Privacy, San Francisco, CA, 2023.
  34. Test-time detection of backdoor triggers for poisoned deep neural networks. In ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2022.
  35. Neural Attention Distillation: Erasing Backdoor Triggers from Deep Neural Networks. In International Conference on Learning Representations (ICLR), 2021.
  36. Backdoor learning: A survey. IEEE Transactions on Neural Networks and Learning Systems, pages 1–18, 2022.
  37. Invisible backdoor attack with sample-specific triggers. In IEEE International Conference on Computer Vision (ICCV), 2021.
  38. Backdoor embedding in convolutional neural network models via invisible perturbation. In CODASPY, 2020.
  39. Defensive quantization: When efficiency meets robustness. In International Conference on Learning Representations (ICLR), 2019.
  40. Fine-pruning: Defending against backdoor attacks on deep neural networks. In International Symposium on Research in Attacks, Intrusions, and Defenses (RAID), 2018.
  41. ABS: Scanning neural networks for back-doors by artificial brain stimulation. In ACM SIGSAC Conference on Computer and Communications Security (CCS), page 1265–1282, 2019.
  42. Trojaning attack on neural networks. In Network and Distributed System Security (NDSS) Symposium, San Diego, CA, 2018.
  43. Reflection Backdoor: A Natural Backdoor Attack on Deep Neural Networks. In European Conference on Computer Vision (ECCV), 2020.
  44. The “beatrix” resurrections: Robust backdoor detection via gram matrices. arXiv preprint arXiv:2209.11715, 2022.
  45. Towards deep learning models resistant to adversarial attacks. In International Conference on Learning Representations (ICLR), 2018.
  46. Adversarial learning in statistical classification: A comprehensive review of defenses against attacks. Proceedings of the IEEE, 108:402–433, 2020.
  47. Reading digits in natural images with unsupervised feature learning. In NIPS Workshop on Deep Learning and Unsupervised Feature Learning 2011, 2011.
  48. NeurIPS. Trojan Detection Challenge NeurIPS 2022. https://trojandetection.ai/, 2022.
  49. On the problem of the most efficient tests of statistical hypotheses. Philosophical Transactions of the Royal Society of London. Series A, Containing Papers of a Mathematical or Physical Character, 231:289–337, 1933.
  50. Input-aware dynamic backdoor attack. In Proceedings of Advances in Neural Information Processing Systems (NeurIPS), 2020.
  51. Wanet - imperceptible warping-based backdoor attack. In International Conference on Learning Representations (ICLR), 2021.
  52. David Wagner Nicholas Carlini. Towards evaluating the robustness of neural networks. In IEEE Symposium on Security and Privacy (SP), pages 39–57, May 2017.
  53. Certified robustness to label-flipping attacks via randomized smoothing. In Proceedings of the 37th International Conference on Machine Learning, 2020.
  54. Inverted residuals and linear bottlenecks: Mobile networks for classification, detection and segmentation. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018.
  55. Backdoor suppression in neural networks using input fuzzing and majority voting. IEEE Design & Test, 37(2):103–110, 2020.
  56. Backdoor Scanning for Deep Neural Networks through K-Arm Optimization. In International Conference on Machine Learning (ICML), 2021.
  57. Learning with bad training data via iterative trimmed loss minimization. In International Conference on Machine Learning (ICML), pages 5739–5748, 2019.
  58. Fast and effective robustness certification. In Advances in Neural Information Processing Systems, 2018.
  59. Robustness certification with refinement. In International Conference on Learning Representations, 2019.
  60. Man vs. computer: Benchmarking machine learning algorithms for traffic sign recognition. Neural Networks, 32:323–332, 2012.
  61. Intriguing properties of neural networks. In International Conference on Learning Representations (International Conference on Learning Representations (ICLR)), 2014.
  62. Better trigger inversion optimization in backdoor scanning. In 2022 Conference on Computer Vision and Pattern Recognition (CVPR 2022), 2022.
  63. Spectral signatures in backdoor attacks. In Advances in Neural Information Processing Systems (NIPS), 2018.
  64. IARPA TrojAI: Trojans in artificial intelligence. https://www.iarpa.gov/index.php/research-programs/trojai/trojai-baa, 2019.
  65. Clean-label backdoor attacks. https://people.csail.mit.edu/madry/lab/cleanlabel.pdf, 2019.
  66. Vladimir Vovk. Conditional validity of inductive conformal predictors. In Proceedings of the Asian Conference on Machine Learning, pages 475–490, 04–06 Nov 2012.
  67. Neural cleanse: Identifying and mitigating backdoor attacks in neural networks. In IEEE Symposium on Security and Privacy (SP), 2019.
  68. Mm-bd: Post-training detection of backdoor attacks with arbitrary backdoor pattern types using a maximum margin statistic. In 2024 IEEE Symposium on Security and Privacy (SP), 2024.
  69. Practical detection of trojan neural networks: Data-limited and data-free cases. In European Conference on Computer Vision (ECCV), 2020.
  70. Bppattack: Stealthy and efficient trojan attacks against deep neural networks via image quantization and contrastive adversarial learning. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2022.
  71. Rab: Provable robustness against backdoor attacks. In 2023 2023 IEEE Symposium on Security and Privacy (SP) (SP), pages 640–657, 2023.
  72. Adversarial neuron pruning purifies backdoored deep models. In NeurIPS, 2021.
  73. Detecting backdoor attacks against point cloud classifiers. In IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2022.
  74. A benchmark study of backdoor data poisoning defenses for deep neural network classifiers and a novel defense. In IEEE MLSP, Pittsburgh, 2019.
  75. Detection of backdoors in trained classifiers without access to the training set. IEEE Transactions on Neural Networks and Learning Systems, pages 1–15, 2020.
  76. L-RED: Efficient post-training detection of imperceptible backdoor attacks without access to the training set. In IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 3745–3749, 2021.
  77. Reverse engineering imperceptible backdoor attacks on deep neural networks for detection and training set cleansing. Computers and Security, 106, 2021.
  78. Post-training detection of backdoor attacks for two-class and multi-attack scenarios. In International Conference on Learning Representations (ICLR), 2022.
  79. Detecting scene-plausible perceptible backdoors in trained DNNs without access to the training set. Neural Computation, 33(5):1329–1371, 2021.
  80. Umd: Unsupervised model detection for x2x backdoor attacks. In Proceedings of the 40th International Conference on Machine Learning, 2023.
  81. Feature squeezing: Detecting adversarial examples in deep neural networks. In Network and Distributed System Security (NDSS) Symposium, 2018.
  82. Detecting AI Trojans using meta neural analysis. In IEEE Symposium on Security and Privacy (SP), 2021.
  83. One-to-n & n-to-one: Two advanced backdoor attacks against deep learning models. IEEE Transactions on Dependable and Secure Computing, 19(3):1562–1578, 2022.
  84. Imperceptible and multi-channel backdoor attack against deep neural networks, 2022.
  85. Adversarial unlearning of backdoors via implicit hypergradient. In International Conference on Learning Representations (ICLR), 2022.
  86. Bagflip: A certified defense against data poisoning. In Alice H. Oh, Alekh Agarwal, Danielle Belgrave, and Kyunghyun Cho, editors, Advances in Neural Information Processing Systems, 2022.
  87. Defeat: Deep hidden feature backdoor attacks by imperceptible perturbation and latent representation constraints. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2022.
  88. Data-free backdoor removal based on channel lipschitzness. In ECCV, 2022.
Citations (6)

Summary

We haven't generated a summary for this paper yet.