TEN-GUARD: Tensor Decomposition for Backdoor Attack Detection in Deep Neural Networks (2401.05432v1)
Abstract: As deep neural networks and the datasets used to train them get larger, the default approach to integrating them into research and commercial projects is to download a pre-trained model and fine tune it. But these models can have uncertain provenance, opening up the possibility that they embed hidden malicious behavior such as trojans or backdoors, where small changes to an input (triggers) can cause the model to produce incorrect outputs (e.g., to misclassify). This paper introduces a novel approach to backdoor detection that uses two tensor decomposition methods applied to network activations. This has a number of advantages relative to existing detection methods, including the ability to analyze multiple models at the same time, working across a wide variety of network architectures, making no assumptions about the nature of triggers used to alter network behavior, and being computationally efficient. We provide a detailed description of the detection pipeline along with results on models trained on the MNIST digit dataset, CIFAR-10 dataset, and two difficult datasets from NIST's TrojAI competition. These results show that our method detects backdoored networks more accurately and efficiently than current state-of-the-art methods.
- I. Goodfellow, J. Shlens, and C. Szegedy, “Explaining and harnessing adversarial examples,” in International Conference on Learning Representations, 2015. [Online]. Available: http://arxiv.org/abs/1412.6572
- B. Wang, Y. Yao, S. Shan, H. Li, B. Viswanath, H. Zheng, and B. Y. Zhao, “Neural cleanse: Identifying and mitigating backdoor attacks in neural networks,” in IEEE Symposium on Security and Privacy. IEEE, 2019.
- K. Karra, C. Ashcraft, and N. Fendley, “The trojai software framework: An opensource tool for embedding trojans into deep learning models,” arXiv preprint arXiv:2003.07233, 2020.
- M. Anderson, T. Adali, and X.-L. Li, “Joint blind source separation with multivariate gaussian model: Algorithms and performance analysis,” IEEE Transactions on Signal Processing, vol. 60, no. 4, 2011.
- K. M. Hossain, S. Bhinge, Q. Long, V. D. Calhoun, and T. Adali, “Data-driven spatio-temporal dynamic brain connectivity analysis using falff: application to sensorimotor task data,” in 2022 56th Annual Conference on Information Sciences and Systems (CISS). IEEE, 2022.
- H. A. Kiers, J. M. Ten Berge, and R. Bro, “Parafac2—part i. a direct fitting algorithm for the parafac2 model,” Journal of Chemometrics: A Journal of the Chemometrics Society, vol. 13, no. 3-4, 1999.
- E. Acar, M. Roald, K. M. Hossain, V. D. Calhoun, and T. Adali, “Tracing evolving networks using tensor factorizations vs. ica-based approaches,” Frontiers in neuroscience, vol. 16, 2022.
- A. Morcos, M. Raghu, and S. Bengio, “Insights on representational similarity in neural networks with canonical correlation,” Advances in Neural Information Processing Systems, vol. 31, 2018.
- C. Cortes, M. Mohri, and A. Rostamizadeh, “Algorithms for learning kernels based on centered alignment,” The Journal of Machine Learning Research, vol. 13, 2012.
- M. Raghu, J. Gilmer, J. Yosinski, and J. Sohl-Dickstein, “Svcca: Singular vector canonical correlation analysis for deep learning dynamics and interpretability,” Advances in neural information processing systems, 2017.
- N. Ailon and B. Chazelle, “The fast johnson–lindenstrauss transform and approximate nearest neighbors,” SIAM Journal on computing, vol. 39, no. 1, 2009.
- A. Eftekhari, M. Babaie-Zadeh, and H. A. Moghaddam, “Two-dimensional random projection,” Signal processing, vol. 91, no. 7, pp. 1589–1603, 2011.
- T. Adali, M. Anderson, and G.-S. Fu, “Diversity in independent component and vector analyses: Identifiability, algorithms, and applications in medical imaging,” IEEE Signal Processing Magazine, vol. 31, no. 3, 2014.
- R. Bro, C. A. Andersson, and H. A. Kiers, “Parafac2—part ii. modeling chromatographic data with retention time shifts,” Journal of Chemometrics: A Journal of the Chemometrics Society, vol. 13, no. 3-4, 1999.
- R. Taylor, “Interpretation of the correlation coefficient: a basic review,” Journal of diagnostic medical sonography, vol. 6, no. 1, 1990.
- E. W. Weisstein, “Bonferroni correction,” https://mathworld. wolfram. com/, 2004.
- B. Chen, W. Carvalho, N. Baracaldo, H. Ludwig, B. Edwards, T. Lee, I. Molloy, and B. Srivastava, “Detecting backdoor attacks on deep neural networks by activation clustering,” arXiv preprint arXiv:1811.03728, 2018.
- N. D. Sidiropoulos, L. De Lathauwer, X. Fu, K. Huang, E. E. Papalexakis, and C. Faloutsos, “Tensor decomposition for signal processing and machine learning,” IEEE Transactions on Signal Processing, vol. 65, no. 13, pp. 3551–3582, 2017.
- K. R. Shahapure and C. Nicholas, “Cluster quality analysis using silhouette score,” in 2020 IEEE 7th International Conference on Data Science and Advanced Analytics (DSAA). IEEE, 2020, pp. 747–748.
- Y. Liu, W.-C. Lee, G. Tao, S. Ma, Y. Aafer, and X. Zhang, “Abs: Scanning neural networks for back-doors by artificial brain stimulation,” in Proceedings of the 2019 ACM SIGSAC Conference on Computer and Communications Security, 2019, pp. 1265–1282.
- S. Kolouri, A. Saha, H. Pirsiavash, and H. Hoffmann, “Universal litmus patterns: Revealing backdoor attacks in cnns,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020.
- G. Shen, Y. Liu, G. Tao, S. An, Q. Xu, S. Cheng, S. Ma, and X. Zhang, “Backdoor scanning for deep neural networks through k-arm optimization,” in International Conference on Machine Learning. PMLR, 2021, pp. 9525–9536.
- I. H. Witten and E. Frank, “Data mining: practical machine learning tools and techniques with java implementations,” Acm Sigmod Record, vol. 31, no. 1, 2002.