Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
102 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Mudjacking: Patching Backdoor Vulnerabilities in Foundation Models (2402.14977v1)

Published 22 Feb 2024 in cs.CR, cs.CV, and cs.LG

Abstract: Foundation model has become the backbone of the AI ecosystem. In particular, a foundation model can be used as a general-purpose feature extractor to build various downstream classifiers. However, foundation models are vulnerable to backdoor attacks and a backdoored foundation model is a single-point-of-failure of the AI ecosystem, e.g., multiple downstream classifiers inherit the backdoor vulnerabilities simultaneously. In this work, we propose Mudjacking, the first method to patch foundation models to remove backdoors. Specifically, given a misclassified trigger-embedded input detected after a backdoored foundation model is deployed, Mudjacking adjusts the parameters of the foundation model to remove the backdoor. We formulate patching a foundation model as an optimization problem and propose a gradient descent based method to solve it. We evaluate Mudjacking on both vision and language foundation models, eleven benchmark datasets, five existing backdoor attacks, and thirteen adaptive backdoor attacks. Our results show that Mudjacking can remove backdoor from a foundation model while maintaining its utility.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (59)
  1. Language models are few-shot learners. In NeurIPS, 2020.
  2. Poisoning and backdooring contrastive learning. In ICLR, 2022.
  3. Towards evaluating the robustness of neural networks. In S&P, 2017.
  4. A simple framework for contrastive learning of visual representations. In ICML, 2020.
  5. Improved baselines with momentum contrastive learning. arXiv, 2020.
  6. Targeted backdoor attacks on deep learning systems using data poisoning. arXiv, 2017.
  7. Beagle: Forensics of deep learning backdoor attack for better defense. In NDSS, 2023.
  8. Sentinet: Detecting localized universal attacks against deep learning systems. In S&P Workshops, 2020.
  9. An analysis of single-layer networks in unsupervised feature learning. In AISTATS, 2011.
  10. A unified evaluation of textual backdoor learning: Frameworks and benchmarks. In NeurIPS: Datasets and Benchmarks, 2022.
  11. Automated hate speech detection and the problem of offensive language. In ICWSM, 2017.
  12. BERT: Pre-training of deep bidirectional transformers for language understanding. In NAACL, 2019.
  13. Februus: Input purification defense against trojan attacks on deep neural network systems. In ACSAC, 2020.
  14. Backdoor attack with imperceptible input and latent modification. NeurIPS, 2021.
  15. Strip: A defence against trojan attacks on deep neural networks. In ACSAC, 2019.
  16. Badnets: Identifying vulnerabilities in the machine learning model supply chain. In IEEE Access, 2017.
  17. Towards inspecting and eliminating trojan backdoors in deep neural networks. In ICDM, 2020.
  18. Identifying a training-set attack’s target using renormalized influence estimation. In CCS, 2022.
  19. Deep residual learning for image recognition. In CVPR, 2016.
  20. Intrinsic certified robustness of bagging against data poisoning attacks. In AAAI, 2021.
  21. Certified robustness of nearest neighbors against data poisoning and backdoor attacks. In AAAI, 2022.
  22. Badencoder: Backdoor attacks to pre-trained encoders in self-supervised learning. In S&P, 2022.
  23. Learning multiple layers of features from tiny images. 2009.
  24. Poisonedencoder: Poisoning the unlabeled pre-training data in contrastive learning. In USENIX Security, 2022.
  25. Fine-pruning: Defending against backdooring attacks on deep neural networks. In RAID, 2018.
  26. Backdoor defense with machine unlearning. In INFOCOM, 2022.
  27. Trojaning attack on neural networks. In NDSS, 2018.
  28. The “beatrix” resurrections: Robust backdoor detection via gram matrices. In NDSS, 2023.
  29. J MacQueen. Classification and analysis of multivariate observations. In 5th Berkeley Symp. Math. Statist. Probability, 1967.
  30. Pointer sentinel mixture models. arXiv, 2016.
  31. Cats and dogs. In CVPR, 2012.
  32. Learning transferable visual models from natural language supervision. In ICML, 2021.
  33. Language models are unsupervised multitask learners. OpenAI blog, 2019.
  34. Imagenet large scale visual recognition challenge. In IJCV, 2015.
  35. Backdoor attacks on self-supervised learning. In CVPR, 2022.
  36. Dynamic backdoor attacks against machine learning models. In EuroS&P, 2022.
  37. Grad-cam: Visual explanations from deep networks via gradient-based localization. In ICCV, 2017.
  38. Poison forensics: Traceback of data poisoning attacks in neural networks. In USENIX Security, 2022.
  39. Conceptual captions: A cleaned, hypernymed, image alt-text dataset for automatic image captioning. In ACL, 2018.
  40. Backdoor pre-trained models can transfer to all. In CCS, 2021.
  41. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv, 2013.
  42. Editable neural networks. In ICLR, 2020.
  43. Recursive deep models for semantic compositionality over a sentiment treebank. In EMNLP, 2013.
  44. Striving for simplicity: The all convolutional net. In ICLR workshop, 2015.
  45. Man vs. computer: Benchmarking machine learning algorithms for traffic sign recognition. In Neural networks, 2012.
  46. Intriguing properties of neural networks. In ICLR, 2014.
  47. Demon in the variant: Statistical analysis of dnns for robust backdoor contamination detection. In USENIX Security, 2021.
  48. On certifying robustness against backdoor attacks via randomized smoothing. arXiv, 2020.
  49. Neural cleanse: Identifying and mitigating backdoor attacks in neural networks. In S&P, 2019.
  50. Backdoor attacks against deep learning systems in the physical world. In CVPR, 2021.
  51. Patchguard: A provably robust defense against adversarial patches via small receptive fields and masking. In USENIX Security, 2021.
  52. Detecting ai trojans using meta neural analysis. In S&P, 2021.
  53. Visualizing and understanding convolutional networks. In ECCV, 2014.
  54. Corruptencoder: Data poisoning based backdoor attacks to contrastive learning. arXiv, 2022.
  55. Backdoor attacks to graph neural networks. In SACMAT, 2021.
  56. Red alarm for pre-trained models: Universal vulnerability to neuron-level backdoor attacks. In Machine Intelligence Research, 2023.
  57. Ai-lancet: Locating error-inducing neurons to optimize neural networks. In CCS, 2021.
  58. Modifying memories in transformer models. arXiv, 2020.
  59. Selective amnesia: On efficient, high-fidelity and blind suppression of backdoor effects in trojaned machine learning models. In S&P, 2023.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (3)
  1. Hongbin Liu (80 papers)
  2. Michael K. Reiter (34 papers)
  3. Neil Zhenqiang Gong (117 papers)

Summary

We haven't generated a summary for this paper yet.