FCert: Certifiably Robust Few-Shot Classification in the Era of Foundation Models (2404.08631v1)
Abstract: Few-shot classification with foundation models (e.g., CLIP, DINOv2, PaLM-2) enables users to build an accurate classifier with a few labeled training samples (called support samples) for a classification task. However, an attacker could perform data poisoning attacks by manipulating some support samples such that the classifier makes the attacker-desired, arbitrary prediction for a testing input. Empirical defenses cannot provide formal robustness guarantees, leading to a cat-and-mouse game between the attacker and defender. Existing certified defenses are designed for traditional supervised learning, resulting in sub-optimal performance when extended to few-shot classification. In our work, we propose FCert, the first certified defense against data poisoning attacks to few-shot classification. We show our FCert provably predicts the same label for a testing input under arbitrary data poisoning attacks when the total number of poisoned support samples is bounded. We perform extensive experiments on benchmark few-shot classification datasets with foundation models released by OpenAI, Meta, and Google in both vision and text domains. Our experimental results show our FCert: 1) maintains classification accuracy without attacks, 2) outperforms existing state-of-the-art certified defenses for data poisoning attacks, and 3) is efficient and general.
- A. Radford, J. W. Kim, C. Hallacy, A. Ramesh, G. Goh, S. Agarwal, G. Sastry, A. Askell, P. Mishkin, J. Clark, et al., “Learning transferable visual models from natural language supervision,” in ICML, 2021.
- R. Bommasani, D. A. Hudson, E. Adeli, R. Altman, S. Arora, S. von Arx, M. S. Bernstein, J. Bohg, A. Bosselut, E. Brunskill, et al., “On the opportunities and risks of foundation models,” arXiv, 2021.
- A. Kirillov, E. Mintun, N. Ravi, H. Mao, C. Rolland, L. Gustafson, T. Xiao, S. Whitehead, A. C. Berg, W.-Y. Lo, et al., “Segment anything,” arXiv, 2023.
- M. Oquab, T. Darcet, T. Moutakanni, H. Vo, M. Szafraniec, et al., “Dinov2: Learning robust visual features without supervision,” arXiv, 2023.
- J. Snell, K. Swersky, and R. Zemel, “Prototypical networks for few-shot learning,” in NeurIPS, 2017.
- G. Alain and Y. Bengio, “Understanding intermediate layers using linear classifier probes,” in ICLR, 2017.
- S. X. Hu, D. Li, J. Stühmer, M. Kim, and T. M. Hospedales, “Pushing the limits of simple pipelines for few-shot learning: External data and fine-tuning make a difference,” in CVPR, 2022.
- Z. Xu, Z. Shi, J. Wei, Y. Li, and Y. Liang, “Improving foundation models for few-shot learning via multitask finetuning,” in ICLR Workshops, 2023.
- L. Muñoz-González, B. Biggio, A. Demontis, A. Paudice, V. Wongrassamee, E. C. Lupu, and F. Roli, “Towards poisoning of deep learning algorithms with back-gradient optimization,” in AISec, 2017.
- A. Shafahi, W. R. Huang, M. Najibi, O. Suciu, C. Studer, T. Dumitras, and T. Goldstein, “Poison frogs! targeted clean-label poisoning attacks on neural networks,” in NeurIPS, 2018.
- E. T. Oldewage, J. F. Bronskill, and R. E. Turner, “Adversarial attacks are a surprisingly strong baseline for poisoning few-shot meta-learners,” in NeurIPS Workshop, 2022.
- H. Xu, Y. Li, X. Liu, H. Liu, and J. Tang, “Yet meta learning can adapt fast, it can also break easily,” in SDM, 2021.
- X. Liu, X. Jia, J. Gu, Y. Xun, S. Liang, and X. Cao, “Does few-shot learning suffer from backdoor attacks?,” in AAAI, 2024.
- S. Puch, I. Sánchez, and M. Rowe, “Few-shot learning with deep triplet networks for brain imaging modality recognition,” in MICCAI 2019 workshop, 2019.
- Y. Ge, Y. Guo, Y.-C. Yang, M. A. Al-Garadi, and A. Sarker, “Few-shot learning for medical text: A systematic review,” arXiv, 2022.
- S. Zhou, C. Deng, Z. Piao, and B. Zhao, “Few-shot traffic sign recognition with clustering inductive bias and random neural network,” Pattern Recognition, vol. 100, p. 107160, 2020.
- M. Cantarini, L. Gabrielli, and S. Squartini, “Few-shot emergency siren detection,” Sensors, vol. 22, no. 12, p. 4338, 2022.
- S. Shan, A. N. Bhagoji, H. Zheng, and B. Y. Zhao, “Traceback of data poisoning attacks in neural networks,” in USENIX Security, 2022.
- J. Chen, X. Zhang, R. Zhang, C. Wang, and L. Liu, “De-pois: An attack-agnostic defense against data poisoning attacks,” in IEEE T-IFS, 2021.
- Y. Zeng, M. Pan, H. Jahagirdar, M. Jin, L. Lyu, and R. Jia, “Meta-sift: How to sift out a clean subset in the presence of data poisoning?,” in USENIX Security, 2023.
- N. Peri, N. Gupta, W. R. Huang, L. Fowl, C. Zhu, S. Feizi, T. Goldstein, and J. P. Dickerson, “Deep k-nn defense against clean-label data poisoning attacks,” in ECCV, 2020.
- P. W. Koh, J. Steinhardt, and P. Liang, “Stronger data poisoning attacks break data sanitization defenses,” Machine Learning, 2022.
- R. Shokri et al., “Bypassing backdoor detection algorithms in deep learning,” in Euro S & P, 2020.
- Y. Yang, T. Y. Liu, and B. Mirzasoleiman, “Not all poisons are created equal: Robust training against data poisoning,” in ICML, 2022.
- B. Tran, J. Li, and A. Madry, “Spectral signatures in backdoor attacks,” in NeurIPS, 2018.
- T. Y. Liu, Y. Yang, and B. Mirzasoleiman, “Friendly noise against adversarial noise: A powerful defense against data poisoning attacks,” arXiv, 2022.
- Y. Gao, C. Xu, D. Wang, S. Chen, D. C. Ranasinghe, and S. Nepal, “Strip: A defence against trojan attacks on deep neural networks,” in ACSAC, 2019.
- E. Chou, F. Tramer, and G. Pellegrino, “Sentinet: Detecting localized universal attacks against deep learning systems,” in IEEE S & P Workshops, 2020.
- B. Chen, W. Carvalho, N. Baracaldo, H. Ludwig, B. Edwards, T. Lee, I. Molloy, and B. Srivastava, “Detecting backdoor attacks on deep neural networks by activation clustering,” arXiv, 2018.
- K. Liu, B. Dolan-Gavitt, and S. Garg, “Fine-pruning: Defending against backdooring attacks on deep neural networks,” in RAID, 2018.
- H. Qiu, Y. Zeng, S. Guo, T. Zhang, M. Qiu, and B. Thuraisingham, “Deepsweep: An evaluation framework for mitigating dnn backdoor attacks using data augmentation,” in Asia CCS, 2021.
- B. Wang, Y. Yao, S. Shan, H. Li, B. Viswanath, H. Zheng, and B. Y. Zhao, “Neural cleanse: Identifying and mitigating backdoor attacks in neural networks,” in IEEE S & P, 2019.
- X. Qiao, Y. Yang, and H. Li, “Defending neural backdoors via generative distribution modeling,” NeurIPS, 2019.
- J. Jia, X. Cao, and N. Z. Gong, “Intrinsic certified robustness of bagging against data poisoning attacks,” in AAAI, 2021.
- A. Levine and S. Feizi, “Deep partition aggregation: Provable defense against general poisoning attacks,” arXiv:2006.14768, 2020.
- J. Steinhardt, P. W. W. Koh, and P. S. Liang, “Certified defenses for data poisoning attacks,” in NeurIPS, 2017.
- W. Wang, A. J. Levine, and S. Feizi, “Improved certified defenses against data poisoning with (deterministic) finite aggregation,” in ICML, 2022.
- Y. Ma, X. Zhu, and J. Hsu, “Data poisoning against differentially-private learners: Attacks and defenses,” in IJCAI, 2019.
- E. Rosenfeld, E. Winston, P. Ravikumar, and J. Z. Kolter, “Certified robustness to label-flipping attacks via randomized smoothing,” in ICML, 2020.
- B. Wang, X. Cao, J. Jia, and N. Z. Gong, “On certifying robustness against backdoor attacks via randomized smoothing,” in CVPR Workshops, 2020.
- Y. Zhang, A. Albarghouthi, and L. D’Antoni, “Bagflip: A certified defense against data poisoning,” arXiv, 2022.
- K. Rezaei, K. Banihashem, A. Chegini, and S. Feizi, “Run-off election: Improved provable defense against data poisoning attacks,” arXiv preprint arXiv:2302.02300, 2023.
- M. Weber, X. Xu, B. Karlaš, C. Zhang, and B. Li, “Rab: Provable robustness against backdoor attacks,” in IEEE S & P, 2023.
- K. Chen, X. Lou, G. Xu, J. Li, and T. Zhang, “Clean-image backdoor: Attacking multi-label models with poisoned labels only,” in ICLR, 2023.
- E. Triantafillou, T. Zhu, V. Dumoulin, P. Lamblin, U. Evci, K. Xu, R. Goroshin, C. Gelada, K. Swersky, P.-A. Manzagol, et al., “Meta-dataset: A dataset of datasets for learning to learn from few examples,” arXiv preprint arXiv:1903.03096, 2019.
- L. Bertinetto, J. F. Henriques, P. H. Torr, and A. Vedaldi, “Meta-learning with differentiable closed-form solvers,” arXiv preprint arXiv:1805.08136, 2018.
- M. Ren, R. Liao, E. Fetaya, and R. Zemel, “Incremental few-shot learning with attention attractor networks,” in NeurIPS, 2019.
- W.-Y. Chen, Y.-C. Liu, Z. Kira, Y.-C. F. Wang, and J.-B. Huang, “A closer look at few-shot classification,” arXiv preprint arXiv:1904.04232, 2019.
- J. Jia, Y. Liu, X. Cao, and N. Z. Gong, “Certified robustness of nearest neighbors against data poisoning and backdoor attacks,” in AAAI, 2022.
- “PaLM-2 API.” https://developers.generativeai.google. Accessed: 2023-09-19.
- “OpenAI API.” https://openai.com/blog/openai-api. Accessed: 2023-09-19.
- X. Cao, J. Jia, and N. Z. Gong, “Provably secure federated learning against malicious clients,” in AAAI, 2021.
- Z. Zhang, J. Jia, B. Wang, and N. Z. Gong, “Backdoor attacks to graph neural networks,” in SACMAT, 2021.
- X. Cao, Z. Zhang, J. Jia, and N. Z. Gong, “Flcert: Provably secure federated learning against poisoning attacks,” IEEE Transactions on Information Forensics and Security, vol. 17, pp. 3691–3705, 2022.
- L. Li, T. Xie, and B. Li, “Sok: Certified robustness for deep neural networks,” in IEEE S & P, 2023.
- J. Jia, Y. Liu, Y. Hu, and N. Z. Gong, “Pore: Provably robust recommender systems against data poisoning attacks,” in USENIX Security Symposium, 2023.
- H. Pei, J. Jia, W. Guo, B. Li, and D. Song, “Textguard: Provable defense against backdoor attacks on text classification,” in NDSS, 2024.
- M. Lecuyer, V. Atlidakis, R. Geambasu, D. Hsu, and S. Jana, “Certified robustness to adversarial examples with differential privacy,” in IEEE S & P, 2019.
- J. Cohen, E. Rosenfeld, and Z. Kolter, “Certified adversarial robustness via randomized smoothing,” in ICML, 2019.
- M. Pautov, O. Kuznetsova, N. Tursynbek, A. Petiushko, and I. Oseledets, “Smoothed embeddings for certified few-shot learning,” in NeurIPS, 2022.
- F. Sung, Y. Yang, L. Zhang, T. Xiang, P. H. Torr, and T. M. Hospedales, “Learning to compare: Relation network for few-shot learning,” in CVPR, 2018.
- S. W. Yoon, J. Seo, and J. Moon, “Tapnet: Neural network augmented with task-adaptive projection for few-shot learning,” in ICML, 2019.
- T. Jeong and H. Kim, “Ood-maml: Meta-learning for few-shot out-of-distribution detection and classification,” NeurIPS, 2020.
- John Wiley & Sons, 2004.
- “Prototypical networks for few shot learning in pytorch.” https://github.com/orobix/Prototypical-Networks-for-Few-shot-Learning-PyTorch. Accessed: 2023-09-19.
- “CLIP-implementation.” https://github.com/openai/CLIP. Accessed: 2023-09-12.
- “sklearn.” https://scikit-learn.org/stable. Accessed: 2023-09-12.
- A. Madry, A. Makelov, L. Schmidt, D. Tsipras, and A. Vladu, “Towards deep learning models resistant to adversarial attacks,” arXiv, 2017.
- R. Anil, A. M. Dai, O. Firat, M. Johnson, D. Lepikhin, et al., “Palm 2 technical report,” arXiv, 2023.
- “20 newsgroups text dataset.” http://qwone.com/ jason/20Newsgroups/. Accessed: 2023-09-19.
- L. Shen, S. Ji, X. Zhang, J. Li, J. Chen, J. Shi, C. Fang, J. Yin, and T. Wang, “Backdoor pre-trained models can transfer to all,” in CCS, 2021.
- J. Jia, Y. Liu, and N. Z. Gong, “BadEncoder: Backdoor attacks to pre-trained encoders in self-supervised learning,” in IEEE S & P, 2022.
- N. Carlini and A. Terzis, “Poisoning and backdooring contrastive learning,” in ICLR, 2021.
- H. Liu, J. Jia, and N. Z. Gong, “PoisonedEncoder: Poisoning the unlabeled pre-training data in contrastive learning,” in USENIX Security, 2022.
- H. He, K. Zha, and D. Katabi, “Indiscriminate poisoning attacks on unsupervised contrastive learning,” in ICLR, 2022.
- C. Li, R. Pang, Z. Xi, T. Du, S. Ji, Y. Yao, and T. Wang, “An embarrassingly simple backdoor attack on self-supervised learning,” in ICCV, 2023.
- N. Carlini, M. Jagielski, C. A. Choquette-Choo, D. Paleka, W. Pearce, H. Anderson, A. Terzis, K. Thomas, and F. Tramèr, “Poisoning web-scale training datasets is practical,” arXiv, 2023.
- J. Zhang, H. Liu, J. Jia, and N. Z. Gong, “Corruptencoder: Data poisoning based backdoor attacks to contrastive learning,” in CVPR, 2024.
- J. Jia, H. Liu, and N. Z. Gong, “10 security and privacy problems in large foundation models,” in AI Embedded Assurance for Cyber Systems, pp. 139–159, 2023.
- C. Szegedy, W. Zaremba, I. Sutskever, J. Bruna, D. Erhan, I. Goodfellow, and R. Fergus, “Intriguing properties of neural networks,” in ICLR, 2014.
- I. J. Goodfellow, J. Shlens, and C. Szegedy, “Explaining and harnessing adversarial examples,” arXiv, 2014.
- T. Gu, B. Dolan-Gavitt, and S. Garg, “Badnets: Identifying vulnerabilities in the machine learning model supply chain,” arXiv preprint arXiv:1708.06733, 2017.
- Y. Liu, S. Ma, Y. Aafer, W.-C. Lee, J. Zhai, W. Wang, and X. Zhang, “Trojaning attack on neural networks,” in NDSS, 2018.
- D. Yin, Y. Chen, R. Kannan, and P. Bartlett, “Byzantine-robust distributed learning: Towards optimal statistical rates,” in ICML, 2018.
- “Outline for greedy algorithms: Exchange arguments.” https://www.cs.cornell.edu/courses/cs482/2007su/excha-nge.pdf. Accessed: 2023-10-01.
- Yanting Wang (25 papers)
- Wei Zou (62 papers)
- Jinyuan Jia (69 papers)