Diffusion Denoising as a Certified Defense against Clean-label Poisoning (2403.11981v1)
Abstract: We present a certified defense to clean-label poisoning attacks. These attacks work by injecting a small number of poisoning samples (e.g., 1%) that contain $p$-norm bounded adversarial perturbations into the training data to induce a targeted misclassification of a test-time input. Inspired by the adversarial robustness achieved by $denoised$ $smoothing$, we show how an off-the-shelf diffusion model can sanitize the tampered training data. We extensively test our defense against seven clean-label poisoning attacks and reduce their attack success to 0-16% with only a negligible drop in the test time accuracy. We compare our defense with existing countermeasures against clean-label poisoning, showing that the defense reduces the attack success the most and offers the best model utility. Our results highlight the need for future work on developing stronger clean-label attacks and using our certified yet practical defense as a strong baseline to evaluate these attacks.
- Bullseye polytope: A scalable clean-label poisoning attack with improved transferability. In 2021 IEEE European Symposium on Security and Privacy (EuroS&P), pp. 159–178. IEEE, 2021.
- On warm-starting neural network training. Advances in neural information processing systems, 33:3884–3894, 2020.
- Poisoning attacks against support vector machines. arXiv preprint arXiv:1206.6389, 2012.
- Dp-instahide: Provably defusing poisoning and backdoor attacks with differentially private data augmentations. arXiv preprint arXiv:2103.02079, 2021.
- Carlini, N. Poisoning the unlabeled dataset of {{\{{Semi-Supervised}}\}} learning. In 30th USENIX Security Symposium (USENIX Security 21), pp. 1577–1592, 2021.
- (certified!!) adversarial robustness for free! arXiv preprint arXiv:2206.10550, 2022.
- Certified adversarial robustness via randomized smoothing. In international conference on machine learning, pp. 1310–1320. PMLR, 2019.
- Sever: A robust meta-algorithm for stochastic optimization. In International Conference on Machine Learning, pp. 1596–1606. PMLR, 2019.
- Learning and certification under instance-targeted poisoning. In Uncertainty in Artificial Intelligence, pp. 2135–2145. PMLR, 2021.
- What doesn’t kill you makes you robust (er): How to adversarially train against data poisoning. arXiv preprint arXiv:2102.13624, 2021a.
- Witches’ brew: Industrial scale data poisoning via gradient matching. In International Conference on Learning Representations, 2021b. URL https://openreview.net/forum?id=01olnfLIbD.
- Denoising diffusion probabilistic models. Advances in neural information processing systems, 33:6840–6851, 2020.
- On the effectiveness of mitigating data poisoning attacks with gradient shaping. arXiv preprint arXiv:2002.11497, 2020.
- Metapoison: Practical general-purpose clean-label data poisoning. Advances in Neural Information Processing Systems, 33:12080–12091, 2020.
- Manipulating machine learning: Poisoning attacks and countermeasures for regression learning. In 2018 IEEE symposium on security and privacy (SP), pp. 19–35. IEEE, 2018.
- Learning multiple layers of features from tiny images. 2009.
- Tiny imagenet visual recognition challenge. CS 231N, 7(7):3, 2015.
- Certified robustness to adversarial examples with differential privacy. In 2019 IEEE symposium on security and privacy (SP), pp. 656–672. IEEE, 2019.
- Deep partition aggregation: Provable defense against general poisoning attacks. arXiv preprint arXiv:2006.14768, 2020.
- Friendly noise against adversarial noise: a powerful defense against data poisoning attack. Advances in Neural Information Processing Systems, 35:11947–11959, 2022.
- Randomness in ml defenses helps persistent attackers and hinders evaluators. arXiv preprint arXiv:2302.13464, 2023.
- Data poisoning against differentially-private learners: Attacks and defenses. arXiv preprint arXiv:1903.09860, 2019.
- Exploiting machine learning to subvert your spam filter. LEET, 8(1-9):16–17, 2008a.
- Exploiting machine learning to subvert your spam filter. LEET, 8(1-9):16–17, 2008b.
- Improved denoising diffusion probabilistic models. In International Conference on Machine Learning, pp. 8162–8171. PMLR, 2021.
- Tempered sigmoid activations for deep learning with differential privacy. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 35, pp. 9312–9321, 2021.
- Deep k-nn defense against clean-label data poisoning attacks. In Computer Vision–ECCV 2020 Workshops: Glasgow, UK, August 23–28, 2020, Proceedings, Part I 16, pp. 55–70. Springer, 2020.
- Run-off election: Improved provable defense against data poisoning attacks. arXiv preprint arXiv:2302.02300, 2023.
- Antidote: understanding and defending against poisoning of anomaly detectors. In Proceedings of the 9th ACM SIGCOMM Conference on Internet Measurement, pp. 1–14, 2009.
- Hidden trigger backdoor attacks. In Proceedings of the AAAI conference on artificial intelligence, volume 34, pp. 11957–11965, 2020.
- Just how toxic is data poisoning? a unified benchmark for backdoor and data poisoning attacks. In International Conference on Machine Learning, pp. 9389–9398. PMLR, 2021.
- Poison frogs! targeted clean-label poisoning attacks on neural networks. Advances in neural information processing systems, 31, 2018.
- Deep unsupervised learning using nonequilibrium thermodynamics. In International conference on machine learning, pp. 2256–2265. PMLR, 2015.
- Sleeper agent: Scalable hidden trigger backdoors for neural networks trained from scratch. Advances in Neural Information Processing Systems, 35:19165–19178, 2022.
- Certified defenses for data poisoning attacks. Advances in neural information processing systems, 30, 2017.
- When does machine learning FAIL? generalized transferability for evasion and poisoning attacks. In 27th USENIX Security Symposium (USENIX Security 18), pp. 1299–1316, Baltimore, MD, August 2018. USENIX Association. ISBN 978-1-939133-04-5. URL https://www.usenix.org/conference/usenixsecurity18/presentation/suciu.
- Differentially private learning needs better features (or much more data). arXiv preprint arXiv:2011.11660, 2020.
- Spectral signatures in backdoor attacks. Advances in neural information processing systems, 31, 2018.
- Robustness may be at odds with accuracy. arXiv preprint arXiv:1805.12152, 2018.
- Label-consistent backdoor attacks. arXiv preprint arXiv:1912.02771, 2019.
- Improved certified defenses against data poisoning with (deterministic) finite aggregation. In International Conference on Machine Learning, pp. 22769–22783. PMLR, 2022.
- Rab: Provable robustness against backdoor attacks. In 2023 IEEE Symposium on Security and Privacy (SP), pp. 1311–1328. IEEE, 2023.
- Differentially private learning needs hidden state (or much faster convergence). Advances in Neural Information Processing Systems, 35:703–715, 2022.
- Theoretically principled trade-off between robustness and accuracy. In International conference on machine learning, pp. 7472–7482. PMLR, 2019.
- Bagflip: A certified defense against data poisoning. Advances in Neural Information Processing Systems, 35:31474–31483, 2022.
- Transferable clean-label poisoning attacks on deep neural nets. In International Conference on Machine Learning, pp. 7614–7623, 2019.
- Sanghyun Hong (38 papers)
- Nicholas Carlini (101 papers)
- Alexey Kurakin (19 papers)