Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Generating Potent Poisons and Backdoors from Scratch with Guided Diffusion (2403.16365v1)

Published 25 Mar 2024 in cs.LG, cs.CR, and cs.CV

Abstract: Modern neural networks are often trained on massive datasets that are web scraped with minimal human inspection. As a result of this insecure curation pipeline, an adversary can poison or backdoor the resulting model by uploading malicious data to the internet and waiting for a victim to scrape and train on it. Existing approaches for creating poisons and backdoors start with randomly sampled clean data, called base samples, and then modify those samples to craft poisons. However, some base samples may be significantly more amenable to poisoning than others. As a result, we may be able to craft more potent poisons by carefully choosing the base samples. In this work, we use guided diffusion to synthesize base samples from scratch that lead to significantly more potent poisons and backdoors than previous state-of-the-art attacks. Our Guided Diffusion Poisoning (GDP) base samples can be combined with any downstream poisoning or backdoor attack to boost its effectiveness. Our implementation code is publicly available at: https://github.com/hsouri/GDP .

Definition Search Book Streamline Icon: https://streamlinehq.com
References (49)
  1. Deep learning with differential privacy. In Proceedings of the 2016 ACM SIGSAC conference on computer and communications security, pages 308–318, 2016.
  2. Bullseye polytope: A scalable clean-label poisoning attack with improved transferability. In 2021 IEEE European symposium on security and privacy (EuroS&P), pages 159–178. IEEE, 2021.
  3. Cold diffusion: Inverting arbitrary image transforms without noise. arXiv preprint arXiv:2208.09392, 2022.
  4. Universal guidance for diffusion models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 843–852, 2023.
  5. Poisoning attacks against support vector machines. arXiv preprint arXiv:1206.6389, 2012.
  6. Dp-instahide: Provably defusing poisoning and backdoor attacks with differentially private data augmentations. arXiv preprint arXiv:2103.02079, 2021.
  7. Targeted backdoor attacks on deep learning systems using data poisoning. arXiv preprint arXiv:1712.05526, 2017.
  8. Diffusion posterior sampling for general noisy inverse problems. arXiv preprint arXiv:2209.14687, 2022a.
  9. Improving diffusion models for inverse problems using manifold constraints. arXiv preprint arXiv:2206.00941, 2022b.
  10. Imagenet: A large-scale hierarchical image database. In 2009 IEEE conference on computer vision and pattern recognition, pages 248–255. Ieee, 2009.
  11. Diffusion models beat gans on image synthesis. Advances in neural information processing systems, 34:8780–8794, 2021.
  12. Adversarial examples make strong poisons. Advances in Neural Information Processing Systems, 34:30339–30351, 2021.
  13. Strip: A defence against trojan attacks on deep neural networks. In Proceedings of the 35th Annual Computer Security Applications Conference, pages 113–125, 2019.
  14. Witches’ brew: Industrial scale data poisoning via gradient matching. In International Conference on Learning Representations, 2020.
  15. Dataset security for machine learning: Data poisoning, backdoor attacks, and defenses. IEEE Transactions on Pattern Analysis and Machine Intelligence, 45(2):1563–1580, 2022.
  16. Diffusion models as plug-and-play priors. arXiv preprint arXiv:2206.09012, 2022.
  17. Badnets: Identifying vulnerabilities in the machine learning model supply chain. arXiv preprint arXiv:1708.06733, 2017.
  18. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 770–778, 2016.
  19. Classifier-free diffusion guidance. arXiv preprint arXiv:2207.12598, 2022.
  20. Denoising diffusion probabilistic models. Advances in Neural Information Processing Systems, 32, 2020.
  21. On the effectiveness of mitigating data poisoning attacks with gradient shaping. arXiv preprint arXiv:2002.11497, 2020.
  22. Unlearnable examples: Making personal data unexploitable. arXiv preprint arXiv:2101.04898, 2021.
  23. Metapoison: Practical general-purpose clean-label data poisoning. Advances in Neural Information Processing Systems, 33:12080–12091, 2020.
  24. Denoising diffusion restoration models. arXiv preprint arXiv:2201.11793, 2022.
  25. Learning multiple layers of features from tiny images. 2009.
  26. Anti-backdoor learning: Training clean models on poisoned data. Advances in Neural Information Processing Systems, 34:14900–14912, 2021.
  27. Gligen: Open-set grounded text-to-image generation. arXiv preprint arXiv:2301.07093, 2023.
  28. Repaint: Inpainting using denoising diffusion probabilistic models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 11461–11471, 2022.
  29. Towards poisoning of deep learning algorithms with back-gradient optimization. In Proceedings of the 10th ACM workshop on artificial intelligence and security, pages 27–38, 2017.
  30. Glide: Towards photorealistic image generation and editing with text-guided diffusion models. arXiv preprint arXiv:2112.10741, 2021.
  31. Hidden trigger backdoor attacks. In Proceedings of the AAAI conference on artificial intelligence, volume 34, pages 11957–11965, 2020.
  32. Mobilenetv2: Inverted residuals and linear bottlenecks. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 4510–4520, 2018.
  33. Just how toxic is data poisoning? a unified benchmark for backdoor and data poisoning attacks. In International Conference on Machine Learning, pages 9389–9398. PMLR, 2021.
  34. Poison frogs! targeted clean-label poisoning attacks on neural networks. Advances in neural information processing systems, 31, 2018.
  35. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014.
  36. Denoising diffusion implicit models. International Conference on Learning Representations, 2021.
  37. Generative modeling by estimating gradients of the data distribution. Advances in Neural Information Processing Systems, 32, 2019.
  38. Sleeper agent: Scalable hidden trigger backdoors for neural networks trained from scratch. Advances in Neural Information Processing Systems, 35:19165–19178, 2022.
  39. Certified defenses for data poisoning attacks. Advances in neural information processing systems, 30, 2017.
  40. Spectral signatures in backdoor attacks. Advances in neural information processing systems, 31, 2018.
  41. Clean-label backdoor attacks. 2018.
  42. Neural cleanse: Identifying and mitigating backdoor attacks in neural networks. In 2019 IEEE Symposium on Security and Privacy (SP), pages 707–723. IEEE, 2019.
  43. Semantic image synthesis via diffusion models. arXiv preprint arXiv:2207.00050, 2022a.
  44. Zero-shot image restoration using denoising diffusion null-space model. arXiv preprint arXiv:2212.00490, 2022b.
  45. Deblurring via stochastic refinement. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 16293–16303, 2022.
  46. Adversarial neuron pruning purifies backdoored deep models. Advances in Neural Information Processing Systems, 34:16913–16925, 2021.
  47. mixup: Beyond empirical risk minimization. In International Conference on Learning Representations, 2018.
  48. Adding conditional control to text-to-image diffusion models. arXiv preprint arXiv:2302.05543, 2023.
  49. Transferable clean-label poisoning attacks on deep neural nets. In International Conference on Machine Learning, pages 7614–7623. PMLR, 2019.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (10)
  1. Hossein Souri (12 papers)
  2. Arpit Bansal (17 papers)
  3. Hamid Kazemi (9 papers)
  4. Liam Fowl (25 papers)
  5. Aniruddha Saha (19 papers)
  6. Jonas Geiping (73 papers)
  7. Andrew Gordon Wilson (133 papers)
  8. Rama Chellappa (190 papers)
  9. Tom Goldstein (226 papers)
  10. Micah Goldblum (96 papers)
Citations (1)

Summary

Generating Potent Poisons and Backdoors from Scratch with Guided Diffusion

Introduction

The paper introduces a novel approach for crafting more potent poisons and backdoors in neural networks by synthesizing base samples from scratch using guided diffusion models. This method, referred to as Guided Diffusion Poisoning (GDP), marks a significant advancement in the field of adversarial attacks on machine learning models. Unlike conventional methods that modify existing clean data samples for crafting poisons, GDP synthesizes the base samples for poisoning, enabling the crafted poisons to be significantly more effective while maintaining their appearance as natural images from the base class. The success of GDP in bypassing several state-of-the-art defenses and its effectiveness in black-box settings further underscores its potential implications for AI security.

Guided Diffusion Poisoning Approach

GDP synthesizes base samples optimized for the poisoning objective by weakly guiding the generative diffusion process. This allows the generation of images that are near-optimal poisons while maintaining high image quality and clean-label characteristics. The process involves three main stages:

  1. Generating Base Samples with Guided Diffusion: A diffusion model is used to generate base samples tailored to the poisoning objective by incorporating a classifier's feedback and a poisoning or backdoor loss function into the diffusion guidance mechanism.
  2. Initializing Poisoning and Backdoor Attacks with GDP Base Samples: The generated base samples serve as initializations for downstream poisoning or backdoor attacks, enhancing the effectiveness of these attacks significantly.
  3. Filtering Poisons: A subset of the generated poisons demonstrating the lowest poisoning loss is selected for the attack, optimizing the trade-off between potency and quantity of poisons.

Experimental Evaluation and Results

The effectiveness of GDP is evaluated across several experiments, including targeted data poisoning and backdoor attacks on CIFAR-10 and ImageNet datasets. In targeted poisoning settings, GDP outperforms existing state-of-the-art methods by achieving high success rates with fewer poisoned samples. For instance, it achieves a 70% success rate on CIFAR-10 with just 50 poisoned images, a scenario where previous methods struggled. Similarly, in backdoor attacks, GDP demonstrates superior efficiency, necessitating far fewer poisoned samples for high success rates compared to contemporary backdoor methods.

Furthermore, GDP showcases robustness in black-box scenarios, where the attacker does not have knowledge of the victim's model architecture, signifying its real-world applicability. The method's resilience against several commonly used defenses also indicates the challenges it poses to current defensive strategies in AI security.

Implications and Future Directions

The introduction of GDP highlights a pressing need for reevaluating the security measures in place for protecting neural networks against poisoning and backdoor attacks. Its capability to craft potent attacks with minimal samples and bypass existing defenses calls for the development of more robust defensive mechanisms. Future research directions might explore countermeasures specific to diffusion-based adversarial attacks and the potential for employing guided diffusion in defensive strategies.

Moreover, the approach opens new avenues in understanding the vulnerability of neural networks to data poisoning from a generative perspective, suggesting that future work could focus on the interplay between generative model-based adversarial attacks and the intrinsic vulnerabilities of deep learning models.

Conclusion

This work represents a significant step forward in the understanding and capability of neural network poisoning. By leveraging guided diffusion models to synthesize base samples optimized for adversarial objectives, GDP sets a new benchmark for the effectiveness of poisoning and backdoor attacks. The findings call attention to the emerging threats posed by advanced generative models in the field of AI security and underscore the necessity for ongoing research into more sophisticated defensive strategies.