Generating Potent Poisons and Backdoors from Scratch with Guided Diffusion (2403.16365v1)

Published 25 Mar 2024 in cs.LG, cs.CR, and cs.CV

Abstract: Modern neural networks are often trained on massive datasets that are web scraped with minimal human inspection. As a result of this insecure curation pipeline, an adversary can poison or backdoor the resulting model by uploading malicious data to the internet and waiting for a victim to scrape and train on it. Existing approaches for creating poisons and backdoors start with randomly sampled clean data, called base samples, and then modify those samples to craft poisons. However, some base samples may be significantly more amenable to poisoning than others. As a result, we may be able to craft more potent poisons by carefully choosing the base samples. In this work, we use guided diffusion to synthesize base samples from scratch that lead to significantly more potent poisons and backdoors than previous state-of-the-art attacks. Our Guided Diffusion Poisoning (GDP) base samples can be combined with any downstream poisoning or backdoor attack to boost its effectiveness. Our implementation code is publicly available at: https://github.com/hsouri/GDP .

References (49)

Authors (10)

Hossein Souri (12 papers)
Arpit Bansal (17 papers)
Hamid Kazemi (9 papers)
Liam Fowl (25 papers)
Aniruddha Saha (19 papers)
Jonas Geiping (73 papers)
Andrew Gordon Wilson (133 papers)
Rama Chellappa (190 papers)
Tom Goldstein (226 papers)
Micah Goldblum (96 papers)

Citations (1)

View on Semantic Scholar

Summary

Generating Potent Poisons and Backdoors from Scratch with Guided Diffusion

Introduction

The paper introduces a novel approach for crafting more potent poisons and backdoors in neural networks by synthesizing base samples from scratch using guided diffusion models. This method, referred to as Guided Diffusion Poisoning (GDP), marks a significant advancement in the field of adversarial attacks on machine learning models. Unlike conventional methods that modify existing clean data samples for crafting poisons, GDP synthesizes the base samples for poisoning, enabling the crafted poisons to be significantly more effective while maintaining their appearance as natural images from the base class. The success of GDP in bypassing several state-of-the-art defenses and its effectiveness in black-box settings further underscores its potential implications for AI security.

Guided Diffusion Poisoning Approach

GDP synthesizes base samples optimized for the poisoning objective by weakly guiding the generative diffusion process. This allows the generation of images that are near-optimal poisons while maintaining high image quality and clean-label characteristics. The process involves three main stages:

Generating Base Samples with Guided Diffusion: A diffusion model is used to generate base samples tailored to the poisoning objective by incorporating a classifier's feedback and a poisoning or backdoor loss function into the diffusion guidance mechanism.
Initializing Poisoning and Backdoor Attacks with GDP Base Samples: The generated base samples serve as initializations for downstream poisoning or backdoor attacks, enhancing the effectiveness of these attacks significantly.
Filtering Poisons: A subset of the generated poisons demonstrating the lowest poisoning loss is selected for the attack, optimizing the trade-off between potency and quantity of poisons.

Experimental Evaluation and Results

The effectiveness of GDP is evaluated across several experiments, including targeted data poisoning and backdoor attacks on CIFAR-10 and ImageNet datasets. In targeted poisoning settings, GDP outperforms existing state-of-the-art methods by achieving high success rates with fewer poisoned samples. For instance, it achieves a 70% success rate on CIFAR-10 with just 50 poisoned images, a scenario where previous methods struggled. Similarly, in backdoor attacks, GDP demonstrates superior efficiency, necessitating far fewer poisoned samples for high success rates compared to contemporary backdoor methods.

Furthermore, GDP showcases robustness in black-box scenarios, where the attacker does not have knowledge of the victim's model architecture, signifying its real-world applicability. The method's resilience against several commonly used defenses also indicates the challenges it poses to current defensive strategies in AI security.

Implications and Future Directions

The introduction of GDP highlights a pressing need for reevaluating the security measures in place for protecting neural networks against poisoning and backdoor attacks. Its capability to craft potent attacks with minimal samples and bypass existing defenses calls for the development of more robust defensive mechanisms. Future research directions might explore countermeasures specific to diffusion-based adversarial attacks and the potential for employing guided diffusion in defensive strategies.

Moreover, the approach opens new avenues in understanding the vulnerability of neural networks to data poisoning from a generative perspective, suggesting that future work could focus on the interplay between generative model-based adversarial attacks and the intrinsic vulnerabilities of deep learning models.

Conclusion

This work represents a significant step forward in the understanding and capability of neural network poisoning. By leveraging guided diffusion models to synthesize base samples optimized for adversarial objectives, GDP sets a new benchmark for the effectiveness of poisoning and backdoor attacks. The findings call attention to the emerging threats posed by advanced generative models in the field of AI security and underscore the necessity for ongoing research into more sophisticated defensive strategies.

PDF Markdown

Related Papers

Tweets

https://twitter.com/micahgoldblum/status/1772639959528137107

https://twitter.com/HosseinSouri8/status/1815113503688245610

https://twitter.com/fly51fly/status/1774192103373041905

https://twitter.com/knishimae0531/status/1774229103757128133