Poisoning and Backdooring Contrastive Learning (2106.09667v2)

Published 17 Jun 2021 in cs.LG

Abstract: Multimodal contrastive learning methods like CLIP train on noisy and uncurated training datasets. This is cheaper than labeling datasets manually, and even improves out-of-distribution robustness. We show that this practice makes backdoor and poisoning attacks a significant threat. By poisoning just 0.01% of a dataset (e.g., just 300 images of the 3 million-example Conceptual Captions dataset), we can cause the model to misclassify test images by overlaying a small patch. Targeted poisoning attacks, whereby the model misclassifies a particular test input with an adversarially-desired label, are even easier requiring control of 0.0001% of the dataset (e.g., just three out of the 3 million images). Our attacks call into question whether training on noisy and uncurated Internet scrapes is desirable.

PDF Abstract

Poisoning and Backdooring Contrastive Learning

The paper authored by Nicholas Carlini and Andreas Terzis examines the susceptibility of contrastive learning models to backdoor and poisoning attacks, emphasizing the risks associated with training on uncurated data scraped from the internet. As contrastive learning methods such as CLIP gain traction due to their cost-effectiveness and enhanced out-of-distribution robustness, the reliance on extensive, uncurated datasets presents potential security vulnerabilities that adversaries can exploit.

The research outlines the threat landscape when using multimodal contrastive learning approaches, focusing on the feasibility of executing poisoning and backdoor attacks under realistic circumstances. The paper demonstrates that by poisoning as little as 0.01% of a large dataset like Conceptual Captions, adversaries can induce significant model misclassifications. The paper presents a more refined finding for targeted poisoning attacks, showing effectiveness with a contamination level of just 0.0001% of the dataset, where adversarially desired labels can be assigned to specific inputs. This finding challenges the desirability of training on datasets obtained from uncontrolled sources.

The research details that these vulnerabilities stem from the inherent properties of contrastive learning techniques, which map data into a lower-dimensional embedding space. Although these techniques improve model robustness and facilitate learning from raw or unlabeled data through a self-supervised approach, they lack inherently secure mechanisms against adversarial injections of data.

To empirically substantiate these claims, the authors performed comprehensive experimental evaluations requiring significant computational resources. The experiments demonstrated the practicality of mounting attacks on multimodal contrastive models, relying on well-established theories of data poisoning and backdoor scenarios. This included introducing subtle data patterns, described as trigger patterns, which lead to model failures when those patterns were present in the input data.

Contrary to previous purely supervised poisoning attacks that necessitate substantial dataset manipulation, the authors demonstrate that contrastive models require far fewer poisoned samples for effective attacks. This revelation is especially significant given the scale and complexity of datasets used in current practice, where curating data is neither economically viable nor feasible at large scales.

From a theoretical perspective, the paper supports the supposition that training on noisy datasets increases the likelihood of malicious content being incorporated, which these models might inadvertently learn. The research encourages a reassessment of current best practices for building robust models, suggesting that uncurated dataset usage should be carefully managed to avoid introducing security loopholes.

The paper sets an important discourse in the field of machine learning regarding the balance between scale and security. It paves the way for future investigations into novel defense mechanisms that could obfuscate or mitigate the impacts of adversarial data while maintaining the significant advantages in robustness offered by uncurated, noise-friendly contrastive models. Moreover, this research highlights the impending need for developing resilience in self-supervised learning paradigms, addressing a critical gap before such models transition into security-sensitive applications.

In closing, the paper presents a significant exploration into the vulnerabilities of contrastive learning frameworks and adds to the dialogue on the necessary trade-offs between model performance and security in large-scale machine learning. As models evolve and more applications adopt these techniques, the security implications highlighted in this research will necessitate vigilance and innovation in AI safety protocols.

PDF Markdown Bookmark Chat (Pro)

Authors (2)

Nicholas Carlini (101 papers)
Andreas Terzis (23 papers)

Citations (142)

View on Semantic Scholar

Related Papers

Find Related Papers

YouTube

Show All Videos